TumblThree - A Tumblr Backup Application

TumblThree - A Tumblr Backup Application

TumblThree is the code rewrite of TumblTwo, a free and open source Tumblr blog backup application, using C# with WPF and the MVVM pattern. It uses the Win Application Framework (WAF). It downloads photo, video, audio and text posts from a given tumblr blog.

Screenshots:

TumblThree - A Tumblr Backup Application.</a></p>
<p><a id=

Features:

  • Source code at github (Written in C# using WPF and MVVM).
  • Multiple concurrent downloads of a single blog.
  • Multiple concurrent downloads of different blogs.
  • Internationalization support (currently available: en, zh, ru, de, fr).
  • A download queue.
  • Autosave of the queuelist.
  • Save, clear and restore the queuelist.
  • A clipboard monitor that detects blogname.tumblr.com urls in the clipboard (copy and paste) and automatically adds the blog to the bloglist.
  • A settings panel (change download location, turn preview off/on, define number of concurrent downloads, set the imagesize of downloaded pictures, set download defaults, enable portable mode, etc.).
  • Uses Windows proxy settings.
  • A bandwidth throttler.
  • An option to download an url list instead of the actual files.
  • Set a start time for a automatic download (e.g. during nights).
  • An option to skip the download of a file if it has already been downloaded before in any currently added blog.
  • Uses SSL connections.
  • Preview of photos & videos.
  • Taskbar buttons and key bindings.

Blog backup/download:

  • Download of photo, video (only tumblr.com hosted), text, audio, quote, conversation, link and question posts.
  • Download meta information for photo, video and audio posts.
  • Downloads inlined photos and videos (e.g. photos embedded in question&answer posts).
  • Download of _raw image files (original/higher resolution pictures).
  • Support for downloading Imgur, Gfycat, Webmshare, Mixtape, Lolisafe, Uguu, Catbox and SafeMoe linked files in tumblr posts.
  • Download of safe mode/NSFW blogs.
  • Allows to download only original content of the blog and skip reblogged posts.
  • Can download only tagged posts.
  • Can download only specific blog pages instead of the whole blog.
  • Allows to download blog posts in a defined time span.
  • Can download hidden blogs (login required / dash board blogs).
  • Can download password protected blogs (of non-hidden blogs).

Liked/by backup/download:

  • A downloader for downloading "liked by" photos and videos instead of a tumblr blog (e.g. https://www.tumblr.com/liked/by/wallpaperfx/) (login required).
  • Download of _raw image files (original/higher resolution pictures).
  • Allows to download posts in a defined time span.

Tumblr search backup/download:

  • A downloader for downloading photos and videos from the tumblr search (e.g. http://www.tumblr.com/search/my+keywords).
  • Download of _raw image files (original/higher resolution pictures).
  • Can download only specific blog pages instead of the whole blog.

Tumblr tag search backup/download:

  • A downloader for downloading photos and videos from the tumblr tag search (e.g. http://www.tumblr.com/tagged/my+keywords) (login required).
  • Download of _raw image files (original/higher resolution pictures).
  • Allows to download posts in a defined time span.

Program Usage:

  • Extract the .zip file and run the application by double clicking TumblThree.exe.
  • Copy the url of any tumblr.com blog you want to backup from into the textbox at the bottom left. Afterwards, click on 'Add Blog' on the right side of it.
  • Alternatively, if you copy (ctrl-c) a tumblr.com blog url from the address bar/text file, the clipboard monitor from TumblThree will detect it and automatically add the blog.
  • To start the download process, click on 'Crawl'. The application will regularly check for (new) blogs in the queue and start processing them, until you stop the application by pressing 'Stop'. So, you can either add blogs to the queue via 'Add to Queue' or double click/drag'n'drop first and then click 'Crawl', or you start the download process first and add blogs to the queue afterwards.
  • A light blue bar left to the blog in the queue indicates a actively downloading blog.
  • The blog manager on the left side also indicates the state of each blog. A red background shows an offline blog, a green background an actively crawling blog and a purple background an enqueued blog.
  • You change the download location, the number of concurrent connections, the default backup settings for each newly added blog and various other settings in the 'Settings'.
  • In the Details window you can view statistics of your blog and set blog specific options. You can here what kind of post type (photo, video, audio, text, conversation, quote, link) to download.
  • For downloading only tagged posts, you'll have to do some steps:
    1. Add the blog url.
    2. Open the blog in the details tab, enter the tags in the Tags textbox in a comma separated list without the leading hash (#) sign. E.g. great big car,bears would search for images that are tagged for either a great big car or bears or both.
  • For downloading password protected blogs, you'll have to do some steps:
    1. Add the blog url.
    2. Open the blog in the details tab, enter the password in the Password textbox.
  • For downloading hidden blogs (login required blogs), you have to do some steps:
    1. Go to Settings, click on the Connection tab and fill in your tumblr email address (login) and password, then click the Authenticate button. If the login was successfully, the label will change and display your email address. The email address and password are not stored locally on disk but cookies are generated and saved in %LOCALAPPDATA%\TumblThree in json format.
    2. Add the blog url.
  • For downloading liked photos and videos, you'll have to do some steps:
    1. Go to Settings, click on the Connection tab and fill in your tumblr email address (login) and password, then click the Authenticate button. If the login was successfully, the label will change and display your email address. The email address and password are not stored locally on disk but cookies are generated and saved in %LOCALAPPDATA%\TumblThree in json format.
    2. Add the blog url including the liked/by string in the url (e.g. https://www.tumblr.com/liked/by/wallpaperfx/).
    3. For downloading your own likes, make sure you've (temporarily) enabled the following options in your blogs settings (i.e. https://www.tumblr.com/settings/blog/yourblogname):
      1. Likes -> Share posts you like (to enable the publicly visible liked/by page)
      2. Visibility -> blog is explicit (to see/download NSFW likes)
  • For downloading photos and videos from the tumblr search, you'll have to do some steps:
    1. Add the search url including your key words separated by plus signs (+) in the url (e.g. https://www.tumblr.com/search/my+special+tags).
  • For downloading photos and videos from the tumblr tag search, you'll have to do some steps:
    1. Go to Settings, click on the Connection tab and fill in your tumblr email address (login) and password, then click the Authenticate button. If the login was successfully, the label will change and display your email address. The email address and password are not stored locally on disk but cookies are generated and saved in %LOCALAPPDATA%\TumblThree in json format.
    2. Add the search url including your tags separated by plus signs (+) in the url (e.g. https://www.tumblr.com/tagged/my+special+tags).

Key Mappings:

  • Currently mapped keys:
    • double click on a blog adds it to the queue
    • drag and drop of blogs from the manager (left side) to the queue
    • space -- start crawl
    • ctrl-space -- pause crawl
    • shift-space -- stop crawl
    • del -- remove blog from queuelist
    • shift-del -- remove blog from blogmanager
    • ctrl-shift-g -- manually trigger the garbage collection

Getting Started:

The default settings should cover most users. You should only have to change the download location and the kind of posts you want to download. For this, in the Settings (click on the Settings button in the lower panel of the main user interface) you might want to change:

  • General -> Download location: Specifies where to download the files. The default is in a folder Blogs relative to the TumblThree.exe
  • Blog -> Settings applied to each blog upon addition:
    • Here you can set what posts newly added blogs will download per default. To change what each blog downloads, click on a blog in the main interface, select the Details Tab on the right and change the settings. This separation allows to download different kind of post for different blogs. You can change the download settings for multiple existing blogs by selecting them with shift+left click for a range or ctrl-a for all of them.
    • Note: You might want to always select:
      • Download Reblogged posts: Downloads reblogs, not just original content of the blog author.

Settings you might want to change if the download speed is not satisfactory:

  • Connection -> Concurrent connections: Specifies the number of connections used for downloading posts. The number is shared between all actively downloading blogs.
  • Connection -> Concurrent video connections: Specifies the number of connections used for downloading tumblr video posts. The vt.tumblr.com host regularly closes connections if the number is too high. Thus, the maximum number of vt.tumblr.com connections can be specified here independently.
  • Connection -> Concurrent blogs: Number of blogs to download in parallel.

Most likely you don't have to change any of the other connection settings. In particular, settings you should never change, unless you're sure you know what you are doing:

  • Connection -> Limit Tumblr Api Connections: Leave this checkbox checked and do not change the corresponding values of 90 connections per 60 seconds. If you still change them, you might end up with offline blogs or missing downloads.

Further Insights:

  • Note: All the following files are stored in json format and can be opened in any editor.
  • Application settings are stored in C:\Users\Username\AppData\Local\TumblThree\.
  • You can use the portable mode (settings->general) to stores the application settings in the same folder as the executable.
  • For each blog there is also a database (serialized class) file in the Index folder of the download location named after the blogname.tumblr. Here blog relative information is stored like what files have been downloaded, the url of the blog and when it was added. This allows you to move your downloaded files (photos, videos, audio files) to a different location without interfering with the download process.
  • Some settings aren't hooked up to the graphical user interface. It's possible to view all TumblThree settings by opening the settings.json in any editor located in C:\Users\Username\AppData\Local\TumblThree\. Their names should be self explainatory. Some notable settings to further fine tune the application include:
    • BufferSize: Allows to set the buffer size for downloading binary files (photos, videos) in multiples of 4KB. The default is 2MB, thus the BufferSize has a value of 512. Increasing this value reduces disk fragmentation as more of the file is kept in the memory before it gets written out to the disk but increases the memory usage.
    • MaxNumberOfRetries: Sets the maximum number of retries if a tumblr server forcefully closes the connection. This might regularly happen on the tumblr video host (vt.tumblr.com) if too many connections were opened in parallel. After the limit is exhausted, the file is left truncated, but is also not registered as a successful downloaded. Thus, the file can be resumed in the next crawl.
    • TumblrHosts: Contains a list of hosts which is tried for downloading _raw photos if the photo size is set to raw. If none of the hosts contains the _raw version, the actually scanned host is tried with the next lower resolution (1028).

Changelog:

2018-07-05:

  • Implements the Tumblr login process and cookie handling in code instead of relying on the Internet Explorer for the Tumblr login process.

2018-06-09:

  • Fixes hidden Tumblr blog download problems caused by the new Tumblr ToS.

2018-05-20:

  • Programmatically agrees to new ToS and GDPR.
  • Implements SVC authentication changes. The SVC service is used to display the dash board blogs (i.e. hidden tumblr blogs). Changes in this internal Tumblr api prohibited TumblThrees access.
  • Saves the last post id in successful hidden tumblr downloads.
  • Improves the text parser of the tumblr api and tumblr svc data models. Separated the slug from the url as the data models are inconsistent. Separated the photoset urls from the photo urls. Moved the date information into a separate column.
  • Minor text changes of some user interface elements.

2018-04-18:

  • Updates the tumblr blog crawler and the hidden tumblr datamodel to reflect tumblr api changes that break blog download of previous TumblThree versions.

2018-02-28:

  • Allows to download only specific pages of hidden Tumblr blogs and in the tumblr search.
  • Improves the proxy settings. TumblThree now uses the default Windows (Internet Explorer) settings if not overridden within TumblThree.
  • Changes the behavior of the timeout value (Settings->Connection->Timeout). The timeout value now counts file chunks of 4kb instead of the whole file download, thus it should better detect if a download is stalled or a connection dropped without canceling active downloads of larger files (e.g. videos).
  • Changes default timeout value (for new users) from 600s to 30s.
  • Fixes possible download of the same photo but with different resolutions. This happened if the _raw file download was interrupted (the timeout hit), then the same photo was queued for download with the _1280 resolution. If the blog was then subsequently queued again, the _raw file was downloaded next to the _1280 file.
  • Fixes reblog/original post detection in the tumblr hidden crawler.
  • Fixes check blog status during startup-option.
  • Fixes download of password protected tumblr blogs.
  • Adds Mixtape, Lolisafe, Uguu, Catbox and SafeMoe parser (thanks to bun-dev).

2017-12-31:

  • Fixes a bug that released the video connection semaphore too often. That means the slider in the settings for limiting the video downloads didn't work at all. It should properly limit the connections to the vt.tumblr.com host and prevent incomplete video downloads now.
  • Includes a rewrite of the blog detection during blog addition. It should reduce latency if you mass add blogs by copying urls into the clipboard (ctrl-c). Offline blogs aren't added anymore.
  • Notifies the user when a connection timeout has occurred. The message states whether the timeout has occurred during downloading or crawling. If it happened during crawling, you might want to re-queue the blog at some point to grab missing posts. A connection timeout should only happen if your connection is wonky. You can decrease/increase the timeout in the settings (settings->connection).
  • You can now specify in the Details-panel for each blog where its files should be downloaded. If the text box control is empty, the files are downloaded as in previous releases in the folder specified in the global download location (settings->general), plus the blogs name.
  • Imgur.com linked albums in tumblr posts are now entirely downloaded if enabled (details panel->external->download imgur). Previously, only directly linked images were detected.
  • Adds an option to load all blog databases into memory and compare each to-download binary file to all databases across TumblThree before downloading. If the file has already been downloaded in any blog before, the file is skipped and will not be counted as downloaded. You can enable this in the settings (settings->global).
  • Allows to add hidden tumblr blogs using the dashboard url (i.e. https://www.tumblr.com/dashboard/blog/blogtobackup).
  • Allows to add all blog types without the protocol suffix (i.e. wallpaperfx.tumblr.com, www.tumblr.com/search/cars).
  • Adds an option to enable a confirmation dialog before removing blogs (#186, #130, #98). It's off by default.

2017-11-17:

  • Adds support for downloading Imgur.com, Gfycat.com and Webmshare.com linked files in tumblr posts.
  • Improves downloading of tumblr liked/by photos and videos.

2017-10-20:

  • Restores bandwidth limiter functionality.

2017-10-13:

  • Changes the default _raw photo host.

2017-10-09:

  • Fixes crawler stop in hidden tumblr blog downloads.
  • Adds options to set the default blog settings for the download from time, download to time and tags in the settings menu.
  • Adds some (ar, el, es, fa, fi, he, hi, it, ja, ko, no, pa, pl, pt, th, tr and vi) google translate translations.

2017-09-08:

  • Can download password protected blogs of non-hidden blogs.
  • Minor UI updates.

2017-08-22:

2017-08-21:

  • French, Spanish and simplified Chinese translations.
  • Removes user interface lag during blog addition.
  • Allows to set the buffer size for downloading binary files in the settings.json in multiples of 4KB. The variable is called BufferSize. The new default is 2MB, thus the BufferSize has a value of 512. Previously it was set to 4KB, but apparently Windows does not do any useful caching on NTFS if multiple writes are concurrent and async. Thus, this should reduce disk fragmentation.
  • Uses .NET Framework 4.6 now as it should be available for all supported windows versions (Windows Vista and above).
  • Improved the selection handling in the details panel. If multiple blogs are selected, old values are now kept if they are the same for all blogs and changes are immediately reflected.
  • Audio file download support for tumblr and hidden tumblr blogs.
  • More code Refactoring.

2017-07-03:

  • Can download hidden (login required/dash board) blogs.

2017-06-30:

  • Improved performance and bugfixes.

2017-06-20:

  • Downloads high resolution (_raw) images.
  • Updated translations (German and Russian).
  • Applies changed settings immediately.

2017-06-04:

  • Sets the date modified date in the Explorer to the posts time.
  • Allows to download single or ranges of blog pages.
  • Full screen media preview.

2017-05-20:

  • Option to skip reblogged posts.
  • Improves detection of inlined photos and videos in text posts (e.g. in answer posts).

2017-05-14:

  • Portable mode.
  • Downloads liked photos and videos.

2017-04-18:

  • Code refactoring.
  • Uses async/await in most of the code instead of tasks from the threadpool.
  • Uses a consumer producer pattern for grabbing and downloading as the Tumblr api v1 is now rate limited.
  • Downloads are now resumable.
  • Data files are now saved as json instead of binary.
  • Reduced memory usage by layering off the downloaded file list and only load it if needed.
  • Improves ui responsiveness.

2017-01-08:

  • Improves the speed of the network code.
  • Adds an option to use a http proxy.
  • Downloads inline images of tumblr posts.
  • Added Russian translation.

2016-12-13:

  • Improves the ui scaling of the main window for smaller resolutions.
  • Prevents crawling of offline blogs.
  • If the same blog is multiple times in the queue and already once active, any other free crawler task will skip and remove any already active blog and proceed to the next inactive blog in the queue.
  • Improved german translation.

2016-12-10:

  • The check for already downloaded files is now independent from the actual host and based entirely on the filename. It look likes the host/mirror does actual vary which would result in a reload of the file since its url changed.
  • Add scrollbars to the settings window if the controls do not fit.
  • Safely replaces blog indexes. If there is an error (e.g. no disk space left) during the update of the index file, the old state should not be corrupted anymore.
  • Changes some color and adds an alternate color for the blog manager.

2016-11-23:

  • Fixes application crashes which occurred by adding tumblr blogs without title or description.
  • Decreases determination time of already downloaded files for large blogs (>100,000 posts) by at least three orders of magnitude.

2016-11-22:

  • Creates more meta information (post id, reblog key, timestamp, tags, slug, title) of the posts, including image, video and audio types.
  • Fixes the progress calculation by adding the found duplicates to the progress. Also states them in the details window.
  • Fixes a locking issue for the meta files (*.txt) which resulted in incomplete downloads.
  • Updates the details and settings view for a better understanding on how to use the application.

2016-11-20:

  • Fixes proper counting of downloaded files.
  • Fully implements the details window (context menus, etc.).

2016-11-18:

  • Fixes the initial automatic queue restore function.
  • Fixes the autodownload function.

2016-11-16:

  • Picture- and videopreview in the details window.
  • Allows the download of text, audio, quote, conversation, link type posts.
  • Download of text, audio, quote, conversation, link and .gif images are based on each blog instead of a global setting and can be turned on/off in the details view. The settings in the settings window are used as template for newly added blogs.
  • Modified .tumblr index files get now always saved upon application exit regardless of the crawlers state. Previously if the application was closed during an active crawl, the index wasn't updated.
  • Inlined the WAF code under lib for easier project setup for newcomers that want to contribute code.
  • bugfixes, UI and memory enhancements.

2016-10-15:

  • Bandwidth throttling.
  • Connection timeout settings.
  • auto queue and start download function.
  • save states of the UI (column size and order).
  • download of hidden blogs.
  • fix proper saving of the ratings and tags.

2016-06-11:

  • Added German translation.

2016-06-10:

  • Support for tumblr.com hosted videos. Check the settings window to enable video download (default: off).

2016-06-08:

  • Tag crawling now properly working. Also it's case-insensitive now.
  • Fixed crash upon blogs with zero-image count in the queue list (e.g. blog is offline, or tag search didn't evaluate any images).
  • Fixed randomly occurring crash in the clipboard monitor.
  • Changed icons (requested by the TumblOne creator).

2016-04-12:

  • Now with progress output in the Queue tab (during url crawling for imageurls -- the number of posts evaluated; during downloading -- the current image url).
  • Added missing resume button in the taskbar control.

2016-04-11:

  • Support for urls starting with https:
  • Fixes application crashed upon pressing the stop-button due to improper exception handling
  • Now saves the index file at every time. Previously the application would exit if the crawling processes was still active without properly waiting them to finish and save its state. Now there is a grace period for the tasks to finish. Same was true if the crawl was paused and then exited.

Download:

Comments

anonymous (not verified)
Sat, 05/11/2016 - 12:49

If you highlight a blog in the queuelist, perhaps it should also highlight in the bloglist

excellent functionality 9/10

BBiggz007 (not verified)
Thu, 10/11/2016 - 02:14

Hello

Will you be adding the feature to be able to recognize and download your "likes" from your own Tumblr page? I've tried almost every one of the apps out there and none seem to have the feature or ability to scan through all of the "My Likes" pages and download the Likes section, only what you've posted or reblogged.

Sorry to post in here. Not sure where else to post questions regarding this great program. I know nothing about coding so if this request is impossible, I apologize.

Hemant (not verified)
Thu, 17/11/2016 - 07:53

Hi, first of all thanks for this wonderful app, and all the features that you keep working on.

I've downloaded too many images using TumblOne & TumblTwo, now I'm using TumblThree, it creates it's own index files which is fine. But it is downloading all the files previously downloaded.
I have already deleted/replaced previously downloaded files with null files (0KB) with same name. Is there a way to avoid overwriting files having same name?

Whenever I delete a image I create a null file of same name, thinking that app will ignore downloading the same file again, but it doesn't. Please help me... I have already downloaded/deleted/replaced many files with null file and I don't want to download them over again.

Please help me

Slow (not verified)
Thu, 24/11/2016 - 20:59

Wow, you have been busy adding things....... wonderful

I just updated to this release and notice the following:

When I launch the application I get this error at the top in red "error 1: could not restore UI settings"
However I do not see anything wrong, and the application runs just fine.

The progress bars do not reflect any actual activity. They are solid green all the way to the end, and the number of files does not change while it is downloading a given blog. (Thanks for making them larger)

An offline blog goes into the Que now and stays there without giving any error like it did before. I guess this is ok, but does add a step to remove it that was not there before. Maybe a global variable to not download offline blogs can be added.

I like where you are going with the details, and this window would be a perfect place to impliment the added file structure for saving files. You could add a "group" textbox where you could type in the group name which would become the next level of where the files were stored.
blogs/group/blogname

That would let you place individual blogs into a group of blogs such as "cars", "people",boats etc as the group. If no group was entered, the blog would go directly where the master setting points: blogs/blogname like it does now.

Slow (not verified)
Fri, 25/11/2016 - 02:17

I made a test setup to test this release more, and I see what the issue was I was having with the progress bar. I was downloading my current blogs which have thousands of photos. The new photos being downloaded were but a fraction of the total, so the progress bar probably had only a single pixel or so added which was not distinguishable to me. When I did a new blog it acted exactly like I expected it to. I think I was thinking it showed the progress of the present download activity rather than the contents of the entire blog.

Slow (not verified)
Thu, 24/11/2016 - 21:15

I noticed another change in this release and do not see any mention of it in the changelog.

It appears that the file date sometimes is the original post date but the majority of the date/time are the "saved" time on my system.

Can you explain when the program does not use the "original" date and time and uses the "saved" date instead? I wish they could all be the original saved date, but maybe that is not possible.

Slow (not verified)
Fri, 25/11/2016 - 02:26

In reading the meta data it looks like there could be three possible dates a file has. In the case of a reblog there is the original date the file was posted on Tumblr, the post date on the blog you are downloading, and last the date the file is saved on the users system.

If it is possible to use the dates a file could have following that order it would be the most useful in weeding out duplicate files that would exist between blogs.

Highest preference would be the original post date
next would be the current blog post date
last would be the date the file was saved on the users system

zab
Fri, 25/11/2016 - 07:58

It's all possible, like restoring the date. It's probably just 2-3 lines of code if you do it messy, but takes a while to think everything through and make it properly with choices. I am just alone and I thought there where more important things to fix right now. But maybe I'll add it over the next weeks coming. Right now I wasn't planing on coding much on this project since I didn't do anything else for almost 2 weeks now.

I've added it on the possible features/enhancements page. Maybe someone else is willing to do it. The code isn't too bad and much more maintainable and abstracted, so won't break other things as easily as in TumblTwo.

Slow (not verified)
Fri, 25/11/2016 - 20:54

Zab, I certainly did not mean to suggest this was more important than the other work you have planned. I do understand about "being alone". I do some coding work on a few projects myself. I';m not a professional programmer, so I do not feel qualified to offer help on a project that is working as well as TumblThree. I fear my contribution would only mess things up.

Thank you for an excellent software application. Your talents are very much appreciated.

anonymous (not verified)
Sun, 04/12/2016 - 08:51

It gets better with every new version, yay!

However, please consider rearranging elements in settings window because it:
a) doesn't fit on screen with resolution height less than 800px;
b) has overlapping elements as shown on the pic: http://i.cubeupload.com/cTz2oc.png.

Also, is it possible to highlight progressbar of currently crawling blog with different colour?

zab
Sun, 04/12/2016 - 13:54

That's a bummer, for sure. I'll fix that quite soon.

The color change is probably also easy to add.
Thanks!

anonymous (not verified)
Sat, 10/12/2016 - 11:35

Thanks for the scrollbar!
The "Authenticate" button is still overlaying some of the options though.

Taranchuk (not verified)
Sun, 04/12/2016 - 16:55

Hi! You make excellent updates! Adding function to create meta data list is a great addition to the program! Personally, I use the metadata list to convert it to a list of the names of the files, then to rename the downloaded files. With a caption and tags to filenames. What then gives the ability to search for certain words in the file names! I have a suggestion to improving the functionality of the program. It is possible to add a function searching and downloading images for the entire Tumblr by tags and captions? For example, download all of the images with the keyword Star Wars. Or something like that! It would be a wonderful addition to the program! Thank you for your wonderful program!

zab
Sun, 04/12/2016 - 18:17

Yes, it would have been easily possible to search the whole tumblr webpage for specific tags, not just a specific tumblr blog, with the api v2. I've actually already implemented using their newer api, but the thing is, they control it per application since every program has to apply for a specific key. Each key can only access the site for 5000 searches a day. You can ask to remove that rate limit, but I don't think they will do to for TumblThree as it probably is against their terms of services. I've already asked once and didn't get a reply back, so I guess, they don't want it.

It's however still possible to parse the whole webpage like a webbrowser is doing it by using a library like Html Agility Pack or AngleSharp. We could use the internal .NET browser to open the login page and use the cookie container to grab all the cookies and than use them within the WebRequests and parse the results with Html Agility Pack/AngleSharp.

It's probably worth a try and also necessary to access private blogs which require authentication. It's just too much for me to try out right now. Maybe during christmas, new years eve.

anonymous (not verified)
Wed, 07/12/2016 - 23:20

I was using 1.0.0.0 previously and found it to work just fine.

However I cannot get 1.0.4.6 to work at all. It does nothing when I tell it to crawl tumblrs. I only can get it to work when I switch from my old computer (windows 7) to a newer one (windows 7 SP1) and even then it ONLY works on a fresh copy of tumblthree. It also only works once. If I close and reopen tumblrthree it doesn't crawl or continue any tumblr downloads. It also keeps crashing if I try to change the settings on parallel connections and blogs before crawling (I was having problems downloading videos because it would timeout so I need to reduce the amount of connections at once). And when I did manage to get the settings to change, tumblethree refused to crawl through the tumblrs.

I've tried deleting parts or all the setting files in the appdata folder and index files for blogs, but I cannot get tumblrthree to continue downloading or recheck any blog already searched. I also tried running as administrator and that doesn't fix anything either.

I also have problems where the program would just crash when I add a blog or 5 seconds afterwards.

zab
Thu, 08/12/2016 - 06:37

You probably have to check all the checkboxes for your already existing tumblr blogs as described here: https://github.com/johanneszab/TumblThree/issues/13

The application now uses blog based settings instead of global application wide settings to weather it should download pictures, videos, etc from a specific blog. Since that setting wasn't implemented in version 1.0.0.0, it defaults to off for any search.

You can see the checkboxes in the new screenshot as well. Just select all you blogs in the library with ctrl-a and then check the boxes you want to process.

anonymous (not verified)
Thu, 08/12/2016 - 07:15

I've tried this and it still doesn't work. The blog in the que highlights to green, but then does nothing at all. It shows nothing is downloaded in the details tab. I even tried it with nothing checked and the blog still just sits there with the green highlight with nothing happening. I left it there for like 30 minutes and nothing gets downloaded or anything.

The only time I see it work is with a fresh new copy of tumblrthree with all the setting files deleted, but I can't use the program cause 25 concurrent downloads is too much for my connection. That and the program stops working after I shut it down and try it back up and get the same problem I mentioned in the last paragraph.

zab
Thu, 08/12/2016 - 09:30

You could send or upload me one of the .tumblr files of a corresponding blog in the \Index\ folder so I can check it with the debugger and see what happens exactly if the download starts. Otherwise I cannot say much to why this would happen.

Also I don't know why the application would crash if you change the settings there. You can still set the settings in the .xml file in the Settings folder under AppData\Local\TumblThree\Settings.xml. Maybe that works.

anonymous (not verified)
Sat, 10/12/2016 - 09:46

Hello and thank you for the best and most powerfull Tumblr grabber so far! There're a few bug reports / feature requests that could be easily implemented (I hope) and would make the program even more awesome:

1. Is it possible to store settings into application directory itself, not in /AppData? This way TumblThree would be fully portable, and you won't accidently delete all your settings. This could be useful in case if you store one copy of program and blogs on external HDD drive and keep another copy with another et of blogs on your computer.
2. Rare bug I've experienced once on Nov 23 build. One blog I've been crawling earlier was removed from Tumblr. I didn't noticed it at first, added it to the queue with rest of the blogs and hit "Crawl" to update them all. When its turn came, the grabber just stuck: there were neither progress on this blog, it wasn't removed from the queue, and simultaneously downloading blog didn't go further in the queue. Even when I manually stopped the crawling process, removed the faulty element from the queue and resumed crawling, its name stayed in application window titlebar along with name of other currently crawling blog, and number of parallel downloading blogs reduces by one -- as if this faulty blog just occupied the vacant download thread while doing nothing. I've restarted the application and removed this blog, and it seems everything working fine since then, but I'm worried that situation could happen again when another blog from my bloglist would go offline.
3. How about adding changes.txt file to the Github archives with new releases?

Slow (not verified)
Sun, 11/12/2016 - 01:28

The issue you describe in #2 isn't really rare at all. If you have an offline blog and select it for crawl, it does get placed into the queue, and when it's turn comes, it does permanently capture a slot in the number of threads. Removing it from the queue only takes it out of the window and does not release the thread it occupies. The result I see is the very last blog in the queue does not start the crawl. You need to add the last blog again to the queue and then delete the one that will not start. As you point out it also reduces the number of threads by the number of stuck threads from trying to download an offline blog.
It is easy to put an offline blog into the queue if you are in the habit of selecting all blogs and adding them to the queue.
Past versions of tumbl3 I believe removed the offline blog or maybe never placed it into the queue, I can't remember which.

zab
Sun, 11/12/2016 - 20:58

Thanks!

will be fixed in the next release.

anonymous (not verified)
Thu, 15/12/2016 - 22:48

I wonder if me and other TumblThree users could help you with translation/localisation for languages other than English and German. Are strings from Resources.resx the only one that required to be translated?

zab
Fri, 16/12/2016 - 08:31

Yea, right. Everything is in the Resources files. Keep in mind there are three of them. One for the applications, one for the domain and one for the presentation.

One exception is the creation of the meta file bodies where I simply was too lazy at that time to add all the strings to the resources files since thery are so cumbersome to handle. So, the following strings here should be the only exceptions:

https://github.com/johanneszab/TumblThree/blob/master/src/TumblThree/Tum...

It's actually a good idea to call for people. Something I've also already had in mind. Maybe we could add a simple message in the README.md and say "these strings have to be translated"..

anonymous (not verified)
Fri, 16/12/2016 - 11:00

Hello, could you please provide direct links to these files on GitHub?
I'll try to work on them on Xmas holidays.

zab
Fri, 16/12/2016 - 19:00

Sure.

Every value starting from there positions need to be translated:
https://github.com/johanneszab/TumblThree/blob/master/src/TumblThree/Tum...

https://github.com/johanneszab/TumblThree/blob/master/src/TumblThree/Tum...

and lines in the file above (https://github.com/johanneszab/TumblThree/blob/master/src/TumblThree/Tum...) , which I'll have added to the first linked resource file until christmas.

You can simply translate the statements/words and send me everything in the text file if you don't have a visual studio. It takes long to install and it's probably not worth it just for this..

anonymous (not verified)
Sun, 18/12/2016 - 22:56

Hello Zab.

There's the result of my attempt to translate text strings into Russian:
http://s000.tinyupload.com/index.php?file_id=16696396071451556716
Hope I didn't messed up with file/text encoding etc. It will probably need some more cleanup/translation fixes after we could see the translation in action (as I've never seen or used some of text and options before).

There're also some questions about certain stings/text that could be brought up by other translators:
1. There're two strings CouldNotRemoveBlog and CouldNotRemoveBlogIndex with the same text in the source files - what's the difference between them?
2. Are the message in InternalErrorDescription string is just a placeholder at this moment?
3. I suppose the value Return is corresponding to return key on keyboard?
4. I'm not sure about my translation for Reblog key as I have no idea what it is.

zab
Mon, 19/12/2016 - 09:54

Woohoo, even as .resx files! Thanks alot. All I had to do was to add them to the solution.

I've also never checked the German translation in action until recently and immediately noticed how broken they were. It's somehow not easy to start the application in a different language in Visual Studio. I'd have to change my user/system language settings for that every time I think. That's why I've never done it.

The application should start in Russian for you now if your system/user language is set to Russian. Just put the ru folder with the .dll files in the main TumblThree folder where the TumblThree.exe sits, just like the en folder.

anonymous (not verified)
Fri, 23/12/2016 - 19:51

Thanks Zab! Just checked Russian translation, everything seems OK, although some messages probably should be shortened for aesthetic reasons. Hope to to do it (and figure out how to use GitHub) in a few days . Also, I was unable to find strings for blog status ("Online"|"Offline") in provided resx files, so they're still untranslated.

Taranchuk (not verified)
Fri, 16/12/2016 - 13:44

Hi. Could you add the possibility of creating meta information in a text file for the another images from the photosets? I use the metadata file to convert it to the file list with caption and tags to filenames to further renaming the files in folder. However, only part of the files being renamed (single non-photosets images and the one image from the photosets), rest the images from photosets don't renamed because the names of the files are absent in the metadata file. It makes me sad. Please add this possibility if it is possible. Thanks!

zab
Sat, 17/12/2016 - 09:04

Yep, it's possible and also straight forward. I haven't put much thoughts into the meta file structures..

If you have any recommendations, let me hear them. It should also be possible to do them user-customizable in the future ..

Taranchuk (not verified)
Sat, 17/12/2016 - 11:42

Thanks for answer! For me personally, it would be enough if the program be able to create a meta information for other images from the photosets. It should look like:
Post ID: 152659720722, Date: 2016-11-02 22:42:31 GMT
Url with slug: [url of blog]
Photourl: [link of image]/tumblr_og1dqvKMdD1spdgdfo1_1280.jpg
Reblog key: [reblog key]
Photo Caption: [caption]
Tags: [tags]

Post ID: 152659720722, Date: 2016-11-02 22:42:31 GMT
Url with slug: [url of blog]
Photourl: [link of image]/tumblr_og1dqvKMdD1spdgdfo2_1280.jpg
Reblog key: [reblog key]
Photo Caption: [caption]
Tags: [tags]

Post ID: 152659720722, Date: 2016-11-02 22:42:31 GMT
Url with slug: [url of blog]
Photourl: [link of image]/tumblr_og1dqvKMdD1spdgdfo3_1280.jpg
Reblog key: [reblog key]
Photo Caption: [caption]
Tags: [tags]

wherein the Phototurls part of urls of images contain the names of files in photoset as these:
tumblr_og1dqvKMdD1spdgdfo1_1280.jpg
tumblr_og1dqvKMdD1spdgdfo2_1280.jpg
tumblr_og1dqvKMdD1spdgdfo3_1280.jpg
I would be very grateful, if you find an opportunity to implement it!

Red (not verified)
Sat, 17/12/2016 - 01:18

1. I've just upgraded from TumblOne. How do I get TumblThree to locate and use the indexes i have saved from the first program?

2. TumblThree doesn't seem to save the past Tumblrs I've ripped, so i have to constantly type in the tumblr address of the page i want to rerip everytime I start the program, even if I've entered and crawled them multiple times before. Is there a way to make TumblThree remember my past crawls, so when I start the program my past url crawls are listed and all I have to do is add them to the queue to download new images?

zab
Sat, 17/12/2016 - 08:59

1. You cannot right now.

2. It does save the past Tumblrs and its state. Make sure you don't have the TumblOne files in that directory, maybe those cause the issue. I haven't tested this. Any error messages?

Red (not verified)
Sat, 17/12/2016 - 23:14

By files, do you mean TumblOne indexs, or images downloaded by TumblOne? I assumed I could just copy and paste my previous blog folders to the new TumblThree directory, and that TumblThree would use them and just add any new files to them. Should I just separate the TumblOne downloaded images from the new directory?

zab
Sun, 18/12/2016 - 08:48

If you add a blog and there is no blogname.tumblr in the \Index\ folder relative to your Download location something is wrong and no state will be saved and you'll have to start from the beginning again. I doubt however its the applications fault. Do you have permissions to write? Any error messages?

I meant the .tumblr files from TumblOne. I've never checked what happens if you try to load those. I guess it doesn't even work, but maybe there is some interference that I programmed it to skip the loading entirely.

You can leave the downloaded pictures in the \Blogname\ folders but they won't do anything. TumblOne indeed looked into the \Blogname\ folders for filenames matching new to be downloaded pictures and would skip them. The .tumblr files in TumblOne were only used to store the blogs url and the number of already downloaded pictures.

That's not how TumblThree is handling it. All urls and filenames are stored in the \Index\blogname.tumblr file. Here we check for already processed files. The main advantages are that the whole .tumblr structures is loaded into memory which provides a fast lookup. You can safely remove any downloaded files without interfering the new downloads and the lookup of a folder, say with 100.000 pictures in it, just to check if there is a pictures with a proper filename in it is insanely slow.

anonymous (not verified)
Sun, 18/12/2016 - 23:00

I just noticed that all downloaded MP3 files are actually just multiple copies of the same Flash audio player.

zab
Mon, 19/12/2016 - 07:48

yea, I know. I've changed it in the recent build trying to download the .swf files but I couldn't get it working. Probably because its streaming and not a straight forward file download. The bad thing is that not all audio posts offer mp3 files, but all offer the swf player to hear the music. Otherwise downloading the mp3s would probably had been the best and also easiest option.

bramnet (not verified)
Fri, 11/08/2017 - 22:45

I've been using this recently, and the .swf files aren't working anymore. I get nothing but a white background and any swf unpacking programs that I use say that it's an invalid file.

zab
Sat, 12/08/2017 - 21:28

It's still not fixed since the requests get a "403 -- forbidded"-error because some authentication or referrer is missing.

It's possible right now to download the .mp3 files by changing the code but they aren't always provided. Some posts have them, some don't But it's probably worth changing. Some is still better than none.

Sonu Meena (not verified)
Sun, 25/12/2016 - 03:21

I don't know what I am doing wrong,except version 1.0 none are working.Every time I add blog whether manually or automatic it crashes.

Taranchuk (not verified)
Sun, 25/12/2016 - 13:30

With me is also happened when the program once instead of D:\Tumblthree\Blogs\Index\ recorded a .tumblr files in the C:\Tumblthree\Blogs\index\ and then a program are crashed after crawling. Try searching .tumblr files on your entire computer, probably anywhere else on your computer stored the wrong directory with these files. Also try to searching throughout computer these files Settings.xml, Manager.xml, Queuelist.xml, perhaps on your computer stored somewhere their duplicates and because of this program crashed. Or if duplicates are not found, try deleting the original files and run the program again.

anonymous (not verified)
Sun, 25/12/2016 - 08:23

tumblthree doesn't seem to download inline images, any idea on how to get tumblthree to get those?

Taranchuk (not verified)
Sun, 25/12/2016 - 12:51

I confirm this. They are recorded in the meta data, but in a folder where downloaded files are stored they are not present. It's about the files that begin in the names with tumblr_inline_*.

Slow (not verified)
Fri, 27/01/2017 - 16:45

Zab,

I wanted to take a few minutes to say thank you for a very useful and time saving program. The present release runs wonderfully.
I am sure this effort has taken countless hours of your time, and a heart felt thank you is always nice to get.
Wishing you the best for the new year.
Slow

zab
Fri, 27/01/2017 - 18:03

Thanks!

Yap, actually it was quite some time now. I think you can only realize this once you start programming on your own and see how slowly some things develop and also what thought have been put into it. I've never thought about this in my earlier ages when the open source programs were really lacking of functionality and people were complaining about the lousy software quality in comparison to some heavy windows applications made from larger companies.

I'm quite short of time right now, but I'll fix the high memory usage if you've multiple larger blogs in the manager by outsourcing the url list to separated files which only get loaded upon crawling. Right know, every blog is fully loaded into the memory including all the urls. Probably at some time end-February.

Also I'll still add the possibility to download private blogs. I've it partly working already, it just needs some detection implemented in when to chose what path to crawl and also some heavy optimization.

I think that will probably the major issues we still need to tackle. Also the download of sound files is still broken ..

Slow (not verified)
Fri, 27/01/2017 - 22:34

Zab,

I see you have a full list of things to work on. I havn't had any issues with memory usage unless I am running other things at the same time.

I do have a question about what the "download specified size even if it is not offered" check box.

What exactly does it do? The help text confused my totally.
What I want to do is to download the largest size available which usually is the 1280 size. But if that size is not available then I want the next lower size that is available.
I got the impression that checking the largest size would do that until that new check box appeared. I checked it and ran a new crawl on almost all of my blogs. I made the mistake of not giving it a test first.

What I ended up with was almost a total re-download of all the photos and this time it saved both versions of the photos that had them which gave me tons of duplicates. I nearly doubled my total file size of the blogs. At first I thought it was getting me new photos that did not have the 1280 size, but that was not the case.

Almost every blog now shows I have more photos for each blog than is available to be downloaded.
Example:
Downloaded files = 52241
Number of downloads = 20890

It appears that the only fix would be to delete the blogs and do another total crawl with just the 1280 selected and leave that other option unchecked.

I would recommend you remove the comment "It's safe to turn this on". For people who only want one version of a photo and that being the highest quality available checking this box appears to give you every version of the photo.

Is it possible to get the highest resolution of a photo or is that not built into the program?

Slow (not verified)
Sun, 29/01/2017 - 17:49

- I deleting the index file of each blog
- adding the blog back into Tumblthree without deleting the actual blog photo folder or files.
- a new crawl was done with both "1280 size" and "download specified size even if it is not offered" checked.

I am not sure what caused the blog folder to end up with both 1280 and smaller duplicate files on my first crawl after checking the "download specified size even if it is not offered".

This problem happened to existing blogs that were not originally set with that option checked.

Fortunately my normal procedure uses a third party program "Digital Janitor" to place all of the photo files into unique sub folders by file extent. This had saved the original files prior to the bad crawl.

That made it fairly easy to use another program "Duplicate File Finder" to delete all exact duplicate files and a duplicate photo finder program "VisPics" to remove all the duplicate small resolution photos.

I usually use the Duplicate File Finder program against the entire "BLOG" root folder to remove all the duplicates caused by REBLOG and double posting within the same blog. I keep the oldest dated file which usually is the original posting of that file, and discard all duplicate photos with a more recent date.

Dani (not verified)
Sat, 28/01/2017 - 14:12

Firstly I wanted to say thank you for this - it's very useful and I'm very grateful for its existence!

I wanted to know whether there's currently any way to download the captions of images? I'm trying to backup a blog where all of the text is stored as the caption of an image post; it seems like this downloads only the images? am I missing something, or is it not a feature of the application?

thanks again!

Pages