TumblThree - A Tumblr Backup Application

TumblThree - A Tumblr Backup Application

TumblThree is the code rewrite of TumblTwo, a free and open source Tumblr blog backup application, using C# with WPF and the MVVM pattern. It uses the Win Application Framework (WAF). It downloads photo, video, audio and text posts from a given tumblr blog.

Screenshots:

TumblThree - A Tumblr Backup Application.</a></p>
<p><a id=

Features:

  • Source code at github (Written in C# using WPF and MVVM).
  • Multiple concurrent downloads of a single blog.
  • Multiple concurrent downloads of different blogs.
  • Internationalization support (currently available: en, zh, ru, de, fr).
  • A download queue.
  • Autosave of the queuelist.
  • Save, clear and restore the queuelist.
  • A clipboard monitor that detects blogname.tumblr.com urls in the clipboard (copy and paste) and automatically adds the blog to the bloglist.
  • A settings panel (change download location, turn preview off/on, define number of concurrent downloads, set the imagesize of downloaded pictures, set download defaults, enable portable mode, etc.).
  • Uses Windows proxy settings.
  • A bandwidth throttler.
  • An option to download an url list instead of the actual files.
  • Set a start time for a automatic download (e.g. during nights).
  • An option to skip the download of a file if it has already been downloaded before in any currently added blog.
  • Uses SSL connections.
  • Preview of photos & videos.
  • Taskbar buttons and key bindings.

Blog backup/download:

  • Download of photo, video (only tumblr.com hosted), text, audio, quote, conversation, link and question posts.
  • Download meta information for photo, video and audio posts.
  • Downloads inlined photos and videos (e.g. photos embedded in question&answer posts).
  • Download of _raw image files (original/higher resolution pictures).
  • Support for downloading Imgur, Gfycat, Webmshare, Mixtape, Lolisafe, Uguu, Catbox and SafeMoe linked files in tumblr posts.
  • Download of safe mode/NSFW blogs.
  • Allows to download only original content of the blog and skip reblogged posts.
  • Can download only tagged posts.
  • Can download only specific blog pages instead of the whole blog.
  • Allows to download blog posts in a defined time span.
  • Can download hidden blogs (login required / dash board blogs).
  • Can download password protected blogs (of non-hidden blogs).

Liked/by backup/download:

  • A downloader for downloading "liked by" photos and videos instead of a tumblr blog (e.g. https://www.tumblr.com/liked/by/wallpaperfx/) (login required).
  • Download of _raw image files (original/higher resolution pictures).
  • Allows to download posts in a defined time span.

Tumblr search backup/download:

  • A downloader for downloading photos and videos from the tumblr search (e.g. http://www.tumblr.com/search/my+keywords).
  • Download of _raw image files (original/higher resolution pictures).
  • Can download only specific blog pages instead of the whole blog.

Tumblr tag search backup/download:

  • A downloader for downloading photos and videos from the tumblr tag search (e.g. http://www.tumblr.com/tagged/my+keywords) (login required).
  • Download of _raw image files (original/higher resolution pictures).
  • Allows to download posts in a defined time span.

Program Usage:

  • Extract the .zip file and run the application by double clicking TumblThree.exe.
  • Copy the url of any tumblr.com blog you want to backup from into the textbox at the bottom left. Afterwards, click on 'Add Blog' on the right side of it.
  • Alternatively, if you copy (ctrl-c) a tumblr.com blog url from the address bar/text file, the clipboard monitor from TumblThree will detect it and automatically add the blog.
  • To start the download process, click on 'Crawl'. The application will regularly check for (new) blogs in the queue and start processing them, until you stop the application by pressing 'Stop'. So, you can either add blogs to the queue via 'Add to Queue' or double click/drag'n'drop first and then click 'Crawl', or you start the download process first and add blogs to the queue afterwards.
  • A light blue bar left to the blog in the queue indicates a actively downloading blog.
  • The blog manager on the left side also indicates the state of each blog. A red background shows an offline blog, a green background an actively crawling blog and a purple background an enqueued blog.
  • You change the download location, the number of concurrent connections, the default backup settings for each newly added blog and various other settings in the 'Settings'.
  • In the Details window you can view statistics of your blog and set blog specific options. You can here what kind of post type (photo, video, audio, text, conversation, quote, link) to download.
  • For downloading only tagged posts, you'll have to do some steps:
    1. Add the blog url.
    2. Open the blog in the details tab, enter the tags in the Tags textbox in a comma separated list without the leading hash (#) sign. E.g. great big car,bears would search for images that are tagged for either a great big car or bears or both.
  • For downloading password protected blogs, you'll have to do some steps:
    1. Add the blog url.
    2. Open the blog in the details tab, enter the password in the Password textbox.
  • For downloading hidden blogs (login required blogs), you have to do some steps:
    1. Go to Settings, click on the Connection tab and fill in your tumblr email address (login) and password, then click the Authenticate button. If the login was successfully, the label will change and display your email address. The email address and password are not stored locally on disk but cookies are generated and saved in %LOCALAPPDATA%\TumblThree in json format.
    2. Add the blog url.
  • For downloading liked photos and videos, you'll have to do some steps:
    1. Go to Settings, click on the Connection tab and fill in your tumblr email address (login) and password, then click the Authenticate button. If the login was successfully, the label will change and display your email address. The email address and password are not stored locally on disk but cookies are generated and saved in %LOCALAPPDATA%\TumblThree in json format.
    2. Add the blog url including the liked/by string in the url (e.g. https://www.tumblr.com/liked/by/wallpaperfx/).
    3. For downloading your own likes, make sure you've (temporarily) enabled the following options in your blogs settings (i.e. https://www.tumblr.com/settings/blog/yourblogname):
      1. Likes -> Share posts you like (to enable the publicly visible liked/by page)
      2. Visibility -> blog is explicit (to see/download NSFW likes)
  • For downloading photos and videos from the tumblr search, you'll have to do some steps:
    1. Add the search url including your key words separated by plus signs (+) in the url (e.g. https://www.tumblr.com/search/my+special+tags).
  • For downloading photos and videos from the tumblr tag search, you'll have to do some steps:
    1. Go to Settings, click on the Connection tab and fill in your tumblr email address (login) and password, then click the Authenticate button. If the login was successfully, the label will change and display your email address. The email address and password are not stored locally on disk but cookies are generated and saved in %LOCALAPPDATA%\TumblThree in json format.
    2. Add the search url including your tags separated by plus signs (+) in the url (e.g. https://www.tumblr.com/tagged/my+special+tags).

Key Mappings:

  • Currently mapped keys:
    • double click on a blog adds it to the queue
    • drag and drop of blogs from the manager (left side) to the queue
    • space -- start crawl
    • ctrl-space -- pause crawl
    • shift-space -- stop crawl
    • del -- remove blog from queuelist
    • shift-del -- remove blog from blogmanager
    • ctrl-shift-g -- manually trigger the garbage collection

Getting Started:

The default settings should cover most users. You should only have to change the download location and the kind of posts you want to download. For this, in the Settings (click on the Settings button in the lower panel of the main user interface) you might want to change:

  • General -> Download location: Specifies where to download the files. The default is in a folder Blogs relative to the TumblThree.exe
  • Blog -> Settings applied to each blog upon addition:
    • Here you can set what posts newly added blogs will download per default. To change what each blog downloads, click on a blog in the main interface, select the Details Tab on the right and change the settings. This separation allows to download different kind of post for different blogs. You can change the download settings for multiple existing blogs by selecting them with shift+left click for a range or ctrl-a for all of them.
    • Note: You might want to always select:
      • Download Reblogged posts: Downloads reblogs, not just original content of the blog author.

Settings you might want to change if the download speed is not satisfactory:

  • Connection -> Concurrent connections: Specifies the number of connections used for downloading posts. The number is shared between all actively downloading blogs.
  • Connection -> Concurrent video connections: Specifies the number of connections used for downloading tumblr video posts. The vt.tumblr.com host regularly closes connections if the number is too high. Thus, the maximum number of vt.tumblr.com connections can be specified here independently.
  • Connection -> Concurrent blogs: Number of blogs to download in parallel.

Most likely you don't have to change any of the other connection settings. In particular, settings you should never change, unless you're sure you know what you are doing:

  • Connection -> Limit Tumblr Api Connections: Leave this checkbox checked and do not change the corresponding values of 90 connections per 60 seconds. If you still change them, you might end up with offline blogs or missing downloads.

Further Insights:

  • Note: All the following files are stored in json format and can be opened in any editor.
  • Application settings are stored in C:\Users\Username\AppData\Local\TumblThree\.
  • You can use the portable mode (settings->general) to stores the application settings in the same folder as the executable.
  • For each blog there is also a database (serialized class) file in the Index folder of the download location named after the blogname.tumblr. Here blog relative information is stored like what files have been downloaded, the url of the blog and when it was added. This allows you to move your downloaded files (photos, videos, audio files) to a different location without interfering with the download process.
  • Some settings aren't hooked up to the graphical user interface. It's possible to view all TumblThree settings by opening the settings.json in any editor located in C:\Users\Username\AppData\Local\TumblThree\. Their names should be self explainatory. Some notable settings to further fine tune the application include:
    • BufferSize: Allows to set the buffer size for downloading binary files (photos, videos) in multiples of 4KB. The default is 2MB, thus the BufferSize has a value of 512. Increasing this value reduces disk fragmentation as more of the file is kept in the memory before it gets written out to the disk but increases the memory usage.
    • MaxNumberOfRetries: Sets the maximum number of retries if a tumblr server forcefully closes the connection. This might regularly happen on the tumblr video host (vt.tumblr.com) if too many connections were opened in parallel. After the limit is exhausted, the file is left truncated, but is also not registered as a successful downloaded. Thus, the file can be resumed in the next crawl.
    • TumblrHosts: Contains a list of hosts which is tried for downloading _raw photos if the photo size is set to raw. If none of the hosts contains the _raw version, the actually scanned host is tried with the next lower resolution (1028).

Changelog:

2018-07-05:

  • Implements the Tumblr login process and cookie handling in code instead of relying on the Internet Explorer for the Tumblr login process.

2018-06-09:

  • Fixes hidden Tumblr blog download problems caused by the new Tumblr ToS.

2018-05-20:

  • Programmatically agrees to new ToS and GDPR.
  • Implements SVC authentication changes. The SVC service is used to display the dash board blogs (i.e. hidden tumblr blogs). Changes in this internal Tumblr api prohibited TumblThrees access.
  • Saves the last post id in successful hidden tumblr downloads.
  • Improves the text parser of the tumblr api and tumblr svc data models. Separated the slug from the url as the data models are inconsistent. Separated the photoset urls from the photo urls. Moved the date information into a separate column.
  • Minor text changes of some user interface elements.

2018-04-18:

  • Updates the tumblr blog crawler and the hidden tumblr datamodel to reflect tumblr api changes that break blog download of previous TumblThree versions.

2018-02-28:

  • Allows to download only specific pages of hidden Tumblr blogs and in the tumblr search.
  • Improves the proxy settings. TumblThree now uses the default Windows (Internet Explorer) settings if not overridden within TumblThree.
  • Changes the behavior of the timeout value (Settings->Connection->Timeout). The timeout value now counts file chunks of 4kb instead of the whole file download, thus it should better detect if a download is stalled or a connection dropped without canceling active downloads of larger files (e.g. videos).
  • Changes default timeout value (for new users) from 600s to 30s.
  • Fixes possible download of the same photo but with different resolutions. This happened if the _raw file download was interrupted (the timeout hit), then the same photo was queued for download with the _1280 resolution. If the blog was then subsequently queued again, the _raw file was downloaded next to the _1280 file.
  • Fixes reblog/original post detection in the tumblr hidden crawler.
  • Fixes check blog status during startup-option.
  • Fixes download of password protected tumblr blogs.
  • Adds Mixtape, Lolisafe, Uguu, Catbox and SafeMoe parser (thanks to bun-dev).

2017-12-31:

  • Fixes a bug that released the video connection semaphore too often. That means the slider in the settings for limiting the video downloads didn't work at all. It should properly limit the connections to the vt.tumblr.com host and prevent incomplete video downloads now.
  • Includes a rewrite of the blog detection during blog addition. It should reduce latency if you mass add blogs by copying urls into the clipboard (ctrl-c). Offline blogs aren't added anymore.
  • Notifies the user when a connection timeout has occurred. The message states whether the timeout has occurred during downloading or crawling. If it happened during crawling, you might want to re-queue the blog at some point to grab missing posts. A connection timeout should only happen if your connection is wonky. You can decrease/increase the timeout in the settings (settings->connection).
  • You can now specify in the Details-panel for each blog where its files should be downloaded. If the text box control is empty, the files are downloaded as in previous releases in the folder specified in the global download location (settings->general), plus the blogs name.
  • Imgur.com linked albums in tumblr posts are now entirely downloaded if enabled (details panel->external->download imgur). Previously, only directly linked images were detected.
  • Adds an option to load all blog databases into memory and compare each to-download binary file to all databases across TumblThree before downloading. If the file has already been downloaded in any blog before, the file is skipped and will not be counted as downloaded. You can enable this in the settings (settings->global).
  • Allows to add hidden tumblr blogs using the dashboard url (i.e. https://www.tumblr.com/dashboard/blog/blogtobackup).
  • Allows to add all blog types without the protocol suffix (i.e. wallpaperfx.tumblr.com, www.tumblr.com/search/cars).
  • Adds an option to enable a confirmation dialog before removing blogs (#186, #130, #98). It's off by default.

2017-11-17:

  • Adds support for downloading Imgur.com, Gfycat.com and Webmshare.com linked files in tumblr posts.
  • Improves downloading of tumblr liked/by photos and videos.

2017-10-20:

  • Restores bandwidth limiter functionality.

2017-10-13:

  • Changes the default _raw photo host.

2017-10-09:

  • Fixes crawler stop in hidden tumblr blog downloads.
  • Adds options to set the default blog settings for the download from time, download to time and tags in the settings menu.
  • Adds some (ar, el, es, fa, fi, he, hi, it, ja, ko, no, pa, pl, pt, th, tr and vi) google translate translations.

2017-09-08:

  • Can download password protected blogs of non-hidden blogs.
  • Minor UI updates.

2017-08-22:

2017-08-21:

  • French, Spanish and simplified Chinese translations.
  • Removes user interface lag during blog addition.
  • Allows to set the buffer size for downloading binary files in the settings.json in multiples of 4KB. The variable is called BufferSize. The new default is 2MB, thus the BufferSize has a value of 512. Previously it was set to 4KB, but apparently Windows does not do any useful caching on NTFS if multiple writes are concurrent and async. Thus, this should reduce disk fragmentation.
  • Uses .NET Framework 4.6 now as it should be available for all supported windows versions (Windows Vista and above).
  • Improved the selection handling in the details panel. If multiple blogs are selected, old values are now kept if they are the same for all blogs and changes are immediately reflected.
  • Audio file download support for tumblr and hidden tumblr blogs.
  • More code Refactoring.

2017-07-03:

  • Can download hidden (login required/dash board) blogs.

2017-06-30:

  • Improved performance and bugfixes.

2017-06-20:

  • Downloads high resolution (_raw) images.
  • Updated translations (German and Russian).
  • Applies changed settings immediately.

2017-06-04:

  • Sets the date modified date in the Explorer to the posts time.
  • Allows to download single or ranges of blog pages.
  • Full screen media preview.

2017-05-20:

  • Option to skip reblogged posts.
  • Improves detection of inlined photos and videos in text posts (e.g. in answer posts).

2017-05-14:

  • Portable mode.
  • Downloads liked photos and videos.

2017-04-18:

  • Code refactoring.
  • Uses async/await in most of the code instead of tasks from the threadpool.
  • Uses a consumer producer pattern for grabbing and downloading as the Tumblr api v1 is now rate limited.
  • Downloads are now resumable.
  • Data files are now saved as json instead of binary.
  • Reduced memory usage by layering off the downloaded file list and only load it if needed.
  • Improves ui responsiveness.

2017-01-08:

  • Improves the speed of the network code.
  • Adds an option to use a http proxy.
  • Downloads inline images of tumblr posts.
  • Added Russian translation.

2016-12-13:

  • Improves the ui scaling of the main window for smaller resolutions.
  • Prevents crawling of offline blogs.
  • If the same blog is multiple times in the queue and already once active, any other free crawler task will skip and remove any already active blog and proceed to the next inactive blog in the queue.
  • Improved german translation.

2016-12-10:

  • The check for already downloaded files is now independent from the actual host and based entirely on the filename. It look likes the host/mirror does actual vary which would result in a reload of the file since its url changed.
  • Add scrollbars to the settings window if the controls do not fit.
  • Safely replaces blog indexes. If there is an error (e.g. no disk space left) during the update of the index file, the old state should not be corrupted anymore.
  • Changes some color and adds an alternate color for the blog manager.

2016-11-23:

  • Fixes application crashes which occurred by adding tumblr blogs without title or description.
  • Decreases determination time of already downloaded files for large blogs (>100,000 posts) by at least three orders of magnitude.

2016-11-22:

  • Creates more meta information (post id, reblog key, timestamp, tags, slug, title) of the posts, including image, video and audio types.
  • Fixes the progress calculation by adding the found duplicates to the progress. Also states them in the details window.
  • Fixes a locking issue for the meta files (*.txt) which resulted in incomplete downloads.
  • Updates the details and settings view for a better understanding on how to use the application.

2016-11-20:

  • Fixes proper counting of downloaded files.
  • Fully implements the details window (context menus, etc.).

2016-11-18:

  • Fixes the initial automatic queue restore function.
  • Fixes the autodownload function.

2016-11-16:

  • Picture- and videopreview in the details window.
  • Allows the download of text, audio, quote, conversation, link type posts.
  • Download of text, audio, quote, conversation, link and .gif images are based on each blog instead of a global setting and can be turned on/off in the details view. The settings in the settings window are used as template for newly added blogs.
  • Modified .tumblr index files get now always saved upon application exit regardless of the crawlers state. Previously if the application was closed during an active crawl, the index wasn't updated.
  • Inlined the WAF code under lib for easier project setup for newcomers that want to contribute code.
  • bugfixes, UI and memory enhancements.

2016-10-15:

  • Bandwidth throttling.
  • Connection timeout settings.
  • auto queue and start download function.
  • save states of the UI (column size and order).
  • download of hidden blogs.
  • fix proper saving of the ratings and tags.

2016-06-11:

  • Added German translation.

2016-06-10:

  • Support for tumblr.com hosted videos. Check the settings window to enable video download (default: off).

2016-06-08:

  • Tag crawling now properly working. Also it's case-insensitive now.
  • Fixed crash upon blogs with zero-image count in the queue list (e.g. blog is offline, or tag search didn't evaluate any images).
  • Fixed randomly occurring crash in the clipboard monitor.
  • Changed icons (requested by the TumblOne creator).

2016-04-12:

  • Now with progress output in the Queue tab (during url crawling for imageurls -- the number of posts evaluated; during downloading -- the current image url).
  • Added missing resume button in the taskbar control.

2016-04-11:

  • Support for urls starting with https:
  • Fixes application crashed upon pressing the stop-button due to improper exception handling
  • Now saves the index file at every time. Previously the application would exit if the crawling processes was still active without properly waiting them to finish and save its state. Now there is a grace period for the tasks to finish. Same was true if the crawl was paused and then exited.

Download:

Comments

zab
Thu, 06/12/2018 - 07:25

Use the 1.0.8.X version! You don't have to download anything else.

You probably didn't queue any of your blogs. Read the instructions on this page!

You not only have to add the blogs you're interested in, but also have to queue the ones you actually want to download now. Take a look at the Details pane (right side, same place as the queue pane) as well.

Lily (not verified)
Thu, 06/12/2018 - 02:27

Good idea but it doesn't work for me... I can open it and add the links of the blogs just fine but when I click Crawl nothing happens, no matter how much I wait.

zab
Thu, 06/12/2018 - 07:24

You probably didn't queue any of your blogs. Read the instructions on this page!

You not only have to add the blogs you're interested in, but also have to queue the ones you actually want to download now. Take a look at the Details pane (right side, same place as the queue pane) as well.

RedRaspberry (not verified)
Thu, 06/12/2018 - 10:28

Hi, I have this issue where when I run the program it loads up but all the bottom buttons are grayed out and do nothing except for Crawl, Check Clipboard (depresses but has no effect), Settings, and About.
Any ideas about how to fix this?

zab
Thu, 06/12/2018 - 10:34

Not sure why this should be an issue?

You need to add a blog in a valid format. Read the manual on this page or the "advices for new user" section I've added two days ago in the current release description on github.

I've also answered the same question several times in the last days: https://www.jzab.de/comment/4207#comment-4207

Jason (not verified)
Thu, 06/12/2018 - 16:06

What are best settings/best practices for backup of a tumblr blog (photos/gifs + captions) so that it can easily be recreated on some other platform with captions correlated back to photos? And, what other platform should I consider?
Context: I am first time user trying to backup a tumblr with photos and caption texts so that it can easily be recreated. The captions are critical, original writings. The photos successfully download, but the only place I see the captions is in a file "images.txt" which is in html or xml (?). I am not very computer savvy and I don't see how to easily correlate these back again to the right photos or gifs, nor on what platform I could do this. Sorry for such basic questions. Help appreciated!

Mura (not verified)
Fri, 07/12/2018 - 03:24

I need to pull down full posts. That means media AND text as it would appear on the site. So far I can only find a way to pull down images. If I can't get the text that goes with the images there is no point.

zab
Fri, 07/12/2018 - 04:07

Check the checkbox: download text/conversation/quote/etc post in the details pane of your blogs. Open the file text/conversation/quote/etc.txt/json in the download folder. In that file are your text posts.

Read the instructions here, the readme, the comments and issues on github! I'm sure it's explained already several times. For example here: https://github.com/johanneszab/TumblThree/issues/296

devin1229 (not verified)
Sat, 08/12/2018 - 03:29

Cannot log in -- keeps asking for 2FA, even when it's not enabled on my blog. If I enable it and try to use a one-time app password, it doesn't accept it.

Vee (not verified)
Sat, 08/12/2018 - 17:48

Is there a version of this for mac?

zab
Sun, 09/12/2018 - 13:44

No.

BlesBy (not verified)
Sun, 09/12/2018 - 14:41

The text posts are downloaded as links. After the mass delete , i wont be able to acces them obviously-
Is there a way to save the asks , texts and ect. the way they are ?

Thank you. This app is a lifesaver.

Jamed (not verified)
Sun, 09/12/2018 - 17:11

Help: been searching for comparable solution for Mac and have not found one. Trying to get the videos quickly from a couple of blogs which are not mine before trey get pulled. Thanks!!

kally (not verified)
Mon, 10/12/2018 - 03:10

I clicked add blog and nothing happens...I'm not sure what's wrong. Just typed in "[blogname].tumblr.com" like you're supposed to do. Tried a few different blog names, nothing.

zab
Mon, 10/12/2018 - 04:44

If you can click the "Add blog"-button, then maybe no write permissions in your download location? If you installed it to C:\Program* , then it might be this.

Paul (not verified)
Mon, 10/12/2018 - 20:04

Thank you for a great efficient app! In appreciation, I donated to your beer fund! ;-)

Chris Steele (not verified)
Tue, 11/12/2018 - 00:33

I wish there was more time this app has no only allowed me to save all of my data but other great blogs that will die post Dec 17, 2018. I have tried to tell as many people about it and I intend to donate as soon as money permits. I do have a question though, when choosing the META data, it looks a little like HTML but it won't load as an HTML page. How does one look at videos.txt when it won't load with any known browser? Thanks again for such a simple yet effective application.

Maryl (not verified)
Tue, 11/12/2018 - 22:50

With Tumblr's incoming purge of NSFW content on December 17th this program has helped me back up the work of a lot of friends of mine (models, performers, and illustrators) so that they still have their portfolio afterwards. The purge is going to hurt the ability of a lot of decent people to make money and this program (alongside a sufficiently large hard drive) is helping curb the damage.

Lys (not verified)
Wed, 12/12/2018 - 19:47

Any blog I input will crawl to various points and then stop. They never finish, never get to to point of downloading images. Not sure what's going on, but would appreciate any help.

Pages