TumblThree - A Tumblr Backup Application

TumblThree - A Tumblr Backup Application

TumblThree is the code rewrite of TumblTwo, a free and open source Tumblr blog backup application, using C# with WPF and the MVVM pattern. It uses the Win Application Framework (WAF). It downloads photo, video, audio and text posts from a given tumblr blog.

Screenshots:

TumblThree - A Tumblr Backup Application.</a></p>
<p><a id=

Features:

  • Source code at github (Written in C# using WPF and MVVM).
  • Multiple concurrent downloads of a single blog.
  • Multiple concurrent downloads of different blogs.
  • Internationalization support (currently available: en, zh, ru, de, fr).
  • A download queue.
  • Autosave of the queuelist.
  • Save, clear and restore the queuelist.
  • A clipboard monitor that detects blogname.tumblr.com urls in the clipboard (copy and paste) and automatically adds the blog to the bloglist.
  • A settings panel (change download location, turn preview off/on, define number of concurrent downloads, set the imagesize of downloaded pictures, set download defaults, enable portable mode, etc.).
  • Uses Windows proxy settings.
  • A bandwidth throttler.
  • An option to download an url list instead of the actual files.
  • Set a start time for a automatic download (e.g. during nights).
  • An option to skip the download of a file if it has already been downloaded before in any currently added blog.
  • Uses SSL connections.
  • Preview of photos & videos.
  • Taskbar buttons and key bindings.

Blog backup/download:

  • Download of photo, video (only tumblr.com hosted), text, audio, quote, conversation, link and question posts.
  • Download meta information for photo, video and audio posts.
  • Downloads inlined photos and videos (e.g. photos embedded in question&answer posts).
  • Download of _raw image files (original/higher resolution pictures).
  • Support for downloading Imgur, Gfycat, Webmshare, Mixtape, Lolisafe, Uguu, Catbox and SafeMoe linked files in tumblr posts.
  • Download of safe mode/NSFW blogs.
  • Allows to download only original content of the blog and skip reblogged posts.
  • Can download only tagged posts.
  • Can download only specific blog pages instead of the whole blog.
  • Allows to download blog posts in a defined time span.
  • Can download hidden blogs (login required / dash board blogs).
  • Can download password protected blogs (of non-hidden blogs).

Liked/by backup/download:

  • A downloader for downloading "liked by" photos and videos instead of a tumblr blog (e.g. https://www.tumblr.com/liked/by/wallpaperfx/) (login required).
  • Download of _raw image files (original/higher resolution pictures).
  • Allows to download posts in a defined time span.

Tumblr search backup/download:

  • A downloader for downloading photos and videos from the tumblr search (e.g. http://www.tumblr.com/search/my+keywords).
  • Download of _raw image files (original/higher resolution pictures).
  • Can download only specific blog pages instead of the whole blog.

Tumblr tag search backup/download:

  • A downloader for downloading photos and videos from the tumblr tag search (e.g. http://www.tumblr.com/tagged/my+keywords) (login required).
  • Download of _raw image files (original/higher resolution pictures).
  • Allows to download posts in a defined time span.

Program Usage:

  • Extract the .zip file and run the application by double clicking TumblThree.exe.
  • Copy the url of any tumblr.com blog you want to backup from into the textbox at the bottom left. Afterwards, click on 'Add Blog' on the right side of it.
  • Alternatively, if you copy (ctrl-c) a tumblr.com blog url from the address bar/text file, the clipboard monitor from TumblThree will detect it and automatically add the blog.
  • To start the download process, click on 'Crawl'. The application will regularly check for (new) blogs in the queue and start processing them, until you stop the application by pressing 'Stop'. So, you can either add blogs to the queue via 'Add to Queue' or double click/drag'n'drop first and then click 'Crawl', or you start the download process first and add blogs to the queue afterwards.
  • A light blue bar left to the blog in the queue indicates a actively downloading blog.
  • The blog manager on the left side also indicates the state of each blog. A red background shows an offline blog, a green background an actively crawling blog and a purple background an enqueued blog.
  • You change the download location, the number of concurrent connections, the default backup settings for each newly added blog and various other settings in the 'Settings'.
  • In the Details window you can view statistics of your blog and set blog specific options. You can here what kind of post type (photo, video, audio, text, conversation, quote, link) to download.
  • For downloading only tagged posts, you'll have to do some steps:
    1. Add the blog url.
    2. Open the blog in the details tab, enter the tags in the Tags textbox in a comma separated list without the leading hash (#) sign. E.g. great big car,bears would search for images that are tagged for either a great big car or bears or both.
  • For downloading password protected blogs, you'll have to do some steps:
    1. Add the blog url.
    2. Open the blog in the details tab, enter the password in the Password textbox.
  • For downloading hidden blogs (login required blogs), you have to do some steps:
    1. Go to Settings, click on the Connection tab and fill in your tumblr email address (login) and password, then click the Authenticate button. If the login was successfully, the label will change and display your email address. The email address and password are not stored locally on disk but cookies are generated and saved in %LOCALAPPDATA%\TumblThree in json format.
    2. Add the blog url.
  • For downloading liked photos and videos, you'll have to do some steps:
    1. Go to Settings, click on the Connection tab and fill in your tumblr email address (login) and password, then click the Authenticate button. If the login was successfully, the label will change and display your email address. The email address and password are not stored locally on disk but cookies are generated and saved in %LOCALAPPDATA%\TumblThree in json format.
    2. Add the blog url including the liked/by string in the url (e.g. https://www.tumblr.com/liked/by/wallpaperfx/).
    3. For downloading your own likes, make sure you've (temporarily) enabled the following options in your blogs settings (i.e. https://www.tumblr.com/settings/blog/yourblogname):
      1. Likes -> Share posts you like (to enable the publicly visible liked/by page)
      2. Visibility -> blog is explicit (to see/download NSFW likes)
  • For downloading photos and videos from the tumblr search, you'll have to do some steps:
    1. Add the search url including your key words separated by plus signs (+) in the url (e.g. https://www.tumblr.com/search/my+special+tags).
  • For downloading photos and videos from the tumblr tag search, you'll have to do some steps:
    1. Go to Settings, click on the Connection tab and fill in your tumblr email address (login) and password, then click the Authenticate button. If the login was successfully, the label will change and display your email address. The email address and password are not stored locally on disk but cookies are generated and saved in %LOCALAPPDATA%\TumblThree in json format.
    2. Add the search url including your tags separated by plus signs (+) in the url (e.g. https://www.tumblr.com/tagged/my+special+tags).

Key Mappings:

  • Currently mapped keys:
    • double click on a blog adds it to the queue
    • drag and drop of blogs from the manager (left side) to the queue
    • space -- start crawl
    • ctrl-space -- pause crawl
    • shift-space -- stop crawl
    • del -- remove blog from queuelist
    • shift-del -- remove blog from blogmanager
    • ctrl-shift-g -- manually trigger the garbage collection

Getting Started:

The default settings should cover most users. You should only have to change the download location and the kind of posts you want to download. For this, in the Settings (click on the Settings button in the lower panel of the main user interface) you might want to change:

  • General -> Download location: Specifies where to download the files. The default is in a folder Blogs relative to the TumblThree.exe
  • Blog -> Settings applied to each blog upon addition:
    • Here you can set what posts newly added blogs will download per default. To change what each blog downloads, click on a blog in the main interface, select the Details Tab on the right and change the settings. This separation allows to download different kind of post for different blogs. You can change the download settings for multiple existing blogs by selecting them with shift+left click for a range or ctrl-a for all of them.
    • Note: You might want to always select:
      • Download Reblogged posts: Downloads reblogs, not just original content of the blog author.

Settings you might want to change if the download speed is not satisfactory:

  • Connection -> Concurrent connections: Specifies the number of connections used for downloading posts. The number is shared between all actively downloading blogs.
  • Connection -> Concurrent video connections: Specifies the number of connections used for downloading tumblr video posts. The vt.tumblr.com host regularly closes connections if the number is too high. Thus, the maximum number of vt.tumblr.com connections can be specified here independently.
  • Connection -> Concurrent blogs: Number of blogs to download in parallel.

Most likely you don't have to change any of the other connection settings. In particular, settings you should never change, unless you're sure you know what you are doing:

  • Connection -> Limit Tumblr Api Connections: Leave this checkbox checked and do not change the corresponding values of 90 connections per 60 seconds. If you still change them, you might end up with offline blogs or missing downloads.

Further Insights:

  • Note: All the following files are stored in json format and can be opened in any editor.
  • Application settings are stored in C:\Users\Username\AppData\Local\TumblThree\.
  • You can use the portable mode (settings->general) to stores the application settings in the same folder as the executable.
  • For each blog there is also a database (serialized class) file in the Index folder of the download location named after the blogname.tumblr. Here blog relative information is stored like what files have been downloaded, the url of the blog and when it was added. This allows you to move your downloaded files (photos, videos, audio files) to a different location without interfering with the download process.
  • Some settings aren't hooked up to the graphical user interface. It's possible to view all TumblThree settings by opening the settings.json in any editor located in C:\Users\Username\AppData\Local\TumblThree\. Their names should be self explainatory. Some notable settings to further fine tune the application include:
    • BufferSize: Allows to set the buffer size for downloading binary files (photos, videos) in multiples of 4KB. The default is 2MB, thus the BufferSize has a value of 512. Increasing this value reduces disk fragmentation as more of the file is kept in the memory before it gets written out to the disk but increases the memory usage.
    • MaxNumberOfRetries: Sets the maximum number of retries if a tumblr server forcefully closes the connection. This might regularly happen on the tumblr video host (vt.tumblr.com) if too many connections were opened in parallel. After the limit is exhausted, the file is left truncated, but is also not registered as a successful downloaded. Thus, the file can be resumed in the next crawl.
    • TumblrHosts: Contains a list of hosts which is tried for downloading _raw photos if the photo size is set to raw. If none of the hosts contains the _raw version, the actually scanned host is tried with the next lower resolution (1028).

Changelog:

2018-07-05:

  • Implements the Tumblr login process and cookie handling in code instead of relying on the Internet Explorer for the Tumblr login process.

2018-06-09:

  • Fixes hidden Tumblr blog download problems caused by the new Tumblr ToS.

2018-05-20:

  • Programmatically agrees to new ToS and GDPR.
  • Implements SVC authentication changes. The SVC service is used to display the dash board blogs (i.e. hidden tumblr blogs). Changes in this internal Tumblr api prohibited TumblThrees access.
  • Saves the last post id in successful hidden tumblr downloads.
  • Improves the text parser of the tumblr api and tumblr svc data models. Separated the slug from the url as the data models are inconsistent. Separated the photoset urls from the photo urls. Moved the date information into a separate column.
  • Minor text changes of some user interface elements.

2018-04-18:

  • Updates the tumblr blog crawler and the hidden tumblr datamodel to reflect tumblr api changes that break blog download of previous TumblThree versions.

2018-02-28:

  • Allows to download only specific pages of hidden Tumblr blogs and in the tumblr search.
  • Improves the proxy settings. TumblThree now uses the default Windows (Internet Explorer) settings if not overridden within TumblThree.
  • Changes the behavior of the timeout value (Settings->Connection->Timeout). The timeout value now counts file chunks of 4kb instead of the whole file download, thus it should better detect if a download is stalled or a connection dropped without canceling active downloads of larger files (e.g. videos).
  • Changes default timeout value (for new users) from 600s to 30s.
  • Fixes possible download of the same photo but with different resolutions. This happened if the _raw file download was interrupted (the timeout hit), then the same photo was queued for download with the _1280 resolution. If the blog was then subsequently queued again, the _raw file was downloaded next to the _1280 file.
  • Fixes reblog/original post detection in the tumblr hidden crawler.
  • Fixes check blog status during startup-option.
  • Fixes download of password protected tumblr blogs.
  • Adds Mixtape, Lolisafe, Uguu, Catbox and SafeMoe parser (thanks to bun-dev).

2017-12-31:

  • Fixes a bug that released the video connection semaphore too often. That means the slider in the settings for limiting the video downloads didn't work at all. It should properly limit the connections to the vt.tumblr.com host and prevent incomplete video downloads now.
  • Includes a rewrite of the blog detection during blog addition. It should reduce latency if you mass add blogs by copying urls into the clipboard (ctrl-c). Offline blogs aren't added anymore.
  • Notifies the user when a connection timeout has occurred. The message states whether the timeout has occurred during downloading or crawling. If it happened during crawling, you might want to re-queue the blog at some point to grab missing posts. A connection timeout should only happen if your connection is wonky. You can decrease/increase the timeout in the settings (settings->connection).
  • You can now specify in the Details-panel for each blog where its files should be downloaded. If the text box control is empty, the files are downloaded as in previous releases in the folder specified in the global download location (settings->general), plus the blogs name.
  • Imgur.com linked albums in tumblr posts are now entirely downloaded if enabled (details panel->external->download imgur). Previously, only directly linked images were detected.
  • Adds an option to load all blog databases into memory and compare each to-download binary file to all databases across TumblThree before downloading. If the file has already been downloaded in any blog before, the file is skipped and will not be counted as downloaded. You can enable this in the settings (settings->global).
  • Allows to add hidden tumblr blogs using the dashboard url (i.e. https://www.tumblr.com/dashboard/blog/blogtobackup).
  • Allows to add all blog types without the protocol suffix (i.e. wallpaperfx.tumblr.com, www.tumblr.com/search/cars).
  • Adds an option to enable a confirmation dialog before removing blogs (#186, #130, #98). It's off by default.

2017-11-17:

  • Adds support for downloading Imgur.com, Gfycat.com and Webmshare.com linked files in tumblr posts.
  • Improves downloading of tumblr liked/by photos and videos.

2017-10-20:

  • Restores bandwidth limiter functionality.

2017-10-13:

  • Changes the default _raw photo host.

2017-10-09:

  • Fixes crawler stop in hidden tumblr blog downloads.
  • Adds options to set the default blog settings for the download from time, download to time and tags in the settings menu.
  • Adds some (ar, el, es, fa, fi, he, hi, it, ja, ko, no, pa, pl, pt, th, tr and vi) google translate translations.

2017-09-08:

  • Can download password protected blogs of non-hidden blogs.
  • Minor UI updates.

2017-08-22:

2017-08-21:

  • French, Spanish and simplified Chinese translations.
  • Removes user interface lag during blog addition.
  • Allows to set the buffer size for downloading binary files in the settings.json in multiples of 4KB. The variable is called BufferSize. The new default is 2MB, thus the BufferSize has a value of 512. Previously it was set to 4KB, but apparently Windows does not do any useful caching on NTFS if multiple writes are concurrent and async. Thus, this should reduce disk fragmentation.
  • Uses .NET Framework 4.6 now as it should be available for all supported windows versions (Windows Vista and above).
  • Improved the selection handling in the details panel. If multiple blogs are selected, old values are now kept if they are the same for all blogs and changes are immediately reflected.
  • Audio file download support for tumblr and hidden tumblr blogs.
  • More code Refactoring.

2017-07-03:

  • Can download hidden (login required/dash board) blogs.

2017-06-30:

  • Improved performance and bugfixes.

2017-06-20:

  • Downloads high resolution (_raw) images.
  • Updated translations (German and Russian).
  • Applies changed settings immediately.

2017-06-04:

  • Sets the date modified date in the Explorer to the posts time.
  • Allows to download single or ranges of blog pages.
  • Full screen media preview.

2017-05-20:

  • Option to skip reblogged posts.
  • Improves detection of inlined photos and videos in text posts (e.g. in answer posts).

2017-05-14:

  • Portable mode.
  • Downloads liked photos and videos.

2017-04-18:

  • Code refactoring.
  • Uses async/await in most of the code instead of tasks from the threadpool.
  • Uses a consumer producer pattern for grabbing and downloading as the Tumblr api v1 is now rate limited.
  • Downloads are now resumable.
  • Data files are now saved as json instead of binary.
  • Reduced memory usage by layering off the downloaded file list and only load it if needed.
  • Improves ui responsiveness.

2017-01-08:

  • Improves the speed of the network code.
  • Adds an option to use a http proxy.
  • Downloads inline images of tumblr posts.
  • Added Russian translation.

2016-12-13:

  • Improves the ui scaling of the main window for smaller resolutions.
  • Prevents crawling of offline blogs.
  • If the same blog is multiple times in the queue and already once active, any other free crawler task will skip and remove any already active blog and proceed to the next inactive blog in the queue.
  • Improved german translation.

2016-12-10:

  • The check for already downloaded files is now independent from the actual host and based entirely on the filename. It look likes the host/mirror does actual vary which would result in a reload of the file since its url changed.
  • Add scrollbars to the settings window if the controls do not fit.
  • Safely replaces blog indexes. If there is an error (e.g. no disk space left) during the update of the index file, the old state should not be corrupted anymore.
  • Changes some color and adds an alternate color for the blog manager.

2016-11-23:

  • Fixes application crashes which occurred by adding tumblr blogs without title or description.
  • Decreases determination time of already downloaded files for large blogs (>100,000 posts) by at least three orders of magnitude.

2016-11-22:

  • Creates more meta information (post id, reblog key, timestamp, tags, slug, title) of the posts, including image, video and audio types.
  • Fixes the progress calculation by adding the found duplicates to the progress. Also states them in the details window.
  • Fixes a locking issue for the meta files (*.txt) which resulted in incomplete downloads.
  • Updates the details and settings view for a better understanding on how to use the application.

2016-11-20:

  • Fixes proper counting of downloaded files.
  • Fully implements the details window (context menus, etc.).

2016-11-18:

  • Fixes the initial automatic queue restore function.
  • Fixes the autodownload function.

2016-11-16:

  • Picture- and videopreview in the details window.
  • Allows the download of text, audio, quote, conversation, link type posts.
  • Download of text, audio, quote, conversation, link and .gif images are based on each blog instead of a global setting and can be turned on/off in the details view. The settings in the settings window are used as template for newly added blogs.
  • Modified .tumblr index files get now always saved upon application exit regardless of the crawlers state. Previously if the application was closed during an active crawl, the index wasn't updated.
  • Inlined the WAF code under lib for easier project setup for newcomers that want to contribute code.
  • bugfixes, UI and memory enhancements.

2016-10-15:

  • Bandwidth throttling.
  • Connection timeout settings.
  • auto queue and start download function.
  • save states of the UI (column size and order).
  • download of hidden blogs.
  • fix proper saving of the ratings and tags.

2016-06-11:

  • Added German translation.

2016-06-10:

  • Support for tumblr.com hosted videos. Check the settings window to enable video download (default: off).

2016-06-08:

  • Tag crawling now properly working. Also it's case-insensitive now.
  • Fixed crash upon blogs with zero-image count in the queue list (e.g. blog is offline, or tag search didn't evaluate any images).
  • Fixed randomly occurring crash in the clipboard monitor.
  • Changed icons (requested by the TumblOne creator).

2016-04-12:

  • Now with progress output in the Queue tab (during url crawling for imageurls -- the number of posts evaluated; during downloading -- the current image url).
  • Added missing resume button in the taskbar control.

2016-04-11:

  • Support for urls starting with https:
  • Fixes application crashed upon pressing the stop-button due to improper exception handling
  • Now saves the index file at every time. Previously the application would exit if the crawling processes was still active without properly waiting them to finish and save its state. Now there is a grace period for the tasks to finish. Same was true if the crawl was paused and then exited.

Download:

Comments

John Albrecht (not verified)
Mon, 11/04/2016 - 20:38

Is there a way to filter tumblr posts by date?

What I'm getting at is that if you can set a certain date to begin crawling the tumblr blogs at while ignoring all older images/posts, it may be a good workaround that sidesteps the need to build in support for TumblOne and TumblTwo index files. Or just ignore images before said date and don't download them.

We could all simply set the date to begin crawling to the day or day before our last complete crawl of the various blogs, and grab everything newer. Crawling those earlier dates could work too, but just don't download any images from posts set before that date, that way the new index files are essentially up to date.

Just a thought I had that I figured I would pass along.

Taranchuk (not verified)
Sat, 16/04/2016 - 02:39

Hi!

1) Make a function to search for the entire Tumblr with all blogs in by tags as it was implemented in tumblripper, but this feature for me does not work, and here I would like you to make this option searched for Tumblr by tags.
2) Need a function that allows the program to rename files , adding text descriptions that sign images in blogs to name files. I would also like to labeled tags to images added to the names of files!

Thank you very much for such a program !

Taranchuk (not verified)
Thu, 21/04/2016 - 10:52

I'm sorry, just noticed that did not finish the text in 1), which can cause distortion of the perception of the text.

Under 1) mean that need a function searching and downloading images for the entire Tumblr with all blogs by tags. Аvailable on the site the function to search images by tags, need to program downloads images that be found. Will it be feasible?

Jerome (not verified)
Tue, 26/04/2016 - 18:13

Hi, its Me (agian lol)

This is going to be long, and I mean really long, because I have several Bugs, Features & Suggestions I'd like to submit, each. So I became very anxious to give this great app a spin...I decided to painfully & manually migrate my tumble 1 & 2 data over...and upon doing so, I discovered quite a few issues. So I'll start with Bugs first in order of importance (to me).

BUGS:
1) Uses tooooo much memory...upon starting the app, on my system, it uses about 3.5gb (yes, GB)
I do have quite a few blogs imported, so that is why its so high for me. But I think the app can be further optimized so that it can use less memory resources.

Possible Solution:
Once app starts up & read index files to check online status & current img download & count, app should release index file from memory, which I think would greatly reduce memory consumption. Then, only time index file should be called for again is if blog is currently updating/downloading, then release index file for that blog again. The blogs that are in que that are waiting to be downloaded should only have their index files loaded once they actually begin downloading...and not while waiting to be downloaded. This in theory should keep app resource to a very minimum.

2) When adding an offline blog to que, it crashes app, no matter how big or small the blog is.

3) Upon crashing, app does NOT load up que list before crash.

4) Upon crashing, images still are there in their folders, but Downloaded Images count isn't.
(its not really a biggie)

5) Cant run multiple tumblthree's on the same harddrive because the 2nd or 3rd instance reads from the 1st instance.
And if 2nd/3rd instance overwrites that data, then the next time I use instance 1, it will reflect data from the last instance I used. With tumblone/two I was able to load multiple instances to have my own personal "folder/tag" system.

Solution: Well, I heard others mentioning something about tags, so maybe that can be implemented to help sort blogs.
Tags/Folder for all cars blogs, truck, bikes, boats, etc....multiple tags would be welcomed much as well.
(I'd personally still like to be able to use multiple crawlers though, for better sortability)

6) Cant download from blogs that have custom domain names anymore.
I was able to do so with tumblone/two

7) Stars do NOT maintain ratings if you rate a blog before or after you crawl it.
Only way to get them to maintain your rating is to rate them while they are in que waiting to download.

8) if set parallel imgs to 5 or below, it wont crawl.
Possible fix could be "scan img urls" could be its own independent action/command and not linked/tied to how many imgs to be downloaded in parallel.

FEATURES & SUGGESTIONS:

1) Confirmation for Removing Blog (most important one)
I accidentally removed several blogs (I meant to press remove from que) and only way for me to find out which blogs they were was for me to compare my folders against the index files. I think it also redownloaded the same images again, Im not sure, haven't doubled checked whether it does or doesn't.

2) sort or download blogs to certain folders/directories before/afterwards to categorize blogs
this would be extremely convenient and would remove my need to run multiple instances

3) tag/filters/comments section for blogs to quickly locate whatever you're looking for
(this is part of the other reason why I wanted to run multiple crawlers)

4) auto update "x" mins, "x" hrs, "x" days, weekly, bi-weekly.....auto update schedule at "x" time

5) should NOT be able to add blog to que more than once at a time.

6) dont delete tumblr file, remove to another location to recheck if blog come back online later in future.
(another reason why I'd like to run multiple instances...to recheck offline blogs)

7) make top tabs re-organizable/removable

8) allow imgs greater than 1280
.
9) if an img have different resolutions, only download highest one.
yes, I know that I can use some 3rd party tool to clean up folder and removing smaller dimension images. then again, it may actually be best to use a 3rd party tool...because this would require extra code to write for you.

10) ability to select different color stars for rating, or different background color for the app?.
The yellow stars are hard to see on a white background (on my led monitor) Darker stars would be an easy fix.
Blue, Green, Red, ect,.

11) option to include gif on a per blog basis.
Not another global "Yes Gif's or No Gif's" because most of us probably dont want gifs from every blog, but it would be neat & cool if we could only download gif's from certain ones that we'd like. Otherwise, after all blogs have been crawled, we'd have to restart app & enable gif's, select the blog we'd like, then once done, disable gif's & restart app again.

12) import img urls list?
this would be a really great feature too

13) Much quicker way to drag blogs while in que order to prioritize downloads.
Drag freely. Currently, I have to drag to top of app, and wait for the app to scroll to top of que...which is pretty slow.

14) MultiSelect blogs in que to drag to prioritize downloads.
Currently, can only grab & drag 1 blog.

15) progress bar should be replaced with numbers

16 Yep, you guessed it.....Video support (for tumblr hosted videos)
:)
Some blogs have billions of videos, so if we can have the option to download from a certain time period, then that would be good too.

Well, I did warn ya that this would be long, lol. But those additions would make the ULTIMATE tumblr ap, for me at least :)
I do have 1 small question, what is the purpose to remove blog index after crawl? To only download a blog once?

zab
Wed, 27/04/2016 - 13:39

Thanks again for you list and exhaustive comment..

I'll look into all of it once I've more time again to really address things more deeply. I've also noticed some of you things while using the program, but haven't fixed anything yet.

Jerome (not verified)
Sat, 15/04/2017 - 09:15

Hi buddy, long time no see (again, lol)

So i have a problem that you can easily & quickly help me solve.
Every since I manually imported ALL of my tumbltwo blogs over to tumblthree (when you first created it) I haven't used this great app since. Now that I need to use it, I cant because of a few issues:

a) i can find any info on how to convert my original tumblthree blogs using v1.0.4.31.
I doubled check and noticed you said this version would modify/convert the files on first run. After I noticed that, I then went to my temp folder "C:\Users\Username\AppData\Local\TumblThree\" to delete those files and was hoping everything would be reset so that next time I start tumblthree, it would convert blog.tumblr files into the new format. That didn't work.

2nd thing I noticed is that once i scrape a brand new blog from tumblr, it does not show up/reload in the app as already being a blog being scraped, so that means i cant rescan the blog the blog at all, cause it just doesnt show up in the app.
Is that a bug with only with v1.0.4.31? Common sense tells me to not even worry about that because whenever I successfully convert my existing tumblthree .tumblr files into the new format, then I'd immediately start using the latest stable/pre-release version anyway.

As usual, very nice job buddy. I see you've been pretty busy with very useful updates. Since I was last here, I wasn't sure how long it would take you to address the memory issue I described in my previous post, so I eventually took the plunge and upgraded to 64gb or memory :)

Well, that doesn't matter in this system anymore, cause I then I got a 2nd system (ryzen) and took half of the memory and put it in there, so both systems have 32gb each, and now that memory usage is greatly improved, then that really means I can scrape happily ever after :) especially when you implement other sites, especially instagram. That would really be a dream come true.

I cant give enough thanks :)

Jerome (not verified)
Sun, 16/04/2017 - 01:44

Nevermind, I finally converted them to the new format, and am now using the latest pre-release, v1.0.4.41 :)

Will report back to you any bugs/improvements I find or can think of.
NICE WORK!!!

Jerome (not verified)
Mon, 17/04/2017 - 16:12

Thanks for the quick response. I've followed your recommendations.

I see that you've made quite a bit of good improvements to the UI. Im experiencing some bugs with the latest .42 release.

1) When scraping, the "last complete crawl" doesn't get updated if:
a) blog gets stuck (then you have to do a force restart)

b) there hasn't been any new content added to the blog. I believe that this should prob be changed to reflect the last time that you crawled the blog, and that shouldn't be based on if something was downloaded during the crawl. For example, in the above post, I said that I haven't used the app since when you first created tumblthree. So my last crawl dates reflect april of 2016. Even though I just scanned/crawled the blog, it still shows the date as april 2016 (there wasnt any new content added)

2) It appears that crawling is missing/skipping a whole bunch of content. As i just said, I haven't used tumblthree since last april, so I know that there has been a lot of new content added to the blogs, which isn't being reflected during the scanning process. I even went as far as trying a small blog, with less than 30 images, and a few of those photo's were skipped. It seems that it skips photos that the blogger reblogged, and did NOT post their self.

3) Upon shut down, I notice that the app is still running in task manager. This presents a huge problem if a blog is stuck for whatever reason, and then trying to force restart to continue scraping. Because the app isn't terminating fully, when you resume, its still in that "stuck" mode. Only fix is to end process in task manager.

4) The "skip gif" option occasionally still downloads gif's. Although that problem is very minor, the biggest problem is that it still displays gif's during the preview, which I also thinks is causing the app to get stuck (sometimes) I think if one decides to disable any type of media, the preview for that media should be disabled too. This is quite important to me because since im trying to update a ton of blogs from a year ago, and experiencing numerous bugs, I cant tell whether or not a blog is actually downloading and running fine unless I can see the images from the preview. And cause it shows gif's, and is extremely buggy (keeps showing the same gifs & gifs get stuck/stop moving) Im thinking that this maybe has something to do with why quite a bit of images are being skipped.

Seeing the images (and how rapid they change) is a visual cue to me to know that blogs are running perfectly fine, and app is running perfectly fine.

5) for some weird reason, on some of my blogs "number of downloads" has a negative value.

6) If a blog gets stuck, and you remove it from que, and stop the crawl, and then add another blog to que (or already have blogs loaded into que) and then restart crawl, then those blogs that were in que dont download any media, they only just scan for files. I have to do more investigation with this to try to track down this issue, and so that I can give you more precise information.

Im going to stay with the .42 release, but if I have too much troubles trying to update my blogs, then Im going to have to downgrade to a more stable previous release. I wanna to THANK YOU VERY MUCH for coming up with a better way to make this more memory efficient. That was one of my biggest issues with tumblthree initially. Its also very brilliant to offload the files in the manner that you did :)

Will report back any more findings.

Jerome (not verified)
Wed, 19/04/2017 - 03:50

I see you removed v.42. Do I need to use .41?

If I use .41, will it solve my issues of skipping too many posts?
If I use .41, will the downloaded images from .42 be overwritten & redownloaded?
Or will .41 recognize that the images have already been downloaded & skip those?

Is .41 stable enough to let me load in all my blogs and let them crawl automatically without crashing?
Or do I simply need to just revert back to .31?
If the answer is yes, will the downloaded images from .42 be overwritten & redownloaded?
Or will .31 recognize that the images have already been downloaded & skip those?

anonymous (not verified)
Tue, 10/05/2016 - 15:05

I'm liking this program so far, but I got a few problems with it.

1. Program keeps crashing when I try to add a blog through the add blog textfield. I have to load the tumblr page in a web browser multiple times until tumblthree adds it in.

2. Download by tags does not seem to work. It might actually be doing the opposite effect of filtering them out instead. This is very annoying since there are some tumblrs out there that have 10,000 reblogs and all I want are the guy's own posts.

I would like to see you add a "ignore reblogs" button. If all this is gets fixed up I can safely say this will be the best tumblr downloader compared to the rest.

One last thing I would like to see is if you could eventually support websites like Soup.io as well (Very similar site to tumblr, doesn't seem anyone made a downloader for that site yet). Thanks again for the great program.

John (not verified)
Sat, 21/05/2016 - 20:17

Do either this or Tumbltwo grab reblogs? If not, can that be added?

Bernard (not verified)
Sat, 28/05/2016 - 12:59

Hi, I really like this software.
I m using a Mac.
Can TumblThree or TumblTwo run in a virtualization environment like Virtualbox or VMware running Windows 7 ?
Thank you.

zab
Sat, 28/05/2016 - 20:36

Yes, it can run in a virtualized Windows.
Virtualbox, VMware or Parallels should all work fine.

John Albrecht (not verified)
Wed, 01/06/2016 - 16:16

I went ahead and abandoned TumblTwo for TumblThree. It took some doing, but I realized I could update both on the same day and then use DupeGuru to wipe out all the duplicates added to TumblThree. Now TumblThree is up to date and I can keep using it from here on out. I ran into a few issues with it, but it seems pretty great so far.

Xavier (not verified)
Thu, 09/06/2016 - 17:39

Good work.

But I try to use this today but always crashed, and tumblone and two both can not worked. Is there anything wrong? Thanks.

zab
Fri, 10/06/2016 - 21:20

hi!

The exact error message would be helpful, either for Tumbltwo or Tumblthree. Tumblthree does not run under Windows XP and for TumblTwo there is a special version for Windows XP. Just an educated guess.

Tobi (not verified)
Sat, 11/06/2016 - 14:30

Hi,

at first have many thanks for your work! I know how much effort it caused to build such an application.

Here are my experiences on a Win7 64 Bit (German) System.

I get this both error entries in the windows system event log (Application):

#############################################################################

- System

- Provider

[ Name] .NET Runtime

- EventID 1026

[ Qualifiers] 0

Level 2

Task 0

Keywords 0x80000000000000

- TimeCreated

[ SystemTime] 2016-06-11T10:57:04.000000000Z

EventRecordID 2478474

Channel Application

Computer XXXXX

Security

- EventData

Anwendung: TumblThree.exe Frameworkversion: v4.0.30319 Beschreibung: Der Prozess wurde aufgrund einer unbehandelten Ausnahme beendet. Ausnahmeinformationen: System.NullReferenceException bei TumblThree.Applications.Controllers.ManagerController.ExtractBlogname(System.String) bei TumblThree.Applications.Controllers.ManagerController+<AddBlogAsync>d__55.MoveNext() Ausnahmeinformationen: System.AggregateException bei System.Threading.Tasks.TaskExceptionHolder.Finalize()

#############################################################################

- System

- Provider

[ Name] Application Error

- EventID 1000

[ Qualifiers] 0

Level 2

Task 100

Keywords 0x80000000000000

- TimeCreated

[ SystemTime] 2016-06-11T10:57:07.000000000Z

EventRecordID 2478475

Channel Application

Computer XXXXX

Security

- EventData

TumblThree.exe
1.0.0.0
575ab71e
KERNELBASE.dll
6.1.7601.23418
5708a89c
e0434352
000000000001a06d
1ce4
01d1c3cfddb5f5e3
C:\TumblThree\TumblThree.exe
C:\Windows\system32\KERNELBASE.dll
3cf8ea6b-2fc3-11e6-9f06-c04a00035fa0

#############################################################################

The error occurs when the automatic clipboard observation is switched off and you try to add a blog-URL over the text box with the "Add-Blog"-Button.

Furthermore, only a double click on the blog entry in the left window let the blog show up in the right Queue window. Then the Crawl-Button works. Without that previous double click the Crawl-Button doesn't work. Is it supposed to be so?

Some trifles are:

- The progress bar doesn't work. There is only a small grey horizontal line.
- The Details window is empty all the time.

I hope it will help to find some errors in that fine application.

Greetings and many thanks again

Tobi

zab
Sat, 11/06/2016 - 15:32

Whoops, sorry. The application crash is fixed already. Wonder how I missed that one.

And yes. The 'Crawl'-button starts tasks (the number can be specified in the settings window) in the background that progress the queue. So, if there is nothing inside, they just don't do anything but regularly check if there is some input in the queue. If you add anything one task will crawl one blog (indicated with the green bar next to it) until it's done and continues with the next one if there is still anything left in the queue.

The details window is empty because there isn't anything implemented yet. I was thinking of adding a picture preview as in TumblTwo there, but i'm not really sure if that's a good thing add all as i guess it cannot display videos. And you can simply open the folder by right clicking on the blog in the bloglist and see the previews in the windows explorer anyways ;)

H3AsO4 (not verified)
Sun, 19/06/2016 - 03:35

I tried that blog and it works well for me.

zab
Sun, 19/06/2016 - 10:38

Yes, I guess it works now as the tumblr is now publicly viewable without logging in to the tumblr website. The day he posted I quickly checked the blog it required an authentication first.

TumblThree hasn't implemented that yet. It's not hard to do as i guess it simply requires to store and use the cookie for accessing the site after successful authentication.

bonnysn (not verified)
Fri, 24/06/2016 - 11:12

quite happy with what you've done and look forward to what your future work has in store.
still waitin on compatibility, but thank you for catering.

Slow (not verified)
Tue, 28/06/2016 - 23:00

I have been using tumbltwo now for about 6 months, and am trying out tumblthree. Both are great programs.

I was wondering if it would be possible to add two options to the way it saved the files.

Using the current blog name is great, but could you add an option that would allow it to save the files into sub folders that matched the file extent? such as .gif .png .mov .mp4 .jpg and so on. This would be an option in settings, something like the "skip GIF files..

The other would be to allow the blog to be given a "category" folder as a prefex to the blog so you could put like blogs into a sub folder of the main "blog" folder. This would have to be a field you could edit within the blog entry itself.
The result would be that a blogs could be placed withing "category" folders, and within blogs them self, the files could be saved in sub folders by file extent. Another possible sub folder option might be by tags if that made any sense.

I have not attempted to use the tags yet.

Thanks for a great program.

Slow (not verified)
Mon, 31/10/2016 - 17:26

I know I requested this feature before, but in reading what I wrote it even confuses me.

The two features I requested was in how the reader saves the files into folders.
Currently it saves the files into a master folder which is set in the "settings" function. This allows all blogs to be saved in this master folder by the blog name.

I would like the ability to assign any blog to a "group" folder so that the file structure would be"
master/group1/blog1
master/group1/blog2
master/blog3
master/blog4
master/blog5
master/group2/blog6
master/group2/blog7

This would allow similar blogs to be grouped into a group folder by each blog.

The implementation of this could be a "group" column in the blog list where a special "group" folder could be added. Much like you do the "tags".

The other feature that would be useful would be to be able to select by blog if you wanted the files to be also placed into additional sub folders by the file extent.

Again this would be another field added to the blog list and would only need to be a checkbox kind of thing to enable or disable placing the file into a sub folder of the file extent in that blogs folder. Here is an example:

master/group1/blog2/gif
master/group1/blog2/jpg
master/group1/blog2/png
master/group1/blog2/mov

Having the ability to set these by individual blog would give much greater control on how the files were saved.

Thanks for a great program.

Slow

Really (not verified)
Wed, 29/06/2016 - 08:05

Thank you. This is amazing!
I wish that tumblthree could crawl blog needed log in.
Thank you again.

Slow (not verified)
Fri, 01/07/2016 - 02:28

Whenever I add a new blog I get a red line at the top of the screen saying "ERROR # The blog already exists: xxxxxxxxx".
It does not matter if I use "add blog" or "check clipboard" to add the blog, both give this error.

The blog does not exist in the tumblthree index, it does exists on tumbler.

It adds the blog and downloads it just fine.

Slow (not verified)
Fri, 01/07/2016 - 15:47

I am not sure if I am doing things right, but so far I have not been able to make tumblthree save the ratings settings, or, if I preposition the size of the columns.

The stars do turn yellow when I set them, and if I stretch a particular column open more or close it some it does change. When I close tumbllthree and then reopen it, those changes are gone and it goes back to no ratings selected, and the column size is back to the defaults which do not allow for the full date and time to be displayed for "date added, or last crawl",

These are minor issues, but thought I should make note of them.

My system is Windows 10 Pro if that is of any interest.

I do have one question about the "de" directory. In the first ZIP file I downloaded back in May it was present, but in the current ZIP file it is not there. Has it been eliminated, or is it now not necessary? I copied it over to the new update just in case it was not an intentional omission.

My comment about the "ERROR" when adding a new blog for some reason has vanished. Intensive testing has not been able to reproduce that failure. I did a total fresh install of everything, including deleting all blogs, and the problem was gone.

Jake (not verified)
Sun, 03/07/2016 - 06:39

I'd like to start off saying I appreciate your continuous development on this project and I enjoy the features it offers so far.

The only problem I encountered so far is that it didn't completely download the images from the tumblr. It scanned for images with a tag and only downloaded 271/436. I tried to re-enter the queue, but it just stops only at 271.

Joshua (not verified)
Sat, 16/07/2016 - 02:40

Hi, just want to say nice work on the program. It is excellent. I just have a few features to request for the next version.
1. Tag Categories- I think it would be an excellent addition for the program to download the images from the blog and categorize them according to their tags by naming folders in accordance to the tags and then adding the pictures into the said folders. This will instantly turn this software into an excellent backup features that many other Tumblr downloaders lack.

2. Auto-Update- Another comment mentioned this feature of setting the program to automatically update the tumblr blog download at certain and pre-set times like daily, weekly, etc.

3. Highest Res Download- I think setting the software to download only the highest resolution of any image would be an invaluable addition.

Thanks for the software. Nice work. Once this software reaches prime time, I'm quite certain user and bloggers alike won't mind purchasing this software for use.

Astra (not verified)
Wed, 20/07/2016 - 23:52

Love it! It's super useful thank you! I was just wondering:
Does it also pick up images that are in the comment section of a tumblr post?
I'd like to be able to download those too, sorry if this has already been answered or written somewhere, I probably missed it.
thanks!

Brian (not verified)
Sat, 23/07/2016 - 19:27

I just downloaded TumbleThree and am using it for the first time.
I wanted to download images with a tag. I added the blog, entered the tag (which is hyphenated - say First-Last), then added to queue. It indexed and said there were like 9,000 images (rounding for simplicity). That was it, it wouldn't/didn't do anything else. I kept trying to figure out how to make it download... On a whim I removed the tag, and voila -- it's downloading now.
So it appears that hyphenated tags do not work?

Joe (not verified)
Mon, 29/08/2016 - 22:35

Hi, first of all thank you for a great software.
Second, I have a sugestion/request ;) I have a huge list of tumblr posts urls and it would be great to import xml list of these urls of posts and let tumblthree download the content of the particular post (images and videos).

example links:
e.g. (http://fullthrottleauto.tumblr.com/post/148095330854, http://fullthrottleauto.tumblr.com/post/148118945754, etc)

And third: Again thank you for a great work.

Preston Martin (not verified)
Thu, 08/09/2016 - 02:11

Hello. Quite a few times when i have closed Tumblthree then reopened it, it would give me a error saying it cannot load the library. I have blogs that have been downloaded and in order and such and now i have to start all over every time that happens. Why is this application so unstable?

zab
Thu, 08/09/2016 - 06:41

Any chance you immediately shut down after closing the application and the download was still running? Or restarted the application immediately? No disk space left on the drive where the download folder is?

The UI closes when you exit, but the application still continues in the background until all previously started files have been successfully downloaded, then the library file gets written on the same drive where you downloaded your files. That should not corrupt the library, I think. But it certainly will if there is no disk space left. But it's just a guess. Never happened for me, but I actually don't really use the application myself or any other downloader right now. So basically, I cannot say.

Happy User (not verified)
Wed, 08/03/2017 - 13:04

Hello-
I think your program is really cool. I too did have the same error after my system crashed, where I could not load library.
I'd like to know if there is a way to backup the configuration file so when the problem happens again I can keep my current blogs.
Is there a way to import old blog data which you have downloaded (or merge it into an existing/new session?)
Thanks

zab
Wed, 08/03/2017 - 14:26

The file which is broken should be mentioned in the error if you move your mouse above the blue box.
For a backup, you can save the Index folder in your download location as it contains the blog specific data. Lastly, maybe you can even fix the file manually if you compare it to a healthy one by opening both in your favorite text editor. There is probably just something of it's structure corrupted. It's a .json file now.

El Wood (not verified)
Mon, 12/09/2016 - 16:17

If using Tumblone, Tumbltwo and Tumblthree, why does each program give a different number of images? For instance, if using One, it may save 1000 images, yet when I use Two it may only save 980 images, same with three?

zab
Tue, 13/09/2016 - 21:34

The 1000 images are in the folder, or stated by the application itself?

TumblThree at least first displays the number of posts in total, after the evaluation the number of found images (for the given tag). IÄve actually never checked if the count matches, but TumblThree and TumblTwo should download all images and images from imagesets.

Anomalous (not verified)
Wed, 14/09/2016 - 01:33

It works great. Are there plans for adding support for inline images in the future?

Preston (not verified)
Sun, 25/09/2016 - 20:14

Hi there, before the update, I was able to "ctrl + C" on my bookmarks and they would automatically add to Tumblrthree, not I have to manually enter each ul intl tumblrthree for it to be added.

"A clipboard monitor that detects *http(s):// .tumblr.com* urls in the clipboard (copy and paste) and automatically adds the blog to the bloglist." This feature is not working anymore.

zab
Mon, 26/09/2016 - 14:35

Still works for me. I've just downloaded the 1.0.2 release and the clipboard manager works as in the previous versions.

Maybe just turn it off/on once?!

anonymous (not verified)
Wed, 12/10/2016 - 08:54

Hi, will there be a way to download higher resolution photos instead of it being maxed out at 1280?
Keep up the great work and thanks!

Slow (not verified)
Mon, 31/10/2016 - 17:32

Is it possible to make the progress bar about 3 times thicker than it currently is? It is so thin it is very difficult to see the change in color. I am using Windows 10 if that is of any significance.

Slow

zab
Mon, 31/10/2016 - 19:06

Should be easy and probably not more than 3 words of code. I'll change that in the next release.

Thanks for the suggestion!

Slow (not verified)
Tue, 01/11/2016 - 18:21

Thanks for the fast response. Each release makes tumblthree much more useful than tumbltwo.

If there were a voting system on what you work on, I would vote for adding the ability to use the "post date" for the photo rather than the date/time that the file is actually being stored on the persons computer like it is now.

My reasoning is this: I download thousands of photos from some 500 blogs. There is a huge number of duplicates due to "reblog". I then run a duplicate finder that matches the CRC, CHECKSUM, and file date. I always save the oldest file thinking that this usually would be the first appearance of this photo in the original blog. That logic works if by chance I happen to be downloading the original blog it was posted in first in the list of 500.

If the post date were used, then no matter what order the blogs were downloaded in, the oldest would likely be the original blog the photo was posted in.

I presume the post date would be that of the current blog and not that of the actual originating blog?

If the post date is really the date it was posted to the current blog you are downloading, no matter if it is an original or a reblog, then this method would always walk back the photo to the original blog that posted it if you happened to be downloading that blog.

Kind of a cool thing when you think about it.

When you do add this function, I hope you make it selectable in the event reblogs all use the original date from the originating blog, which would really mess up my present way of eliminating duplicates.

Slow (not verified)
Tue, 01/11/2016 - 18:58

I have another idea I would like to propose. Add a function to settings so that tunblthree would close automatically when all blogs have been downloaded.
I have not tried the timed re-scan feature yet, but I presume you have to leave tumblrthree running constantly for this time interval to trigger.
A more practical way would be to allow the scheduling functions of windows or linux cron start the program. If the re-scan interval had been reached trigger the rescan and if you have the "auto shutdown on completion" function checked, tumblthree would close and be ready for the next time it was launched to check the re-scan trigger.
If it was not checked, then tumblthree would do the re-trigger of the scan as soon as the time interval expired.
This would allow a total hands free automation of doing downloads while not requiring the program to be constantly running. (that is if my assumption is correct in how the re-scan works).
The usefulness of having an auto shutdown does not require a person to use the re-scan function. It would be nice to optionally have tumblthree close when it has finished it's work even if you were only doing a onetime scan.

Pages