TumblThree - A Tumblr Backup Application

TumblThree - A Tumblr Backup Application

TumblThree is the code rewrite of TumblTwo, a free and open source Tumblr blog backup application, using C# with WPF and the MVVM pattern. It uses the Win Application Framework (WAF). It downloads photo, video, audio and text posts from a given tumblr blog.

Screenshots:

TumblThree - A Tumblr Backup Application.</a></p>
<p><a id=

Features:

  • Source code at github (Written in C# using WPF and MVVM).
  • Multiple concurrent downloads of a single blog.
  • Multiple concurrent downloads of different blogs.
  • Internationalization support (currently available: en, zh, ru, de, fr).
  • A download queue.
  • Autosave of the queuelist.
  • Save, clear and restore the queuelist.
  • A clipboard monitor that detects blogname.tumblr.com urls in the clipboard (copy and paste) and automatically adds the blog to the bloglist.
  • A settings panel (change download location, turn preview off/on, define number of concurrent downloads, set the imagesize of downloaded pictures, set download defaults, enable portable mode, etc.).
  • Uses Windows proxy settings.
  • A bandwidth throttler.
  • An option to download an url list instead of the actual files.
  • Set a start time for a automatic download (e.g. during nights).
  • An option to skip the download of a file if it has already been downloaded before in any currently added blog.
  • Uses SSL connections.
  • Preview of photos & videos.
  • Taskbar buttons and key bindings.

Blog backup/download:

  • Download of photo, video (only tumblr.com hosted), text, audio, quote, conversation, link and question posts.
  • Download meta information for photo, video and audio posts.
  • Downloads inlined photos and videos (e.g. photos embedded in question&answer posts).
  • Download of _raw image files (original/higher resolution pictures).
  • Support for downloading Imgur, Gfycat, Webmshare, Mixtape, Lolisafe, Uguu, Catbox and SafeMoe linked files in tumblr posts.
  • Download of safe mode/NSFW blogs.
  • Allows to download only original content of the blog and skip reblogged posts.
  • Can download only tagged posts.
  • Can download only specific blog pages instead of the whole blog.
  • Allows to download blog posts in a defined time span.
  • Can download hidden blogs (login required / dash board blogs).
  • Can download password protected blogs (of non-hidden blogs).

Liked/by backup/download:

  • A downloader for downloading "liked by" photos and videos instead of a tumblr blog (e.g. https://www.tumblr.com/liked/by/wallpaperfx/) (login required).
  • Download of _raw image files (original/higher resolution pictures).
  • Allows to download posts in a defined time span.

Tumblr search backup/download:

  • A downloader for downloading photos and videos from the tumblr search (e.g. http://www.tumblr.com/search/my+keywords).
  • Download of _raw image files (original/higher resolution pictures).
  • Can download only specific blog pages instead of the whole blog.

Tumblr tag search backup/download:

  • A downloader for downloading photos and videos from the tumblr tag search (e.g. http://www.tumblr.com/tagged/my+keywords) (login required).
  • Download of _raw image files (original/higher resolution pictures).
  • Allows to download posts in a defined time span.

Program Usage:

  • Extract the .zip file and run the application by double clicking TumblThree.exe.
  • Copy the url of any tumblr.com blog you want to backup from into the textbox at the bottom left. Afterwards, click on 'Add Blog' on the right side of it.
  • Alternatively, if you copy (ctrl-c) a tumblr.com blog url from the address bar/text file, the clipboard monitor from TumblThree will detect it and automatically add the blog.
  • To start the download process, click on 'Crawl'. The application will regularly check for (new) blogs in the queue and start processing them, until you stop the application by pressing 'Stop'. So, you can either add blogs to the queue via 'Add to Queue' or double click/drag'n'drop first and then click 'Crawl', or you start the download process first and add blogs to the queue afterwards.
  • A light blue bar left to the blog in the queue indicates a actively downloading blog.
  • The blog manager on the left side also indicates the state of each blog. A red background shows an offline blog, a green background an actively crawling blog and a purple background an enqueued blog.
  • You change the download location, the number of concurrent connections, the default backup settings for each newly added blog and various other settings in the 'Settings'.
  • In the Details window you can view statistics of your blog and set blog specific options. You can here what kind of post type (photo, video, audio, text, conversation, quote, link) to download.
  • For downloading only tagged posts, you'll have to do some steps:
    1. Add the blog url.
    2. Open the blog in the details tab, enter the tags in the Tags textbox in a comma separated list without the leading hash (#) sign. E.g. great big car,bears would search for images that are tagged for either a great big car or bears or both.
  • For downloading password protected blogs, you'll have to do some steps:
    1. Add the blog url.
    2. Open the blog in the details tab, enter the password in the Password textbox.
  • For downloading hidden blogs (login required blogs), you have to do some steps:
    1. Go to Settings, click on the Connection tab and fill in your tumblr email address (login) and password, then click the Authenticate button. If the login was successfully, the label will change and display your email address. The email address and password are not stored locally on disk but cookies are generated and saved in %LOCALAPPDATA%\TumblThree in json format.
    2. Add the blog url.
  • For downloading liked photos and videos, you'll have to do some steps:
    1. Go to Settings, click on the Connection tab and fill in your tumblr email address (login) and password, then click the Authenticate button. If the login was successfully, the label will change and display your email address. The email address and password are not stored locally on disk but cookies are generated and saved in %LOCALAPPDATA%\TumblThree in json format.
    2. Add the blog url including the liked/by string in the url (e.g. https://www.tumblr.com/liked/by/wallpaperfx/).
    3. For downloading your own likes, make sure you've (temporarily) enabled the following options in your blogs settings (i.e. https://www.tumblr.com/settings/blog/yourblogname):
      1. Likes -> Share posts you like (to enable the publicly visible liked/by page)
      2. Visibility -> blog is explicit (to see/download NSFW likes)
  • For downloading photos and videos from the tumblr search, you'll have to do some steps:
    1. Add the search url including your key words separated by plus signs (+) in the url (e.g. https://www.tumblr.com/search/my+special+tags).
  • For downloading photos and videos from the tumblr tag search, you'll have to do some steps:
    1. Go to Settings, click on the Connection tab and fill in your tumblr email address (login) and password, then click the Authenticate button. If the login was successfully, the label will change and display your email address. The email address and password are not stored locally on disk but cookies are generated and saved in %LOCALAPPDATA%\TumblThree in json format.
    2. Add the search url including your tags separated by plus signs (+) in the url (e.g. https://www.tumblr.com/tagged/my+special+tags).

Key Mappings:

  • Currently mapped keys:
    • double click on a blog adds it to the queue
    • drag and drop of blogs from the manager (left side) to the queue
    • space -- start crawl
    • ctrl-space -- pause crawl
    • shift-space -- stop crawl
    • del -- remove blog from queuelist
    • shift-del -- remove blog from blogmanager
    • ctrl-shift-g -- manually trigger the garbage collection

Getting Started:

The default settings should cover most users. You should only have to change the download location and the kind of posts you want to download. For this, in the Settings (click on the Settings button in the lower panel of the main user interface) you might want to change:

  • General -> Download location: Specifies where to download the files. The default is in a folder Blogs relative to the TumblThree.exe
  • Blog -> Settings applied to each blog upon addition:
    • Here you can set what posts newly added blogs will download per default. To change what each blog downloads, click on a blog in the main interface, select the Details Tab on the right and change the settings. This separation allows to download different kind of post for different blogs. You can change the download settings for multiple existing blogs by selecting them with shift+left click for a range or ctrl-a for all of them.
    • Note: You might want to always select:
      • Download Reblogged posts: Downloads reblogs, not just original content of the blog author.

Settings you might want to change if the download speed is not satisfactory:

  • Connection -> Concurrent connections: Specifies the number of connections used for downloading posts. The number is shared between all actively downloading blogs.
  • Connection -> Concurrent video connections: Specifies the number of connections used for downloading tumblr video posts. The vt.tumblr.com host regularly closes connections if the number is too high. Thus, the maximum number of vt.tumblr.com connections can be specified here independently.
  • Connection -> Concurrent blogs: Number of blogs to download in parallel.

Most likely you don't have to change any of the other connection settings. In particular, settings you should never change, unless you're sure you know what you are doing:

  • Connection -> Limit Tumblr Api Connections: Leave this checkbox checked and do not change the corresponding values of 90 connections per 60 seconds. If you still change them, you might end up with offline blogs or missing downloads.

Further Insights:

  • Note: All the following files are stored in json format and can be opened in any editor.
  • Application settings are stored in C:\Users\Username\AppData\Local\TumblThree\.
  • You can use the portable mode (settings->general) to stores the application settings in the same folder as the executable.
  • For each blog there is also a database (serialized class) file in the Index folder of the download location named after the blogname.tumblr. Here blog relative information is stored like what files have been downloaded, the url of the blog and when it was added. This allows you to move your downloaded files (photos, videos, audio files) to a different location without interfering with the download process.
  • Some settings aren't hooked up to the graphical user interface. It's possible to view all TumblThree settings by opening the settings.json in any editor located in C:\Users\Username\AppData\Local\TumblThree\. Their names should be self explainatory. Some notable settings to further fine tune the application include:
    • BufferSize: Allows to set the buffer size for downloading binary files (photos, videos) in multiples of 4KB. The default is 2MB, thus the BufferSize has a value of 512. Increasing this value reduces disk fragmentation as more of the file is kept in the memory before it gets written out to the disk but increases the memory usage.
    • MaxNumberOfRetries: Sets the maximum number of retries if a tumblr server forcefully closes the connection. This might regularly happen on the tumblr video host (vt.tumblr.com) if too many connections were opened in parallel. After the limit is exhausted, the file is left truncated, but is also not registered as a successful downloaded. Thus, the file can be resumed in the next crawl.
    • TumblrHosts: Contains a list of hosts which is tried for downloading _raw photos if the photo size is set to raw. If none of the hosts contains the _raw version, the actually scanned host is tried with the next lower resolution (1028).

Changelog:

2018-07-05:

  • Implements the Tumblr login process and cookie handling in code instead of relying on the Internet Explorer for the Tumblr login process.

2018-06-09:

  • Fixes hidden Tumblr blog download problems caused by the new Tumblr ToS.

2018-05-20:

  • Programmatically agrees to new ToS and GDPR.
  • Implements SVC authentication changes. The SVC service is used to display the dash board blogs (i.e. hidden tumblr blogs). Changes in this internal Tumblr api prohibited TumblThrees access.
  • Saves the last post id in successful hidden tumblr downloads.
  • Improves the text parser of the tumblr api and tumblr svc data models. Separated the slug from the url as the data models are inconsistent. Separated the photoset urls from the photo urls. Moved the date information into a separate column.
  • Minor text changes of some user interface elements.

2018-04-18:

  • Updates the tumblr blog crawler and the hidden tumblr datamodel to reflect tumblr api changes that break blog download of previous TumblThree versions.

2018-02-28:

  • Allows to download only specific pages of hidden Tumblr blogs and in the tumblr search.
  • Improves the proxy settings. TumblThree now uses the default Windows (Internet Explorer) settings if not overridden within TumblThree.
  • Changes the behavior of the timeout value (Settings->Connection->Timeout). The timeout value now counts file chunks of 4kb instead of the whole file download, thus it should better detect if a download is stalled or a connection dropped without canceling active downloads of larger files (e.g. videos).
  • Changes default timeout value (for new users) from 600s to 30s.
  • Fixes possible download of the same photo but with different resolutions. This happened if the _raw file download was interrupted (the timeout hit), then the same photo was queued for download with the _1280 resolution. If the blog was then subsequently queued again, the _raw file was downloaded next to the _1280 file.
  • Fixes reblog/original post detection in the tumblr hidden crawler.
  • Fixes check blog status during startup-option.
  • Fixes download of password protected tumblr blogs.
  • Adds Mixtape, Lolisafe, Uguu, Catbox and SafeMoe parser (thanks to bun-dev).

2017-12-31:

  • Fixes a bug that released the video connection semaphore too often. That means the slider in the settings for limiting the video downloads didn't work at all. It should properly limit the connections to the vt.tumblr.com host and prevent incomplete video downloads now.
  • Includes a rewrite of the blog detection during blog addition. It should reduce latency if you mass add blogs by copying urls into the clipboard (ctrl-c). Offline blogs aren't added anymore.
  • Notifies the user when a connection timeout has occurred. The message states whether the timeout has occurred during downloading or crawling. If it happened during crawling, you might want to re-queue the blog at some point to grab missing posts. A connection timeout should only happen if your connection is wonky. You can decrease/increase the timeout in the settings (settings->connection).
  • You can now specify in the Details-panel for each blog where its files should be downloaded. If the text box control is empty, the files are downloaded as in previous releases in the folder specified in the global download location (settings->general), plus the blogs name.
  • Imgur.com linked albums in tumblr posts are now entirely downloaded if enabled (details panel->external->download imgur). Previously, only directly linked images were detected.
  • Adds an option to load all blog databases into memory and compare each to-download binary file to all databases across TumblThree before downloading. If the file has already been downloaded in any blog before, the file is skipped and will not be counted as downloaded. You can enable this in the settings (settings->global).
  • Allows to add hidden tumblr blogs using the dashboard url (i.e. https://www.tumblr.com/dashboard/blog/blogtobackup).
  • Allows to add all blog types without the protocol suffix (i.e. wallpaperfx.tumblr.com, www.tumblr.com/search/cars).
  • Adds an option to enable a confirmation dialog before removing blogs (#186, #130, #98). It's off by default.

2017-11-17:

  • Adds support for downloading Imgur.com, Gfycat.com and Webmshare.com linked files in tumblr posts.
  • Improves downloading of tumblr liked/by photos and videos.

2017-10-20:

  • Restores bandwidth limiter functionality.

2017-10-13:

  • Changes the default _raw photo host.

2017-10-09:

  • Fixes crawler stop in hidden tumblr blog downloads.
  • Adds options to set the default blog settings for the download from time, download to time and tags in the settings menu.
  • Adds some (ar, el, es, fa, fi, he, hi, it, ja, ko, no, pa, pl, pt, th, tr and vi) google translate translations.

2017-09-08:

  • Can download password protected blogs of non-hidden blogs.
  • Minor UI updates.

2017-08-22:

2017-08-21:

  • French, Spanish and simplified Chinese translations.
  • Removes user interface lag during blog addition.
  • Allows to set the buffer size for downloading binary files in the settings.json in multiples of 4KB. The variable is called BufferSize. The new default is 2MB, thus the BufferSize has a value of 512. Previously it was set to 4KB, but apparently Windows does not do any useful caching on NTFS if multiple writes are concurrent and async. Thus, this should reduce disk fragmentation.
  • Uses .NET Framework 4.6 now as it should be available for all supported windows versions (Windows Vista and above).
  • Improved the selection handling in the details panel. If multiple blogs are selected, old values are now kept if they are the same for all blogs and changes are immediately reflected.
  • Audio file download support for tumblr and hidden tumblr blogs.
  • More code Refactoring.

2017-07-03:

  • Can download hidden (login required/dash board) blogs.

2017-06-30:

  • Improved performance and bugfixes.

2017-06-20:

  • Downloads high resolution (_raw) images.
  • Updated translations (German and Russian).
  • Applies changed settings immediately.

2017-06-04:

  • Sets the date modified date in the Explorer to the posts time.
  • Allows to download single or ranges of blog pages.
  • Full screen media preview.

2017-05-20:

  • Option to skip reblogged posts.
  • Improves detection of inlined photos and videos in text posts (e.g. in answer posts).

2017-05-14:

  • Portable mode.
  • Downloads liked photos and videos.

2017-04-18:

  • Code refactoring.
  • Uses async/await in most of the code instead of tasks from the threadpool.
  • Uses a consumer producer pattern for grabbing and downloading as the Tumblr api v1 is now rate limited.
  • Downloads are now resumable.
  • Data files are now saved as json instead of binary.
  • Reduced memory usage by layering off the downloaded file list and only load it if needed.
  • Improves ui responsiveness.

2017-01-08:

  • Improves the speed of the network code.
  • Adds an option to use a http proxy.
  • Downloads inline images of tumblr posts.
  • Added Russian translation.

2016-12-13:

  • Improves the ui scaling of the main window for smaller resolutions.
  • Prevents crawling of offline blogs.
  • If the same blog is multiple times in the queue and already once active, any other free crawler task will skip and remove any already active blog and proceed to the next inactive blog in the queue.
  • Improved german translation.

2016-12-10:

  • The check for already downloaded files is now independent from the actual host and based entirely on the filename. It look likes the host/mirror does actual vary which would result in a reload of the file since its url changed.
  • Add scrollbars to the settings window if the controls do not fit.
  • Safely replaces blog indexes. If there is an error (e.g. no disk space left) during the update of the index file, the old state should not be corrupted anymore.
  • Changes some color and adds an alternate color for the blog manager.

2016-11-23:

  • Fixes application crashes which occurred by adding tumblr blogs without title or description.
  • Decreases determination time of already downloaded files for large blogs (>100,000 posts) by at least three orders of magnitude.

2016-11-22:

  • Creates more meta information (post id, reblog key, timestamp, tags, slug, title) of the posts, including image, video and audio types.
  • Fixes the progress calculation by adding the found duplicates to the progress. Also states them in the details window.
  • Fixes a locking issue for the meta files (*.txt) which resulted in incomplete downloads.
  • Updates the details and settings view for a better understanding on how to use the application.

2016-11-20:

  • Fixes proper counting of downloaded files.
  • Fully implements the details window (context menus, etc.).

2016-11-18:

  • Fixes the initial automatic queue restore function.
  • Fixes the autodownload function.

2016-11-16:

  • Picture- and videopreview in the details window.
  • Allows the download of text, audio, quote, conversation, link type posts.
  • Download of text, audio, quote, conversation, link and .gif images are based on each blog instead of a global setting and can be turned on/off in the details view. The settings in the settings window are used as template for newly added blogs.
  • Modified .tumblr index files get now always saved upon application exit regardless of the crawlers state. Previously if the application was closed during an active crawl, the index wasn't updated.
  • Inlined the WAF code under lib for easier project setup for newcomers that want to contribute code.
  • bugfixes, UI and memory enhancements.

2016-10-15:

  • Bandwidth throttling.
  • Connection timeout settings.
  • auto queue and start download function.
  • save states of the UI (column size and order).
  • download of hidden blogs.
  • fix proper saving of the ratings and tags.

2016-06-11:

  • Added German translation.

2016-06-10:

  • Support for tumblr.com hosted videos. Check the settings window to enable video download (default: off).

2016-06-08:

  • Tag crawling now properly working. Also it's case-insensitive now.
  • Fixed crash upon blogs with zero-image count in the queue list (e.g. blog is offline, or tag search didn't evaluate any images).
  • Fixed randomly occurring crash in the clipboard monitor.
  • Changed icons (requested by the TumblOne creator).

2016-04-12:

  • Now with progress output in the Queue tab (during url crawling for imageurls -- the number of posts evaluated; during downloading -- the current image url).
  • Added missing resume button in the taskbar control.

2016-04-11:

  • Support for urls starting with https:
  • Fixes application crashed upon pressing the stop-button due to improper exception handling
  • Now saves the index file at every time. Previously the application would exit if the crawling processes was still active without properly waiting them to finish and save its state. Now there is a grace period for the tasks to finish. Same was true if the crawl was paused and then exited.

Download:

Comments

anonymous (not verified)
Sat, 14/10/2017 - 06:15

Hello, I've been trying to use this program but the first 5 buttons and check clipboard buttons do not work. Tumbltwo works fine however. I've tried reinstalling and downloading different versions and even tried a different computer but the program itself still doesn't work. It doesn't give me an error when I click the add blog button, but the buttons I mentioned were completely unresponsive.

On a side note, can it download things from larger blogs than tumbltwo? There's one blog I wanted to pull art from and it has 123671 posts, which causes tumblrtwo to simply not even try downloading and tell me that the queue is "complete" without downloading.

zab
Sat, 14/10/2017 - 06:35

TumblTwo most likely hits the Tumblr api limit as it's not rate limited, thus it will show around ten thousands of posts, and always a different number.

I'm sure the buttons work. You'll have to supply a whole blog url including http(s):// and .com into the box, then the buttons will become active. Then click add blog. Select the blog, then click add to queue, then click crawl.

It's all written in the instructions on this website, and in the readme on github.

anonymous (not verified)
Mon, 16/10/2017 - 11:17

Thank you! My apologies for failing to see the requirement for the full URL as the old program automatically inputted the https, but it works like a charm.

T3-Q.Anon (not verified)
Sat, 14/10/2017 - 16:52

Minor issue so far, I'm certain it's just a result of recent events.

No meta data seems to be acquired at all among these posts. No image, video, or anything. I know currently hidden blogs are unable to give video meta data (unsure of what else), but I'd like to bring this to your attention.

zab
Sat, 14/10/2017 - 21:33

Can you provide me a hidden video blog url/name? I'm not sure right now if I've ever tested it since I don't know a video only hidden blog..

What else do you mean with "among these posts"? From non-hidden blogs, all meta data works for me as it did before the the _raw photo url change. Except for inlined videos/photos.

T3-Q.Anon (not verified)
Sun, 15/10/2017 - 03:23

What I meant was that I wasn't getting meta data in general. I'm not at home as I type this to test things though to check if I'm just blind or something. I'll report back later when I've done some testing

T3-Q.Anon (not verified)
Sun, 15/10/2017 - 08:22

So I'm doing a bit more testing to see if it's maybe a mistake or maybe there's some weird related cause, but I'm not able to download image meta data.

Example of blogs where I'm unable to download meta data (images only blog):
https://fumio936.tumblr.com/
http://mikeluckas.tumblr.com/

Now to let you know, I realized I wasn't getting image meta data cause I added a bunch of blogs fairly recently that wasn't giving any image metadata.

The above is an example of ones not giving metadata. This one, however, is. And I deleted the folder and the url from TumblrThree to verify it wasn't some fluke. It does repeatably give a text document with metadata.

http://deadflow.tumblr.com/

anonymous (not verified)
Tue, 17/10/2017 - 09:06

So just have only "Download Image Meta" or the like selected and it should download the meta?

zab
Tue, 17/10/2017 - 09:48

I'm still not sure what he did, but it's all explained in the text.

Select Download image meta if you want the meta data, obviously. If you don't click Download reblogs as well, only the meta data from the original posts will be downloaded.

T3-Q.Anon (not verified)
Wed, 18/10/2017 - 02:12

No, what I'm saying is that I only get meta data IF the only thing I have selected is "Download Image Meta".

I just did further testing on a hunch and I'm finding I won't get Image Meta if I am downloading the images themselves. So if you have "Download Images" and "Download Image Meta" checked, you won't get image meta. But if you have "Download Images" unchecked, image meta will download.

I don't know if this applies to Video or Audio as well since I don't follow many people who do such things

zab
Wed, 18/10/2017 - 05:45

Okay, now I got it.

Sorry, I really don't have time to test anything more thourougly, and downloading the blog with photos and meta data was already too much for me, since fixing and upload all the crap takes at least an hour even if it's only a small change.

There is a bug indeed, but I've changed that code and introduce that bug not long ago to make it download inlined photos/videos of all posts. I'll probably upload a fixed release at the weekend ..

Thanks for sharing!

T3-Q.Anon (not verified)
Wed, 18/10/2017 - 06:48

It's alright, that's what I'm hear for. Just trying to provide as much test data as possible to get reproducible results

anonymous (not verified)
Wed, 18/10/2017 - 18:46

Hello,

There's updated Russian translation:

TumblThree.Presentation https://pastebin.com/esmSXrFi
TumblThree.Application https://pastebin.com/P9rRkWJm

Sorry it took so long. Had to make some strings shorter so they would fit into their place.
I also didn't get what some strings are for, so there could be some minor errors that could be fixed if would be found in future releases.

Suggestion: synchronising order of data/value elements of .resx files for English and other localisations would make the translation checking/updating process much more convenient.

anonymous (not verified)
Sun, 22/10/2017 - 13:43

Hope to translate README.MD eventually as well.

I also have some minor UI suggestions / request. Wonder if it would be hard to implement them.

  1. Made URL input field more flexible, so it would be limited both by panel buttons and "Enter URL" text. Now it could overlap the text to the left.
    • Maybe you could make buttons on text adaptive to window width? As half of the labels consists of two or more words, it may be one-line text labels for wide window & two-line labels for narrow window. Would need to increase panel height though.
  2. On "Details" tab in "Blog settings" section the single header "Download:" could be used and checkboxed elements behind could omit "Download..." part of text string, so it would be just "Images", "Audio" etc.
  3. The invisible border between Blog Settings and Blog Stats columns on Details tab probably should be aligned to the Blog Settings column right margin, so text wouldn't be overlapped. I've made push commit with shorter text in Blog Stats column (hope I did it right?), but it still could be the problem with large number of available/downloaded/duplicated files.

These problems may be specific for low-resolution or small-sized program window, or for certain translations with long text strings. You may even not consider them as bugs, but they still spoil the UX. There's how main window looks on my 1280*800 laptop screen so you could get the idea: https://ibb.co/chaZ86

John (not verified)
Mon, 30/10/2017 - 04:01

Is there any way you could add support for content that is linked in the body of a Tumblr post? A lot of Tumblr's I follow now include more pictures / videos / zip files in the body of a post, and this program doesn't seem to catch them all the time. Can you add that feature, or do you know of a way to do that?

Thanks for what you've been doing thus far. I appreciate it.

zab
Mon, 30/10/2017 - 07:08

Please add example posts or at least small blogs with multiple examples for all of these. I'll not randomly try tumblr blogs until I'd find one that might match your description, but is potentially still a different case..

I cannot say I'll add those, but I might take a look. Right now we're RegEx'ing all images (i.e. pattern matching all *.png/.jpg/.gif urls) and videos based on extension and the tumblr host, thus it could potentially extended. But if they are hosted on different sites that in turn require complete parsing like youtube, vimeo, etc. I'll personally not add it since at some point it's too much work and would result in a whole web crawler. Sure, if someone is interested in implementing it, I'll happily take the code.

Did you try jDownloader or httrack?! They both should be more complete (web-)crawler.

John (not verified)
Wed, 01/11/2017 - 23:43

Something like this Tumblr (sfw) - http://wgettest.tumblr.com/

Here's an example of an image hotlinked in the body of a post - http://wgettest.tumblr.com/post/159385784773/

Example of 2 images on different sites hotlinked in body of post - http://wgettest.tumblr.com/post/159352717063

Example of image linked in body of post that's hosted on gfycat - http://wgettest.tumblr.com/post/159353039203/

_____

Most art Tumblrs link alternate versions in the body of the post and the pictures and videos (webms, gifs, etc) are usually hosted on sites like Imgur / Gfycat / Webmshare.

I can provide more information if need be. I've tried JDownloader and HTTrack but can't get them to work properly, since they usually don't want to grab the images linked in the body of the post.

Sonny (not verified)
Sat, 04/11/2017 - 05:52

This would actually be a great feature, most media is stored outside of tumblr; megadownload, imgurs. Getting the ones aforementioned would cover most media. Next level backup. Thank you zab!

Sonny (not verified)
Sun, 05/11/2017 - 07:56

Thank you for continuing support of this program*

zab
Mon, 13/11/2017 - 07:45

Do you mind do some testing?

I've added the code for downloading files from Imgur, Gfycat and Webmshare last weekend. It's not fully hooked up into the user interface yet (settings window is missing an update, preview might not always be working depending on the file type), I know that. I'm also not certain where to put all the options. It's getting crowed and is probably not every intuitive for newcomers. However, you can enable the downloads in the Details panel on the right side.

Can you check if it downloads everything from the three hosts, if the download stalls for some reason or the application crashes?

I'll probably not finish the implementation before the end of the next weekend, so there is no rush.

TumblThree-v1.0.8.30-Application.zip

Thanks a lot!

John (not verified)
Wed, 20/12/2017 - 06:28

Hey, I'm sorry for the late reply; I've been busy and put all this on hold.

I've tried the most recent version [v1.0.8.33], and the program seems to download images and video files hosted on imgur and some webm sites. I've tried it with a handful of blogs, and the program managed to download everything. I created the tumblr WgetTest to test it out wget and tumblr many months ago, but it went nowhere, but I still use the tumblr blog to test programs ability to download different things from tumblr.

I haven't had the ability to test it out yet, but does TumblThree support downloading of .zip files in the body of a post? I can see why that would be hard to implement, but what about adding a text document that urls that it can't rip, say from like Rapidgator / 4shared / etc. That way the person can rip them themselves and not have to worry about the program supporting it.

Sorry, I think I'm rambling now, but anyway I want to really thank you for the support thus far.

Nicole (not verified)
Tue, 31/10/2017 - 07:40

First, thanks for this awesome program. I use it nearly every single day. I have a question. If a blog you've been crawling changed its name, do you have any options besides adding the new name as a new blog and crawling the whole thing again?

zab
Tue, 31/10/2017 - 07:49

Right now there is nothing implemented, but it should be possible to change it with some manual labor (under 1 minute).

Make sure TumblThree is closed, then you can manually change TumblThree's databases in the Index folder of your download location. Say you download to D:\TumblThree\Blogs, then the right place would be D:\TumblThree\Blogs\Index\.

In that folder are two files for each blog you've added. You'll have to manually change their name to the new names blog, and additionally open both files in Notepad (or any other text editor) and change:

  • In the non _files file (e.g. blogname.tumblr):
    • ChildID, Name, Url
  • In the _files file (e.g. blogname_files.tumblr):
    • Name

You could make a backup of the databases before (a simple copy), but don't store the copied files in same folder. A sub folder is okay, but you must not have the same database twice in the Index folder.

Nicole (not verified)
Tue, 31/10/2017 - 20:55

Thank you so so much for the help! I wasn't sure that changing the Index files would work so I didn't touch them in case it messed something up, this solution sounds perfect!!

zab
Tue, 31/10/2017 - 07:58

Alternatively, if you still have all the downloaded files, you can simply remove the blog, re-add it under its new name, and move all the downloaded files into the new download folder of the blog.

If an image, video or audio is already fully downloaded in the download folder of the blog, TumblThree will register it as already downloaded (increase the download counter) but skip the actual download.

anonymous (not verified)
Sat, 18/11/2017 - 11:08

Hello, recently I've been trying to download some blogs but when I'm making use of the Authentication button (as the blogs ask me to log in), I enter my usual username and password, which automatically closes the window. The strangeness comes later though, when I try to download from any blog, the light blue banner up top says I need to authenticate although I already have - queuing any blogs just results in the program quickly "glossing" over the blog and not actually downloading anything. I try to reauthenticate but it either just automatically closes the window again (which I think means it's logged in) or asks me to relog, which leads to the same situation. This started happening recently (around the "best of" dashboard update) so I'm not sure what's going wrong. Thank you in advance!

zab
Sat, 18/11/2017 - 12:15

When the window closes if you click authenticate that means that the authentication was successfully since the browser (the window is actually an internet explorer) was redirected to you dash board.

To your actual problem, unfortunately, I cannot say much since that never occurred to me. And I've logged in and out many times already. Someone on Github however mentioned the same recently. He cleaned this internet explorer cache and cookies from inside the internet explorer and the issue went away.

Any chance you're using the old internet explorer and changed its settings? Maybe something that could interfere with accessing cookies?! I'm only guessing, but the cookie we're using are stored in the cache from the internet explorer as the window that opens is the internet explorer itself. So, maybe some (internet) protection software, like Norton something. Do you use any antivirus software, and which one?

anonymous (not verified)
Sat, 18/11/2017 - 11:36

I'm still testing out and trying to understand this new feature you've implemented (a great feature though) and I want to inquire about it.

There's a post I'm trying to get downloaded, but try as I may, it doesn't seem to download. I've tried re-enabling "download reblogged posts" and the like, but to no avail.

The post in question is this one here.

http://krekk0v.tumblr.com/post/164925138167/krekk0v-illegal

zab
Sat, 18/11/2017 - 12:10

It's quite simple. The image is differently linked compared to the example blog I've got a few comments above from John.

In your example the the link is:
<a href="http://imgur.com/a/jD324">!!!!<

whereas in the blog above from John they are like this instead:
<a href="https://i.imgur.com/fyL1aVB.jpg">Hyperlinked image test.</a>

and the regular expression I'm using doesn't catch your links because the host is missing the i. and the url has an additional slash with an a (/a/) before the ID. Since I don't use Tumblr actually, nor Imgur, I didn't know that.

It's a few lines only to fix though, but uploading all the changes and releasing new binaries takes somewhat like 30 mins and I've just uploaded a new one. Thus, I'll fix it maybe tomorrow or at some point later.

Thanks for the comment and making me aware!

anonymous (not verified)
Sun, 19/11/2017 - 07:53

Okay, I was worried that I was doing something wrong. I'm glad I brought up something unforseen in the process

jkr (not verified)
Fri, 01/12/2017 - 01:37

Wonder if it's a related issue that I'm seeing: I've turned on downloading just about everything I can think of, but on this one blog the blogger is quoting/reblogging his older posts and adding new images, and those are not getting picked up.

Here's an example of a post where the later images aren't being picked up: http://idelacio.tumblr.com/post/167746593676/a-competition-for-the-best-...

zab
Tue, 05/12/2017 - 20:59

The v1.0.7.40 release here will download your pictures, but requires a login to access anything. This version misuses a internal tumblr api for displaying the website. If you activate the new "dump crawler data" option in the details-panel you can check the .json files in the download folder of the blog. That is the data that the v.1.0.7.x branch uses for downloading.

It's not so easy to grab the missing pictures in the current v1.0.8.x releases. It would require to pull twice to amount of pages just to check if someone posted an answer that quotes a previously posted post with a downloadable file.

jkr (not verified)
Wed, 06/12/2017 - 14:31

Seems to be working perfectly, thanks so much for the assistance!

zab
Sat, 09/12/2017 - 13:28

It's an Imgur album again.

I've added support for it today, but since I've changed quite a lot more things recently, I'll not officially "release" a new version until it's a bit more tested.

You can get binaries from today here. But be warned, it might not be as stable as the latest release (v1.0.8.32) from the release site.

Psycho (not verified)
Wed, 20/12/2017 - 10:20

Okay this is first version i ever used(2017-12-18), so I do not know if this is persistent, but there are some problems i encountered:
- process does not die after i close the application.
- Options, almost always not saved. (btw, it is not intuitive that options are closed on pressing 'save', 'save & close' would be better.)
- I could not guess the relationships between option page and selected options on details page of the blog. This might be because of previous problem i mentioned though.

zab
Wed, 20/12/2017 - 18:58

2) The settings aren't saved to disk until TumblThree is closing. Thus, if you kill it, the settings will not be saved and your changes are only held in memory.
3) It's explained on the website and in the Readme.MD on github.
1) EDIT: Could you also try the version v1.0.8.32? Some people said their blog crawl never finished with v1.0.8.33, but I couldn't reproduce it.
1) It never happened to me, thus I can only guess. I could imagine that it happens if your internet connection is wonky and some async tasks aren't finished yet. Usually they finish if either all the I/O is done (i.e. the web page/file is completely downloaded) or when they are canceled if you stop the crawl/close the application. But if the underlying connection (TCP/IP) dies at some point, some of them could potentially await until the timeout is hit if they are stuck in awaiting the webrequest response (that the server will never return).
You could try to lower the timeout value (Settings->Connection->Timeout) depending on what you download, your internet connection speed and how many concurrent connections you've set up. It should not kill working connections that just take longer. Say you download 4 200 mb videos with a 10mbit/s connection. They'll certainly take longer than 20 1 mb pictures with a 100mbit/s connection.
As I've said, it's all just a guess.

How many concurrent connections did you set up? Is there any error displayed at the top of TumblThree (Error X: ..) or any additional info you could provide?

Psycho (not verified)
Fri, 22/12/2017 - 00:49

1) 8.32 does not have that problem, it closes after pressing [x], and settings are saved.
in 8.33 my crawls never finish more often than not, but i associated it with the fact that I seem to be banned on tumblr now after abusing TumblOne too much... %(
My internet is pretty good, and I did not alter any related numbers yet. I doubt it will help anyway.

Right now i tried 33 and wut, it closes normally... haa.. although is still takes it time to kill process, like a minute. Despite that I did not even try to download anything in this session.

zab
Fri, 22/12/2017 - 07:51

> Right now i tried 33 and wut, it closes normally... haa.. although is still takes it time to kill process, like a minute. Despite that I did not even try to download anything in this session.

Did you add any blogs during that session? Could you paste them here/upload them somewhere? Did you enable the "check blogs online status during startup"-option?
Since those two options (add blogs, check online status) would be the only way TumblThree would open a new connection, that's all that's left over to check. I'm almost certain that there cannot be any other, newly added code that would delay the shutdown for one minute, except for a (stalled) outgoing connection.

I've already tried and wasted around an hour trying to reproduce the "issue", but I simply cannot based on the descriptions, despite that three people now are complaining about the apparently same thing. I've downloaded 10000 and more posts, added 100 different blogs, modified all settings, but in the end, the v1.0.8.33 behaves the exact same for me as v1.0.8.32. Thus, there is nothing I can fix.

anonymous (not verified)
Sun, 07/01/2018 - 16:29

Blog backup is fine. But could you add one feature? I want to open downloaded blog file on the Internet. Just as I save someone blog page as HTML and view it offline. Is it possible?

TumblrMouse (not verified)
Mon, 08/01/2018 - 18:35

IMHO, unnecessary function. For this purpose, there are specialized tools like offline browsers, for example, Offline Explorer.

zab
Mon, 08/01/2018 - 18:54

You could also try something like httrack or wget or anything that crawls websites.

What you ask isn't easily done but takes several days/weeks to code. I'm not even sure if it's possible with all the javascript nowadays.

GentleBeast (not verified)
Tue, 09/01/2018 - 22:20

You are a God. I will be signing up for Paypal just to send you a beer. And the fact that you consistently update with features that are insanely awesome makes me love you even more... and I'm a dude.

Good freaking job!

kj (not verified)
Sat, 13/01/2018 - 21:05

just wanted to say i love the application. it works flawlessly and i use it every day! you're a life saver :) i just had a little idea for a new feature. perhaps there could be an option for a refined way to save the photos? like if each photoset would be automatically saved to its own unique folder (named after the permalink or something) so that the blogs folder looks cleaner and more organized.
thank you for all your hard work!

Pages