windows
zip: https://github.com/hydrusnetwork/hydrus/releases/download/v196/Hydrus.Network.196.-.Windows.-.Extract.only.zip
exe: https://github.com/hydrusnetwork/hydrus/releases/download/v196/Hydrus.Network.196.-.Windows.-.Installer.exe
os x
app: https://github.com/hydrusnetwork/hydrus/releases/download/v196/Hydrus.Network.196.-.OS.X.-.App.dmg
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v196/Hydrus.Network.196.-.OS.X.-.Extract.only.tar.gz
linux
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v196/Hydrus.Network.196.-.Linux.-.Executable.tar.gz
source
tar.gz: https://github.com/hydrusnetwork/hydrus/archive/v196.tar.gz
I had a good week. I fixed and improved some things, and I made some important changes to the autocomplete tag code.
The past couple of weeks have brought a lot of new mappings to my public tag repository, and I think we yesterday hit 50 million! It represents about 700k different tags applied to 3 million files! I'm really pleased, and I appreciate the contributions everyone has made. While this expansion is revealing some terrible lag in places, I still think it neat that we have collectively gathered so much stuff.
autocomplete rewrite
I made good progress this week on the new autocomplete cache layer I have been planning. I have a good plan of where I am headed, a semi-fleshed skeleton of the cache database written, and a fair amount of optimism about bringing it all together. I am generally confident the new cache will deliver sub-second autocomplete results for pretty much all queries when it is done. I still need to think about some things, and there is a lot of work still to do, but I've made some decisions on what I can and cannot easily attempt to calculate in quick time. Disabling 'all known files'/'all known tags' last week was part of that.
I also gave the existing code a polish and converted it to this new calculation philosophy. Some sibling stuff is fixed, and some bits of inefficient code are cleaned up, but the important thing is that 'all known tags' autocomplete results (e.g. the sort you get when you open a new local search page and start typing) are calculated in a faster but sometimes less accurate way. If you only have 'local tags' and my PTR, then you will likely see no change, but if you have multiple services that contain the same tags for the same files, you will see counts that are the sum of those services' tag counts, rather than their union (they will be a bit higher than is accurate, sometimes).
I am not sure if I will be able to bring back 100% accurate counts for absolutely all cases, but I will revisit it after the first version of the cache layer is done. I might end up including it in a second layer that collapses sibling counts and applies local censorship rules as well, but we will see. For now, I plan to add '<=' signs to uncertain counts and see how much of a big deal it turns out to be.
Anyway, the upshot of changing this calculation is that all autocomplete results seem to be returning a bit quicker than usual already. That still often means ten seconds, so the new cache is still needed, but at least we are headed in the right direction. Regular tag processing, as well, which interacts with the existing cache, seems to be running up to 40% faster, and some other big jobs like resetting services now no longer have to do a very laggy autocomplete purge. Let me know how you get on!
A lot of orphaned cache entries need to be deleted, so the update may take a few minutes.
other stuff
The 8chan thread watcher should now work for all boards. Let me know if it doesn't work for something unusual.
I also fixed the tumblr downloader, which recently broke due to a subtle API change.
The IPFS downloader now detects directory multihashes and gives a more pleasant error. I will next add directory downloading and then directory pinning.
The recent 'load tags from neighbouring .txt files' import option is now available for import folders. If you are interesting in this, check it out at the edit import folders dialog and let me know if the new option makes sense. The .txt files should be deleted or moved alongside the media files they refer to depending on how you have those actions set up.
full list
- fixed the 8chan thread watcher for boards that host content on media.8ch.net
- improved the thread watcher url check logic so it won't lag with the new fix
- cleaned up the ac generation code a little
- 'all known tags' ac counts are now summed from all the known tag services rather than calculated directly (a <= indicator for when these cases overlap will be forthcoming). this speeds up file add/delete, service reset, a/c fetch time, and general tag processing, and reduces the size of the db
- ac generation code now deals with 'is the entry text an exact match or not?' better
- ac generation code will now no longer produce non-exact-match siblings on an exact match search
- ac generation code will no longer save half complete search text into the db as new tags
- on update, the a/c cache and its helper table 'existing tags' will be cleaned of a lot of orphans, which may take a few minutes
- fixed some bad unicode path parsing when importing files in some OSes, I think!
- fixed some bad read autocomplete sibling substitution
- fixed a bug where autocomplete predicate lists would not update if the new list was merely a reorder (which can happen in some unusual sibling cases)
- fixed the tumblr parser for the subtly new API
- import folders now support loading tags from neighbouring .txt files–check the dialog to set up which tag services you would like to import to
- the ipfs file downloader now queries DAG object links, determines if the given multihash is a directory or other complicated object, and if so politely dumps out (handling of directory downloads is forthcoming)
- some db code is cleaned up
- prepared db code for some future subclasses
- wrote most of the new ac cache db
- misc cleanup
- added some browser addon links to the ipfs help
next week
I will continue this cache layer stuff, and I'll also see about IPFS directory parsing and a bit of gui to pick the files you want to download.
My overall near-future 'big stuff' plan is to go something like IPFS directories->cache layer->extract service data from client.db->suggested tags control->faster dupe searching->make a new poll on what to work on next.