[ home / board list / faq / random / create / bans / search / manage / irc ] [ ]

/hydrus/ - Hydrus Network

Bug reports, feature requests, and other discussion for the hydrus network.

Catalog

Name
Email
Subject
Comment *
File
* = required field[▶ Show post options & limits]
Confused? See the FAQ.
Embed
(replaces files and can be used instead)
Options
Password (For file and post deletion.)

Allowed file types:jpg, jpeg, gif, png, webm, mp4, swf, pdf
Max filesize is 8 MB.
Max image dimensions are 10000 x 10000.
You may upload 5 per post.


New user? Start here ---> http://hydrusnetwork.github.io/hydrus/

Currently prioritising: simple IPFS plugin


YouTube embed. Click thumbnail to play.

 No.2316[Reply]

windows

zip: https://github.com/hydrusnetwork/hydrus/releases/download/v198/Hydrus.Network.198.-.Windows.-.Extract.only.zip

exe: https://github.com/hydrusnetwork/hydrus/releases/download/v198/Hydrus.Network.198.-.Windows.-.Installer.exe

os x

app: https://github.com/hydrusnetwork/hydrus/releases/download/v198/Hydrus.Network.198.-.OS.X.-.App.dmg

tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v198/Hydrus.Network.198.-.OS.X.-.Extract.only.tar.gz

linux

tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v198/Hydrus.Network.198.-.Linux.-.Executable.tar.gz

source

tar.gz: https://github.com/hydrusnetwork/hydrus/archive/v198.tar.gz

I had a good week. I added IPFS directory downloads and sped up file search result generation.

IPFS directories

If the IPFS downloader receives a directory multihash, it now asks IPFS for the full nested directory listing and throws that up on a new checkbox tree dialog. You select the files and folders you want to download, hit ok, and the IPFS downloader should attempt to get them all.

Post too long. Click here to view the full text.
3 posts omitted. Click reply to view.

 No.2333

>>2332

Just installed the extract version, it doesn't give me any errors, but the process just starts and closes itself after a second.




File: 1457561016502.jpg (32.87 KB, 433x380, 433:380, 1395023535367.jpg)

 No.2231[Reply]

I feel like we need a single thread to contain all the single-line answer questions that are asked around here. Would provide a nice archive of "noob questions", and would prevent the board from being clogged up by simple, single-purpose Q&A threads.

26 posts and 7 image replies omitted. Click reply to view.

 No.2320

>>2319

Options > Maintenance and Processing

will let you tell Hydrus to either perform 3 hours of maintenance on shutdown

OR

let you set "Assume the system is idle if…" to like 0-1 minutes. Hydrus will only do maintenance (which includes tag repo synchronization) if it considers your system to be idle.


 No.2321

File: 1458845670312.png (69.12 KB, 800x466, 400:233, 3ef0182a0a0c1ac98551be8db6….png)

>>2320

Thanks!


 No.2327

Gonna ask another question, is there a problem if I'm downloading the public tag repository tags, and it's stuck on "content row 0/100,164: writing"?

It hasn't moved in a while…


 No.2329

>>2327

Specifically, it seems hydrus locks up when doing this (according to the db logs):

2016/03/27 14:08:34: Profiling write content_update_package


 No.2331

File: 1459205090473.jpg (7.83 KB, 225x166, 225:166, 149512a6d5b46c1040f4453b83….jpg)

Is there a way to delete tags that aren't in use by any image, or was that one of the "coming soon" features? I recently fixed a spelling error for a character name in all my files, but I'm worried I'll mess up again since the misspelling still comes up in the tag suggestions.




File: 1426721772716.png (100.78 KB, 1624x1081, 1624:1081, 1327614072601.png)

 No.471[Reply]


Drag and drop windows with tag rules. Show two windows side by side and one window can be programmed with the rule "ADD tag foo" and the other one has the rule "REMOVE tag foo, ADD tag bar" and you can drag and drop files to them.

Deriving tags from regex of other tags/namespace tags. A file has the tag "filename:big_ugly_name" and we could regex that namespace for another tag.

Tag sets with hotkeys: save a set of tags under a hotkey so it's quick to add them to a file while filtering

Opaque window behind tag list in the corner so it doesn't get hidden by picture background

Option to default certain mime types to be excluded from slideshow and only open externally, will help with videos with odd codecs that don't preview in the slideshow correctly

Option to specify hamming distance in "find similar images", you can't change the option once it's in the filter window and you have to enter the hash manually in the "system:similar to" option
235 posts and 102 image replies omitted. Click reply to view.

 No.2254

Is it at all possible to exclude files of a certain rating? Like "-system:rating for R = 0.5" or something. Or indeed, a ≠ operator.


 No.2256

File: 1457822713832-0.png (207.12 KB, 972x995, 972:995, regex_dir_names.png)

File: 1457822713836-1.jpg (168.81 KB, 1058x1500, 529:750, 03ecc524364e1c9baef29fa769….jpg)

>>2226

That's an interesting thought, and I am not sure the best answer. The more generalised and powerful I make tag archives, the more they look like hydrus services. I am strongly considering moving all tag archive functions to the new extracted service database I am planning. When I pull client.db apart, all service-specific data will be extracted to its own folder. It'll be a portable container for most sorts of hydrus-compatible content, so it'll probably be able to do everything tag archives currently do, and with siblings and stuff as well.

So, in future, I'd like the user to be able to go something like file->import service->pick file/directory/whatever->would you like to import it as a new service, or merge it into an existing service?. Maybe with a single wizard that covers a lot of potential bases, rather than having different functions hidden all over the client.

For your plan for now, it might be reasonable to make three-monthly update archives subsequent to your original 'e621 up to 2015-09' or whatever, and then people can add those smaller (~50MB? something like that?) files a few times a year rather than having to deal with larger ones all the time. Having two archives of 1970-01->2015-09 + 2015-09->2015-12 works the same as 1970-01->2015-12 from hydrus's perspective, it'll just take two db queries. Having your users manage multiple files could be awkward, but it is easy.

The only difficulty there is figuring out what is the new content. If there is a way to grab that from the site (like if you know the guy who runs it, who has direct db access), then that's easy, but I presume you are expecting to have to do a full sync every time you update your tag archive, in which case you can go:

A = Tag archive 2015-09

B = Tag archive 2015-12

C = New tag archive

For every mapping in B:

If it is not in A:

Add it to C.

Post too long. Click here to view the full text.

 No.2259

>>2256

Your logic re new content is what I was musing about - the problem is removed tags. If there is a mapping in A that is not in B, I have no way, in the HTA, to say "this mapping should be removed if it exists". I suppose I could do an "add these" update db and a "remove these" update db, which could be one-time applied, but I'm not sure that's worth it. I'm especially not sure that running a DB that has 50,000 mappings on 50,000 files into Hydrus is going to be appreciably faster than running one that has 500,000 mappings on 50,000 files (just using numbers pulled out of my ass). Question: Does the permanent sync still take a really really long initial processing time, or does it just sit in the background?

For FA, incremental updates can and will work, though.

I like your services idea - that has potential.


 No.2264

File: 1457896706821.jpg (517.02 KB, 1475x990, 295:198, 441fed36d15b471ae29e77fad8….jpg)

>>2259

Ah, I see. Yeah, tag archives have no deleted knowledge yet. The service db will, so again that's probably the direction we'll head.

Initial sync takes ages, although it no longer locks up the gui. It now runs off a pausable and cancellable popup window that does 50 files at a time or something, so the program runs a little loggier, but is still usable. The total time it will take will be roughly proportional to the number of files that are cross-referenced. If only 5,000 files are matches, it will be about ten times faster than if 50,000 were. (The exception to this is if the tag archive is sha256, in which case 100% of the files are matched, so it'll take ages every time.)

I was heartened to see tag processing generally working faster with the recent autocomplete cache change, so we'll see how that generally holds up as I continue to alter the cache. I expect some further speed-up and also some slow-down, but I am not sure how much they will balance out.


 No.2328

Would it be possible to add a "censor this tag" option to the right-click menu you get in the selection tags menu?




File: 1422868263828.jpg (1022.74 KB, 3493x1037, 3493:1037, hydrus_client_2015-02-02_0….jpg)

 No.173[Reply]

I suppose I'll start a thread for bug reports?

Description
Hydrus fails to upload the petition for files if files were queued only for deletion on the server. This does not happen if files are both queued for uploading and some are queued for deletion.

Basically boils down to files pending count being anything like (0/x) failing and working if (>0/x).

Pic related.

Traceback
UnboundLocalError
local variable 'i' referenced before assignment
Traceback (most recent call last):
File "C:\code\Hydrus\build\client\out00-PYZ.pyz\include.HydrusThreading", line 163, in run
File "C:\code\Hydrus\build\client\out00-PYZ.pyz\include.ClientGUI", line 235, in _THREADUploadPending
Post too long. Click here to view the full text.
278 posts and 156 image replies omitted. Click reply to view.

 No.2270

>>2269

yeah that's their new thing. Make sure you edited the locale file and ran locale-gen too


 No.2303

After bumping up my hydrus installation a few versions last week, the thread importer is currently failing to import pictures from /furry/ threads entered with their json url, it seems to be looking for the pictures on media.8ch.net while the images for /furry/ don't seem to be hosted on that subdomain :/


 No.2304

>>2303

It should work if you put in the html url, like http://8ch.net/hydrus/res/173.html for this thread. I didn't even realise the json url worked previously!

The new version of the thread checker sometimes actually visits the url you put in to figure out which image domain it should be downloading from, which is absent from the json API.


 No.2315

File: 1458738757278.png (62.22 KB, 1018x565, 1018:565, client_2016-03-23_14-11-43.png)

Should the subscription downloader be downloading images in a completely random order? It's fucking my workflow up something fierce and I don't recall it being like this.


 No.2326

>>2304

The json URLs worked just fine, they even made hydrus skip downloading items/hashes in the json which were already in the database (these would import way faster).

Now, it sadly doesn't work when putting in a html url to a thread either, now I get stuff like this (using https://8ch.net/furry/res/515077.html as an example, hydrus still seems to be assuming the images are hosted on media.8ch.net, but they're hosted on simply 8ch.net, delivering me NotFoundExceptions ):


Traceback (most recent call last):
File "include\ClientImporting.py", line 2602, in _WorkOnFiles
HydrusGlobals.client_controller.DoHTTP( HC.GET, file_url, report_hooks = report_hooks, temp_path = temp_path )
File "include\ClientController.py", line 358, in DoHTTP
def DoHTTP( self, *args, **kwargs ): return self._http.Request( *args, **kwargs )
File "include\ClientNetworking.py", line 300, in Request
( response, size_of_response, response_headers, cookies ) = self._DoRequest( method, location, path, query, request_headers, body, follow_redirects = follow_redirects, report_hooks = report_hooks, temp_path = temp_path )
File "include\ClientNetworking.py", line 249, in _DoRequest
( parsed_response, redirect_info, size_of_response, response_headers, cookies ) = connection.Request( method, path_and_query, request_headers, body, report_hooks = report_hooks, temp_path = temp_path )
File "include\ClientNetworking.py", line 710, in Request
elif response.status == 404: raise HydrusExceptions.NotFoundException( parsed_response )
NotFoundException: <html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.8.1</center>
</body>
</html>




File: 1454040568237.jpg (135.42 KB, 1510x594, 755:297, aaaaa.jpg)

 No.1914[Reply]

Here's something I did to make copying individual booru tags easier. hydrus is good with booru browser but its page downloader doesn't read tags. And sometimes you just wanna quickly add tags to an individual image.

So this tamper/greasemonkey script will add a button and a key bind to booru posts and the Illustration2Vec demo site to copy all tags on the page including their schemas and rating. Then you can just right click your image in hydrus > manage file tags > paste tags > apply. The whole process takes about 5 seconds.

github.com/JetBoom/boorutagparser

22 posts and 3 image replies omitted. Click reply to view.

 No.2276

>>2273

Thanks for the fix!


 No.2322

File: 1458879933851.png (31.42 KB, 134x127, 134:127, face.png)

( ˇωˇっ)3


 No.2323

File: 1458886226733.gif (2 MB, 500x578, 250:289, 1440819090172-tech.gif)

>>2170

Do you know how I might go about getting boorutagparser-server running in Arch?


 No.2324

File: 1458888809809.png (453.15 KB, 700x940, 35:47, db57a2386108cd7634c29bb17b….png)

>>2323

# pacman -S npm
$ npm install boorutagparser-server
$ node node_modules/boorutagparser-server

This will install boorutagparser-server in the present working directory into a folder named node_modules along with all it dependencies hence the final command. There's probably a way to add node_modules to path or something but I'm not familiar enough with node.


 No.2325

File: 1458953000441.png (139.33 KB, 518x632, 259:316, 1458296718827-1.png)

>>2324

Thanks.




File: 1458684649258.jpg (350.7 KB, 774x744, 129:124, 2b3835ab88226d6366285097f4….jpg)

 No.2310[Reply]

I had a good week. I fixed some bugs, added IPFS directory parsing and download, and massively sped up file query results generation.

The release should be as normal tomorrow.

 No.2312

>>2310

>added IPFS directory parsing and download

get hype!




 No.2277[Reply]

Hello,

first of all: neat program! :) Really like the features. I've been looking for a suitable application for collecting stuff such as memes and artwork forever. I'll probably also use it for my No Man's Sky screenshots when it's finally out. I'm expecting to take a shit ton and to tag them with all the relevant data like planet name, available resources, creatures etc. so I can look it up later if needed or just for the memories - since we don't really know how much the ingame encyclopedia will track.

Anyway, I've connected the hydrus network client to the remote tag repository and updated all the way. I've then imported just a single folder of 40 quite popular images (ranging from around 500kb to 5mb) to see if the auto-tagging is working. Works like a charm. All the picturs are sitting at around 50 tags each now.

One question though: is it normal that loading the images via system:everything takes around 20 seconds (used my phone timer to measure)? Remember, I do only have 40 pictures in the database (and all the tag mappings from the repo, but coming from a dev background myself I'd think think they'd be ignored in the query when using system:everything). Using the title:xyz tag to limit the result set to a single image also takes around 17 seconds. Normal?

I'm sitting behind a fairly powerful gaming pc that's also kept very clean and has no performance issues in any other application. Relevant specs: i5 4690k, 32 gb ddr3 2400mhz, 1tb samsung 850 pro series ssd (yes, my only drive, I only care for the performance and don't need that much space anyway :P). I guess my gtx 980 ti won't matter much in this case. :D

I've already tried vacuuming the db. Any suggestions or feedback?

Second question: I'm also using a macbook pro with a high-dpi display I'd also like to run the client on (probably syncing the db between the two devices). It seems the client doesn't support the doubled resolution of its so-called retina display (it has a 2880x1800 resolution but is basically rendering it as it would a 1440xPost too long. Click here to view the full text.

2 posts and 1 image reply omitted. Click reply to view.

 No.2291

Thanks for the suggestion of creating some db profiles. However, I did a clean reinstal of v197 l (I also deleted the DB) and that seemed to fix it. Like you said, images now load in less than a second (probably more like 100ms).

I remember that I had a few Internet disconnects when I first synced with the repository (having some problems with my router lately, it sometimes crashes when I'm downloading with combined speeds > 100 Mbit/s). Could that have created some issues in the database (e.g., corrupt tables) that would have caused it to slow down that heavily?

Anyway, thanks again for your help. I'll probably use the OS X client quite extensively (since I won't be home at my PC all the time due to work reasons) so if you need any help with checking for bugs or optimizing stuff (like the DPI scaling) I'd be happy to help. :)


 No.2300

>>2287

Well, I think I spoke too soon. I have now experienced the same problem I had on my PC on OS X too. The scenario:

- Fresh Installation

- Complete sync with the repo (in the client)

- No local tags

- Import of a single image (821KB png, 68 tags in the repo)

- Vacuumed the db

- "system:everything"

This takes around 30 seconds before the loading is finished and the image is rendered. The macbook is not as powerful as my PC, but still quite the machine (2.5GHz i7, 16GB RAM, 512GB SSD). Running on the latest OS X 10.11.3. Something definitely seems fishy here.

Here the relevant part from the log I created: http://pastebin.com/6VYDvUAE

Kind of weird to me that most of the stuff shows up as taking up no time at all (0.000), but you can clearly see the one call taking over 27 seconds.

Hope you can help. :(

P.S.: Performance was fine with several hundred files before syncing to the tag repository. Since I did report that performance on PC was fine after reinstalling, I guess it may be because I didn't sync to the repository after the reinstall because I didn't find the time to do so. What I also noticed: everything seems slow in general. Opening the "manage tags" window for the image takes around a minute.


 No.2301

>>2287

And, since I thought it might be useful for finding the error, also a log for opening the manage tags window (which takes around a minute as I already mentioned): http://pastebin.com/tik35T4u

This one's a lot longer since there are more entries with actual time values and I didn't want to leave anything out.


 No.2306

File: 1458670341559.jpg (533.17 KB, 1964x1538, 982:769, 9ad701a527740bffb2b7e7136d….jpg)

>>2291

>>2300

>>2301

Thank you for these profiles. I repeated them on my dev machine and discovered something similar, although not pronounced enough for me to previously notice, and found it was all due to a single very inefficient line of sql–the bit that fetches raw tag ids for files. I can't remember it being a problem before, and I don't really understand why it is running so slow now (sqlite's query optimiser reports to have no problem with it), but tweaking the way it takes its parameters somehow reduces its time from 2.5 seconds to about 9ms!

This fix will be in tomorrow's release. Please give it a go sometime and let me know if your problem is any better.


 No.2309

>>2306

Thanks, sounds good. I'll try the update as soon as it's available. :)




YouTube embed. Click thumbnail to play.

 No.2274[Reply]

windows

zip: https://github.com/hydrusnetwork/hydrus/releases/download/v197/Hydrus.Network.197.-.Windows.-.Extract.only.zip

exe: https://github.com/hydrusnetwork/hydrus/releases/download/v197/Hydrus.Network.197.-.Windows.-.Installer.exe

os x

app: https://github.com/hydrusnetwork/hydrus/releases/download/v197/Hydrus.Network.197.-.OS.X.-.App.dmg

tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v197/Hydrus.Network.197.-.OS.X.-.Extract.only.tar.gz

linux

tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v197/Hydrus.Network.197.-.Linux.-.Executable.tar.gz

source

tar.gz: https://github.com/hydrusnetwork/hydrus/archive/v197.tar.gz

I had a great week. I fixed a bunch of bugs, and I managed to finish the first version of the first part of my new autocomplete cache layer. Depending on the size of your database and the speed of your computer, this week's update could take a little while, but once it is done, a lot of autocomplete results will appear very quickly, every time.

new cache layer

So, everything has generally gone to Post too long. Click here to view the full text.

3 posts and 1 image reply omitted. Click reply to view.

 No.2289

Is there a way to control the order tags are listed in the autocomplete section of the 'manage tags' dialogue?

It was either because of the latest update or some setting I unwittingly changed, but there doesn't seem to be a rhyme or reason to the ordering anymore, which makes the autocomplete fairly useless at the moment


 No.2292

File: 1458495003102.jpg (1.61 MB, 1800x1362, 300:227, 84cab086dc51acbb56ffc07eee….jpg)

>>2289

I accidentally broke it in this release–it is fixed for next week! Some sibling replacement stuff should work better as well!


 No.2298

Is it just me or is the upnp broken?


 No.2305

File: 1458665245458.jpg (314.79 KB, 1837x1139, 1837:1139, 500fc0e0bc2e06a54ad738558c….jpg)

>>2298

It seems to be generally working for me. Which part is giving you problems? Do you have an error message?


 No.2308

>>2305


2016/03/21 20:27:09: Daemon UPnP encountered an exception:
2016/03/21 20:27:09: Exception:
2016/03/21 20:27:09: Exception

Problem while trying to add UPnP mapping:



upnpc : miniupnpc library test client. (c) 2005-2013 Thomas Bernard

Go to http://miniupnp.free.fr/ or http://miniupnp.tuxfamily.org/

for more information.

List of UPNP devices found on the network :

desc: http://192.168.1.1:5000/rootDesc.xml

st: urn:schemas-upnp-org:device:InternetGatewayDevice:1



Found valid IGD : http://192.168.1.1:5000/ctl/IPConn

Local LAN ip address : 192.168.1.247

ExternalIPAddress = 10.5.131.97

AddPortMapping(25510, 25510, 192.168.56.1) failed with code 718 (ConflictInMappingEntry)

GetSpecificPortMappingEntry() failed with code 714 (NoSuchEntryInArray)



Traceback (most recent call last):
File "include\HydrusThreading.py", line 218, in run
self._callable( self._controller )
File "include\ClientDaemons.py", line 399, in DAEMONUPnP
HydrusNATPunch.AddUPnPMapping( local_ip, internal_port, external_port, protocol, description, duration = duration )
File "include\HydrusNATPunch.py", line 84, in AddUPnPMapping
raise Exception( 'Problem while trying to add UPnP mapping:' + os.linesep * 2 + HydrusData.ToUnicode( output ) )




File: 1424393184272.jpg (1.57 MB, 1500x1978, 750:989, 1b42a554ea243f40c1ec80b391….jpg)

 No.290[Reply]

Here is a 7zip of the client database, version 147, freshly initialised and synced up to my public tag repository as of today.

http://www.mediafire.com/download/0nh9z1994iao4jl/Hydrus_Network_Client_bare_database_with_PTR_up_to_update_1012.7z

If you want to start a new client that connects to my public tag repository, you can swap in this database right after you install, and you won't have to spend twenty hours sitting around waiting for 7.8 million mappings to process.

If you have no idea what this is, I suggest you ignore it and install the client normally, learning about how hydrus works using my help files first.
57 posts and 33 image replies omitted. Click reply to view.
Post last edited at

 No.2175

Hi guys, OP of the original (2014) e621 db here. I'm reripping e621 now, under my original sanitization regime - read, normalized creator: series: species: character: namespaces instead of the site's artist: copyright: species: character: native set.

Should be up in a week or two, for anyone interested.

To the other guy making site archives: what hardware are you using, out of curiosity? I've been playing with short-duration EC2 c4.8xlarge instances, and they're obscenely quick for this, but pricy.


 No.2176

>>2175

(by "obscenely quick" I mean "I ripped sha1 and md5 hashes of all of e621 in about 90 minutes at 1gbit". It's like renting a Bugatti to spin cookies on the front lawn of the Augusta National.)


 No.2186

File: 1457097320773.png (3.76 KB, 716x247, 716:247, screen.1457096014.png)

>>2175

>>2175

>To the other guy making site archives: what hardware are you using, out of curiosity?

I'm just using my cheap seedbox from https://seedboxes.cc/. Only $14 a month. I've been with them probably since 2013 now, and they're definitely reliable. Although advertised as a seedbox, they let you basically do anything on it. Of course, it's not just used for downloading images/making tag archives.. I also run h@h from sad panda on it, and also download a ton of torrents from private trackers.

20Gbps Up/Down. However, 3TB /month upload cap monthly. As for making the tag archive, it's just a simple 50 line python script that parses their index.xml page. "https://e621.net/post/index.xml?limit=1000&page={page_id}"

>short-duration EC2 c4.8xlarge instances

I actually had no idea Amazon offered this type of service. I'll have to look into it.

>(by "obscenely quick" I mean "I ripped sha1 and md5 hashes of all of e621 in about 90 minutes at 1gbit"

Yeah e621 and gelbooru can be ripped quickly. Most tag archives from big sites can be done in a few hours, with the only exception being sankaku complex. Their incredibly restrictive server makes it a huge bottleneck.

Speaking of sankaku, I had to start over. I didn't realize they used a limit of 100 instead of the usual 1000 when doing post api searches, which meant my date ranges had gotten messed up. It doesn't help that it has 10 minute time outs after 150 requests either. Makes it a huge pain


 No.2192

>>2186

Ah, interesting service - will have to look into it. I need something less limited than a t2.micro, and c4.8xlarges are $1.67 an HOUR (!).

I remember the last time I did this having an insane amount of difficulty with the e621 API - part of it was that I was a much worse programmer then, but part of it was that individual API pages didn't seem to include things like tag categories and ratings. I also don't, in theory, trust their hashes - in practice they work fine, but I'm developing here with an eye towards sites like Derpibooru that optimize the image and don't update the hash in their API, so if I want anything to match on that (since as you mentioned, nobody downloads originals) I'm going to have to manually get the hashes of both the orig_ and the optimized images.

So the way I'm doing it this time is using BeautifuSoup to actually scrape every single page on the site (/post/show/{id}, iterated from 000001 to 900000 ignoring 404s), grab the tags including namespaces, grab the rating, stream the image and hash it myself, put everything in a JSON file, save it into the locally running instance of MongoDB that I keep around, repeat.

I broke up the iteration into work-blocks, which are 1000 IDs each and are served from Mongo as well. So I can fire up any number of workers, which point themselves at Mongo, grab the top block, and send the data back.

Now that I have it all in Mongo, all I'll have to do is write a quick script to iterate through it and dump it through the Hydrus Tag Archive generator.

This is definitely overkill, but it's extensible overkill - I don't need to worry about API differences, I don't need to worry about a process choking and hosing my output file, I don't need to worry about parallelism conflicts, etc.

The ultimate goal is to re-do Furaffinity, properly this time. Have to do hashing on the fly for that, too, and this method will work for sure now that their servers are less shit (thanks, IMVPost too long. Click here to view the full text.


 No.2278

If I synced with a downloaded db, gelbooru for example, after the sync do I still need to keep that db file or all the mappings now inside my client db?




File: 1458081918095.png (631.15 KB, 666x666, 1:1, ea522233d79a14020e87416fcd….png)

 No.2272[Reply]

I had a great week. I fixed a lot of bugs, and I managed to finish the first part of my new autocomplete tags cache. I still need to test it on a real-world database, and the cache does not yet cover all search domains, but in general it seems to be producing autocomplete results for a 6,000 file/20m mappings db in about 40-100ms. I am obviously extremely pleased. The cache takes a bit of CPU to initialise and sync with an existing service, so I expect the update to take a few minutes, possibly longer.

Absent any problems in my extended testing, the release should be as normal tomorrow.



File: 1457483155180.jpg (11.02 KB, 208x200, 26:25, 4862683 _b2974ab5efb1f36cc….jpg)

 No.2217[Reply]

A local booru program is exactly what I've been looking for! Thank you Hydev.

One q: Is it possible to take your local images, compare them to a booru's images, and apply all tags that the booru has for the images to your local ones?

Even if it's time-consuming, it would make my life a ton easier.

4 posts omitted. Click reply to view.

 No.2222

>>2221

So if I import the tag archives the dev gives, it'll auto-tag anything with a hash it recognizes?

Also, the different databases seems to be able to be done with different installs. Which is a bit of a pain, but at least it's feasible.


 No.2224

>>2222

Right. The file hash is how Hydrus identifies the file. That does lead to issues with watermarked or resized files not being recognized.

There's also someone working on a script that will dump all an image's tags into a text file with the same name as the image when you save it, and Hydrus can import those tags with the image.


 No.2225

>>2224

And here I was thinking the tag archives/repos were just for the tags themselves. That's pretty awesome.

Thanks!


 No.2263

File: 1457895853639.jpg (177.44 KB, 1000x508, 250:127, 03e9cc69acf41d2988939f7128….jpg)

>>2217

I am glad you like it!

As well as the tag archives that people are creating and sharing, the most popular files' tags on the popular boorus are shared on my public tag repository. If you sync with that, you will find many of your files gain tags naturally, and increasingly so as more are added in future.

Also, if you end up importing some tag archives and also sync with my ptr, please feel free to upload any mappings your client generates, so other people see them as well.

>>2220

>>2222

I recommend running multiple copies of the program to do this. I designed it to be 100% portable (all settings and data are stored under install_dir/db) with the explicit intention of allowing this, and it means there is no privacy bleed from one database to the other. If you have a sfw client, you can search it with someone looking over your shoulder without any embarrassing autocomplete counts popping up, for instance!

With the increasing size of individual repository-syncing databases, however, I may reconsider this. I may end up permitting the creation of new non-overlapping local file services, but it really depends on what people are interested in. If you end up running several clients at once, let me know how it works out for you!


 No.2271

>>2263

I get that you like the portable approach, but it really has a number of problems and you really should implement a proper system to allow for installations. You're kind of bucking the standard convention of literally every OS

https://github.com/hydrusnetwork/hydrus/issues/106#issuecomment-167012174




File: 1457489329988.gif (76.53 KB, 712x774, 356:387, wc124.gif)

 No.2223[Reply]

I think it would be worth it to have a proper timestamp namespace with some usability features:

Options to pull stamp from the filename (most image boards use the Unix timestamp for the filename) otherwise optionally use the file modified time from the filesystem

Autocompletion for missing time fields, e.g. if only a year and month are entered (publication date) then the rest of the timestamp is set to midnight on the first of the month or other option

 No.2265

File: 1457898008340.jpg (787.69 KB, 1239x1749, 413:583, 88b72b50b42d8143705d45b355….jpg)

I think I would like something like this as well, although I am not sure which timestamps would be useful to store.

I have sometimes applied 'date:' to represent date of creation, and I have been using pseudo-ISO 8601 (year-month-date, which is lexicographically sortable), so like:

date:1992-06

date:2000-01-01

date:2008

And I figured I could parse that inside the program for a system:date>1999-02 or something that would intelligently fill in the midnight-blanks as you suggest, although I haven't actually seen many date tags or yet wanted to search that way, so I haven't put programming time into it. I suspect the gathering of accurate timestamps is difficult to automate, although perhaps that is something future Deviant-Art-like parsers could generate. Most artist websites have an uploaded date somewhere, don't they?

If you would like to parse imageboard upload timestamp into hydrus, you might be able to neatly do it with a script that went:

for filename in directory:

try to parse timestamp from that (142454546115642.jpg)

create ISO 8601 datestring from that (or whatever)

copy/rename that file to be '2000-01-01.jpg'

And then import the files into hydrus and use regex path tagging to create your 4chan_timestamp: namespace or whatever.


 No.2268

You should take into consideration video timestamps while thinking about this.

mpv and mpc-hc save screenshots in the format: "videoname.ext [timestamp].png"

In mpv, [timestamp] is [hh:mm:ss.000]

For any screencap, I've actually gotten in the habit of adding tags in the following format:

series:archer

episode:s01e05

timestamp:00:05:01.500

I have no suggestion or request, just informing you of another use case.




YouTube embed. Click thumbnail to play.

 No.2230[Reply]

windows

zip: https://github.com/hydrusnetwork/hydrus/releases/download/v196/Hydrus.Network.196.-.Windows.-.Extract.only.zip

exe: https://github.com/hydrusnetwork/hydrus/releases/download/v196/Hydrus.Network.196.-.Windows.-.Installer.exe

os x

app: https://github.com/hydrusnetwork/hydrus/releases/download/v196/Hydrus.Network.196.-.OS.X.-.App.dmg

tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v196/Hydrus.Network.196.-.OS.X.-.Extract.only.tar.gz

linux

tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v196/Hydrus.Network.196.-.Linux.-.Executable.tar.gz

source

tar.gz: https://github.com/hydrusnetwork/hydrus/archive/v196.tar.gz

I had a good week. I fixed and improved some things, and I made some important changes to the autocomplete tag code.

The past couple of weeks have brought a lot of new mappings to my public tag repository, and I think we yesterday hit 50 million! It represents about 700k different tags applied to 3 million files! I'm really pleased, and I appreciate the contributions everyone has made. While this expansion is revealing some terrible lag in places, I still think it neat that we have colPost too long. Click here to view the full text.

1 post and 1 image reply omitted. Click reply to view.

 No.2239

I'm new to Hydrus, just testing it out for now. I downloaded the PTR sync database and about 10 different tag databases. Then imported about 1000 images. After importing the images Hydrus starts processing them and this is taking a very long time, up to 30 seconds per file. Is this normal?


 No.2242

I got many copies of these errors when I tried to load my inbox.

UnicodeDecodeError

'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)

File "site-packages\wx-3.0-msw\wx\_core.py", line 16766, in <lambda>

File "include\HydrusController.py", line 227, in ProcessPubSub

try: self._pubsub.Process()

File "include\HydrusPubSub.py", line 127, in Process

callable( *args, **kwargs )

File "include\ClientGUICommon.py", line 4096, in SetTagsByMediaPubsub

self.SetTagsByMedia( media, force_reload = force_reload )

File "include\ClientGUICommon.py", line 4007, in SetTagsByMedia

self._RecalcStrings( tags_changed )

File "include\ClientGUICommon.py", line 3821, in _RecalcStrings

tag_string = self._GetTagString( tag )

File "include\ClientGUICommon.py", line 3763, in _GetTagString

if self._show_pending and tag in self._pending_tags_to_count: tag_string += ' (+' + HydrusData.ConvertIntToPrettyString( self._pending_tags_to_count[ tag ] ) + ')'

UnicodeDecodeError

'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)

File "include\ClientGUIMedia.py", line 1948, in EventPaint

self._DrawCanvasPage( page_index, bmp )

File "include\ClientGUIMedia.py", line 1254, in _DrawCanvasPage

dc.DrawBitmap( thumbnail.GetBmp(), x, y )

File "include\ClientGUIMedia.py", line 2981, in GetBmp

upper_info_string += ', '.join( series )

I tried opening a new tab to see if maybe there were some particular images/tags causing the issue, but this new tab is loading everything fine. It's only when I switch back to the original inbox tab that the errors appear.

Post too long. Click here to view the full text.

 No.2243

>>2242

Wait, I've been messing around some more. The errors are actually popping up whenever I archive an image or send it to the inbox. It has nothing to do with the image itself or its tags (it happens with an image with no tags at all, too).


 No.2251

File: 1457808074078.jpg (110.96 KB, 1000x671, 1000:671, 62912fe9e49f6d6379956041f8….jpg)

>>2239

30 seconds sounds quite high. Even a slow computer with a lot of tags might take perhaps 2 seconds to import a typical file. I expect the many recent tags you added have fragged up your database file and/or your hard drive, so I suggest you go database->maintenance->vacuum, which will clean up your client.db (it may take ten minutes to complete), and then shut the client down and run a hard drive defrag. Once that is done, restart the client, and see if things are running faster.

If things are still running slowly, please check out:

http://hydrusnetwork.github.io/hydrus/help/reducing_lag.html

>>2242

>>2243

Thank you for this report. I believe a tag got imported and was not properly decoded to unicode at the correct point in the import process. The bad tag hung around in gui memory and hence failed to render to screen. The inbox/archive events were triggering a refresh of the tag gui controls, repeating the error.

I think this error is temporary. If you have since restarted your client, it should have gone already. I had a look and think I might have fixed it, but let's check:

Were the .txt files you parsed through a manual import that you created, or through the new import folder code?

Did any of the tags in your .txt files have unusual characters, like rare accents, punctuation, or japanese text?

If you load up a fresh page with something like system:age<7 days to show all those files you imported with tags, do you get any display errors at all? Are there anPost too long. Click here to view the full text.


 No.2253

File: 1457811812775.jpg (316.53 KB, 843x675, 281:225, 1435928990501.jpg)

Thank you hydrus.

After all the IPFS stuff is integrated I'll have no choice but to migrate all my images to hydrus. The only thing holding me back is laziness.




File: 1457278380449.jpg (148.5 KB, 1111x1587, 1111:1587, sts98plume_nasa_big.jpg)

 No.2203[Reply]

I've been trying to use tags with hierarchy which lets subtags describe the tags above them, using periods as delimiters.

hair

hair.black

hair.long

hair.long.knee length

hair.style.braids

hair.style.ribbon

This is nice for keeping relevant tags close and finding characteristics of tags that I want to describe in more detail.

The problem is searching. If I try to search "braid" it doesn't hit on "hair.style.braids" and if I search *braid it takes a very long time, along with the danger of the autocomplete taking off with searching for the asterisk and locking up the client for several minutes.

I've been able to work around this a little bit with siblings and searching for "braided hair" -> "hair.style.braids".

Can you think of a way to start searches inside tags following some kind of delimiter like a period?

 No.2210

File: 1457414115214.jpg (53.61 KB, 813x357, 271:119, 080a2df3c5830895725f35a6b3….jpg)

There are namespaces i.e.

someNamespace:some tag

which, if you type the namespace and one letter and then autocomplete, hydrus will fetch all the tags in the namespace that start with that letter; if you type a tag without a namespace that is in a namespace, hydrus will still fetch the file.

For example, lets say you have the "hair" namespace which you populate with the tags "black," "long," and "braided." If you type "long" into the search bar, hydrus will get all the files with the tag long ignoring any namespace (so if you have the tags "hair:long," "long," and "legs:long" hydrus will return all of them), but if you type "hair:long" hydrus will only return that.

The only problem is that namespaces cannot be nested so you cannot have "hair:style:braids" as hydrus will just interprite it as the tag "style:braids" in the "hair" namespace. Which actual sounds like a bug as you wouldn't want the namespace delimiter inside of tags.

If your "subnamespaces" aren't abstract, you can use parent tags so that the children get the parent tag in the taging process.


 No.2211

>>2210

Namespace delimiters not being recursive is tricky - you could think of it as a bug, but also there's stuff like "series:the legend of zelda: ocarina of time" in which you do actually want "series" to be the namespace and the rest of it to be a full tag.


 No.2212

>>2211

Or less controversially, "series:2001: a space odyssey"


 No.2216

>>2210

Unfortunately the sub-namespaces are pretty much abstract and arbitrary. I was wondering if the delimiter could be an option or maybe two characters. The idea would be to avoid using asterisks in the search to keep the search time down, otherwise I could just put asterisks on both sides of the search term. Namespaces kind of do this already, but like you guys said it's hard to avoid recursion blowing up the search.

Ideally I'd like the search to hit on any of the nested tags individually. I picture it working like this:

set delimiter to ":."

long matches

hair:.long

hair:.long:.knee length

clothing:.shirt:.long sleeves




File: 1457476890109.gif (57.82 KB, 355x370, 71:74, cba9acb11d176a37067b9a3b5f….gif)

 No.2215[Reply]

I had a good week. I fixed a lot of stuff, including the 8chan thread watcher and the tumblr downloader, added the new .txt tag parsing to import folders, and significantly improved parts of the tag autocomplete code in preparation for the new a/c cache layer.

The release should be as normal tomorrow.



Delete Post [ ]
[]
Previous [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]
| Catalog
[ home / board list / faq / random / create / bans / search / manage / irc ] [ ]