[ home / board list / faq / random / create / bans / search / manage / irc ] [ ]

/loli/ - Lolis

Lolis are Love, Lolis are Life

Catalog

Name
Email
Subject
Comment *
File
* = required field[▶ Show post options & limits]
Confused? See the FAQ.
Embed
(replaces files and can be used instead)
Oekaki
Show oekaki applet
(replaces files and can be used instead)
Options
Password (For file and post deletion.)

Allowed file types:jpg, jpeg, gif, png, webm, mp4, swf, pdf
Max filesize is 8 MB.
Max image dimensions are 10000 x 10000.
You may upload 5 per post.


File: 1434571763316.png (59.99 KB, 1387x537, 1387:537, db.png)

 No.21565

sup guys, im building the ultimate software to collect and organize porn and i would like what you think of it and your suggestions

it is currently in alpha but should be soon in beta.

how it work?

lets say you downloaded a doujin on exhentai, all you have to do is copy paste the html source of the page and give the folder path of where its saved in your computer and all the info are sent in a local SQL database. (tags by category, main category, file name, rating, date, and etc…)

if a tag dont exist yet, it is created.

You can choose your own rating and chage/add the tags as you want.

i plan to add support for e-hentai, gelbooru, danbooru.

i can add support for website who dont have tag systems

 No.21588

sounds neat, but i can't say much else without trying it.

Can you support nHentai too? And what language is it being made in? Is it open source?


 No.21593

I would keep paheal's rule34 in mind as well.


 No.21595

You're creating a table for every tag…

>facepalm.jpg

Take a database class FFS.


 No.21617

>>21588

i did not knew nHentai. i want to first support a few website and make it fully functionnal, then adding new supported website.

it is made with c#

it will be open source.

If you dont like the GUI of windows form, all the important stuff are in DLL's so you cant make your own GUI

>>21593

will add it to my bucket list

>>21595

i must admit that SQL is not my speciality and i havent use it for years.

There is no table for every tag, there is a table for every category of tag. Each tag are a boolean.

A friend of mine suggested i used a single nvarchar for all the tags of a file but i think in long term it may have performance issues.

BOOL are tinyint(1) and its easier to manage and manipulate the available tags for tagging and searching.

There will be search function for tags

But if you have any suggestion for the SQL, go on


 No.21618

>>21617

*can make your own GUI

Sorry typo error


 No.21627

>>21565

> it is currently in alpha but should be soon in beta

I'm not sure what I'm supposed to think of a developer who doesn't know that a piece of software is beta first and becomes alpha as soon as the teething problems have been solved.

I'm sorry I don't get your Database structure either. Why does the 'Artist' table contain only a Boolean field, for instance? I'd rather expect it to contain string fields (varchar or something like that in most SQL dialects) that tell you the name (preferrably separated into first name, family name, pseudonym etc.) of the artist. Depends on how the information you can acquire from exhentai is structured though (if you want to avoid tedious post-processing).


 No.21679

>>21617

I think you're overcomplicating it, create as few classes as possible and have ones pertaining to a certain type of tag implement an interface corresponding to that tag

i.e.

Loli implements Female

English implements Language

DBZ implements Parody

etc


 No.21683

a good database and data structure means a good output

you really don't need tables for tags

tb_image (all images goes here)

id_image

artist (for directory)

filename

tags(separated by spaces)

date

resolution

datatype

tags=loli black_bullet uncensored questionable

select * from tbl_image where tag="uncensored"

tb_anime_suggestions (for listbox, dropdownlist, link, datagrids/tables,etc.)

black_bullet

hidan_no_aria

hitsugi_no_chaika

…etc

select from tbl_image where tag="anime_suggestion_dropdown.text"

>inb4 c#

select from tbl_image where tag="uncensored" (link)

the problem here is how to count all "uncensored" tags in descending order, select * can count it all but you can also +1/-1 a tag counter in every uploaded/removed image or add a refresh all counts to check integrity.

Involves a lot of 'select'

You could also just fork an existing opensource booru,

Make it lightweight as possible and add great features like ability to run offline/ on lan through wifi (like NAS drives) and serve as pdf reader too.


 No.21684

>>21683

host it to run under localhost as a server let's say http://127.0.0.1:51707 (lolis upsde down)

And you can have possibility to also access it via wifi and other kinds of devices since it's the server doing the work

>mfw no loli server yet


 No.21685

File: 1434712937744.jpg (1006.69 KB, 1400x992, 175:124, 43557296.jpg)

Protip: Doujins aren't the only type of porn on the internet.

Also your current theory of design will become insanely slow and unusable.

- Buy some botnets off of Russians

- Make them web spider the fuck out of all porn material. Videos, images, audio, text, the lot.

- Generate a tag cloud.

- Routinely process said tag cloud into three categories. Popular, median and quiet.

- Don't use search indexes on popular content searches, simply web crawl the view numbers from the original websites you used as the content source. Quantity of likes/views/blog notes/faves/etc, is far more informative than "200 people searched for latino", because you directly have a number that says "this quantity of people actually consumed the content in some way or another".

- Store it all in Reddis. Stop using a poorly designed SQL structure that would get insanely slow and unusable as the volume of diversity increases.

- Configure it to not hit fail-points like sadpanda (because there is always a proxy that will get you around that).

- Generalize your search results UI. Be the first one to have proper preview thumbnail sheets that work properly no matter what type of content is being shown in the search results grid. When the mouse is over a search result thumb, cycle through the thumbnail sheet at a 1 second interval. Don't be affraid to mix different types of content into one page on your search results.

— If its a video, preload an entire thumbnail sheet as one image like YouTube does

— If its a doujin imageset, merge the thumbs into one thumbnail sheet image, like YouTube does

— If its text, analyze it for nouns, describing words, unique words, erotic words, etc, and then locate the pages in the book/PDF where those occur, and generate thumbnails of those areas, with the words in different colors. Then merge those all into a single thumbnail sheet image, like YouTube does

— If its audio, analyze the waveform for areas that have pops and clicks, but where the mid-range isn't completely empty. Split it up into 5 key sections of preview audio, with fade-in and fade-out on each section. Merge that into one single OGG file per search result, and play that on mouse over.

Call it "FapNet".


 No.21736

>>21565

>>21617

Dat scheme. What am I even looking at?

-ImageSet - i think it's individual image. Not bad so far

-All other tables - you want to make 1500 column table for 1500 artists? DB does not work that way.

Have image_tag table:

-imageset_id -> foreign key to imageset

-tag -> varchar of tag

To organise tags have tag info table:

-tag - varchar primary key

-is_parody bool

-is_artist bool

-etc

Also field names. What's wrong with just "id"?


 No.21757

File: 1434841041396.jpg (1021.03 KB, 3493x1037, 3493:1037, hydrus_client_2015-02-02_0….jpg)

>>21565

I'd say hydrus already serves this purpose quite well.

There's currently rips of the entirety of danbooru's and gelbooru's tag mapping for hydrus and it supports importing single or collections of images from a lot of different image hosts.

>>>/hydrus/


 No.21781

>>21627

Alpha is first.


 No.21782

>>21736

Turns out doing it like that is inefficient as fuck. A single varchar one ach post with all the tags space seperated has turned out to be fairly speedy with a full text search engine… But even with a LIKE it's quite okay.


 No.21791

>>21781

In the Greek alphabet, yes. Not in software development processes, at least if what I have learned is correct. Wikipedia doesn't help much either – according to what it says, both alpha and beta are for first independent testing and may contain bugs. The schematic image there, however, speaks volumes: Seems you were right and the software engineers who educated me wrong.


 No.21793

>>21791

Not him but he's right, even in software alpha is first, beta is later. Just an example, the game world of warships was in closed alpha testing a number of months back, it is now in closed beta testing and soon open beta.


 No.21796

File: 1434903415043.png (113.77 KB, 1433x753, 1433:753, abcdd7cef2ceb44d621f19fae9….png)

Not really on topic but can anyone give me feedback on an otakudb I want to create? Mostly for myself to categorize and list stuff.


 No.21802

File: 1434908450392.jpg (448.41 KB, 1600x1200, 4:3, ha.jpg)

>>21565

a feature that I think would be pretty rad would be a fap count checkbox that would tally up the number of times you fapped to an image and being able to filter your search to "number of times fapped" to make finding a particular image a little quicker


 No.21807

>>21757

haven't tried hydrus yet,

Does it automatically grab tags from boorus and rename files?

I'm thinking of making one that can also batch download and image search like saucenao/google imgs

I'd also love to put in a translator using tesseract ocr.

(haven't compiled shit for a year!)


 No.21823

>>21793

Hm, strange. Probably I misunderstood something someone told me. Presumably because I'd otherwise have wondered what gamma, delta etc. versions would be. And what happens if you reach the omega version.

Anyway, >>21796

I currently don't have time to review your design, but there are some things that catch my eye:

> AnimeNameRomanji

Doesn't really matter, but it's usually 'Romaji' (without 'n'), 'Rome-Letters'. Don't know where the 'n' is supposed to come from (although you see this transcription quite often).

> AnimeNameKana

Why only Kana? What about Kanji?

> airedEpisodes TINYINT(3)

> totalEpisodes TINYINT(3)

As far as I know (I don't use SQL that regularly), the number in the TINYINT brackets indicates the number of bits to use. I.e. TINYINT(3) uses 3 bits, hence it has 2^3 = 8 possible values. Isn't that a bit too little for episode counts?

> Status_StatusID

What is this?


 No.21824

>>21823

They don't go that far. After beta is full release product.


 No.21877

>>21824

Yes, unless it's perpetual beta, but I'd like to settle it with that.

>>21823 cont.

> AnimeProducer, AnimeTranslator (likewise: NovelProducer, NovelTranslator, MangaProducer, MangaTranslator)

Somewhat superfluous tables in my opinion. As long as there isn't any other information to be included that makes separate tables necessary, merging them would increase readability. That is, as long as producer and translator are pieces of information that are unambiguous for any anime, the producer's ID and the translator's ID should be included in the 'Anime' table. A different matter of course if there can be more than one producer (translator, respectively) per anime: Then two separate tables for each do make sense.

Same goes for AnimeGenre, by the way, but I think with genres it does happen quite often that there are several per film.

> producerAddress

What kind of address is this supposed to be? If it's a postal address, it should be broken down into whatever an address is supposed to consist of (-> semantics). That can be difficult, however; a Japanese address may have a completely different format than a European/American one (Japanese addresses often use housing block numbers instead of street names). I don't expect this field to denote a postal address though; it's probably a web address, and then a single VARCHAR field is fine.

> novelType; 'Status' table

What is this supposed to be?

> 'Manga' table

Should contain fields like manga name etc., but maybe you just haven't included them yet. By the way, the TINYINT(3) problem I've seen in the 'Anime' table applies to manga, too.

> VARCHAR(45)

You always use 45 as VARCHAR length; why? VARCHAR is usually stored as one (sometimes even two) byte(s) holding an integer number that denotes the (individual) length, followed by that number of bytes or characters. (Whether it's bytes or characters doesn't usually have to be of interest for the database user, but for the database driver developer. The byte and char lengths differ if the string contains non-ASCII characters, e.g. umlauts, kana, etc.) So database systems usually allow up to 255 as VARCHAR length; MySQL allows even 65535. There is no point (since no space gain) in using anything less than 255. I'd recommend this especially for fields that are supposed to hold web addresses (e.g. producerAddress?) as web addresses can easily get longer than 45 bytes.

One general remark: Relational databases are called relational databases because their tables represent finite relations over certain sets of (possible) values, each row corresponding to an (abstract) tuple. So if you want to design a relational database structure, it's worth trying to think in a 'relational' way, that is, regarding the items you want to represent and all their properties (excluding 'technical' ones like ID) as individuals and asking yourself what the relations between them are. For instance, 'X has property Y' is a basic relation between X and Y. You may then analyse the found relations to gain insight into how to make database tables out of them. In the context of database tables, one of the most important aspects of relations is whether they are '1:1', '1:n' (or 'n:1', respectively), or 'n:m'. '1:1' relations can (and should usually) be merged into larger relations (i.e. tables with more columns) while the others cannot.


 No.22038

File: 1435162401537.jpg (71.12 KB, 472x814, 236:407, 1406045847362.jpg)

>>21877

Thanks for the input. Status was ment for me to add things like: finished/airing/released/preorder and other stuff like that.

yeah address was ment for website and the varchar thing I probably misunderstood. I thought it would store less with 45 but it was the other way around.

Ill try to read some sql guides/websites about the things you mentioned to optimize my tables.


 No.22049

>>21565

>lets say you downloaded a doujin on exhentai, all you have to do is copy paste the html source of the page and give the folder path of where its saved in your computer and all the info are sent in a local SQL database. (tags by category, main category, file name, rating, date, and etc…)

damn this sounds great, something that bothered my with hydrus is that I had to tag everything myself


 No.22087

>>21757

>>21794

I use Hydrus to store single images, but I was never able to get it to store comics because you had to use regexes or something.

>>21565

My main question is will the program support comic book archives, because all of mine are stored in .cbz.


 No.24092

>>21807

Yes and no. It does grab tags, but you have to check a box to have it done, and the box is unchecked by default, though god knows why. If you want a tag automatically renamed, or for a certain tag to always imply another tag, you can set it do do that automatically. It automatically prevents duplicate images (by image hash) and has a function to search for near-identical images, but this is imperfect.

All in all, the only downside is having to tag several thousand porn images, gifs, flashes, and webms when you join the hydrus master race.

I don't see this doing much that Hydrus can't. Thanks for your effort, OP, but it's unfortunately not needed.


 No.33044

Found this gem:

http://pewpews.github.io/happypanda/

It does everything OP wants flawlessly


 No.33188

I feel like these sorts of things only work for either people with very small collections of doujinshi or very consistently organized collections.

I have rar and zip archives, images saved to folders, some folders are sorted by artist, others are just titles, some I don't have names or artists for and have them lumped under "Unknown 2" etc. And my collection numbers in the thousands.

I don't know if there's even an indexing program that can handle this mess. I've been gradually trying to update my collection when I have the time, but I think I've only managed to convert about 10% of it to properly named archives.

And this is only counting my doujinshi and manga collections, to say nothing of my gifs, webms, videos, or regular image collections (sorted by fetish and by author and by parody).


 No.33226

Add CRC32 Duplication checking, and importable DATA-TAGS that others can share on a universal website, and you have one hell of a system to share.

Like the old CDDB system once was. (The thing that everyone uses now to identify CD's DVD's and individual song-tracks, built into almost all CD/DVD players.)

You don't have to keep thumbnails of the images, just CRC32's of the images, for comparison. But base the CRC32's off the BMP equivalent of the image, not the file-CRC32 which may contain useless altered metadata.

You could extend that for similar images, if the images were all reduced to fit into 128x128 format and then CRC32 those standard thumbnails as a possible similarity. (Eg, when someone has stupidly re-compressed or reduced an image and uploaded or saved the crappy low-quality version.)

Using a per-pixel color-value threshold would be more accurate, like "VisPic" does for comparison of duplicates. (Which also rotates the images to a "highest value on top" for rotation comparison, and also does slip-trimming to detect "same image" inside of bordered images or from cropped versions of images.)

Include a direct link to googles "reverse image search", for images we are unsure of, and we can get more info to collect and share.

Why waste time reinventing the wheel. If we all share the data, we can cover more ground, only having to identify the unknowns, not keep re-identifying the knowns.




[Return][Go to top][Catalog][Post a Reply]
Delete Post [ ]
[]
[ home / board list / faq / random / create / bans / search / manage / irc ] [ ]