[ / / / / / / / / / / / / / ] [ dir / b2 / baaa / choroy / dempart / doomer / mde / pinoy / vichan ]

/jp/ - The Last Bastion of VIP

ゆっくりしていってね!
Winner of the 82rd Attention-Hungry Games
/tikilounge/ - Relax, take it easy

June 2019 - 8chan Transparency Report
名前
Eメール
題名
コメント *
ファイル
パスワード (Randomized for file and post deletion; you may also set your own.)
* = required field[▶ Show post options & limits]
Confused? See the FAQ.
埋め込む
(replaces files and can be used instead)
お絵かき
Show oekaki applet
(replaces files and can be used instead)
Options

Allowed file types:jpg, jpeg, gif, png, webm, mp4, swf, pdf
Max filesize is 16 MB.
Max image dimensions are 15000 x 15000.
You may upload 3 per post.


Listen to /jp/Radio! | /jp/'s board ring | Board log | Tearoom channel (we stream here damnit)
Our bunker is at nanashi@bbs.shiptoasting.com (SSH connection)!

File: 9b5ca17d79faebb⋯.gif (461.01 KB, 500x354, 250:177, 9b5ca17d79faebb5b06042d02f….gif)

 No.42797

As many of you are probably well-aware Geocities JP is set to go onto the chopping block within the next year. You can help Archive Team with that, there's probably a dozen redundant backups of the entire site by now, but I'm sure that more would always be appreciated. The thing is, archival isn't some transient issue that'll be resolved every time people are called into action to solve whatever the current crisis is. There's a CONSTANT crisis going on, and people can hardly be alerted to it every other day of the week. The fact that we aren't prepared for data erasure unless we're given prior warning is the issue here. You can't always just assume that somebody else has got the job done already.

How can you help? There's two simple ways to do this:

Download and seed torrents of things you like. Easy as pie. Just remember to always try to at least keep the ones which most closely reflect the original media, even if that somehow detracts from the experience for you. Downloading raws and using your own external subtitle files means nothing if you haven't got an actual ISO for anime, for example. Manga are kind of tricky, since often in cases where they do have digital editions, some content is cut out from the printed version, and there's no real definitive way to perfectly scan paper, as it's an analogue medium. As for games, downloading ISOs rather than floating directories with these shitty pre-installed patches for video games is kind of just a given, so please just do that.

Another way which might turn out to be substantially more useful to everybody if you have some money to burn (better the paper of money than the paper of books), is to buy media which isn't currently available anywhere you've searched on the Internet and scan it yourself. This methodology is less accessible to anyone living outside of Japan (as you can't really rent items from overseas, and renting is much less expensive than buying), but it's still possible through online marketplaces like eBay or Yahoo (or even Amazon, if it comes down to that). If you do this, it's very important that you remember not to make any alterations to the media before uploading it. Anime seems to get treated the worst, as transcodes will typically somehow appear much earlier than actual ISO images. And PLEASE use ISOs, every last bit of your disc must be captured or else it'll be absolutely worthless.

It should be kept in mind that paying for things that are already freely available to pirate is generally not only shameful, but also somewhat harmful to the archival effort. If you've got a copy of whatever media you've got on a disc, or in book form or whatever, chances are, you're not going to download and seed any torrents for it. If you've already got it, and it's already very available, then what's the point in keeping it on your hard drive as well, right? I guess that kind of attitude can be deliberately avoided, but I think the money would be better spent on some obscure crap that you won't necessarily enjoy, but just happened to not be on the net.

Thanks for reading these foolish words.

 No.42798

Nice, the email field doesn't even have enough space to fit a full email address. Very nice.


 No.42834


 No.42836

>>42834

「RECORDED!」


 No.42839

File: 16684dae5b67868⋯.png (27.44 KB, 596x428, 149:107, VIDEO GAMES.png)

I'd argue that improved versions of existing material is worth preserving over whatever shitty original exists. Especially video games, having the original ISO is great and all but if it doesn't fucking work without patches and cracks what's the point. Hell the reason why more GOOD archives haven't been uploaded to trackers is due to the autistic rules policing for quality. Obscure media is obscure for a reason, and people won't preserve files that they don't deem worth preserving. Tragic I know, but I believe the best I can do for the cause is keep whatever I have, and offer it to others on request.


 No.42841

File: 1083324fd22b886⋯.webm (5.01 MB, 1280x720, 16:9, 1083324fd22b8866a858d43f1….webm)


 No.42842

>>42839

Actually, exactly my problem with torrent trackers lately is that they often refuse to distribute the original ISO at all. Gazelle Games for example, has this strict policy where undub patches, translations, and cracks are STRICTLY meant to be used on the game beforehand. What the FUCK is up with that! Do they really think I have the space on my hard drive to keep 3 different versions of a game when I shouldn't have to, instead of just keeping one, and two really really small files that'll give those guys what they want? It's fucking ridiculous. Probably one of the worst things about it is that emulators nowadays are often able to do this thing called "live patching" where those retards who can't even choose two files in a program patcher don't even need to go THAT far.

Anyway, I'll keep pretending to myself that I DO have the space to keep a billion different versions of every game, but space is running thin. Fuck you.


 No.42843

i really like modern jaypee


 No.42844


 No.42899

>>42843

Then copy it, while I continue to act oblivious to your harsh criticism!


 No.43400

>every last bit of your disc must be captured or else it'll be absolutely worthless

Holy hyperbole, dude. All data has worth, even if it's not the same worth. I'd be happy to get encoded video too. I'd even prefer it as an end user for convenience and space consumption. Certainly better than nothing regardless, as that's the difference in seeing/playing it or not.

If people would ONLY backup disc images or nothing, there would be almost no PC98 game rips at all. Because by time people wiling to archive them bothered, only installed games existed for most. So they pretty much had to backup the hard drives or nothing. To call that worthless just because it's impure is beyond absurd.

Ideally I'd like disc backups as well, yes. But they're a far secondary goal. Sharing the media in a form people can easily store and use it is more important. Especially since that makes more people willing to hold and seed stuff, which means more lively torrents and whatnot.


 No.43601

>>42843

Is this sarcasm?


 No.43602

I got a Senran Kagura art book am I supposed to scan that my dudes? well I don't have a scanner.


 No.43603

>>43602

Commercial scanners are no good, you've got to get that into a professional bootlegger.


 No.43604

>>43400

It is worthless. You don't collect games and other media just for the temporary, superficial purpose of enjoying them.


 No.43631

>>42797

It would be very nice if a soft translation format emerged for manga. Nobody hardsubs anime any more. It's about time the same transition happened for manga.


 No.43632

>>43604

To insert a bizarre philosophical claim in your otherwise sensible post is a classic technique for keeping the conversation going!


 No.43633

File: a8f345d47a20bce⋯.jpeg (1.71 MB, 3264x2448, 4:3, 050D9D3D-2F7F-473E-9475-F….jpeg)

>>43603

Woah. I think you are right. The people need to be able to see this.


 No.43639

>>42797

Nothing because all must come to an end. The Internet is very superficial, which is a real damn shame.


 No.43791

Nothing, but a 2ch poster is lurking uboachan right now, apparently, if any of you have input.

https://uboachan.net/fg/res/14021.html


 No.43953

File: 50484a7c8f40db6⋯.png (146.58 KB, 362x366, 181:183, Moon.png)


 No.43954

>>43953

calm down dude I pirated more games okay


 No.43955

>>43954

An entire website needs to be pirated and it seems like nobody could really be assed to do any manual work up until this point.

Of course, at this point, saving the entire thing would probably be comparable to holding up the moon.

>Yahoo!ジオシティーズは2019年3月31日をもちましてサービスの提供を終了いたします

30 hours remain.


 No.43961

─あと24時間─


 No.43969

All I know is that, The japanese make love through a hole in their "tatami" mat. Is that enough?


 No.43971

YouTube embed. Click thumbnail to play.


 No.43974

>>43955

>>43961

Is GeoCities Japan closed?


 No.43976

>>43974

Yeah. Whatever was recovered from it, will probably be posted about here, https://archiveteam.org/index.php?title=GeoCities_Japan eventually, one day.


 No.43981

I thought the whole namefag just for a single thread thing might be kind of cool, but it ended up just feeling really lame. Apologies to anybody I annoyed doing this.


 No.43987

http://www.asahi-net.or.jp/~AD8y-hys/index.htm Copied!

I can't ignore the voices of any poor website! Can't you hear them calling?


 No.44112

Youtube/Any video hosting site in existence:

Youtube-DL https://github.com/ytdl-org/youtube-dl

Typical command:

youtube-dl -o "/yourfavouriteabsolutepathtoadirectory/%(uploader)s/%(upload_date)s - %(title)s.%(ext)s" --netrc --write-info-json --write-thumbnail --write-annotations --write-description --download-archive "/yourfavouriteabsolutepathtoadirectory/downloads\ list" {CHANNEL/VIDEO URL}

o: Output path. Variables like %(uploader)s and stuff can be read about on their github wiki.

netrc: Login details for various sites. You'll need one for niconicodouga.

write-annotations: Nevermind, rest in peace.

download-archive: A list of video IDs that have already been downloaded to read from and write to, for skipping redundant downloads. You can avoid using it if you prefer, or put it in another directory.

Twitter:

Twint https://github.com/twintproject/twint

Typical command (BE GRATEFUL THIS TOOK FUCKING AGES):

A shell script made of spaghetti code attached to this post, by yours truly. Twint in itself isn't actually intended for this kind of function, so I had to include it in a string of commands. It doesn't have anything helpful like a manual or a "help" option, so here's the rundown:

You give it a twitter username (that's their @), and it'll download all of their tweets to a file, which gets sorted into chronological order. Then, the script downloads all images and videos they've posted. It even continues where it left off (if the username is still the same) (I think this works, but all tests say it doesn't) and everything. One word of warning though, if you're scraping an account for the first time, you have to let the script get all the way to the beginning. Otherwise, it'll pick up from the most recent tweet it's scraped and continue from there, you'll have to get the rest manually.

Defective though it may be, I'm going to upload it anyway. Can't attach this actually, so I'll just paste this: https://pastebin.com/P5j1Ru4C

Mediawiki/Probably any kind of wiki software:

WikiTeam https://github.com/WikiTeam/wikiteam

Typical command:

dumpgenerator.py (< move this somewhere convenient, like /usr/bin/wikiteam-dump) {LINK TO WIKI} --xml --images --path . --resume

Downloads every page of a wiki, and each of those page's respective histories into a single file. This isn't very convenient for readability, but there's tools to restore the files to their normal form. It also downloads all images (not sure about audio and video), but only how they are in their present form.

Unless you're batshit insane, don't even think of trying to dump Wikipedia. They already have their own, regularly updated dumps you can download from.

Websites:

GNU Wget https://www.gnu.org/software/wget/

Typical command:

wget -e robots="off" --mirror {SITE URL}

Please be careful to only download small, static sites through this method. If it's a site that generates pages,, you're going to give yourself and the webmaster a very hard time if you don't get a little creative, especially with robots.txt checking turned off (so many of them just seem to think they know what's better for their site than I do, the GALL of them!). In those situations, the include-directories and exclude-directories option will probably be helpful to you. Avoid the accept or reject options, as while they don't clutter up your filesystem, they still do download the entire file anyway for some reason, before deciding to delete it.

Flash and Javascript are a major pain in the ass, and even though wget should be able to navigate through them for embedded links, it just decides not to. For those, you'll have to go onto the pages yourself, open your web browser's debug tools (generally F12 or something, I don't know), and see if you can get a list of assets while they're loading. And then you're going to have to wave your cursor around, click on the flash files and stuff, and manually download the files as they appear in the log. It's a massive pain in the ass, but I guess it gives you something to do.


 No.44113

Pixiv galleries/Sadpanda:

Gallery-DL https://github.com/mikf/gallery-dl

Typical command:

gallery-dl {URL}

Configuring this program is very, very annoying, so I'll just let you figure that out on your own. With a proper setup, you can download galleries into pre-zipped, pre-cbz'd archives, named according to the Japanese titles.

https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst

https://github.com/mikf/gallery-dl/blob/master/docs/gallery-dl-example.conf

https://github.com/mikf/gallery-dl/blob/master/docs/gallery-dl.conf

These files are going to be interesting to you, but they'll get much less interesting once you realise you've been spending two hours trying to figure everything out. Hang in there.

Messageboards (like /jp/!):

GNU Wget https://www.gnu.org/software/wget/ / A specialised program if you can find one, it's probably better

Alright these can be tricky, and it's often different on a case by case basis.

A lot of boards with have a /boardname/res/ directory, and if it's publically available, then things are going to be very easy for you. You just wget -x -i those directories, and you'll be graciously given all of the threads on that board, without having to worry about the site generating a web page for every single post, and every single combination of every single post (seriously, Kareha does this).

You'll generally want to wget --page-requisites the index of each board too, just to make sure that you've got the page's CSS, extra images, and stuff.

Here's a specialised shell script I made for 8chan, it only works on a thread-by-thread basis. I think you can find a list of threads at boardname/threads.json or similar API points, but to exploit that I'd have to do something like I just did for the Twitter script, and I really don't feel like that right now. Here's the incompleted version anyway, you just feed it thread URLs, and it works with several.


#!/bin/sh
cd "/yourfavouriteabsolutepathtoadirectory/"
wget --page-requisites --timestamping $@
gallery-dl --ignore-config --option "base-directory=./media.8ch.net/file_store/" --option "filename={tim}{ext}" --option "directory=" $@

Anything else you need, there's a good chance you can find it by running a search for "[x] scraper" in Github.

>>43632

If you don't take your philosophy to its logical extreme, then it's open to attack.


 No.44120

>>44112

oh shit, I forgot to give one of the @s a dollar sign

that's probably why continuing didn't work

don't use twitscrape's continue feature unless you fix it yourself, otherwise it won't continue


 No.44226

>>43604

It's not worthless. And I do. Enjoying them is about 40% of the motivation for collecting them.

Having content for review and sharing with others is the majority of the rest of the motivation.

Having absolutely perfect rips of anything at all is but a small portion of what I care about. I only ever go for that when it's actually feasible and reasonable. Such as with video game console collections small enough to fit in a few dozen GB like most of No-Intro; if the most perfect rip is right in front of me already without exhaustive search; or if it's the only version I can actually find.

Regardless of motivation, having the content accessible in the first place is the absolute goal. Perfect rips are just icing on the cake, so to speak.

Having things in a usable format is inherently desirable, even for archival.

Even if you consider it imperfect, even if you consider it of less worth than perfect data, it's still data, which is miles better than NO data. And it's usable data at that.

But I'd go so far as to say imperfections may actually ADD worth. Particularly with the including of subtitles or removal of anti-piracy messages and advertisements. Or in some cases, like this one rip of Qwaser I have, drastically cleaning up the fucking awful scaling for the bluray release by pre-filtering, so that I don't have to waste electricity doing it myself.

I do not like dealing only in absolutes like all or nothing. Because it's absolutely retarding. Literally so. And also generally impossible.

Worth less? Maybe; arguably at least. Worthless? Fuck no. All discernible data has worth, even if it's relatively worth less than other data.

>>44113

I'd argue that if you ONLY take your philosophy to its logical extreme then it's even more open to attack. Due to having no room for rationality and reason.

Hell, such extremist philosophy deserves to be attacked, and it's begging for it.

Like, I love that you want to archive things, and that you want to archive them as pure as possible. I don't mind that you personally aren't intending to archive anything impure. But to forsake anything impure as absolutely valueless? That gets under my skin. It's like you don't care about the content of the data at all, whatsoever, and just have a fetish for the data itself.


 No.44228

>>44226

Information is cheap, data is invaluable.

Well I guess I'm not actually serious about that, it's just part of my character here.


 No.44291

>>44228

Information isn't always cheap. In fact, people often work to its limit availability to try and make expensive. Especially real life secrets, but also simple media.

Ideally, information for simple media would be cheap and readily available. Maybe not immediately, for the sake of profit and production, but eventually at least.

Ideals aren't realistic though.

Data itself may be considered invaluable, in certain contexts. But data without the information it contains is worthless; simple noise.

Even though the data and format do indeed matter, contents matter more.

Only when the contents are guaranteed to exist and be obtainable do the formats come into play. And even then it is often quite a tradeoff, for space, convenience, and sometimes finance if it must be purchased.

I'd still gladly take a crappy XVID encode of a series from the days of old, if the only other option is not having it at all due to being lost to time.

If I can find a better format then great.

The XVID encode would be worth less, but not worthless, as it still contains the contents however degraded they may be.

I do wish you well though. Archive all you can.


 No.44298

>>44291

Thanks, and sorry if my opinions seem unreasonable. The amount of junk rips compared to proper copies makes it seem like absolutely anything is available on the internet, as long as you're happy with getting an inferior or legitimately incomplete (that is, not by my own insane standards) version. But I guess that's probably not the case, I'd probably have an easier time knowing if I actually downloaded things with the intention of using them.

Unless we start travelling through time and start digitising the space around a master film reel/the first performance of a piece of classical music/the really big fish dad caught but didn't take a photo of, I guess there's no point in trying to archive everything, anyway.

Do I want to archive things to make them more accessible, or just so they can keep existing in some form? If it's the second, there's probably some bullshit metaphysics I could come up with to relieve myself. Otherwise, I guess archiving anything I see would be the best I can do. Maybe I should be a little more reasonable about my goals.




[もどる][トップへもどる][カタログ][Nerve Center][Cancer][Post a Reply]
投稿削除 [ ]
[]
[ / / / / / / / / / / / / / ] [ dir / b2 / baaa / choroy / dempart / doomer / mde / pinoy / vichan ]