File: 9b5ca17d79faebb⋯.gif (461.01 KB, 500x354, 250:177, 9b5ca17d79faebb5b06042d02f….gif)


As many of you are probably well-aware Geocities JP is set to go onto the chopping block within the next year. You can help Archive Team with that, there's probably a dozen redundant backups of the entire site by now, but I'm sure that more would always be appreciated. The thing is, archival isn't some transient issue that'll be resolved every time people are called into action to solve whatever the current crisis is. There's a CONSTANT crisis going on, and people can hardly be alerted to it every other day of the week. The fact that we aren't prepared for data erasure unless we're given prior warning is the issue here. You can't always just assume that somebody else has got the job done already.

How can you help? There's two simple ways to do this:

Download and seed torrents of things you like. Easy as pie. Just remember to always try to at least keep the ones which most closely reflect the original media, even if that somehow detracts from the experience for you. Downloading raws and using your own external subtitle files means nothing if you haven't got an actual ISO for anime, for example. Manga are kind of tricky, since often in cases where they do have digital editions, some content is cut out from the printed version, and there's no real definitive way to perfectly scan paper, as it's an analogue medium. As for games, downloading ISOs rather than floating directories with these shitty pre-installed patches for video games is kind of just a given, so please just do that.

Another way which might turn out to be substantially more useful to everybody if you have some money to burn (better the paper of money than the paper of books), is to buy media which isn't currently available anywhere you've searched on the Internet and scan it yourself. This methodology is less accessible to anyone living outside of Japan (as you can't really rent items from overseas, and renting is much less expensive than buying), but it's still possible through online marketplaces like eBay or Yahoo (or even Amazon, if it comes down to that). If you do this, it's very important that you remember not to make any alterations to the media before uploading it. Anime seems to get treated the worst, as transcodes will typically somehow appear much earlier than actual ISO images. And PLEASE use ISOs, every last bit of your disc must be captured or else it'll be absolutely worthless.

It should be kept in mind that paying for things that are already freely available to pirate is generally not only shameful, but also somewhat harmful to the archival effort. If you've got a copy of whatever media you've got on a disc, or in book form or whatever, chances are, you're not going to download and seed any torrents for it. If you've already got it, and it's already very available, then what's the point in keeping it on your hard drive as well, right? I guess that kind of attitude can be deliberately avoided, but I think the money would be better spent on some obscure crap that you won't necessarily enjoy, but just happened to not be on the net.

Thanks for reading these foolish words.


Nice, the email field doesn't even have enough space to fit a full email address. Very nice.






File: 16684dae5b67868⋯.png (27.44 KB, 596x428, 149:107, VIDEO GAMES.png)

I'd argue that improved versions of existing material is worth preserving over whatever shitty original exists. Especially video games, having the original ISO is great and all but if it doesn't fucking work without patches and cracks what's the point. Hell the reason why more GOOD archives haven't been uploaded to trackers is due to the autistic rules policing for quality. Obscure media is obscure for a reason, and people won't preserve files that they don't deem worth preserving. Tragic I know, but I believe the best I can do for the cause is keep whatever I have, and offer it to others on request.


File: 1083324fd22b886⋯.webm (5.01 MB, 1280x720, 16:9, 1083324fd22b8866a858d43f1….webm)



Actually, exactly my problem with torrent trackers lately is that they often refuse to distribute the original ISO at all. Gazelle Games for example, has this strict policy where undub patches, translations, and cracks are STRICTLY meant to be used on the game beforehand. What the FUCK is up with that! Do they really think I have the space on my hard drive to keep 3 different versions of a game when I shouldn't have to, instead of just keeping one, and two really really small files that'll give those guys what they want? It's fucking ridiculous. Probably one of the worst things about it is that emulators nowadays are often able to do this thing called "live patching" where those retards who can't even choose two files in a program patcher don't even need to go THAT far.

Anyway, I'll keep pretending to myself that I DO have the space to keep a billion different versions of every game, but space is running thin. Fuck you.


Then copy it, while I continue to act oblivious to your harsh criticism!


>every last bit of your disc must be captured or else it'll be absolutely worthless

Holy hyperbole, dude. All data has worth, even if it's not the same worth. I'd be happy to get encoded video too. I'd even prefer it as an end user for convenience and space consumption. Certainly better than nothing regardless, as that's the difference in seeing/playing it or not.

If people would ONLY backup disc images or nothing, there would be almost no PC98 game rips at all. Because by time people wiling to archive them bothered, only installed games existed for most. So they pretty much had to backup the hard drives or nothing. To call that worthless just because it's impure is beyond absurd.

Ideally I'd like disc backups as well, yes. But they're a far secondary goal. Sharing the media in a form people can easily store and use it is more important. Especially since that makes more people willing to hold and seed stuff, which means more lively torrents and whatnot.



I got a Senran Kagura art book am I supposed to scan that my dudes? well I don't have a scanner.



Commercial scanners are no good, you've got to get that into a professional bootlegger.



It is worthless. You don't collect games and other media just for the temporary, superficial purpose of enjoying them.



It would be very nice if a soft translation format emerged for manga. Nobody hardsubs anime any more. It's about time the same transition happened for manga.



To insert a bizarre philosophical claim in your otherwise sensible post is a classic technique for keeping the conversation going!


File: a8f345d47a20bce⋯.jpeg (1.71 MB, 3264x2448, 4:3, 050D9D3D-2F7F-473E-9475-F….jpeg)


Woah. I think you are right. The people need to be able to see this.



Nothing because all must come to an end. The Internet is very superficial, which is a real damn shame.


Nothing, but a 2ch poster is lurking uboachan right now, apparently, if any of you have input.



File: 50484a7c8f40db6⋯.png (146.58 KB, 362x366, 181:183, Moon.png)



An entire website needs to be pirated and it seems like nobody could really be assed to do any manual work up until this point.

Of course, at this point, saving the entire thing would probably be comparable to holding up the moon.


30 hours remain.




YouTube embed. Click thumbnail to play.




Is GeoCities Japan closed?



Yeah. Whatever was recovered from it, will probably be posted about here, https://archiveteam.org/index.php?title=GeoCities_Japan eventually, one day.


I thought the whole namefag just for a single thread thing might be kind of cool, but it ended up just feeling really lame. Apologies to anybody I annoyed doing this.


http://www.asahi-net.or.jp/~AD8y-hys/index.htm Copied!

I can't ignore the voices of any poor website! Can't you hear them calling?


Youtube/Any video hosting site in existence:

Youtube-DL https://github.com/ytdl-org/youtube-dl

Typical command:

youtube-dl -o "/yourfavouriteabsolutepathtoadirectory/%(uploader)s/%(upload_date)s - %(title)s.%(ext)s" --netrc --write-info-json --write-thumbnail --write-annotations --write-description --download-archive "/yourfavouriteabsolutepathtoadirectory/downloads\ list" {CHANNEL/VIDEO URL}

o: Output path. Variables like %(uploader)s and stuff can be read about on their github wiki.

netrc: Login details for various sites. You'll need one for niconicodouga.

write-annotations: Nevermind, rest in peace.

download-archive: A list of video IDs that have already been downloaded to read from and write to, for skipping redundant downloads. You can avoid using it if you prefer, or put it in another directory.


Twint https://github.com/twintproject/twint


A shell script made of spaghetti code attached to this post, by yours truly. Twint in itself isn't actually intended for this kind of function, so I had to include it in a string of commands. It doesn't have anything helpful like a manual or a "help" option, so here's the rundown:

You give it a twitter username (that's their @), and it'll download all of their tweets to a file, which gets sorted into chronological order. Then, the script downloads all images and videos they've posted. It even continues where it left off (if the username is still the same) (I think this works, but all tests say it doesn't) and everything. One word of warning though, if you're scraping an account for the first time, you have to let the script get all the way to the beginning. Otherwise, it'll pick up from the most recent tweet it's scraped and continue from there, you'll have to get the rest manually.

Defective though it may be, I'm going to upload it anyway. Can't attach this actually, so I'll just paste this: https://pastebin.com/P5j1Ru4C

Mediawiki/Probably any kind of wiki software:

WikiTeam https://github.com/WikiTeam/wikiteam

Typical command:

dumpgenerator.py (< move this somewhere convenient, like /usr/bin/wikiteam-dump) {LINK TO WIKI} --xml --images --path . --resume

Downloads every page of a wiki, and each of those page's respective histories into a single file. This isn't very convenient for readability, but there's tools to restore the files to their normal form. It also downloads all images (not sure about audio and video), but only how they are in their present form.

Unless you're batshit insane, don't even think of trying to dump Wikipedia. They already have their own, regularly updated dumps you can download from.


GNU Wget https://www.gnu.org/software/wget/

Typical command:

wget -e robots="off" --mirror {SITE URL}

Please be careful to only download small, static sites through this method. If it's a site that generates pages,, you're going to give yourself and the webmaster a very hard time if you don't get a little creative, especially with robots.txt checking turned off (so many of them just seem to think they know what's better for their site than I do, the GALL of them!). In those situations, the include-directories and exclude-directories option will probably be helpful to you. Avoid the accept or reject options, as while they don't clutter up your filesystem, they still do download the entire file anyway for some reason, before deciding to delete it.

Flash and Javascript are a major pain in the ass, and even though wget should be able to navigate through them for embedded links, it just decides not to. For those, you'll have to go onto the pages yourself, open your web browser's debug tools (generally F12 or something, I don't know), and see if you can get a list of assets while they're loading. And then you're going to have to wave your cursor around, click on the flash files and stuff, and manually download the files as they appear in the log. It's a massive pain in the ass, but I guess it gives you something to do.


Pixiv galleries/Sadpanda:

Gallery-DL https://github.com/mikf/gallery-dl

Typical command:

gallery-dl {URL}

Configuring this program is very, very annoying, so I'll just let you figure that out on your own. With a proper setup, you can download galleries into pre-zipped, pre-cbz'd archives, named according to the Japanese titles.




These files are going to be interesting to you, but they'll get much less interesting once you realise you've been spending two hours trying to figure everything out. Hang in there.

Messageboards (like /jp/!):

GNU Wget https://www.gnu.org/software/wget/ / A specialised program if you can find one, it's probably better

Alright these can be tricky, and it's often different on a case by case basis.

A lot of boards with have a /boardname/res/ directory, and if it's publically available, then things are going to be very easy for you. You just wget -x -i those directories, and you'll be graciously given all of the threads on that board, without having to worry about the site generating a web page for every single post, and every single combination of every single post (seriously, Kareha does this).

You'll generally want to wget --page-requisites the index of each board too, just to make sure that you've got the page's CSS, extra images, and stuff.

Here's a specialised shell script I made for 8chan, it only works on a thread-by-thread basis. I think you can find a list of threads at boardname/threads.json or similar API points, but to exploit that I'd have to do something like I just did for the Twitter script, and I really don't feel like that right now. Here's the incompleted version anyway, you just feed it thread URLs, and it works with several.

cd "/yourfavouriteabsolutepathtoadirectory/"
wget --page-requisites --timestamping $@
gallery-dl --ignore-config --option "base-directory=./media.8ch.net/file_store/" --option "filename={tim}{ext}" --option "directory=" $@

Anything else you need, there's a good chance you can find it by running a search for "[x] scraper" in Github.


If you don't take your philosophy to its logical extreme, then it's open to attack.



oh shit, I forgot to give one of the @s a dollar sign

that's probably why continuing didn't work

don't use twitscrape's continue feature unless you fix it yourself, otherwise it won't continue

