[ / / / / / / / / / / / / / ] [ dir / general / hisrol / htg / leftpol / magali / marx / sw / zoo ]

/qresearch/ - Q Research Board

Research and discussion about Q's crumbs
Old tripcode cracked. Do not believe any posts with that tripcode from now on.
/bant/ rocks, Dr.Medic is an ass
Name
Email
Subject
Comment *
File
Password (Randomized for file and post deletion; you may also set your own.)
* = required field[▶ Show post options & limits]
Confused? See the FAQ.
Embed
(replaces files and can be used instead)
Options

Allowed file types:jpg, jpeg, gif, png, webm, mp4
Max filesize is 16 MB.
Max image dimensions are 15000 x 15000.
You may upload 5 per post.


Attention newfags: Leave the Name/Email field blank for your safety/anonymity. Do not create new threads. You may post in any thread already created. Thank you.

File: 008d0a7934ff522⋯.jpg (76.31 KB, 800x450, 16:9, howToArchive.jpg)

23d7ce No.974637

This is a guide for archiving websites offline using HTTrack:

https:// www.httrack.com/

If someone knows of a different method, please feel free to talk about it here. Also, I'm no expert on this…so any tips are welcome.

The benefits of copying (aka "mirroring") websites or website pages offline are myriad. For one, you know it won't be deleted unless you delete it. For another, you get a complete copy of the structure of the site–from the directory structure on down. This has actually led me to find folders that I wouldn't have otherwise knew existed, as well as other things that the designer might have tucked away.

The downside is that it can be slightly complicated. That's why this anon is writing this guide–to help you identify common errors, and how to overcome them. By twiddling with certain settings and understanding what they mean, you'll be getting results in no time.

23d7ce No.974690

File: d541b745f7b6e81⋯.png (91.74 KB, 1467x505, 1467:505, Capture1.PNG)

Step 1: go to this page and select the version appropriate for your system:

https:// www.httrack.com/page/2/en/index.html

Go ahead and install it where you want. I can't remember what options pop up, but if it asks you where you would like to archive your stuff, give it an appropriate directory–there will be one folder for each site/page you download, so I recommend you have a completely separate directory/folder so it doesn't mess everything else up.


23d7ce No.974732

File: 5e80f077fab7be0⋯.jpg (38.9 KB, 832x774, 416:387, Capture2.jpg)

Step 2: Once you've installed it, click "next." The blackened area in the image is where you'll see your directory structure. I've hidden mine so you don't see my 2.9 TB directory of nasty midget porn.


23d7ce No.974792

File: b65529dba1c629f⋯.jpg (132.26 KB, 1353x794, 1353:794, Capture3.jpg)

Step 3: Give your project an appropriate name; I recommend naming it after the website that you're going to mirror. After that, put it into an appropriate category–in my example, I've put it under "qresearch_NK", which is where I've put my other North Korea-related mirrors as well.

After that, click on "next"


23d7ce No.974849

File: 4de674c14173c52⋯.jpg (43.17 KB, 838x768, 419:384, Capture4.jpg)

Step 4: copy & paste the url from your browser into the indicated box. Before you move forward, the most important step comes up: you have to set the options.

These options are rarely "one size fits all." Different websites have different setups, so you've got to adapt your setup in order to get what you want. We'll get into that next.

After your options are set, click "next"


23d7ce No.974883

File: ba318e7d6f2356b⋯.jpg (20.75 KB, 513x435, 171:145, Capture5.jpg)

Step 4a: If you aren't using a proxy, un-click the "use proxy for ftp transfers" box under the "Proxy" tab.


23d7ce No.975019

File: 18ed4bd28c2eb56⋯.jpg (42.16 KB, 513x440, 513:440, Capture6.jpg)

step 4b: Under the "Scan Rules" tab, make sure you check each of the boxes if you want to download that type of media for the page. Typically you want to get the pictures that go with the site, so check the "gif, jpg, jpeg…" box. If there are movies on the site that you want, check the "mov, mpg, mpeg…" box. If the website has files that you can download, select the "zip, tar, tgz…" box.

What you select is really about what you're after–if you want a complete record, select all of them…but if you just need the text, don't check any. Whether or not you select these items can make a huge difference in how big the result is–but don't worry: if it's taking to long or the result is getting too huge, you can always cancel and try again.


23d7ce No.975096

File: 02bed38f2967de2⋯.jpg (31.21 KB, 513x437, 27:23, Capture7.jpg)

Step 4c: This setting will tell HTTrack how to go through the website. If you want more information, go here:

https:// moz.com/learn/seo/robotstxt

I'd say keep it off, but sometimes you run into issues with this setting…so I'm mentioning it here because having the wrong setting often gives an error, and you can try twiddling it between "follow" and "don't follow" to fix the error.


23d7ce No.975203

File: 9bc06354fbc594c⋯.jpg (32.91 KB, 513x439, 513:439, Capture9.jpg)

File: 6b9b98dd46f7db7⋯.jpg (44.88 KB, 830x769, 830:769, Capture12.jpg)

Step 4d: Under the "Browser ID" tab, you have the option of setting your "Browser Identity". I'm guessing that, by telling the website which browser that you're using, the website will present certain features in order to take advantage of that browser. If you find that you get an error almost immediately after trying to move forward (like pic related), go into your options and change these to "none" and it should clear it up.


23d7ce No.975296

File: 3efe64c3c3e1cb5⋯.jpg (34.33 KB, 512x437, 512:437, Capture10.jpg)

Step 4e: If, after you mirrored the website you've found that you didn't get what you wanted, you might try messing with these settings. Essentially what they do is tell HTTrack how to move about the website–can it only move downward through the directory structure, or can it go upward as well?

Depending on how the website is set up, you may have to mess with these…but I suggest making the settings slightly less restrictive each time, until you get only what you need. The reason I say this is that you may find yourself downloading all manner of things from every website connected to your target–every ad from the ad sites, every movie from links, etc. When I was downloading liddlekidz.org, I found myself well past 2 GB before I realized that I wasn't just getting stuff from that website–I was pulling stuff from at least a dozen websites, and most of it was being downloaded before the stuff from liddlekidz. So be conservative here, otherwise you're wasting your time and hard drive space.


23d7ce No.975311

File: 6a52af0353b7647⋯.jpg (45.12 KB, 835x769, 835:769, Capture11.jpg)

Step 5: after you've set your options and clicked "next", you'll get to this page. Just click "next," and hopefully everything goes well.


23d7ce No.975521

File: 831b685b5bd98de⋯.jpg (34.55 KB, 829x769, 829:769, Capture13.jpg)

File: 133639a7d1a7608⋯.png (2.94 KB, 259x66, 259:66, Capture14.PNG)

Step 6: After HTTrack has completed, you'll get this page. If there's an error, you'll get a flashing notifier–you can take a look at the log to get the details, and use that information to search the web for a solution. For the most part, twiddling with the settings that I've mentioned will handle any of the errors you get…and it won't take long before you get familiar with them.

Sometimes, an error is essentially meaningless. For instance, I often get errors that state that HTTrack couldn't get an image from an ad site because of my settings–that's not important, so I don't worry about it.

You can click on "browse mirrored site" to see how your copy looks. If you're unhappy, change the options and try again.

Finally, you can go into your archive folder, and you'll see a new folder with the project. You can go in there any time, click on "index.html," and it will open up your fresh new copy of the website.

Now get out there and archive offline!

One final note: if you find something really, like that image of Hillary sacrificing children to Moloch that we all know is floating around, make sure to archive first before you come here and tell everyone else. We know that 8ch is being watched, and by blabbing what you've found you're giving them a chance to pull their stuff offline before anyone else can get to it. But once you've got it, then by all means, tell everyone–the more people there are that have a copy, the better it is for you…after all, you don't want to be the only person with that kind of evidence on your hard drive, do you?

Happy archiving!


ae3e60 No.978477

File: 763fc0d2f62af43⋯.png (310.01 KB, 1836x904, 459:226, The-Best-Tools-for-Saving-….png)

File: 9c76c36dc3a8818⋯.png (99.32 KB, 732x2358, 122:393, The-Best-Tools-for-Saving-….png)

File: 6e1097ea3288c25⋯.png (125.14 KB, 3450x1550, 69:31, Wikipedia-Synopsis-for-Wge….png)

File: 6bbfbb564deb481⋯.png (119.4 KB, 944x986, 472:493, GitHub-Synopsis-for-Wget.png)

There are a LOT of options for archival, as listed in graphic (1) and article (2). There are synopses for Wget in (3) and (4).

I prefer Wget, for the simple reason of power and flexibility. Those of you who use *nix prolly already know this, but the mirror of choice is Wget, and that goes really well if you have access to s VPS. You can queue it up and then tgz and sftp it when it's complete. Sometimes it can take days to mirror a full site if they've got aggressive leech protection.

You'll want to be aware of the robots option and the retry option, if you notice a server blocking your access because of too many requests in rapid succession, or a bitchy robots.txt.

       MAN:: Wget - The non-interactive network downloader.

'''SYNOPSIS'''
wget [option]... [URL]...

'''OPTIONS'''
Download Options
-w seconds
--wait=seconds
Wait the specified number of seconds between the retrievals. Use of this option is recommended, as it lightens the server load by making the requests less frequent. Instead of in seconds, the time can be specified in minutes using the "m" suffix, in hours using "h" suffix, or in days using "d" suffix.

Specifying a large value for this option is useful if the network or the destination host is down, so that Wget can wait long enough to reasonably expect the network error to be fixed before the retry. The waiting interval specified by this function is influenced by "--random-wait", which see.

My recommended initial configuration is below, but I'm sure you can tailor it to suit your needs.

wget --mirror --page-requisites --adjust-extension --no-parent --no-clobber --no-check-certificate --convert-links -e robots=off https:// example.com/

Happy archiving.


3d2665 No.978833

>>978477

for those who use IPFS, i wont go into how to install it, look it up, it's a decentralized filesystem, this script allows you to wget a website and upload it to your ipfs repo

https:// github.com/VictorBjelkholm/ipfscrape


dca4c0 No.980849

>>975521

Great guide, cheers anon


9d7f52 No.982172

>>974637

Thank you TechExperts from a Research&InfoDistributor Luddite.


9d7f52 No.982196

>>982172

Correction: Research-er


9d7f52 No.982212

>>978833

>>978477

>>982172

Thank you TechExperts from a Researcher&InfoDistributor Luddite.


9d7f52 No.982298

>>974690

>>974637

>>978477

>>978833

Brilliant. This was what I was asking for.

1. It should linked to at the top of each new bread's Resources,

AND

2. on first page of qresearch index with the "ARCHIVE, ARCHIVE, ARCHIVE EVERYTHING OFFLINE" instructions for "newfags"/"normies".


67f895 No.983930

>>978477

Have used wget before but always with hesitation because the options don't seem to always do the expected.

When testing your suggested wget + parms I don't see any images being downloaded…?

Can you speak to this and recommend a fix?

On Dec 17th someone sent me this wget command which scrapes all the images referenced in a single thread:

wget -P ./thread/ -nd -r -l 1 -H -D media.8ch.net -A png,gif,jpg,jpeg,webm https:// insert_thread_URL_here.html

and it works but does not fetch the fullsize images, only the thumbnails, and I haven't figured out how to modify your wget mirror -the-whole-qresearch command to also fetch images and adjust references so the local HTML pages refer to locally-mirrored images.

I just don't have the time or patience to work this out.


9d7f52 No.984029

>>974637 (OP)

>>978477

>>978833

>>983930

A serious question: How would a clever anon go about saving a person of interest's entire twitter feed from inception, including pics, in case of deletion?

I get the feeling that these methods can narrow select/focus the file folder to be copied, yes?


f2398b No.984291

>>984029

I'm not really sure about twitter specifically; I don`t use it. But when researching POTUS' tweets I came across a couple of good websites and figured that they are using twitter's api.

For people that don't program, an "api" is an "application programming interface," which is basically a set of tools for you to get what you want, designed by the maker of the app. It works in the best interest of these social media companies to develop a good api because it allows others to use and display their stuff on other websites–free advertising and spread of influence. I'll look further into it, as it relates to an app I'm working on.

As far as using HTTrack on it, I haven't tried. For the most part I've had success with websites that don't involve user accounts; there may be a way around that, but again, I'm pretty new to it myself. Just taught myself like 4 days before Q highlighted liddlekidz.


9d7f52 No.985562

>>984291

>>974637

>>978477

>>978833

>>983930

>>877198

>>122807

>>93735

>>3138

Thanks for your reply.

I know that it is possible to copy a particular Twitter discussion thread by printing a pdf, or by using 'ThreadReader' I think it is called, but again this is only for one discussion thread, not the root or branches, and in this case only if Twitter author allows it.

So the challenge for non-Tech anons like me is to find an easy technical way to copy all roots and branch discussion threads of a particular person/institution without copying absolutely everyone on Twitter.

Maybe there is an easy solution, if so great, that is why I am asking.

Thanking all anons in advance for their patience, consideration, and time.

We all know of instances where evidence has disappeared before archiving.

Non-Tech Anons need to archive particular person's/institution's complete thread discussions for possible use as evidence etc.


353534 No.987185

>>985562

Newfag here. Long-time lurker, first post.

Wrote a beta PHP script for Twitter that works in conjunction with Youtube-DL and wget (all freeware) to archive an entire conversation piece. Vids, whether native Twitter uploads or click-thrus to Youtube, are saved in their entirety, in best res available. Pics, PDFs, same thing. Any Web URLs, their front-page HTML is saved off as an HTML file.

Let me know if interested. It is by no means a finished product but it works very well as long as the convo is PUBLIC. Oh yeah, Instagram too.


e39ab8 No.987352

I'm speaking from a Linux operating system perspective (though aspects may apply to the equivalent windows version)

>>978477

wget manual can be downloaded here:

>>>/pdfs/8640

this board doesn't allow pdfs

I would also recommend using the -U or –user-agent options to change how the website sees the wget program. (wget can impersonate a browser when making connections). This can get around some sites that actually look for and filter wget connections.

(see p14 of attached manual)

>>983930

It looks like you were in that poster thread anon ;^)

>it works but does not fetch the fullsize images, only the thumbnails, and I haven't figured out how to modify your wget mirror -the-whole-qresearch command to also fetch images and adjust references so the local HTML pages refer to locally-mirrored images

fetch fullsize: adjust your recursion depth, if I recall correctly from -l1 to -l2 as with -l1 you would only be grabbing that page's content, not anything it links to (the full size images). I think there was a way to only get the items at depth level 2 (the full size content)

adjust references: the pages should contain relative links from the current page to the other and not absolute links (i.e. page1.html has a link to page2.html not http:// somewebsite/fulladdress/page2.html). You may wish to look at using the -m option for site mirroring. WARNING: it has infinite recursion depth and can chew disk space as it attempts to grab anything linked, and anything those links point to, etc, ad infinitum! MAKE SURE -l depth is set to stop it.

See this webpage for more:

https:// stackoverflow.com/questions/4602153/how-do-i-use-wget-to-download-all-images-into-a-single-folder-from-a-url

>>984029

>>985562

>How would a clever anon go about saving a person of interest's entire twitter feed

Off the top of my head, you would be looking at a scraper script using curl for server requests, most likely written in Python, Perl, PHP, or similar. Search "twitter scraper" for lots of hits on the sort of thing you'd be using. There are lots of scripts on gitHub or similar.

Getting a list of URLs when you have 1 per line in a file "grab-these-URLs.txt"

wget -i grab-these-URLs.txt

Downloading videos from YouTube, Twitter, basically anywhere

youtube-dl -F http-URL-goes-here

will give you a list of the available formats to download with a CODE by each (on sites like YouTube) (e.g. CODE 18 RESOLUTION 1920X1080, CODE 22 RESOLUTION 1280X720….)

youtube-dl -fCODE http-URL-goes-here

will download that version specified by the CODE

youtube-dl http-URL-goes-here

will download the best quality version of the video (= largest file size)

Creating a folder for each day of the year on Linux not sure if you'd need this, but it was on my mind for some reason

mkdir -p {01,03,05,07,08,10,12}/{01..31} 02/{01..28} {04,06,09,11}/{01..30}

This will create a folder for each month 01-12. Inside each month folder will be 30 or 31 folders for each day of the month, except 28 for February (which can be changed to 29 for a leap year)


9d7f52 No.987515

>>987185

Welcome!

A very useful, thoughtful first post. Not everyone can say that.

Sounds great.

I got nothing for Twitter now apart from pdf-printers so hell yeah, I'd love to test it if you're up for it. Anything is better than nothing. Thank you for your reply.

>>987352

Thanks very much for your detailed and useful post with easy to read, easy to follow explanations. Really great work.

Ok, I will try to follow through with all the kind anons' advice and test on sites and Twitter.

Thank you all anons :)


9d7f52 No.987600

>>987185

>>987352

>>984291

>>983930

>>978833

>>978477

>>974637

You guys are like geniuses.

Gotta learn some coding.


ae3e60 No.987606

File: f57f982a4c4a997⋯.jpg (62.46 KB, 1024x576, 16:9, Python_Twitter-1024x576.jpg)

Twitter scraping can be achieved with something like Python. The juridical viability for use in a legal proceeding will vary by jurisdiction, as this is a prime vulnerability for distortion.

If you intend to use the data for a disposition, it might be best to scrape as well as printing (with timestamp) to PDF and/or hardcopy. When you want to admit it, you'll need it to be certified by the court, so the more supporting information, the better.

https:// medium.com/@dawran6/twitter-scraper-tutorial-with-python-requests-beautifulsoup-and-selenium-part-1-8e76d62ffd68


9d7f52 No.987620


ae3e60 No.987712

https:// duckduckgo.com/?q=twitter+scrape+python&bext=msl&atb=v70-6&ia=web


ae3e60 No.987798

File: 5dad98242c16205⋯.png (172.76 KB, 1024x3870, 512:1935, A-beginner's-guide-to-coll….png)

https:// knightlab.northwestern.edu/2014/03/15/a-beginners-guide-to-collecting-twitter-data-and-a-bit-of-web-scraping/


e39ab8 No.987830

>>987606

>When you want to admit it, you'll need it to be certified by the court,

You'll may well be looking at "digital timestamping" and creating file/document hashes. You might be interacting with a time server/authentication party via OpenSSL.

The digital timestamp proves it was created after some other document, which in turn was created after a different document, and so on, right back to the beginning of the chain - anchored in time. This proves 1) the file existed at 2) that point in time.

The file or document hashes are a one way math operation to reduce a file into a signature. Every signature of a file is different, and if you change one character of the file the new signature will change dramatically. This can be used to prove a document copy is identical to the original, or a downloaded file was not corrupted/intercepted during download.

For example, a free service for timestamping documents (1st result from a quick search)

https:// www.freetsa.org/index_en.php

There are other ways it can be done with file hashes in a crypto currency blockchain.


23d7ce No.987840

File: f124fdc995d9f86⋯.jpg (34.31 KB, 729x732, 243:244, Capture.jpg)

File: f8853e8b3d61f0d⋯.jpg (142.69 KB, 1006x980, 503:490, Capture1.jpg)

File: 389b02954fd0f0b⋯.png (13.63 KB, 246x646, 123:323, Capture3.PNG)

>>985562

I think I have a simple solution. I can't quite verify yet, but so far it seems to be doing what I think you would want it to do.

First, you need to go to twitter's advanced search*:

https:// twitter.com/search-advanced?lang=en&lang=en

After entering the user's name, and the starting date from which you would like to collect tweets, click "search"

After that, you'll get a results page. Copy & paste the url into HTTrack. I didn't have to adjust my settings at all–I just chose to download the images, not movies.

From then on, it should start downloading without any serious issues. In my first two images, I did a search for @snowden's tweets from October 25th today. In my trial run (which I'm currently still processing), I chose to grab all of @JulianAssange's tweets since 1-1-2017. Big mistake–as the poor man is locked up, he probably averages about 10-15 tweets a day. So far as I can tell, not only am I gathering his tweets, but also the tweets of those that he has retweeted and the tweets on their profiles. I'm at 9 GB far, and almost 20k files downloaded.

I'm sure there's a setting somewhere that might tell it not to go to far, but I haven't figured that out yet. Regardless, it's pretty much doing what you would want–as you can see from the image, there's a separate folder for each person that Assange has interacted with. Below those folders are some completed .html files, and tons of html.tmp files–which are basically unfinished downloads.

When this is all done, I'll go over it and confirm how well it turned out. At this point, I can individually click on some of the .html files and they bring up profiles, so I'm pretty confident.

*Twitter's advanced search doesn't work well on mobile devices. If you can't find it on your mobile device and want to reach it, go into your browser's settings and click "request desktop view." Also, it may not be necessary to go to advanced search at all–if you look in the "Capture1.jpg" image I've posted, you can see "from:snowden since:2017-10-25". You may just be able to enter a query like that into their regular search to get the results you want.


9d7f52 No.987855

>>987606

Good advice, thank you.

For legal evidence archiving, do both a pdf print and program site scrape for context.

Both a pdf print and a site scrape may be necessary for court admission to prevent defense lawyers throwing out good evidence.

In reality I hope that our backups won't be needed for court cases, but better safe than sorry.

Almost any personal pain is worth it so see these legions of evil people be served the justice they rightfully deserve for their disgusting crimes.

We have it easy, think of the multitudes of victims who have no voice, and rely on us and other good people to investigate and give them a voice.

Let alone our relatives and the dead from the world wars, or more recent affairs. The scars are deep.


353534 No.987937

>>987515

While I'm fluent on normie platforms (fb, twitter, insta) when it comes to what I can/can't upload, I've never attached anything on 8ch. Can I upload a ZIPped archive that contains my PHP scripts? Or is a scanner gonna see scripting in my archive and go apeshit?

I would upload the source in plain-text (not copywriting anything here…) but…it's alot.


9d7f52 No.988012

>>987712 Thanks lol, very droll, my searches were no use

>>987798 Thanks, will try

>>987830 Thanks, good point. Will experiment.

>>987840 Thank you. This might be the most elegant solution, but without pdf-print and timestamp. I will try this too. Thanks.


ae3e60 No.988086

File: 678521ac85227e3⋯.png (94.37 KB, 673x2195, 673:2195, No-Quick-Way-To-Electronic….png)

There is NO quick and easy way to do juridical data collection. The cause is worth it, so put some sweat and tears into this shit. It's worth it.

http:// technology.findlaw.com/electronic-discovery.html


9d7f52 No.988088

>>987937

Some other anons used arc hiv e.is for the first archives, but then switched to Meg aU plo ad.

Your work sounds very useful and pretty cool tbh, so I would still love to try it if you're ok with that. I don't know anything about locking down the plain-text source so it doesn't get f-d with. I assume that once it is there on Me gaU pl oad for example, then it is safe from alteration?

Gotta start some coding.


e39ab8 No.988164

A useful one line script to grab pdfs listed 1-per-line in the file "pdflist", and convert each into text

for i in `grep -E '^htt.*\.pdf$' pdflist`;do foo=$(basename $i .pdf);wget $i; pdftotext -r 300 $foo; done

What this does:

for i in :perform a loop

`grep -E '^htt.*\.pdf$' pdflist` :search for any lines in the file "pdflist" using a regular expression where the line starts with "htt" and ends with ".pdf" - essentially this matches any URLs listing PDFs in the file.

;do foo=$(basename $i .pdf) :call a variable "foo" the basename of $i - this strip "http….some.thing/blah/whatever.pdf" down to "whatever"

;wget $i :grab the PDF at the URL

;pdftotext -r 300 $foo :convert the grabbed PDF "htt…../blah/whatever.pdf" into text using a print level scan resolution (300ppi) and save it in "whatever.txt"

;done :loop for the next URL found in the "pdflist" file.

I've used this when downloading Hillary Clinton Emails from the Judicial Watch website.

I end up with text versions of the 1000s of emails. I can then use the grep program to search through them all and get a list of search term matches.

grep -oHC4 pizza *.txt

search all files ending in ".txt" for the term pizza, print out the matched parts, along with the filename and in context of 4 lines (e.g. 2 lines before the match & 2 lines after)

The search term I first look for isn't "pizza" but "B1" because this indicates Classified Emails.


e39ab8 No.988218

>>987937

I'd recommend using pastebin or similar and post the URL.

If it is a file collection then you could use filedropper, mixtape.moe, MEGA NZ, or similar to upload the zip archive.

If the files are small, you could create a b64 from the zip then upload the resulting text file to pastebin, but only oldfags/nerds may understand how to decode it.


23d7ce No.988648

File: 2401d5653e921ec⋯.png (44.57 KB, 1332x841, 1332:841, Capture.PNG)

>>988012

You actually do kind of get a timestamp, in that the creation of the files on your computer have a creation date as they are written. So long as you don't go about editing them, the date remains intact. You might want to consider making a copy and storing it someplace safe; if it were for a legal case, I would perhaps throw it onto a thumb drive and give that to, say, a lawyer or notary public. You could upload it to some website, but if all of this archiving is about having information while the web is down, then that presents a problem…

I've found something interesting about archiving with HTTrack, but it doesn't solve the Twitter problem. You can set the "mirror depth" to a certain number, which represents the number of "clicks" away from your page you want to copy. If you don't set this (in options >> limits >> Maximum mirror depth), you may wind up with a gigantic download.

Consider this scenario: You're downloading the last 100 tweets from someone. In one of those tweets, they re-tweeted someone else…so that person is clicked on, which brings up all of their tweets. Each of those is clicked on…and on and on and on…

So this is what I recommend: for social media, start with a setting of 1 or 2; if it doesn't get enough, bump it up one until you get what you need. Leaving it unset means that it will continue onwards with infinite clicks–when it comes to social media, that means that it could go on for a -very- long time, as people quote other people, etc.

As far as Twitter is concerned, I've tried it with a setting of 1, 2, and 3; 1 and 2 got me his profile, 3 got me his profile in a ton of different languages and maybe a month's worth of tweets (with Chinese headings). So it doesn't look like it's necessarily a productive means of getting the info you want–more than likely it's going to be a matter of using the api to get the results you want.

I found the issue I've been having–it has to do with another setting.


e39ab8 No.988778

>>988648

>You actually do kind of get a timestamp, in that the creation of the files on your computer have a creation date as they are written. So long as you don't go about editing them, the date remains intact.

This would be insufficient proof, since anyone can change the clock on your computer to give any date.

The "digital timestamping" and file hashes mentioned previously are a recognized form of document authentication.

>>987830


e39ab8 No.988857

>>988648

>1 and 2 got me his profile, 3 got me his profile in a ton of different languages and maybe a month's worth of tweets (with Chinese headings).

You may wish to see if there are filtering options, so you can ignore content that doesn't match the filter.

So a depth of 3 + filter will ignore irrelevant information like the Chinese headings.

This would be equivalent to

-r -l3 -A ext1,ext2,ext3

in wget (to recursively grab to a depth of 3 files ending in "ext1", "ext2", or "ext3"


0066ec No.990154

File: 14ffacb28e1b4a7⋯.jpg (87.58 KB, 640x356, 160:89, downloadfile-28.jpg)

>>974637

>Nobody cares.


f1dd0a No.991469

Thanks for the link. I used to be a fucking wizard on a PC, but got into another platform and still need to figure out a lot on this side of things for this platform. Thanks again, and I will post back if I find something useful.


78e364 No.992419

File: 0711dddcb86d119⋯.jpg (47.68 KB, 620x350, 62:35, gettyimages-944396598.jpg)

>>990154

A new hot theory from reddit.com/r/greatawakening

https:// www.reddit.com/r/greatawakening/comments/8bd84s/trump_is_our_real_17th_president_since_lincoln/?st=jfuhf6fm&sh=d68afeb7

Trump is our real 17th President. Since Lincoln, 16th, we have only had corporate CEOs pretending to be president. self.greatawakening

submitted 2 hours ago by BlackSand7New arrival.

With "The Act Of 1871" - Our Republic became a corporation named "THE UNITED STATES". (Names in all caps represent corporations). Since then all our presidents have just been corporate CEOs. Now we can get our Republic back and have true Presidents again. Thank you Donald Trump!

So yes, his jersey 17=Q, but 17 might also mean our true 17th President.


4c2f44 No.993455

>>984029

ask quinn michaels, he is a programmer

find him on youtu be,he programs bots that search twitter for all items related to a specific hashtag


068feb No.995721

Unbroken link test

https%3A%2F%2F8ch.net/qresearch/catalog.html

>>991469

No one knows who you are replying to unless they are linked in to your post. Click on the Post Number "No. XXXXXXX" that comes after the posters "ID: xxxxxx"

to automatically have it inserted into your reply.

>>992419

Use the correct thread for your posts, regardless of how excited you feel about something. That is irrelevant to this thread. It is like your sports team wins something, so you interrupt a meeting of strangers to shout about it - i.e. rude. Yes it is good news, and I read your post about it in another thread. Putting it here too is poor form, and spamming. Spamming is always bad etiquette.


068feb No.995745

>>995721

This test successfully kept the URL intact, but is still broken as far as the browser/ search engine sees it.

You can paste the entire address into a browser and go without editing.

Unbroken link test 2

https:%2F%2F8ch.net/qresearch/catalog.html

Testing to see if the browser, or 8ch filters the URL based on these characters.

Unbroken link test 3

https:/%2F8ch.net/qresearch/catalog.html

Testing to see if the browser, or 8ch filters the URL based on these characters.

Unbroken link test 4

https:%2F/8ch.net/qresearch/catalog.html

Testing to see if the browser, or 8ch filters the URL based on these characters.


068feb No.995761

>>995745

Tests 2, 3, 4,all produce URLs that are not broken by 8ch's board software, but the browser can interpret correctly to open a new tab when the URL is selected and right click menu activated.


a5583b No.995964

>>995761

Clever workaround. I think the board owner threw the word filter in to avoid drawing the attention of "sniffer" programs. I doubt they would register your altered versions.


e22d67 No.998062

>>992419

It can also be 2017 National Champions / POTUS officially became CIC in 2017….


ae3e60 No.1004570

File: 3db889c160fcd75⋯.png (713.05 KB, 1808x1400, 226:175, Screenshot 2018-04-11 at 2….png)

>>995761

You're over complicating the issue… in most cases, there is no need to encode the URL. So if you configure a shortcut for the "Open URL" service on most systems, it's a no-brainer.


23d7ce No.1006231

>>988857

I think I misspoke when I said "headings." What I meant was that the entire page is in Chinese, except for the tweet itself.

What's happening is that I'm getting copies of certain .html files in every language–1000 in total for the index, login, and search .html files.

It looks like they have a typedef set up to link a language to a 4 digit hexadecimal code which is appended like so: indexffbb.html, etc. Also, the other files have different designations.

It would be trivial to write a program that found the right one, but you need to know the right one beforehand in order to set up the right filter so the whole purpose is defeated. Who knows how often they change it? That having been said, it really doesn't matter–the files are relatively small.

I think a setting of "3" is best. I pushed it to "4," and ended up getting far more than I wanted. Once your download completes, look for the folder of the person who's tweets you're collecting, and they should all be in there. It will be in "project name folder" >> twitter.com >> "person whose tweets you're getting" >> status. For me they're in English.

The difference between a setting of "3" and "4", in my case, was about a 50x increase in my download size. Twitter put me on a time-out because of it, lulz.


23d7ce No.1006290

>>1006231

Oh, and if the helps you wget users, the file structure is as I'd mentioned: twitter.com >> (twitter handle, without the '@' in front) >> status >> *.html


ae3e60 No.1008724

>>988857

By default, wget grabs everything so it's actually better to cast a wide net when mirroring unknown servers, as you said.. you can find more hidden gems this way. If you add the flags, you'll limit your retrieval to those filetypes.


4c2796 No.1009935

For local archive searching, I'm using Java DocFinder with decent results, curious about other standalone options for a private/local fulltext search supporting fuzzy and near, etc.


4c2796 No.1009947

This works great for saving this board, it was posted a while back.

wget -nH -k -p -nc -np -H -m -l 2 -e robots=off -I stylesheets,static,js,file_st

ore,file_dl,qresearch,res -U "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6

) Gecko/20070802 SeaMonkey/1.1.4" –random-wait –no-check-certificate https:// 8

ch.net/qresearch


e6f97a No.1010366

Glad to see someone else noticed this limitation with wget. I do have the patience. Searched all over the web for a solution for a couple weeks now and even tried alternatives to wget but really wanted the wget solution to work. Finally wrote my own solution. It is a bash script. Couldn't get it to work with a one-liner but got it down to two wget calls and a couple sed calls to rewrite the img URLs. It's a very early iteration but I have a feeling it will have a few more before this is all done with. Calling it 'qarkzilla.sh'. I'm releasing it further down this thread so others can start tinkering with the idea and optimize it.


3a6b7f No.1010558

>>988088

can anyone give me a FQDN, to test an archiving script? Possibly bitcoin registered outside USA (dotCOM)?

Archive.is/fo/today is censored. I've had more than a few archive links change the url they archived and the domain searching yeilds nothing. All in the past few months, so I know it's related to Q stuff.


2cb4f9 No.1010886

File: dd23d58107ed65d⋯.jpg (120.89 KB, 774x767, 774:767, qarkzilla.0.3.jpg)

>>1010366

Searched for days to find a way to easily archive a thread in a scriptable manner. Finally determined that wget does not have the ability to download both the thumbnail pics embedded in the page AND the larger size images. Would love to hear of someone who got it working in a one-liner, but failing that, I created a bash script that performs two wget calls and does very minor sed URL rewriting to get the local page working with minimal transformations.

This is beta code. It works, but is not optimized, and can grow to be something really cool. For example, it currently downloads a single thread at a time, but you could easily get it to download from a whole list of threads, or even autodetect/update threads, etc. Anything is possible, so this is just the core of a handy little archival utility for chans. There are others, this one's merit is super lightweight yet effective.

Calling it Qarkzilla because it ARKives Qresearch threads and is kind of like a Zilla.

The script is available at https:// github.com/subqarkanon/Qarkzilla


545696 No.1010995

Just curious, how big in Gig is the Qresearch site when you use Httrack? I am trying out the Linux version, and it seems to be taking forever.


2030ac No.1011009

>>1010886

Thanks anon!

Why is the ssh key included?

> wget does not have the ability to download both the thumbnail pics embedded in the page AND the larger size images

? wget -l 2 would grab them, like this anon >>1009947 using the -m gives a mirror copy.

I wrote a oneliner to grab only the larger images a while back, but it was probably in the early hours in the middle of an autism attack, and I can't find it in my history…Ah!! I MIGHT HAVE MADE A NOTE! HANG ON….

rm 11245726.html*; wget https:// 8ch.net/pol/res/11245726.html; grep -Po '(?<=href=")[^"]*' 11245726.html| grep -vE '*html$'|sort -u |wget -nc -A jpeg,jpg,bmp,gif,png -i -

The leading rm is to remove the previously downloaded html, because this one liner was run as an update. You can use clobber options to overwrite the old file & the -N to "get if newer than current file" and leave off the rm really.

Summary of what it does:

wget -grab thread html

grep -parse html and find any download links

grep -find any of those links that don't end in html

sort -u -sort and remove duplicates

wget -grab the from those links if they are images


2030ac No.1011225

>>1004570

Those who aren't technically versed in creating shortcuts would be the ones who gain the most benefit from viewing (not posting) hex coded URLs.

As you correctly implied ("in most cases" but not all), not all systems would allow shortcut configuration either, and the prescribed method works for every case.


faceb2 No.1012849

File: d21db1a69887903⋯.png (186.73 KB, 1366x713, 1366:713, Screenshot from 2018-04-12….png)

File: 859658d2b620366⋯.png (126.3 KB, 1293x558, 431:186, Screenshot from 2018-04-12….png)

File: 1234e8bb65c0b65⋯.png (121.51 KB, 704x651, 704:651, Screenshot from 2018-04-12….png)

I use zfs. This is how I archive:

cd /bread/qresearch/ && wget -nH -k -p -np -H -m -e robots=off -I stylesheets,static,js,file_store,file_dl,qresearch,res -U "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4" –random-wait –no-check-certificate http:// 8ch.net/qresearch ; snapshotname=$(date +%Y%m%d-%H%M) && zfs snapshot bread/qresearch@manual-grab-n-snap-$snapshotname

The four main parts are:

Change to directory

And if successful wget the board

Regardless of success set a snapshot time

And if success create a zfs snapshot

Pics related


353534 No.1017385

>>987937

Here goes…and we'll take it from here.

Sorry for the wait. Several components were gonna need to be prerequisites and, considering that not everyone's a tech anon, I put some extra work into rolling up the extra components into the existing folder.

———————————–

CTTPortable beta v0.1

-Tested ONLY in Windows 10 and Server 2012, 64-bit, but any 64-bit windows should suffice.

-Will require running from a Command Prompt

-Instagram features only include downloading videos and pics

————————————–

INSTALLATION:

————————————–

1. Extract the compressed folder anywhere. (My example will be c:\temp\CTTPortable)

2. Navigate to a specific Twitter conversation or thread in a web browser, and copy the URL. (My example will be the latest [InTheMatrix] thread: https:// twitter.com/intheMatrixxx/status/984559849873780736

3. Open a command prompt window, navigating to C:\temp\CTTPortable and type

php cttURL.php https:// twitter.com/intheMatrixxx/status/984559849873780736

If all goes smoothly, this will:

1. create a subfolder roughly named after the URL

2. Place a WGETed copy of the original thread file "984559849873780736.html"

3. Download locally all youtube videos referenced within the thread

4. Download locally all pics contained within the thread

5. Download locally any external website references (html,pdf,etc)

6. Generate a SECOND copy of the Twitter Thread("984559849873780736new.html"), substituting remote links with the locally stored copies of each respective file.

I'll post a couple instructional vids later if anyone struggles with this.


353534 No.1017494

>>1017385

Size too large. Sharing from this link:

https:// drive.google.com/open?id=1flFOJafHIF9hTSybhZ4JuLv5k35nk9xx


7b3ab0 No.1017595

>>1010995

That wget command is from:

>>493884

It's a 20+GB download of images and HTML for all of qresearch, takes quite a while the first time. May not want to run it., or remove the file_store from list of things to get, reducing size greatly.


353534 No.1017666

>>1017595

I've done this before as well. It's the pics and vids that make up the bulk.


06909a No.1028019

bumping a very useful and important thread


bee361 No.1051517

bump


014740 No.1172486

bump


104106 No.1206291

The dough of every bread has "we need better tools".

This thread discusses better tools.

BO requesting sticky please


104106 No.1207179

>>1206291

Thank you BO!

Ripping an Instagram profile without login

The publicly viewable content in Instagram can be archived using Instalooter - a Python program.

https://github.com/althonos/InstaLooter

The usage manual is found at https://instalooter.readthedocs.io/en/latest/usage.html

To rip an archive for a user called "foo-bar":

python -m instalooter user foo-bar -v -d

-v option rips videos as well as images

-d option rips JSON metadata containing comments found on the Instagram posts

A helpful one-liner to use with the JSON files saved - Searching the JSON metadata for keywords of interest:

grep shortcode `grep -li pizza *.json`|sed 's/^.*"shortcode":\s"\(.*\)".*/http:\/\/www\.instagram\.com\/p\/\1/' -

This will run a case-insensitive search for "pizza" and return all URLs in the Instagram profile that contain that term in the comments.


c305f5 No.1207335

>>1010995

I never completed an enire download. I think I got to 11 GB one time and figured something was off (I was just trying to get one thread). If I were to stick to a depth of one, it probably would've worked.

>>1017595

Knowing there's an upper limit is helpful. 20 GB isn't as bad as I would've imagined.


a41fa5 No.1214842

comms are unfolding.

-you already know


35e9de No.1224047

>>1214842

>comms are unfolding.

May your origami be beautiful and to your liking anon.

Recommendation by anons to use clipconverter.cc for making video clips.

https://www.clipconverter.cc/

Those who are more technically minded can make use of youtube-dl as described above in this thread.


3bb71a No.1241764

File: 2c3303d8fbc1b01⋯.png (1.81 MB, 2191x1951, 2191:1951, I have Mac.png)


6dde74 No.1251551

>>1251422

We need a spiritual cooking element to the whole thing. Show the entree' like one of those sick parties where someone is laying on a table.

Cocktail…blood bag?


ca2872 No.1251964

Illuminati overlord Albert Pike explained in 1871 that the Third World War will focus on the mutual destruction of the Islamic World and the Political Zionists.

Cabalists planned to initiate this war that will bring ‘complete physical, moral spiritual and economic exhaustion’ for decades. The entire populist/Trump/Q movement has arisen at this time for the sole purpose of preventing the ultimate conflict before it starts.

The catalyst that will initiate the Third World War was to be an Iranian Nuclear strike on Israel, conducted with a nuclear weapon built in Syria, containing ‘Russian’ Uranium (U1). The Iran deal will ensure immediate US involvement and the Russian aspect would be used to foment war between the EU Countries and Russia, thus dragging the whole of Western Society into a Global Conflict.

https://www.youtube.com/watch?v=9RC1Mepk_Sw

https://www.youtube.com/watch?v=yWAFvIT-NHs&feature=youtu.be

North Korea was to be used as an agent for initiating conflict in the East. A Medium-Range ICBM was provided to the regime to engage Hawaii. The ‘false alert’ on Hawaii that occurred several months ago was a test missile, launched from NK. The missile passed over Japan, causing a simultaneous ‘false alert’. This secondary nuclear attack will drag NK and China into the World War.

Patriots have installed themselves in Jordan. They will attempt to intercept the Iranian missile that would strike Israel, thereby preventing the Third World War. However, war may still be declared on the grounds of breaking the terms of the Iran deal.

Everything has been leading up to this moment, this is a crossroad in our civilisation.

SA -> NK.

NK -> Armenia.

Armenia -> Iran

Iran ->ENDGAME


c0f3a4 No.1259141

# OS X + httrack command line

Steps to install `httrack` on the OS X command line and run it. `httrack` has a lot of options to play with depending on your desired results.

# Install homebrew

https://brew.sh/

# Install httrack

`brew install httrack`

# Make a directory for your httrack 8ch files

`cd ~/Downloads`

`mkdir 8ch`

`cd 8ch`

# Run httrack

`cd ~/Downloads/8ch`

`httrack https://8ch.net/qresearch/`

# httrack update

`cd ~/Downloads/8ch`

`httrack –update`

# httrack help

`httrack –help`

`man httrack`


92310b No.1265626

Requesting modification to (and reminding people of) the search script here: pastebin.com/tM53Q6AM

The original works by searching in arcdir/qresearch/res. It only works in one folder at a time. A modification is needed to search my /bread/qresearch/.zfs/snapshot/*/qresearch/res. Note the asterisk. If I run ls /bread/qresearch/.zfs/snapshot/*/qresearch/res then I can successfully get a wall of text.

Not sure how to do so myself. Doesn't look like a simple oneliner to me. I've also thought to modify the temp dir to use the folder I run the script from for temp (since the snapshots are read-only and we can't modify hundreds of folders to have a temp folder).

This is what I get when a just change the arcdir:

# Error! No HTML-files found in "/bread/qresearch/.zfs/snapshot/*/qresearch/res"

# Please check if archivePath ("arcDir=…") is set correct.

I'd like to see if I can cause any headaches revive any dead posts.


1c8064 No.1271968

Should PROOFS/Side x Sides be posted in research threads or is there a designated thread for them?


2e8bb2 No.1276775

>>1265626

>This is what I get when a just change the arcdir:

I have no idea what line you changed, since there are several places in the script you could have done that, or what you changed it to, as your question is fairly vague.

However, I set up the necessary directory structure, and tested the script, and had no problems with it searching multiple directories.

For instance, I used a glob (i.e. * ) to pick more than one directory on line 11:

arcDir="${HOME}/../../tmp/a/*/b"

Works as expected, and searched through multiple directories.

My guess is you changed line 11 from:

arcDir="${HOME}/archive/qresearch/res"

to:

arcDir="/bread/qresearch/.zfs/snapshot/*/qresearch/res"

but the archive is in your home directory, not root directory???

If so, line 11 should be:

arcDir="${HOME}/bread/qresearch/.zfs/snapshot/*/qresearch/res"

which would specify the correct path.

Otherwise check you really do have any html files in those directories.


33eb04 No.1276833

Help. Not tech savvy. Received a message on my computer to call Apple Support "NOW" regarding security issues. The message listed my IP address and gave me a case number, which has me freaked out b/c the end of that case number is: ….-qch8nt Could it be someone from 8ch is trying to contact me?


2e8bb2 No.1276841

>>1276775

>fairly vague

..and by "vague", I mean the problem is clear ( I understand what you're trying to do, and what the end result should be), BUT the specification is vague (your description isn't specific enough to know exactly what you're doing wrong, so I'm guessing where you need to fix to get the required result.)


2e8bb2 No.1276872

>>1276833

Likely a scam.

Never bareback an imageboard, ie. use a VPN. at least.

Highly doubtful anyone would contact you, since this is an anonymous imageboard.


33eb04 No.1276914

>>1276872

Thank you. Sorry, but I'm a tard and don't even know what "bareback" is or ow to use a VPN


33eb04 No.1276929

>>1276914

Also, I remember one of Q's posts saying we would be receiving a scary message, or something to that effect, so thought maybe this tied in to it somehow.


2e8bb2 No.1276987

>>1276914

>>1276929

bareback = connect from your home router without any proxy to cover what your home IP is.

VPN = virtual private network. You connect to any of the VPN companies proxy servers to get an IP in the country of your choice, so it makes it look like you are connecting from somewhere other than your home. (There's a lot more to it, but that is the basics)

Also, in certain countries like China, EU, UK the government collect what websites you visit and rank their citizens accordingly.


33eb04 No.1277184

>>1276987

This grannyVet thanks you, not only for helping me, but for helping to get the truth out to the world.


ea5f76 No.1277388

>>984291

>>987600

>You guys are like geniuses.

Yes, thank you anon(s).

>>984029

>saving a person of interest's entire twitter feed

Just attempted. Account wasn't set on private, I wasn't a suscriber. HHTrack hummed along 15h, was up to 30Gb on last check, and it just finished with this error:

>15:31:39 Panic: Too many URLs, giving up..(>100000)

The site did mirror, but only to about 20 tweets.

So it was a fail. If you have any ideas HHTrackanon, I'd be grateful.

I know NSA has it all ultimately, but I suspect this twitter account has leads to high level tech elite pedo stuff on west coast. I don't want to publish to anons w/o mirror. Meantime, I'm just copying to word from time to time. -Ty


2e8bb2 No.1277646

File: 4166db75e85f367⋯.jpeg (1.36 MB, 4944x8000, 309:500, Qanon03_5of10.jpeg)

>>1277184

>his grannyVet thanks you, not only for helping me, but for helping to get the truth out to the world

You're welcome.


093d6a No.1277909

>>1276833

Just FYI, there are a lot of hoax websites that will pop up a scary-looking message telling you that you've been infected. Then, they'll give you some fake contact information, you'll contact them, and they'll try to convince you to do something that ultimately will allow them to screw you. They can be very, very tricky, but believe me: no legitimate business operates that way.

Here's how you deal with those kinds of things: close the tab in your web browser that the scary page is on. If you only have one tab open, open another and close the bad one. If that doesn't work, close your web browser (Opera, Firefox, Chrome, or whatever you use to get on the internet). If you open it again and you get the same message, you need to change your homepage to something normal, like startpage.com.

Don't ever, every contact those people–they're lying. If you're really, really worried, take a picture of the message and the web page address, then call Apple and ask them about it.

Hope that helps, and welcome to the internet–you'll get used to it in no time :)


093d6a No.1278055

>>1277388

It sounds like you just need to change one setting–see this post:

>>988648

I think a mirror depth of "2" is enough. This is the thing–you won't be able to open the folder, click on "index.html", and get all of the posts up like you would if you did an advanced search…but if you look in the folders, you'll find one that's named the same as the person you're trying to "grab." Inside that folder will be all of the tweets–you can double-click each one and take a look. It's not pretty, but it's pretty thorough.

Also, make sure that you do an advanced search before you start mirroring, with the start date as far back as you think you'll have to go to get them all.

If a "mirror depth" of 2 doesn't get you everything you want after doing both of those things, then go ahead and try "3". But don't bother going up to "4," because it gets way out of hand.

Another note: if you didn't use a vpn, odds are that twitter is going to rate-limit you after such a large download. You'll have to go through a proxy if that's the case. Good luck!


33eb04 No.1278274

>>1277909

Thank you, as well for helping me. Again, what really spooked me was the end numbers/letters of the "case number," which was qch8nt (Q + 8chan.) You guys and gals are the BEST


ea5f76 No.1278294

>>1278055

Ty anon. lol at the large download. I should have used a VPN, but didn't. This anon has always made a clumsy operative–either sperg out on every detail or just decide fuck it and dive in headfirst.

I'll report back if I make progress.

Thanks again for your detailed help.

Much appreciation


7a6ae1 No.1278651

https://8ch.net/qresearch/res/974637.html#1278055


7a6ae1 No.1278665

https://www.youtube.com/watch?v=HzCiPjS7KeI


9a62d3 No.1282594

Different way of archiving but it can archive all media, links, etc of a site. You can run it locally if you have any skill or run it from the webrecorder.io site.


6db461 No.1284257

>>988164

>for i in `grep -E '^htt.*\.pdf$' pdflist`;do foo=$(basename $i .pdf);wget $i; pdftotext -r 300 $foo; done

No need for the basename strip as the shell can handle string parsing, so:

for i in `grep -E '^htt.*\.pdf$' pdflist`;do wget $i; pdftotext -r 300 ${i%.pdf}; done

is equivalent.

¢♄Δ⊕$


4981c3 No.1287062

Fireshot works for easily saving high-quality PDF renders of webpages. Can't handle really long webpages, however.


093d6a No.1289581

>>1278294

don't worry about it–I speak from experience re: the large download. And I sperg as well. You might notice some big posts where the author calls out and corrects mistakes right after putting it up–that's me a lot of times.


093d6a No.1289675

>>1278274

I would've made a double-take as well.

I notice that I get a "Q" in around half of my captcha challenges. If you figure 26 capital letters, 26 lower case, and ten digits, that makes for a one-in-sixty-two chance of getting a "Q" for each character. With six characters, the odds of getting one should be something like 9.3%. Superstition or not, it gives me comfort.

It's funny how noticing patterns can either make you an idiot or a genius. It's probably the most important predictor of both mathematical ability and paranoid schizophrenia, ha ha.


272be6 No.1290990

>>974637

now i can finally download terabytes of my interracial lesbian cuck midget tranny porn….


093d6a No.1291196

>>1290990

…is that you sucking up all the bandwidth on my server?


648419 No.1307392

WireShark will help you see whats going where


1c64f1 No.1311116

File: 6c4f227eb73cc5d⋯.mp4 (833.97 KB, 1280x720, 16:9, hi [INSERT RACIAL SLUR HER….mp4)


6a246e No.1316720

# Twitter: Scrape / download a user's tweets on OS X

Isn't real pretty but it worked for me

Doesn't require a twitter API key

Scrapes twitter search instead

Most steps are performed in an OS X Terminal/shell

Requires basic shell experience

### Install homebrew

https://brew.sh/

### Install jq

`brew install jq`

### Install TweetScraper

https://github.com/jonbakerfish/TweetScraper

`cd ~/Downloads/`

`git clone https://github.com/jonbakerfish/TweetScraper.git`

`cd TweetScraper/`

`pip install -r requirements.txt`

### Run TweetScraper

Here, `SaRaAshcraft`, is an example twitter user name

`scrapy crawl TweetScraper -a query="from:SaRaAshcraft"`

`cd Data/tweet/`

`find . -type f -print0 | while read -d $'\0' file; do jq 'select((.is_reply==false) and .is_retweet==false) | .text' $file ; done > ../saraashcraft-all.txt`

### Open your new text file

Use any text editor to open your new `saraashcraft-all.txt` file

`vim ../saraashcraft-all.txt`


4ae2db No.1317520

When POTUS speaks at various events he should ensure that there are strategically placed mirrors behind him that forces media to show the crowd!


228139 No.1325796

>>987600

Teleport pro

greatest tool for archiving - ever

www. tenmax.com/teleport/pro/home.htm


feb0d5 No.1330780

Once you have the data mirrored, this is awesome for smaller filesets, handles Q archive fine, with regular expressions "near" phrase, synonyms, etc. No image searching, but does filenames and content of text, html, pdf, etc.

Open Source, java, increase java memory limit if doing dozens of gigs of text/PDF.

http://docfetcher.sourceforge.net/de/index.html


ac9690 No.1334628

File: 8c93d0311a41537⋯.png (70.36 KB, 895x567, 895:567, DownThemAllFilterAnnotated.png)

File: 43e81f9ba373d3b⋯.png (38.75 KB, 585x560, 117:112, DownThemAllPreferences.png)

If you're browsing qresearch with Tor (or older Firefox), you can add DownThemAll and configure the following filter to pull down all the large size images of a given thread. (Ctrl-S will save the page, but it only captures the smaller size images. This gets the larger ones too.)

/^(?!.+\/(\d{10})-?[1-9]?\..+).+(file_dl\/.+(jpg|png|jpeg|gif)\/)/i

pics related


34c219 No.1344502

File: 9c076c8a3464010⋯.png (473.29 KB, 649x493, 649:493, ClipboardImage.png)


34c219 No.1344533

File: 8ed3cff395f0f33⋯.png (333.82 KB, 586x414, 293:207, ClipboardImage.png)


ca795e No.1434024

>>1344533

Eyes Wide Open?


e5e46d No.1485667

File: b28a3f49ed3ca06⋯.png (34.96 KB, 652x355, 652:355, SDavis re Kristol 5-20-18.PNG)


85358a No.1497828

File: ab14a2cbaddabbc⋯.png (229.82 KB, 567x206, 567:206, ClipboardImage.png)

File: a0b7836d8767318⋯.png (1.39 MB, 1024x744, 128:93, ClipboardImage.png)

File: 9b328bc7006d135⋯.png (1.36 MB, 1045x1200, 209:240, ClipboardImage.png)

File: 2df5d32fc9656e3⋯.png (416.65 KB, 705x623, 705:623, ClipboardImage.png)


85358a No.1497913

File: 198dd2c107686ff⋯.png (877.36 KB, 1200x675, 16:9, ClipboardImage.png)

File: 527d853c793cba1⋯.png (1.12 MB, 1200x675, 16:9, ClipboardImage.png)

File: 297336b29ccbef8⋯.png (662.38 KB, 1200x675, 16:9, ClipboardImage.png)


85358a No.1497942

<iframe width="504" height="283" src="https://www.youtube.com/embed/tpH5L8zCtSk" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>


85358a No.1498246

File: 9deaf6ad9d4f385⋯.png (1.24 MB, 960x642, 160:107, ClipboardImage.png)

File: b799102008fd914⋯.png (947.4 KB, 640x904, 80:113, ClipboardImage.png)

File: ec15a43c2c2d49c⋯.png (1.4 MB, 1200x675, 16:9, ClipboardImage.png)

File: 71fd4834ef7ab11⋯.png (429.79 KB, 750x873, 250:291, ClipboardImage.png)


85358a No.1498284

File: 149df5ba8799f77⋯.png (499.52 KB, 750x863, 750:863, ClipboardImage.png)


85358a No.1498327

File: 4a4d26987c756b2⋯.png (823.41 KB, 750x864, 125:144, ClipboardImage.png)

File: 28074f1fb64d745⋯.png (544.04 KB, 750x859, 750:859, ClipboardImage.png)

File: 68013b41642147e⋯.png (626.7 KB, 750x864, 125:144, ClipboardImage.png)

File: cb48e5005035421⋯.png (650.77 KB, 750x871, 750:871, ClipboardImage.png)


85358a No.1498375

File: c8759b28a769447⋯.png (621.61 KB, 750x866, 375:433, ClipboardImage.png)

File: b132c6501996fee⋯.png (1.04 MB, 1200x675, 16:9, ClipboardImage.png)

File: 8f2e8121506d0ac⋯.png (447.4 KB, 599x495, 599:495, ClipboardImage.png)

File: 25abbfb1cb1527e⋯.png (758.98 KB, 960x540, 16:9, ClipboardImage.png)


85358a No.1498428

File: 10e7f78f8d7d3f4⋯.png (359.69 KB, 600x429, 200:143, ClipboardImage.png)

File: 3e7bbde27a40352⋯.png (493.65 KB, 666x500, 333:250, ClipboardImage.png)

File: df18af02befbdda⋯.png (453.87 KB, 642x528, 107:88, ClipboardImage.png)

File: 50b81b191c4b029⋯.png (467.15 KB, 987x552, 329:184, ClipboardImage.png)


85358a No.1498641

File: c398d4df33b431b⋯.png (225.04 KB, 552x386, 276:193, ClipboardImage.png)

File: aef8f0a10fde3a1⋯.png (374.48 KB, 1080x695, 216:139, ClipboardImage.png)

File: 802f362b12eeeb0⋯.png (816.22 KB, 892x892, 1:1, ClipboardImage.png)

File: 5ac99e3ff736d09⋯.png (650.55 KB, 750x773, 750:773, ClipboardImage.png)

File: 073d456d5d4d0a0⋯.png (1.2 MB, 960x897, 320:299, ClipboardImage.png)


85358a No.1498686

File: ef8202ac7cc2046⋯.png (392.49 KB, 610x457, 610:457, ClipboardImage.png)

File: 9ae500461da8562⋯.png (545.72 KB, 750x870, 25:29, ClipboardImage.png)

File: 14cc9fbf7f5c4ad⋯.png (570.85 KB, 750x875, 6:7, ClipboardImage.png)

File: e699379e767d3bb⋯.png (951.82 KB, 750x856, 375:428, ClipboardImage.png)


85358a No.1498919

File: 66da7b51f1a10b2⋯.png (316.61 KB, 600x339, 200:113, ClipboardImage.png)

Barack & Michelle Obama just signed a multi-year contract with Netflix.

Some titles for their new shows have already been released.

- House of Race Cards

- Orange is the New Barack

- 13 Reasons Why I Was Indicted

- Stranger Things Than Michelle

- Better Call Saul Alinsky


36a986 No.1504090

Kek! These are some great titles, Anon… thanks for the morning laugh.

Related: I cancelled Netflix because they went full-retard with their Soros-level programming. Namely the show that about how shitty white people are. When I heard Obummer and Rice had partnered with the company, that was it for me.

Unrelated: I can't stop seeing Biden in this pic as Simple Jack.


7f80cb No.1505178

>>1344533

>>1344502

>>1434024

>>1485667

>>1497828

>>1497913

>>1498246

>>1498284

>>1498327

>>1498375

>>1498428

>>1498641

>>1498686

>>1498919

>>1504090

WRONG THREAD. USE THE CORRECT ONE

This thread is for contributing methods to archiving evidence.


ffc98a No.1505538

>>1505178

This is what happens when bad programmers dont know how to program for humans…and think all humans will understand your code structure.

its called updating your site beyond 2002 style.

Thank you

– The autists and The anons and other organized diggers who have moved beyond 2002.


7f80cb No.1505995

>>1505538

>its called updating your site beyond 2002 style.

>it's not my fault I didn't read what page I was on, it's someone else's fault.

No, it's called "reading the page you're on and posting accordingly - not being lazy and blaming someone else."

>updating your site beyond 2002 style.

This site can have ANY STYLE YOU CHOOSE!

You can write your OWN style/theme and use it under the options mention. So don't blame "2002 style" for your own laziness for 1) not finding this out by reading the FAQ, and 2) not writing your own style.

sage for off-topic


7f80cb No.1506006

>>1505995

>mention

*menu


093d6a No.1531142

File: ba9b1572c17ba08⋯.png (16.16 KB, 511x431, 511:431, Capture.PNG)

>>974637

Just wanted to drop off some good settings for getting one thread from 8ch.net, separate from the entire board.

In the options, under the Limits tab, select "Maximum Mirrorring Depth" and set it equal to "2", then select "Maximum External Depth" and set it equal to "1". When selecting options under the "Scan Rules" tab, I usually select the first box (for graphics), but if you really want to make a complete copy select the other ones as well. I believe it will download every movie that's embedded onto the page…so be prepared for a long download.

Setting the "Maximum Mirroring Depth" to "2" will allow you to get not just the thumbnails, but also the larger images you see when clicking on thumbnails. Also, bear in mind that even if you don't download the videos, the links to the video will still be there…so you will still be able to see them so long as they're still being hosted elsewhere. But if they're taken down, not so much.


3b4171 No.1531253

>>1251964

The Hawaii Missile alert was at 8 am on Saturday morning. The Japan missile alert was on Tuesday. They were days apart, not simultaneous.




[Return][Go to top][Catalog][Nerve Center][Cancer][Post a Reply]
[]
[ / / / / / / / / / / / / / ] [ dir / general / hisrol / htg / leftpol / magali / marx / sw / zoo ]