[ home / board list / faq / random / create / bans / search / manage / irc ] [ ]

/prog/ - Programming

Programming board

Catalog

Infinity Never
Name
Email
Subject
Comment *
File
* = required field[▶ Show post options & limits]
Confused? See the FAQ.
Embed
(replaces files and can be used instead)
Options
Password (For file and post deletion.)

Allowed file types:jpg, jpeg, gif, png, webm, mp4, pdf
Max filesize is 8 MB.
Max image dimensions are 10000 x 10000.
You may upload 1 per post.


File: 1453177111172.png (94.86 KB, 337x450, 337:450, 403.png)

8d5b74 No.3852

I wrote a python script to download images from tinyboard/vichan imageboards.

It works on every imageboard I try except 8ch, which gives me a 403 forbidden error. I tried changing my user agent within the script (perhaps unsuccessfully), but still 403. What gives?


#!/usr/bin/env python3

import argparse, bs4, os, urllib.request, urllib.parse

parser = argparse.ArgumentParser()
parser.add_argument("url", help="Link to thread")
parser.add_argument("-d", help="Directory to download to")
args = parser.parse_args()

if args.d:
if not os.path.exists(args.d):
os.makedirs(args.d)
os.chdir(args.d)

soup = bs4.BeautifulSoup(urllib.request.urlopen(args.url))

domain = urllib.parse.urlparse(args.url).netloc
http = urllib.parse.urlparse(args.url).scheme + "://"

for link in soup.find_all("p", class_="fileinfo"):
image = http + domain + link.next_sibling.get("href")
filename = image.rsplit("/", 1)[1]
if not os.path.exists(filename):
urllib.request.urlretrieve(image, filename

8d5b74 No.3853

Note that I accidentally removed a parenthesis to close the very last line.

- urllib.request.urlretrieve(image, filename

+ urllib.request.urlretrieve(image, filename)


701bd5 No.3882

>>3852

If I had to guess,

>image = http + domain + link.next_sibling.get("href")

href of link is is https://media.8ch.net/prog/src/1453177111172.png for 8chan. Look at in in a page inspector.

I think it's because hotwheels has some weird hacky shit with the servers due to bui and/or site growth.

You might have figured this out by yourself already, since this was a while ago.

I might try writing my own version in javascript or bash for 8chan specifically. I'll prob post it here if I do


701bd5 No.3883

>>3852

If I had to guess,

>image = http + domain + link.next_sibling.get("href")

href of link is is https://media.8ch.net/prog/src/1453177111172.png for 8chan. Look at in in a page inspector.

I think it's because hotwheels has some weird hacky shit with the servers due to bui and/or site growth.

You might have figured this out by yourself already, since this was a while ago.

I might try writing my own version in javascript or bash for 8chan specifically. I'll prob post it here if I do

fak you flood detection




[Return][Go to top][Catalog][Post a Reply]
Delete Post [ ]
[]
[ home / board list / faq / random / create / bans / search / manage / irc ] [ ]