Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug/request functionality to bypass or work around rate limiting 429 errors #184

Closed
stubkan opened this issue Apr 2, 2024 · 3 comments
Closed

Comments

@stubkan
Copy link

stubkan commented Apr 2, 2024

  • Ripme version: 2.1.9-7 (latest release)

  • Java version: openjdk 17.0.10

  • Operating system: Ubuntu 22.04

  • Exact URL you were trying to rip when the problem occurred: thebarchive multiple threads

  • Please include any additional information about how to reproduce the problem:

Expected Behavior

ripping should take into account anti-ripping rate limiting, and allow you to adjust download rate to prevent 429 errors, and notify you if files are not downloaded due to rate limit block (429 error), as well as allow you to redownload an url to get the missed files due to exceeding rate limit

Actual Behavior

these 3 things do not seem to occur. rate limiting and 429 errors are common when one rips sites, so i think a good ripping tool should have functions to prevent or workaround 429 rate limiting

image

downloading urls from 4chan thebarchive gets rate limited quickly, so on average 2 out of 3 pictures are fetched when ripping a dozen threads

the rate limited files are declared 'unretrievable' when a simple wait or retry will actually work fine, not sure why they are declared unretrievable, they are also tagged as completed in the final result list, but they were never downloaded

after the scrape is over, all unretrievable 429 rate limit blocked files are placed in the 'completed' list of history, so it tells you 150 files succeeded, by putting 429 unretrieved files as well as completed files together in the same list, so if you don't check the log or have debug mode on, you will think it downloaded properly, when it didnt

if you attempt to fix it and redownload the threads, by check mark and click redownload button, it will ignore all the missed 429 files and you don't have the option to re-download them, because the log says "Already downloaded" when they are not

image

In the configuration - the only configuration you can do to attempt to reduce rate limit 429 errors is to reduce the threads to 1, but this is not enough, I suggest adding a delay between each download - similar to gallery-dl's --sleep or --sleep-request function, this will allow users to bypass 429 rate limit errors

image

Also, the retry option is for 10 retries, but it does not try to retry, so I am unsure if that option is working

@soloturn
Copy link

soloturn commented Apr 6, 2024

not that i'd expect a big change, but would you mind trying with latest release 2.1.9 ?

@stubkan
Copy link
Author

stubkan commented Apr 6, 2024

Oh, my bad. I am using 2.1.9-7 - the latest release. I mis-read it as 2.1.7 (you can confirm by looking at the green version text in my last screenshot)

@stubkan stubkan changed the title bug/request functionality to bypass or work around rate limiting 429 errors when getting urls is absent bug/request functionality to bypass or work around rate limiting 429 errors Apr 6, 2024
@soloturn
Copy link

soloturn commented Jan 4, 2025

close in favor of RipMeApp#2049 - we moved back to ripme.

@soloturn soloturn closed this as completed Jan 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants