Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete Twitter Rips #65

Open
receptakill opened this issue Jan 28, 2022 · 1 comment
Open

Incomplete Twitter Rips #65

receptakill opened this issue Jan 28, 2022 · 1 comment

Comments

@receptakill
Copy link

receptakill commented Jan 28, 2022

  • Ripme version: 2.0.4-13-03e32cb7
  • Java version: v8u321
  • Operating system: win7 x64
  • Exact URL you were trying to rip when the problem occurred: https://twitter.com/hexacult_beast/media (CW: lewd 18+ furry content)
  • Please include any additional information about how to reproduce the problem:

Expected Behavior

Expected a clean rip of the media posts of the account from top to bottom.

Actual Behavior

ripme grabbed the first 64 media posts then quit, 'Rip Complete'. No errors in log. Repeating just iterates over the same items and quits again at the same spot. This is despite there being more content beyond where it stops at.

I repeated this several times then went away for a while, came back and tried again, and this time it nabbed 103 images (inlcusive of those it had grabbed before) and then quit, again prematurely - and repeating the rip then had it stop at this number over and over as well. I'm not sure what changed between the first set of attempts and the second.

I tried several other twitter accounts and they ripped fine from start to finish. Not sure what's special on this one. The tweets are not private.

I tried ripping without URL History checked and there's no problem with that. I'm using the default twitter auth; tried using my own API key to see if it would make a difference but I'm dumb and couldn't get that to work. [UPDATE] I was finally able to generate my own 1.1 api key, and this did not change the ripper's behavior at all. So I doubt it's a rate limit problem or anything else related to a shared api key.

I notice towards the end of the rip, it's grabbing less and less items in between 'Downloading next page' entries in the log. Until at the end, it's just several 'Downloading next page' lines without any image grabs at all, despite this account being basically all self-posted media from top to bottom.

@receptakill
Copy link
Author

OK, revisiting this 3 months later with latest ver 2.1.2-3 and Java18, behavior for twitter rips appears unchanged. For some twitter accounts with a large number of tweets (inc RTs and nonmedia tweets), there appears to be an arbitrary point beyond which the ripper simply fails to fetch any more tweets. Testing various accts it seems like ripme taps out consistently at retrieval 17, at 200 tweets per retrieval, so effectively quits after apprx 3500-4000 tweets have been processed.

Here is another account prolific enough to experience the problem, https://twitter.com/0zmnds/media (SFW). I attached a ripme log for it. You can check the oldest image downloaded at bottom of log, it only dates back apprx two months - and this account has been posting images for years.

Is this a rate limit issue? The logger throws no errors related to it, it simply concludes the rip as if nothing is wrong. Futzing with dl threads, twitter.max_requests and twitter.rip_retweets, even twitter.max_items_requests does not change the behavior.

If it's a rate limit issue, can the ripper be updated to complete a full twitter rip across several sessions? Or, barring that, perhaps let users specify a beginning statusid to crawl back from in rip.properties so they can crawl through old tweets manually?

ripme.log.0zmnds.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant