Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panicked at 'internal error: entered unreachable code: received unknown error (timeout) #120

Closed
leaty opened this issue Jan 15, 2021 · 8 comments · Fixed by #126
Closed

Comments

@leaty
Copy link
Contributor

leaty commented Jan 15, 2021

When crawling a website, I get this when it happens upon a certain page:

thread 'tokio-runtime-worker' panicked at 
'internal error: entered unreachable code: received unknown error (timeout) for INTERNAL_SERVER_ERROR status code',
/home/spooder/.cargo/registry/src/github.com-1ecc6299db9ec823/fantoccini-0.15.0/src/session.rs:806:34

It also seems geckodriver dies at this point, as I'll get the following on the next pages.

webdriver connection lost: WebDriver session was closed while waiting
webdriver connection lost: WebDriver session has been closed
webdriver connection lost: WebDriver session has been closed
// etc

Is there a way to circumvent this error? Anything I could do about it?

@leaty
Copy link
Contributor Author

leaty commented Jan 15, 2021

Also, to be clear. It doesn't matter if I'm unable to scrape that specific page, I just want to keep geckodriver from dying.

@jonhoo
Copy link
Owner

jonhoo commented Jan 23, 2021

Huh, that's interesting. The webdriver spec does say that "timeout" is a valid error code, specifically with the meaning:

An operation did not complete before its timeout expired.

What operation were you trying to do when this error occurred?

@leaty
Copy link
Contributor Author

leaty commented Jan 23, 2021

Sorry, at this time I don't know exactly which operation causes it, but these are the only ones I use:

client.goto(url).await?;
client.find_all(Locator::Css("a")).await?;

// Then for each <a> tag
link.attr("href").await?;

@jonhoo
Copy link
Owner

jonhoo commented Jan 27, 2021

The error suggests to me that it's the browser window that basically ends up hanging. What do you see in the window?

@leaty
Copy link
Contributor Author

leaty commented Jan 27, 2021

Interesting thought, I'll try running it non-headless.

@leaty
Copy link
Contributor Author

leaty commented Feb 25, 2021

Hello! I got some time for this again, very sorry for the late update.

So apparently when running it non-headless, I saw a download window pop up, asking me to save something somewhere. After this happens, it just stands there and eventually fantoccini times out the connection to the webdriver, the webdriver however sits there alive and well until my crawler reaches the finish line.

Thus the "error" is clearly not related to fantoccini, it just waits until it times out because the webdriver did not respond in time. But if at all possible, I'd be very happy to hear some ideas on how one could circumvent this.

Could you for example:

  1. Disable all downloads in its entirety? Because limiting certain links is impossible since any one of them could redirect to a download. This is obviously related to the webdriver itself though and not fantoccini.
  2. Instruct fantoccini to tell the webdriver to cancel the previous action after x amount of time?

Thanks in advance!

@leaty
Copy link
Contributor Author

leaty commented Feb 25, 2021

After looking around a bit, I've seen no clear solution for disabling it, in fact- it gets worse, apparently this would happen with any sort of browser prompt e.g.: push notifications, downloads, printing, HTTP Auth and so on. Not all of these can (from what I've found) be disabled, so whenever any of these prompts appear, fantoccini will be waiting for a response and will remain stuck there until it decides to timeout the connection.

My ideas:

  1. If fantoccini could simply return back an error after a timeout instead of killing the connection, perhaps a new .goto() on a different link would cancel these dialogs. Regardless, I'll at least attempt a full reconnect and try a .goto() with a different link when this problem occurs, I'm hopeful that the dialog vanishes and it continues on its merry way.

  2. Since pressing e.g. ESC manually gets rid of the prompts I've so far tried, if that could be done programmatically somehow it could be an option, but I don't know the extent of the webdriver API and if it has any such control.

@leaty
Copy link
Contributor Author

leaty commented Feb 25, 2021

Okay, I've narrowed down which timeout causes fantoccini to drop it. It's the pageLoad timeout, which by default is 5 minutes. I've now set it to 5 seconds for testing, and I'm getting the same error after those 5 seconds. Unless fantoccini is really just getting thrown out after that timeout hits, it might be a bug. The geckodriver debug is clean.

If it is a bug, would it be possible to make fantoccini simply return the error without destroying the connection? As far as I can tell, the geckodriver and the session within, is still running. Though, I can't be sure if it's still usable- I'm only assuming a .goto() after would invalidate the previous request, well I'm hoping so. But I've been unable to test this as I can't reconnect to the same session, but I saw this #100 which I might try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants