-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aborted with sctp_timeout_handler: tmr->self corrupted, but tmr->self is null in the core dump #673
Comments
This has now happened three more times, with the same symptoms. Please let me know any further information you need. |
How often does it happen? Do you have a way to reproduce this? |
Sadly, I don't have a way to reproduce it reliably. It happens once every few days across our fleet of meet.jit.si production servers. I have the core dumps so if there's any other information I can share that would be useful let me know. See also #676 which is a rarer crash (I've only seen it once so far) but I suspect has the same root cause, and may be more revealing? |
I've finally managed to extract a Java heap dump corresponding to this core dump, which lets me correlate my user-level objects and logs with the usrsctp objects. (For most of the crashes I'm running into a Java bug which is preventing this heap dump from being created.) In this case it appears that the socket with the crashing timer received an SCTP packet just under 200 ms before the crash. Five other SCTP sockets received a packet in the interval between that packet receipt and the crash. The crashing timer appears to be a SCTP_TIMER_TYPE_RECV timer. |
That is the delay ACK timer and normally expires at 200ms. I'm looking at a timer related problem where a socket is freed twice. Once I have a fix committed, you could try it, if it fixes also your issue. I'll let you know once I have solved the issue. |
Any news on this? |
Hi - I'm debugging an assert crash inside the usrsctp library.
What I see is that it output the debug output
sctp_timeout_handler: tmr->self corrupted
from netinet/sctputil.c:1820 just before it aborted, but when I look at the core dump, I see this value correctly as NULL:
So I assume this must be a race condition of some sort where the value of tmr->self isn't properly protected.
I don't see any other threads running inside usrsctp in my core dump, but from my logs, it appears that usrsctp_conninput was called immediately before the crash (within the same millisecond).
This is usrsctp c1d6cb3 built for Linux/x86_64, as built in https://github.com/jitsi/jitsi-sctp, running as native code under an OpenJDK 11.0.18 Java VM running https://github.com/jitsi/jitsi-videobridge.
Let me know if there's any other information I can provide.
The text was updated successfully, but these errors were encountered: