Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datachannel send stopped suddenly #721

Open
thinkboy opened this issue Dec 3, 2024 · 4 comments
Open

datachannel send stopped suddenly #721

thinkboy opened this issue Dec 3, 2024 · 4 comments

Comments

@thinkboy
Copy link

thinkboy commented Dec 3, 2024

I encountered a problem: when using datachannel to send a 170MB+ file, the sending suddenly stopped in the middle. I found that the SACK packet was lost by capturing the packet through tcpdump.
By adding logs in usersctp, I found that when sctp_mark_all_for_resend was marked, many messages were not sent (because the condition "if (chk->sent_rcv_time.tv_usec >= min_wait.tv_usec)" was met).
But why was sctp_mark_all_for_resend not triggered again within dozens of seconds, that is, there was no logic to execute sctp_os_timer_start? Is it because the packet retransmission will not be triggered after the SACK is lost?
The last log after stopping sending is as follows:

2024-12-03 10:19:32.788 VERB [7006654] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: m-c-o put out 1
2024-12-03 10:19:32.788 VERB [7006654] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: m-c-o put out 0
2024-12-03 10:19:32.788 VERB [7006654] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: Ok, we have put out 2 chunks
2024-12-03 10:19:32.788 VERB [7006654] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: chunk OUTPUT returns
2024-12-03 10:19:32.788 VERB [7006654] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sorele(0x13c20f8c0) -> 1, sctp_common_input_processing:/Users/jason/work/tools/libdatachannel/deps/usrsctp/usrsctplib/netinet/sctp_input.c:6205
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: Timer type 1 goes off.
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: soref(0x13c20f8c0) -> 2, sctp_timeout_handler:/Users/jason/work/tools/libdatachannel/deps/usrsctp/usrsctplib/netinet/sctputil.c:1888
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: Error count for 0x13c307b90 now 1 thresh:5
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: Overall error count for 0x13e808258 now 1 thresh:5 state:3
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend last_acked:6ea6ce7e chk:6ea6ce7f chk->sent=1
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend chk->sent_rcv_time.tv_sec=1733192372, min_wait.tv_sec=1733192372 , 782926, 556576
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend last_acked:6ea6ce7e chk:6ea6ce80 chk->sent=1
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend chk->sent_rcv_time.tv_sec=1733192372, min_wait.tv_sec=1733192372 , 782926, 556576
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend last_acked:6ea6ce7e chk:6ea6ce81 chk->sent=1
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend chk->sent_rcv_time.tv_sec=1733192372, min_wait.tv_sec=1733192372 , 784200, 556576
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend last_acked:6ea6ce7e chk:6ea6ce82 chk->sent=1
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend chk->sent_rcv_time.tv_sec=1733192372, min_wait.tv_sec=1733192372 , 784200, 556576
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend last_acked:6ea6ce7e chk:6ea6ce83 chk->sent=1
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend chk->sent_rcv_time.tv_sec=1733192372, min_wait.tv_sec=1733192372 , 785001, 556576
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend last_acked:6ea6ce7e chk:6ea6ce84 chk->sent=1
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend chk->sent_rcv_time.tv_sec=1733192372, min_wait.tv_sec=1733192372 , 785001, 556576
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend last_acked:6ea6ce7e chk:6ea6ce85 chk->sent=1
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend chk->sent_rcv_time.tv_sec=1733192372, min_wait.tv_sec=1733192372 , 785630, 556576
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend last_acked:6ea6ce7e chk:6ea6ce86 chk->sent=1
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend chk->sent_rcv_time.tv_sec=1733192372, min_wait.tv_sec=1733192372 , 785630, 556576
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend last_acked:6ea6ce7e chk:6ea6ce87 chk->sent=1
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend chk->sent_rcv_time.tv_sec=1733192372, min_wait.tv_sec=1733192372 , 786297, 556576
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend last_acked:6ea6ce7e chk:6ea6ce88 chk->sent=1
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend chk->sent_rcv_time.tv_sec=1733192372, min_wait.tv_sec=1733192372 , 786297, 556576
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend last_acked:6ea6ce7e chk:6ea6ce89 chk->sent=1
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend chk->sent_rcv_time.tv_sec=1733192372, min_wait.tv_sec=1733192372 , 786906, 556576
2024-12-03 10:19:32.794 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend last_acked:6ea6ce7e chk:6ea6ce8a chk->sent=1
2024-12-03 10:19:32.795 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend chk->sent_rcv_time.tv_sec=1733192372, min_wait.tv_sec=1733192372 , 786906, 556576
2024-12-03 10:19:32.795 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend last_acked:6ea6ce7e chk:6ea6ce8b chk->sent=1
2024-12-03 10:19:32.795 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend chk->sent_rcv_time.tv_sec=1733192372, min_wait.tv_sec=1733192372 , 787452, 556576
2024-12-03 10:19:32.795 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend last_acked:6ea6ce7e chk:6ea6ce8c chk->sent=1
2024-12-03 10:19:32.795 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend chk->sent_rcv_time.tv_sec=1733192372, min_wait.tv_sec=1733192372 , 787452, 556576
2024-12-03 10:19:32.795 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend last_acked:6ea6ce7e chk:6ea6ce8d chk->sent=1
2024-12-03 10:19:32.795 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend chk->sent_rcv_time.tv_sec=1733192372, min_wait.tv_sec=1733192372 , 787452, 556576
2024-12-03 10:19:32.795 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend last_acked:6ea6ce7e chk:6ea6ce8e chk->sent=1
2024-12-03 10:19:32.795 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend chk->sent_rcv_time.tv_sec=1733192372, min_wait.tv_sec=1733192372 , 788141, 556576
2024-12-03 10:19:32.795 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend last_acked:6ea6ce7e chk:6ea6ce8f chk->sent=1
2024-12-03 10:19:32.795 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sctp_mark_all_for_resend chk->sent_rcv_time.tv_sec=1733192372, min_wait.tv_sec=1733192372 , 788141, 556576
2024-12-03 10:19:32.795 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: m-c-o put out 0
2024-12-03 10:19:32.795 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: Ok, we have put out 0 chunks
2024-12-03 10:19:32.795 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: sorele(0x13c20f8c0) -> 1, sctp_timeout_handler:/Users/jason/work/tools/libdatachannel/deps/usrsctp/usrsctplib/netinet/sctputil.c:2216
2024-12-03 10:19:32.795 VERB [7006659] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: Timer type 1 handler finished.
2024-12-03 10:19:33.015 VERB [7007263] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: Send called addr:0x0 send length 1025
2024-12-03 10:19:33.015 VERB [7007263] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: USR Send complete qo:1 prw:4696537 unsent:988100 tf:17697 cooq:17 toqs:1005797 err:0
2024-12-03 10:19:33.015 VERB [7007263] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: Send called addr:0x0 send length 1025
2024-12-03 10:19:33.015 VERB [7007263] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: USR Send complete qo:1 prw:4696537 unsent:989125 tf:17697 cooq:17 toqs:1006822 err:0
2024-12-03 10:19:33.015 VERB [7007263] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: Send called addr:0x0 send length 1025
2024-12-03 10:19:33.015 VERB [7007263] [rtc::impl::SctpTransport::DebugCallback@1013] usrsctp: USR Send complete qo:1 prw:4696537 unsent:990150 tf:17697 cooq:17 toqs:1007847 err:0

@thinkboy
Copy link
Author

thinkboy commented Dec 5, 2024

I may have figured out what caused it. Please see if my analysis of the code is correct. If it is correct, it should be a bug.
First, there is a piece of code in the sctp_timeout_handler method as follows:

                 if ((stcb->asoc.num_send_timers_up == 0) &&
		    (stcb->asoc.sent_queue_cnt > 0)) {
			struct sctp_tmit_chunk *chk;

			/*
			 * Safeguard. If there on some on the sent queue
			 * somewhere but no timers running something is
			 * wrong... so we start a timer on the first chunk
			 * on the send queue on whatever net it is sent to.
			 */
			TAILQ_FOREACH(chk, &stcb->asoc.sent_queue, sctp_next) {
				if (chk->whoTo != NULL) {
					break;
				}
			}
			if (chk != NULL) {
				sctp_timer_start(SCTP_TIMER_TYPE_SEND, stcb->sctp_ep, stcb, chk->whoTo);
			}
		}

I understand that this code means: if there is still data in the send queue after the current timer tick is executed, the timer should be started again, but I found through debugging that the stcb->asoc.num_send_timers_up variable will continue to increase.
By analyzing the code, I guess the code calling process that causes this phenomenon is: in the sctp_handle_tick method, "c->c_flags &= ~SCTP_CALLOUT_PENDING;" is first executed, and then "c_func(c_arg);" is executed before the sctp_timeout_handler method, the SCTP_CALLOUT_PENDING mark is released. Then it is possible to successfully start the timer again at "https://github.com/sctplab/usrsctp/blob/b28f0b55b00bde67f6be80d6623e2775b88026b8/usrsctplib/netinet/sctp_output.c#L10441C5-L10441C21", so that num_send_timers_up increases twice and only decreases once. As a result, this section of code cannot be executed forever.
If my analysis is correct, then this is indeed a bug.

@thinkboy
Copy link
Author

thinkboy commented Dec 6, 2024

I tried to remove the "stcb->asoc.num_send_timers_up == 0" condition. After multiple tests, I found that the problem was solved. That is to say, there is indeed a logic loophole that causes the timer not to start.
But I am not sure if this is a correct modification?

@avcodecs
Copy link

avcodecs commented Dec 6, 2024

Hello, what kind of sending mode are you using, upcall or calling sendv to send directly?

@thinkboy
Copy link
Author

Hello, what kind of sending mode are you using, upcall or calling sendv to send directly?

To be precise, I directly rely on the https://github.com/paullouisageneau/libdatachannel library, which relies on usersctp. I am not sure about the difference between the two modes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants