Deadlock between usrsctp_conninput and usrsctp_close #711

JonathanLennox · 2024-04-22T19:06:22Z

In a stress-test of usrsctp (the same test as was attached to #709) I saw a deadlock between usrsctp_close and usrsctp_conninput. Looking at the code, I suspect this could happen for the kernel implementation as well.

The issue is that sctp_common_input_processing acquires (in sctp_findassociation_addr) stcb->tcb_mtx, then, through the call stack sctp_process_data -> sctp_process_a_data_chunk -> sctp_add_to_readq, tries to acquire inp->inp_mtx. Meanwhile, sctp_close acquires inp->inp_mtx, then, in sctp_inpcb_free, tries to acquire stcb->tcb_mtx.

(Note: the line numbers shown in the crash are from #710, but nothing in that PR should have affected this deadlock.)

Excerpted gdb info:

(gdb) info threads
  Id   Target Id                                           Frame 
* 1    Thread 0xffffa8923020 (LWP 2157812) "crash_repro"   futex_wait (private=0, expected=2, futex_word=0xffff8c015c70)
    at ../sysdeps/nptl/futex-internal.h:146
...
  13   Thread 0xffffa253f120 (LWP 2165461) "crash_repro"   futex_wait (private=0, expected=2, futex_word=0xffff8c02e4e8)
    at ../sysdeps/nptl/futex-internal.h:146

(gdb) bt
#0  futex_wait (private=0, expected=2, futex_word=0xffff8c015c70) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0xffff8c015c70, private=private@entry=0) at ./nptl/lowlevellock.c:49
#2  0x0000ffffa868070c in lll_mutex_lock_optimized (mutex=0xffff8c015c70) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=mutex@entry=0xffff8c015c70) at ./nptl/pthread_mutex_lock.c:93
#4  0x0000ffffa88b0aa4 in sctp_inpcb_free (inp=inp@entry=0xffff8c02e140, immediate=immediate@entry=1, from=from@entry=1)
    at ../../usrsctplib/netinet/sctp_pcb.c:4083
#5  0x0000ffffa88b854c in sctp_close (so=so@entry=0xffff8c02b1e0) at ../../usrsctplib/netinet/sctp_usrreq.c:891
#6  0x0000ffffa8863a7c in sofree (so=0xffff8c02b1e0) at ../../usrsctplib/user_socket.c:287
#7  0x0000ffffa8867aa8 in usrsctp_close (so=<optimized out>) at ../../usrsctplib/user_socket.c:2005
#8  0x0000aaaab6c020c8 in close_socket (o=0xaaaada80e3e0) at crash_repro.c:164
#9  run_test (close_ns=close_ns@entry=198272357) at crash_repro.c:245
#10 0x0000aaaab6c014a8 in main () at crash_repro.c:284

#4  0x0000ffffa88b0aa4 in sctp_inpcb_free (inp=inp@entry=0xffff8c02e140, immediate=immediate@entry=1, from=from@entry=1)
    at ../../usrsctplib/netinet/sctp_pcb.c:4083
4083			SCTP_TCB_LOCK(stcb);
(gdb) p stcb->tcb_mtx
$1 = {__data = {__lock = 2, __count = 0, __owner = 2165461, __nusers = 1, __kind = 2, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
  __size = "\002\000\000\000\000\000\000\000\325\n!\000\001\000\000\000\002", '\000' <repeats 30 times>, __align = 2}

(gdb) thread 13
[Switching to thread 13 (Thread 0xffffa253f120 (LWP 2165461))]
#0  futex_wait (private=0, expected=2, futex_word=0xffff8c02e4e8) at ../sysdeps/nptl/futex-internal.h:146
146	../sysdeps/nptl/futex-internal.h: No such file or directory.
(gdb) bt
#0  futex_wait (private=0, expected=2, futex_word=0xffff8c02e4e8) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0xffff8c02e4e8, private=private@entry=0) at ./nptl/lowlevellock.c:49
#2  0x0000ffffa868070c in lll_mutex_lock_optimized (mutex=0xffff8c02e4e8) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=mutex@entry=0xffff8c02e4e8) at ./nptl/pthread_mutex_lock.c:93
#4  0x0000ffffa88df50c in sctp_add_to_readq (inp=0xffff8c02e140, stcb=stcb@entry=0xffff8c015450, control=0xffff9801cd70, sb=0xffff8c02b298, 
    end=end@entry=1, inp_read_lock_held=inp_read_lock_held@entry=0, so_locked=so_locked@entry=0) at ../../usrsctplib/netinet/sctputil.c:5383
#5  0x0000ffffa887b63c in sctp_process_a_data_chunk (chk_type=<optimized out>, last_chunk=1, break_flag=<synthetic pointer>, abort_flag=0xffffa253e488, 
    high_tsn=0xffffa253e670, net=0xffff8c02dec0, chk_length=<optimized out>, offset=<optimized out>, m=0xffffa253e7f0, asoc=0xffff8c0154a8, 
    stcb=0xffff8c015450) at ../../usrsctplib/netinet/sctp_indata.c:2154
#6  sctp_process_data (mm=mm@entry=0xffffa253e7f0, iphlen=iphlen@entry=0, offset=offset@entry=0xffffa253e66c, length=length@entry=48, inp=0xffff8c02e140, 
    stcb=stcb@entry=0xffff8c015450, net=0xffff8c02dec0, high_tsn=high_tsn@entry=0xffffa253e670) at ../../usrsctplib/netinet/sctp_indata.c:2806
#7  0x0000ffffa888b1c8 in sctp_common_input_processing (mm=mm@entry=0xffffa253e7f0, iphlen=iphlen@entry=0, offset=<optimized out>, offset@entry=12, 
    length=length@entry=48, src=src@entry=0xffffa253e7f8, dst=dst@entry=0xffffa253e808, sh=0xffff9801a4e0, ch=0xffff9801a4ec, compute_crc=1 '\001', 
    ecn_bits=ecn_bits@entry=0 '\000', vrf_id=vrf_id@entry=0, port=port@entry=0) at ../../usrsctplib/netinet/sctp_input.c:6095
#8  0x0000ffffa8869860 in usrsctp_conninput (addr=<optimized out>, buffer=0xaaaada80e950, length=48, ecn_bits=ecn_bits@entry=0 '\000')
    at ../../usrsctplib/user_socket.c:3321
#9  0x0000aaaab6c01ad4 in input_packet_data (arg=0xffff8c02a2b0) at crash_repro.c:373
#10 0x0000ffffa867d5c8 in start_thread (arg=0x0) at ./nptl/pthread_create.c:442
#11 0x0000ffffa86e5edc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

#4  0x0000ffffa88df50c in sctp_add_to_readq (inp=0xffff8c02e140, stcb=stcb@entry=0xffff8c015450, control=0xffff9801cd70, sb=0xffff8c02b298, 
    end=end@entry=1, inp_read_lock_held=inp_read_lock_held@entry=0, so_locked=so_locked@entry=0) at ../../usrsctplib/netinet/sctputil.c:5383
5383			SCTP_INP_READ_LOCK(inp);
(gdb) p inp->inp_mtx
$2 = {__data = {__lock = 1, __count = 0, __owner = 2157812, __nusers = 1, __kind = 2, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
  __size = "\001\000\000\000\000\000\000\000\364\354 \000\001\000\000\000\002", '\000' <repeats 30 times>, __align = 1}
(gdb) bt

The text was updated successfully, but these errors were encountered:

JonathanLennox · 2024-04-23T17:22:34Z

Actually unfortunately it looks like this was a consequence of #710 -- unexpectedly, it looks like the library depends on the socket's reference count not going to zero during sctp_common_input_processing.

JonathanLennox closed this as completed Apr 23, 2024

JonathanLennox mentioned this issue Apr 23, 2024

Add configure option to disable upcalls #710

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock between usrsctp_conninput and usrsctp_close #711

Deadlock between usrsctp_conninput and usrsctp_close #711

JonathanLennox commented Apr 22, 2024

JonathanLennox commented Apr 23, 2024

Deadlock between usrsctp_conninput and usrsctp_close #711

Deadlock between usrsctp_conninput and usrsctp_close #711

Comments

JonathanLennox commented Apr 22, 2024

JonathanLennox commented Apr 23, 2024