Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make static variables in worker be in thread-local storage #523

Closed
wants to merge 1 commit into from

Conversation

nazar-pc
Copy link
Collaborator

@nazar-pc nazar-pc commented Mar 7, 2021

This is another necessary change towards ability to run worker in a thread instead of own whole process.

Tests pass, but I'm a bit worried about "SCTP iterator" thread that is still being created, see sctplab/usrsctp#472

@ibc
Copy link
Member

ibc commented Mar 8, 2021

This is another necessary change towards ability to run worker in a thread instead of own whole process.

What does this mean and why do we need this?

@ibc
Copy link
Member

ibc commented Mar 8, 2021

but I'm a bit worried about "SCTP iterator" thread that is still being created, see sctplab/usrsctp#472

Definitely this must be fixed in usrsctp side. Said that, since mediasoup-worker is still a separate process I assume we don't have any issue even if we merge this, right?

However, I don't understand yet the purpose of this "ability to run worker in a thread instead of own whole process". Definitely mediasoup-worker is designed to be a separate process, no matter it is a single thread process or not.

@nazar-pc
Copy link
Collaborator Author

nazar-pc commented Mar 8, 2021

What does this mean and why do we need this?

However, I don't understand yet the purpose of this "ability to run worker in a thread instead of own whole process". Definitely mediasoup-worker is designed to be a separate process, no matter it is a single thread process or not.

Right, this is related to my work on librarifying mediasoup-worker (as I mentioned on the forum), which is to be able to wrap it into Rust library and be able to install as a dependency with Rust port automatically instead of requiring to build manually and specify path to manually, just as seamless as it works with Node.js. Building executable as part of the dependency installation in this case would be weird for Rust library and confusing in case of building static binaries, etc.

Worker works in a thread in my fork already as soon as there is just one worker per app, but ideally it should be possible to create more than 1 worker, for which I need each worker to be isolated within its thread and not use global state (making worker multi-thread aware is, obviously, out of consideration here).

Here is a quite from another PR:
For that I started with creating lib.cpp, which is basically main.cpp with main() function renamed to run() and some minor tweaks. I then build that into static library and link into Rust library that calls run(), here is simple wrapper I have right now: https://github.com/nazar-pc/mediasoup/blob/02529bc44bf1c7ea90b910beac8d0b9c5a0f3979/worker/src/lib.rs

Said that, since mediasoup-worker is still a separate process I assume we don't have any issue even if we merge this, right?

The concern here is that if one of 3 callbacks currently set in the code are called from that "SCTP iterator" thread, then thread-local variables will not be accessible and things will fail. I do not know whether it is the case, I'll probably raise a question in usrsctp repo to clarify that.

@nazar-pc
Copy link
Collaborator Author

This clearly doesn't work, closing for now

@nazar-pc nazar-pc closed this Mar 10, 2021
@ibc
Copy link
Member

ibc commented Mar 10, 2021

Why do you say that it doesn't work?

@nazar-pc
Copy link
Collaborator Author

First of all, more things need to be made thread-local, which I did in my fork for testing.

Second, in concurrent execution with multiple workers in different threads I still get weird SCTP-related segfaults that I don't quite understand the origin of (call site is protected by lock in usrsctp, yet it fails), something like this:

Thread 101 "data_consumer::" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffe6f5fc640 (LWP 2791843)]
0x00005555562fa7c9 in sctp_find_ifn (ifn=0x0, ifn_index=4294967295) at ../deps/usrsctp/usrsctp/usrsctplib/netinet/sctp_pcb.c:265
265			if (sctp_ifnp->ifn_index == ifn_index) {
(gdb) print sctp_ifnp
$1 = (struct sctp_ifn *) 0x6953726566667542
(gdb) print sctp_ifnp->ifn_index
Cannot access memory at address 0x6953726566667582
(gdb) bt
#0  0x00005555562fa7c9 in sctp_find_ifn (ifn=0x0, ifn_index=4294967295) at ../deps/usrsctp/usrsctp/usrsctplib/netinet/sctp_pcb.c:265
#1  0x00005555562faf55 in sctp_add_addr_to_vrf (vrf_id=0, ifn=0x0, ifn_index=4294967295, ifn_type=0, if_name=0x555556758320 "conn", ifa=0x0, addr=0x7ffe6f5d7610, ifa_flags=0, dynamic_add=0) at ../deps/usrsctp/usrsctp/usrsctplib/netinet/sctp_pcb.c:569
#2  0x000055555616cb00 in usrsctp_register_address (addr=0x1) at ../deps/usrsctp/usrsctp/usrsctplib/user_socket.c:3153
#3  0x0000555555fd39c0 in RTC::SctpAssociation::SctpAssociation (this=0x7ffe5c005340, listener=0x7ffe5c002ac0, os=1024, mis=1024, maxSctpMessageSize=262144, sctpSendBufferSize=262144, isDataChannel=true) at ../src/RTC/SctpAssociation.cpp:128
#4  0x0000555555fd8e03 in RTC::Transport::Transport (this=0x7ffe5c002aa0, id=..., listener=0x7ffe5c00ae00, data=...) at ../src/RTC/Transport.cpp:172
#5  0x0000555555ffd881 in RTC::WebRtcTransport::WebRtcTransport (this=0x7ffe5c002aa0, id=..., listener=0x7ffe5c00ae00, data=...) at ../src/RTC/WebRtcTransport.cpp:32
#6  0x0000555555fafa2a in RTC::Router::HandleRequest (this=0x7ffe5c00ae00, request=0x7ffe5c0081d0) at ../src/RTC/Router.cpp:188
#7  0x0000555555f6e951 in Worker::OnChannelRequest (this=0x7ffe6f5db320, request=0x7ffe5c0081d0) at ../src/Worker.cpp:285
#8  0x0000555555f85e80 in Channel::UnixStreamSocket::OnConsumerSocketMessage (this=0x7ffe5c000bc0, 
    msg=0x7ffe1d9f807f "{\"id\":1,\"method\":\"router.createWebRtcTransport\",\"data\":{\"enableSctp\":true,\"enableTcp\":false,\"enableUdp\":true,\"initialAvailableOutgoingBitrate\":600000,\"isDataChannel\":true,\"listenIps\":[{\"ip\":\"127.0.0.1"..., msgLen=453) at ../src/Channel/UnixStreamSocket.cpp:118
#9  0x0000555555f8655f in Channel::ConsumerSocket::UserOnUnixStreamRead (this=0x7ffe5c000bd0) at ../src/Channel/UnixStreamSocket.cpp:253
#10 0x00005555561a5086 in UnixStreamSocket::OnUvRead (this=0x7ffe5c000bd0, nread=458) at ../src/handles/UnixStreamSocket.cpp:262
#11 0x00005555561a45bb in onRead (handle=0x7ffe5c001880, nread=458, buf=0x7ffe6f5d7ed0) at ../src/handles/UnixStreamSocket.cpp:30
#12 0x0000555555f51da6 in uv__read (stream=0x7ffe5c001880) at ../deps/libuv/libuv/src/unix/stream.c:1239
#13 0x0000555555f52089 in uv__stream_io (loop=0x7ffe5c000cc0, w=0x7ffe5c001908, events=1) at ../deps/libuv/libuv/src/unix/stream.c:1306
#14 0x0000555555f588d7 in uv__io_poll (loop=0x7ffe5c000cc0, timeout=-1) at ../deps/libuv/libuv/src/unix/linux-core.c:462
#15 0x0000555555f47eb7 in uv_run (loop=0x7ffe5c000cc0, mode=UV_RUN_DEFAULT) at ../deps/libuv/libuv/src/unix/core.c:385
#16 0x0000555555f5d4e2 in DepLibUV::RunLoop () at ../src/DepLibUV.cpp:52
#17 0x0000555555f6cc8f in Worker::Worker (this=0x7ffe6f5db320, channel=0x7ffe5c000bc0, payloadChannel=0x7ffe5c001aa0, handleSignals=false) at ../src/Worker.cpp:39
#18 0x0000555555ef98ae in run (argc=4, argv=0x7ffe5c000c60, version=0x7fff44001570 "0.0.0", consumerChannelFd=242, producerChannelFd=245, payloadConsumeChannelFd=246, payloadProduceChannelFd=249) at ../src/lib.cpp:134
#19 0x0000555555ef3297 in mediasoup_sys::run (args=..., consumer_channel_fd=242, producer_channel_fd=245, payload_consumer_channel_fd=246, payload_producer_channel_fd=249) at /web/github/mediasoup/worker/src/lib.rs:47
#20 0x0000555555ec194b in mediasoup::worker::utils::spawn_with_worker_channels::{{closure}} () at /web/github/mediasoup/rust/src/worker/utils.rs:34
#21 0x0000555555ecc6c2 in std::sys_common::backtrace::__rust_begin_short_backtrace (f=...) at /home/nazar-pc/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125
#22 0x0000555555e78601 in std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}} () at /home/nazar-pc/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:474
#23 0x0000555555ecc021 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=()) at /home/nazar-pc/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:322
#24 0x0000555555edd179 in std::panicking::try::do_call (data=0x7ffe6f5db878 "") at /home/nazar-pc/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:379
#25 0x0000555555edd32d in __rust_try ()
#26 0x0000555555edd0a4 in std::panicking::try (f=...) at /home/nazar-pc/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:343
#27 0x0000555555ecc093 in std::panic::catch_unwind (f=...) at /home/nazar-pc/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:396
#28 0x0000555555e7809c in std::thread::Builder::spawn_unchecked::{{closure}} () at /home/nazar-pc/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:473
#29 0x0000555555defe7f in core::ops::function::FnOnce::call_once{{vtable-shim}} () at /home/nazar-pc/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227
#30 0x00005555566b5b1a in <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once () at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/alloc/src/boxed.rs:1328
#31 <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once () at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/alloc/src/boxed.rs:1328
#32 std::sys::unix::thread::Thread::new::thread_start () at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b//library/std/src/sys/unix/thread.rs:71
#33 0x00007ffff7f5f590 in start_thread (arg=0x7ffe6f5fc640) at pthread_create.c:463
#34 0x00007ffff7d30223 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

If someone could help me with that, that would be awsome, but lack of any documentation on those methods from usrsctp and tons of conditionally compiled code makes it basically undebuggable for me.

@ibc
Copy link
Member

ibc commented Mar 11, 2021

I see. Problem here is:

@nazar-pc
Copy link
Collaborator Author

Yeah, I saw those. And considering this is not something that is of any priority for you, I'll have to figure it out on my own it seems. Will post on the forum any updates I have. Will submit another PR if I find a way to make it work, but in this form it is not useful for anyone.

@ibc
Copy link
Member

ibc commented Mar 11, 2021

More than priority it's more about we don't find confident at all to solve such an internal hell in usrsctp...

@nazar-pc nazar-pc deleted the thread-local-statics branch March 13, 2021 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants