-
Notifications
You must be signed in to change notification settings - Fork 411
What's new with io_uring in 6.10
MSG_ZEROCOPY
already does this with send(2)
and sendmsg(2)
, but the io_uring
side did not. In local testing, the crossover point for send zerocopy being
faster is now around 3000 byte packets, and it performs better than the sync
syscall variants as well. This improvement is transparent to the application,
no changes needed in how zerocopy sends are used.
Bundles are multiple buffers used in a single operation. On the receive side, this means a single receive may utilize multiple buffers, reducing the roundtrip through the networking stack from N per N buffers to just a single one. On the send side, this also enables better handling of how an application deals with sends from a socket, eliminating the need to serialize sends on a single socket. Bundles work with provided buffers, hence this feature also adds support for provided buffers for send operations.
See the liburing io_uring_prep_send_bundle(3)
man page for more details.
Accept now supports IORING_ACCEPT_DONT_WAIT
, allowing applications to issue
a non-retryable accept attempt. IORING_ACCEPT_POLL_FIRST
was also added,
which works like IORING_RECVSEND_POLL_FIRST
in that no immediate accept
request is attempted. Rather, io_uring will rely solely on a poll trigger
to gauge when it is a good idea to retry the operation. Again like on the
receive side, this can be used with the added support for signaling
IORING_CQE_F_SOCK_NONEMPTY
to eliminate unnecessary accept attempts.
This is more of an internal cleanup with no user visible changes, but it does reduce the complexity of how retries are done. Rather than maintain on-stack state that is then copied to allocated state as needed, the same state is now being used whether or not this is the initial issue attempt or a retry. Various improvements were made in terms of how efficiently this state can be allocated and freed.
Rather than use remap_pfn_range()
, vm_insert_page(s)()
is used. This applies
to both the rings and SQ/CQ arrays, as well as the ring provided buffers.
Again not a directly user visible change, as everything should work exactly
like it did before. But it does have the added benefit of not requiring
physically contiguous memory, which will help with making restarts of longer
running services and bigger rings more reliable. Those previously would be
prone to running into memory fragmentation that prevented allocation of
bigger rings. Outside of that, the code for mapping data was also cleaned
up and unified, and the end result is that roughly 400 lines of coded could
be removed from the code base.
NOP commands don't do anything, they simply post a completion with a result
of 0 to the CQ ring. Support was added for controlling what the completion
result is, which means you can now use it to inject errors as well. This is
handy for test purposes. If IORING_NOP_INJECT_RESULT
is set in sqe->nop_flags,
then sqe->len will be the posted completion result.