Replies: 13 comments 5 replies
-
What would be the point of waiting to collecting bunch of cqes? For that time wasted you could have finished those cqes and create new once. If you can keep an counter like Trick would be to use multiple small function depending on the usage vs function like |
Beta Was this translation helpful? Give feedback.
-
Just to reduce wakeups, so lower CPU usage.
Right for send()/write(), though not very effective for recv()/read(), which could be unpredictable. E.g. a read of a tun/pipe and a recv of a network socket, both could arrive within a short timespan(assumed 1ms), so here
As you mentioned, the counter of send()/write() can be used, then |
Beta Was this translation helpful? Give feedback.
-
Currently bellow code is how I am writing the event manager (limited to language I am using). The
I am not sure what you mean by "might wake up twice."! I don't see it happening, unless its something that io_uring/kernel does, that's out of my control. while counter := (io_uring_submit(ring) + counter - cq_ready):
# get count of how many event(s) are ready and fill `cqe`
while not (cq_ready := io_uring_peek_batch_cqe(ring, cqe, counter)):
# wait for at least `1` event to be ready.
io_uring_wait_cqe(ring, cqe)
for i in range(cq_ready):
# do stuff
# ...
io_uring_cq_advance(ring, cq_ready) # free seen entries Depending on the language you are using to code, you can improve on above code to get better results. |
Beta Was this translation helpful? Give feedback.
-
During idle Now if |
Beta Was this translation helpful? Give feedback.
-
And I get another idea: Could it be possible to mask certain successful cqes (e.g. send/write)? So that they will not wakeup |
Beta Was this translation helpful? Give feedback.
-
Anytime you use anything with I get that you want to wait 1ms to collect cqes by using In my example correction: I use I can't really use
You would have to talk to @isilence or @axboe on that, and your other ideas. |
Beta Was this translation helpful? Give feedback.
-
It's a perfectly valid question to ask, CQ batching is important for performance. The pure number of syscall is not representative, it's more interesting what they do, i.e. if there was real waiting and how many times the task was actually scheduled out/in. When there are long operations, like recv, than it's indeed pretty hard to batch, usually degrades to nr_wait = nr_sends;
if (nr_recv != 0)
nr_wait++;
io_uring_wait(nr_cqe_to_wait); If you have submit separated from wait: submit();
while (peek) { handle(cqe); }; // or for_each_cqe();
io_uring_wait(1); // wait receives and other long ops Another approach we're trying is to specify the minimum time it's supposed to wait. To give an idea: submit();
...
sleep(min_time_to_wait);
io_uring_wait(1); I think Jens had patches to do it a little bit more controllable and efficient, that sleep(1) is moved inside the io_uring syscall.
It was discussed before, it works, but folks who were thinking how to have it in production systems are not super excited though. I hope we'd have a more generic solution. |
Beta Was this translation helpful? Give feedback.
-
Moving wait out of kernel and into After you start to batch cqes at io_uring side, you will soon want priority/heap queue, since you would want to process certain task sooner than others. Then if you have 100k+ entries to go through, now you have connection/timeout/limits issues to deal with, ... |
Beta Was this translation helpful? Give feedback.
-
Be a little careful.
Wish you good luck! Reference |
Beta Was this translation helpful? Give feedback.
-
It would be less efficient if some sends were punted to io-wq or delayed, right?
I suppose the timestamp is based on the submitter, rather than the first cqe. Though could NAPI/IRQ-defer style probably be more efficient?It wouldn't be so hard for io_uring to init a timeout request inside of kernel once the first cqe arrived.
Most output operations can return immediately so probably they might think it not conspicuous to mask. But for a large chunk of sends flooding little socket buffers, wouldn't it be a length pain to waste time? Anyway I believe both solutions mentioned are mutually complementary. |
Beta Was this translation helpful? Give feedback.
-
@pyhd Just tested |
Beta Was this translation helpful? Give feedback.
-
@pyhd I actually liked your idea of checking The manual doesn't make |
Beta Was this translation helpful? Give feedback.
-
@isilence @pyhd You guys are talking about |
Beta Was this translation helpful? Give feedback.
-
io_uring_submit_and_wait_timeout
can be used to collect more cqes, but it is costly to repeat if the interval is very short, especially during idle.So I just wonder if any better method to collect more cqes in one wait, like IRQ coalesce. i.e. when the first cqe had arrived, a timer(e.g. 1ms) can be kicked to wake up
wait_cqes
later. In other words, the delay is based on the timestamp of the first cqe. Now the problem is the additional timer request, which means one more syscall.Then my question is : could it be possible to kick the timer in kernel? Even a new blocking function like
io_uring_wait_cqe(s)_batch/delay/coalesce
?Except that is there a better solution?
Beta Was this translation helpful? Give feedback.
All reactions