-
I was using multiple threads, with single io_uring instance per thread, to do concurrent reads on a file. I observed there was no improvement when enabling Network socket would work, and I can see the performance difference. So I imagine my code is correct. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
Could it be that time to completion is dominated by the underneath storage rather than in-kernel file descriptor bookkeeping? |
Beta Was this translation helpful? Give feedback.
-
Was it the same file descriptor for all rings, or did you open the file multiple times? If the former, then the cost is two atomics per I/O executed almost always by the same CPU, which is not much. For the latter it'd depend on what kind of contention we're talking about, how heavy the IO stack below io_uring and so on, but you would definitely need something high performant enough and without memcpy into the page cache to see the difference. You can take perf profiles and see what fraction of CPU they take. |
Beta Was this translation helpful? Give feedback.
-
@isilence Sorry, my fault. It seemed that I have posted a incorrect benchmark. Re-run those tests several times, and I'm sure that 4 thread, 1 ring per thread, buffered read on a small file, open once, register 4 times |
Beta Was this translation helpful? Give feedback.
@isilence Sorry, my fault. It seemed that I have posted a incorrect benchmark. Re-run those tests several times, and I'm sure that
register_files
is quite helpful in terms of increasing performance, even with page cache getting involved.4 thread, 1 ring per thread, buffered read on a small file, open once, register 4 times
enable register_files: 5M qps
disable register_files: 3.6M qps