Slow Cached Performance of io_uring #944
-
Hi, I observed the following w.r.t io_uring all the data is served from Linux page Cache (Buffered Mode and no data read directy from disk) and test are with sequential access
Test details: io_uring with batch submit and batch size same as QD. Also verified the performance (to ensure that my code is not the culprit) with https://github.com/axboe/fio/blob/master/t/io_uring.c and the performance number for io_uring I got with my code and with Profiling (using vtune amplifier) shows io_uring is more memory bound (69%) when compared to pread (46%) Are my observations correct? if they are correct whats the reason for the same? Machine Details
Code :
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 16 replies
-
Could you test on more recent version and also try XFS |
Beta Was this translation helpful? Give feedback.
-
According to this comment it does |
Beta Was this translation helpful? Give feedback.
-
Ran a quick test on a local box here, buffered IO and file fully cached. Here's what I see:
This is NOT using registered buffers, because otherwise io_uring is a lot faster. As an example, 4 threads using 4k:
This is using a XFS, 10G file, on an amd 7950x. I'm using t/io_uring for the tests, ala:
with bs, pread,threads being the only thing changed. |
Beta Was this translation helpful? Give feedback.
I just did a quick test by sharing the same buffer within a thread and suddenly performance is not degrading with increase in the QD.
For example when compared to original test with
bs=512K and QD=64
the performance has improved by 40-50%, although no change for QD=1, for obvious reasons.Will also repeat the test with hugepages and will update..