register_files performance for file IO in multi-thread situation? #807

beef9999 · 2023-03-01T05:09:45Z

beef9999
Mar 1, 2023

I was using multiple threads, with single io_uring instance per thread, to do concurrent reads on a file.

I observed there was no improvement when enabling register_files.

Network socket would work, and I can see the performance difference. So I imagine my code is correct.

Mar 7, 2023

@isilence Sorry, my fault. It seemed that I have posted a incorrect benchmark. Re-run those tests several times, and I'm sure that register_files is quite helpful in terms of increasing performance, even with page cache getting involved.

4 thread, 1 ring per thread, buffered read on a small file, open once, register 4 times
enable register_files: 5M qps
disable register_files: 3.6M qps

View full answer

redbaron · 2023-03-01T08:01:37Z

redbaron
Mar 1, 2023

Could it be that time to completion is dominated by the underneath storage rather than in-kernel file descriptor bookkeeping?

0 replies

isilence · 2023-03-02T04:04:56Z

isilence
Mar 2, 2023
Collaborator

Was it the same file descriptor for all rings, or did you open the file multiple times? If the former, then the cost is two atomics per I/O executed almost always by the same CPU, which is not much. For the latter it'd depend on what kind of contention we're talking about, how heavy the IO stack below io_uring and so on, but you would definitely need something high performant enough and without memcpy into the page cache to see the difference. You can take perf profiles and see what fraction of CPU they take.

2 replies

beef9999 Mar 2, 2023
Author

Yes it was the same fd for all rings. Each of the ring has registered the same fd. File was only opened once.

I'm just curious why you don't want page cache getting involved. Because even if I run direct IO reads on a SSD, my program could easily occupied the entire device to 100%. I doubt the hardware would be the bottleneck and thus no difference would occur.

isilence Mar 5, 2023
Collaborator

Because depending on the IO size memory copies can be expensive, which would make CPU share spend on refcounting negligibly small. E.g. if the app spends 95% of CPU on memcpy() then refcounting takes only a fraction of the rest 5%, probably less than 0.5%, and not really noticeable. It's usually not 95% but still changes numbers and also hinder caches.

Though it shouldn't make registered files slower than normal ones. How stable results are? Do you have a repro we can run?

beef9999 · 2023-03-07T10:00:00Z

beef9999
Mar 7, 2023
Author

@isilence Sorry, my fault. It seemed that I have posted a incorrect benchmark. Re-run those tests several times, and I'm sure that register_files is quite helpful in terms of increasing performance, even with page cache getting involved.

4 thread, 1 ring per thread, buffered read on a small file, open once, register 4 times
enable register_files: 5M qps
disable register_files: 3.6M qps

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

register_files performance for file IO in multi-thread situation? #807

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

register_files performance for file IO in multi-thread situation? #807

beef9999 Mar 1, 2023

Replies: 3 comments · 2 replies

redbaron Mar 1, 2023

isilence Mar 2, 2023 Collaborator

beef9999 Mar 2, 2023 Author

isilence Mar 5, 2023 Collaborator

beef9999 Mar 7, 2023 Author

beef9999
Mar 1, 2023

Replies: 3 comments 2 replies

redbaron
Mar 1, 2023

isilence
Mar 2, 2023
Collaborator

beef9999 Mar 2, 2023
Author

isilence Mar 5, 2023
Collaborator

beef9999
Mar 7, 2023
Author