Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QEMU soft-TLB improvements #150

Merged
merged 8 commits into from
Jun 28, 2021
Merged

QEMU soft-TLB improvements #150

merged 8 commits into from
Jun 28, 2021

Conversation

arichardson
Copy link
Member

This is just a rebase onto dev, I haven't reviewed the code yet.

@arichardson
Copy link
Member Author

Benchmarks for booting MFS_ROOT kernels:

RISC-V purecap kernel:

alr48@waltham:/local/scratch/alr48/cheri/cheribsd(dev u=)> hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.baseline,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.softtlb  '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' -m 3
Benchmark #1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.baseline -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):     20.976 s ±  0.159 s    [User: 20.590 s, System: 0.167 s]
  Range (min … max):   20.794 s … 21.085 s    3 runs

Benchmark #2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.softtlb -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):     12.973 s ±  0.058 s    [User: 12.288 s, System: 0.111 s]
  Range (min … max):   12.921 s … 13.036 s    3 runs

Summary
  '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.softtlb -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran
    1.62 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.baseline -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-PURECAP-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'

RISC-V hybrid kernel:

alr48@waltham:/local/scratch/alr48/cheri/cheribsd(dev u=)> hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.baseline,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.softtlb  '{qemu} -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'
Benchmark #1: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.baseline -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      4.585 s ±  0.026 s    [User: 4.118 s, System: 0.085 s]
  Range (min … max):    4.551 s …  4.627 s    10 runs

Benchmark #2: /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.softtlb -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh
  Time (mean ± σ):      3.648 s ±  0.020 s    [User: 3.347 s, System: 0.089 s]
  Range (min … max):    3.623 s …  3.692 s    10 runs

Summary
  '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.softtlb -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh' ran
    1.26 ± 0.01 times faster than '/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.baseline -M virt -m 2048 -nographic -bios bbl-riscv64cheri-virt-fw_jump.bin -kernel /local/scratch/alr48/cheri/output/kernel-riscv64-purecap.CHERI-QEMU-MFS-ROOT -append init_path=/sbin/startup-benchmark.sh'

accel/tcg/cputlb.c Outdated Show resolved Hide resolved
accel/tcg/cputlb.c Outdated Show resolved Hide resolved
Copy link
Member

@nwf nwf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a first rough read through, this looks great with just some minor points. I'll merge this into my tree with #117 and see how that goes.

accel/tcg/cputlb.c Outdated Show resolved Hide resolved
* we let it "see around" the TLB capability load inhibit. That should
* perhaps change?
* The Mips CLoadTags, can "see around" the TLB capability load inhibit.
* That should perhaps change?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that true only of MIPS? I don't think there's harm in making CLoadTags respect the tag-clearing or tag-load-trapping behaviors. It's used very rarely, and at present exclusively by the kernel through the dmap region which never has those attributes set (that is, the dmap always permits cap loads and stores).

target/cheri-common/cheri_tagmem.c Outdated Show resolved Hide resolved
@nwf
Copy link
Member

nwf commented Jun 17, 2021

Not that it's indicative of perfect functionality by any means, but I rebased #117 atop this and it passes the basic cheribsdtest-purecap smoke tests.

@arichardson
Copy link
Member Author

Benchmarks for booting to multi-user and shutting down:
purecap kernel + purecap userspace

alr48@waltham:/local/scratch/alr48/cheri/cheribsd(dev u=)> hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.baseline,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.softtlb '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd {qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 2
Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.baseline --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     611.770 s ±  3.921 s    [User: 597.733 s, System: 12.143 s]
  Range (min … max):   608.997 s … 614.543 s    2 runs

Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.softtlb --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     311.638 s ±  0.213 s    [User: 300.455 s, System: 2.044 s]
  Range (min … max):   311.488 s … 311.789 s    2 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.softtlb --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.96 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.baseline --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'

hybrid kernel + purecap userspace

alr48@waltham:/local/scratch/alr48/cheri/cheribsd(dev u=)> hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.baseline,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.softtlb '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel/kernel --qemu-cmd {qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 2
Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.baseline --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     208.867 s ±  2.144 s    [User: 194.562 s, System: 2.332 s]
  Range (min … max):   207.351 s … 210.382 s    2 runs

Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.softtlb --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     141.067 s ±  0.650 s    [User: 121.470 s, System: 7.372 s]
  Range (min … max):   140.607 s … 141.526 s    2 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.softtlb --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.48 ± 0.02 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.baseline --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'

Copy link
Member Author

@arichardson arichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed some scripts/checkpatch.pl fixes on top of the original commit.

I think this looks good. We need a reasonable commit message for "Always use soft-TLB for tag reads/writes" and the v2r comment needs to be updated. Once that has been done I think we can merge it.

target/cheri-common/cheri_tagmem.c Outdated Show resolved Hide resolved
target/cheri-common/cheri_tagmem.c Show resolved Hide resolved
accel/tcg/cputlb.c Show resolved Hide resolved
target/cheri-common/cheri_tagmem.c Show resolved Hide resolved
Copy link
Member

@nwf nwf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't mean to be holding this up; if people are happy with it, it should land.

@arichardson
Copy link
Member Author

I am happy to merge this once CI passes. Have all your comments been addressed @jrtc27 @nwf?

@LawrenceEsswood
Copy link
Contributor

Addressed most of the comments. If somebody wants to clean up clang-format to not do silly things, I guess feel free. The only real issue is what whether we are happy for loadtags to do what I just changed it to do. Just to re-itterate: not only have I removed the ifdef for MIPS, I have also made it only be exceptional in the case the result of the load contains some tags, for the sake of consistency.

@arichardson
Copy link
Member Author

arichardson commented Jun 24, 2021

I can update the commit message before merging to include the performance numbers.

This will be used by Morello. Also slightly refactor getting / setting
multiple tags with atomicity in mind.
This significantly speeds up QEMU since a large fraction of the total
runtime is spent looking up the physical address for tag memory.
A purecap kernel+purecap userspace boot is almost twice as fast with
this change (310 instead of 610 seconds) and purecap userspace with
a hybrid kernel now takes 140 instead of 207 seconds.

purecap kernel + purecap userspace:
```
alr48@waltham:/local/scratch/alr48/cheri/cheribsd(dev u=)> hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.baseline,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.softtlb '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd {qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 2
Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.baseline --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     611.770 s ±  3.921 s    [User: 597.733 s, System: 12.143 s]
  Range (min … max):   608.997 s … 614.543 s    2 runs

Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.softtlb --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     311.638 s ±  0.213 s    [User: 300.455 s, System: 2.044 s]
  Range (min … max):   311.488 s … 311.789 s    2 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.softtlb --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.96 ± 0.01 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel.CHERI-PURECAP-QEMU/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.baseline --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```
hybrid kernel + purecap userspace:
```
alr48@waltham:/local/scratch/alr48/cheri/cheribsd(dev u=)> hyperfine -L qemu /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.baseline,/local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.softtlb '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel/kernel --qemu-cmd {qemu} --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' -m 2
Benchmark #1: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.baseline --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     208.867 s ±  2.144 s    [User: 194.562 s, System: 2.332 s]
  Range (min … max):   207.351 s … 210.382 s    2 runs

Benchmark #2: /home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.softtlb --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest
  Time (mean ± σ):     141.067 s ±  0.650 s    [User: 121.470 s, System: 7.372 s]
  Range (min … max):   140.607 s … 141.526 s    2 runs

Summary
  '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.softtlb --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest' ran
    1.48 ± 0.02 times faster than '/home/alr48/devel/cheribuild/test-scripts/run_cheribsd_tests.py --ssh-key /home/alr48/.ssh/insecure_id_ed25519.pub --architecture riscv64-purecap --kernel /local/scratch/alr48/cheri/output/rootfs-riscv64-purecap/boot/kernel/kernel --qemu-cmd /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri.baseline --disk-image /local/scratch/alr48/cheri/output/cheribsd-riscv64-purecap.img --no-run-cheribsdtest'
```

TODO: write a more detailed commit message including performance numbers
target/riscv/cheri-archspecific-early.h Show resolved Hide resolved
target/mips/helper.c Outdated Show resolved Hide resolved
target/cheri-common/cheri_tagmem.c Outdated Show resolved Hide resolved
Copy link
Member

@jrtc27 jrtc27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to merge once comments are addressed

@arichardson arichardson merged commit 031dff6 into dev Jun 28, 2021
@arichardson arichardson deleted the soft-tlb-tags-dev branch June 28, 2021 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants