Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MicroVMs with QEMU can conflict in vsock port space on the host side #311

Open
vikanezrimaya opened this issue Dec 13, 2024 · 2 comments
Open

Comments

@vikanezrimaya
Copy link
Contributor

vikanezrimaya commented Dec 13, 2024

While implementing the QEMU version of the notify-over-vsock feature, I seem to have overlooked the fact that the vsock port space is shared between units, and unsharing it would require a network namespace. In contrast, it looks like cloud-hypervisor doesn't actually use vsock and somehow fakes it with a UNIX socket. QEMU cannot do that. Therefore we need to choose a unique port on the host side of the vsock for every single VM, and communicate that port via the systemd credential we pass to the VM.

Because of this, it is impossible to launch more than one MicroVM that use notify sockets with QEMU.

@vikanezrimaya vikanezrimaya changed the title MicroVMs with QEMU can conflict in vsock address space MicroVMs with QEMU can conflict in vsock port space on the host side Dec 13, 2024
@vikanezrimaya
Copy link
Contributor Author

It feels like one could solve this problem by binding to port -1 and then somehow divining the port number assigned. (Apparently vsock port numbers are 4 bytes long — wow that's a lot of ports — and ports below 1024 are 'privileged' just like with ordinary networking.)

socat -d -d logs the result of getsockname on the listening socket. Parsing the port number out of there is, admittedly, a hack:

${pkgs.socat}/bin/socat -d -d VSOCK-LISTEN:-1,fork UNIX-SENDTO:$NOTIFY_SOCKET 2> $TMP/socat.log &
NOTIFY_VSOCK_PORT=$(${pkgs.coreutils}/bin/tail -f $TMP/socat.log | ${pkgs.gawk}/bin/awk '/listening on/ { if (match($0, /port:[0-9]*/)) { print substr($0, RSTART + 5, RLENGTH - 5); exit; } }')

But the clever usage of lib.escapeShellArgs on the formation of command prevents me from using this environment variable.

However, QEMU does not need to take literal arguments in fw_cfg: it can also take a path, which I found quite useful before for passing through credentials from systemd units. Therefore, one only needs to somehow securely create a file and refer to it in QEMU... and we're back to step one, because filename collisions are also a thing, and I'd like to avoid them... unless we use relative paths and are cded to a safe directory first?

It turns out we're in /var/lib/microvms/%i, which looks safe enough. In fact, some other parts of the script also place load-bearing files in this directory. Why not a few more auxiliary files? (eh, just make sure to clean them up, I guess.)

Internally, VMADDR_PORT_ANY assigns a port, starting from a random 32-bit number. While the mechanism may fail if a program assigns more than 24 ports sequentially after its assigned port by itself (not through VMADDR_PORT_ANY), I'd suggest the DoS potential to be extremely low (you wouldn't want to be running untrusted software directly on the hypervisor host anyway).

However, I don't seem to be able to get it working reliably for some reason. Sometimes, somehow, it just fails, and I suspect the problem is on socat's side. I'll send a draft PR in a few minutes, but socat's behavior might need to be debugged more precisely.

@astro
Copy link
Owner

astro commented Dec 13, 2024

Thanks for sharing your findings!

PR is #313

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants