Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QEMU hangs when mem is exactly 2GB #171

Open
mutantmell opened this issue Dec 20, 2023 · 13 comments
Open

QEMU hangs when mem is exactly 2GB #171

mutantmell opened this issue Dec 20, 2023 · 13 comments

Comments

@mutantmell
Copy link

Hello,

I have a microvm that, microvm.mem is sufficiently large, will refuse to boot under qemu. This behavior only show after this commit.

The following (relatively minimal) configuration causes boot to hang:

{ pkgs, config, lib, ...}:
{
  microvm.shares = [{
    source = "/nix/store";
    mountPoint = "/nix/.ro-store";
    tag = "ro-store";
    proto = "virtiofs";
  }];

  microvm.mem = 2048;
  microvm.vcpu = 1;
}

the journal (journalctl -u [email protected]) for the vm ends with the following two lines:

Dec 20 23:30:12 host microvm@mid-mem[32008]: Memory KASLR using RDRAND RDTSC...
Dec 20 23:30:12 host microvm@mid-mem[32008]: Poking KASLR using RDRAND RDTSC...

The following config, however, boots as expected:

{ pkgs, config, lib, ...}:
{
  microvm.shares = [{
    source = "/nix/store";
    mountPoint = "/nix/.ro-store";
    tag = "ro-store";
    proto = "virtiofs";
  }];
  microvm.mem = 1024;
  microvm.vcpu = 1;
}
@oddlama
Copy link
Contributor

oddlama commented Dec 22, 2023

Can confirm, got the exact same problem here... No idea how to debug this

@astro
Copy link
Owner

astro commented Dec 22, 2023

I fail to reproduce this. qemu boots for me.

Off the top of my hat:

  • Does /var/tmp have at least 2g of free space? (Is that directory even used with memfd?)
  • Play with our use of the qemu parameter -object memory-backend-memfd,...
  • Start qemu without -sandbox on?
  • strace -f

@oddlama
Copy link
Contributor

oddlama commented Dec 22, 2023

Thanks, not sure about OP but I had 32G free space on /tmp. I'll try the others suggestions in the coming days :)

@oddlama
Copy link
Contributor

oddlama commented Dec 28, 2023

TL;DR: Adding acpi=off fixes the issue. Not yet sure why, will debug later.

Neither sandbox nor memory-backend seem to be the issue. Using strace is impossible since it's not directly related to a call on the hypervisor, the issue seems to be inside of the guest.

Setting loglevel=7 shows that the last print call has to do with ACPI initialization:

Dec 28 01:38:54 ward microvm@samba[950331]: Poking KASLR using RDRAND RDTSC...
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.236077] ftrace: allocating 37517 entries in 147 pages
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.245250] ftrace: allocated 147 pages with 4 groups
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.247023] Dynamic Preempt: voluntary
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.247616] rcu: Preemptible hierarchical RCU implementation.
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.248293] rcu:         RCU event tracing is enabled.
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.248831] rcu:         RCU restricting CPUs from NR_CPUS=384 to nr_cpu_ids=1.
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.249590]         Trampoline variant of Tasks RCU enabled.
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.250165]         Rude variant of Tasks RCU enabled.
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.250700]         Tracing variant of Tasks RCU enabled.
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.251225] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.252104] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.258410] NR_IRQS: 24832, nr_irqs: 256, preallocated irqs: 0
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.259202] rcu: srcu_init: Setting srcu_struct sizes based on contention.
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.260152] Console: colour *CGA 80x25
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.260662] printk: console [ttyS0] enabled
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.260662] printk: console [ttyS0] enabled
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.261622] printk: bootconsole [earlyser0] disabled
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.261622] printk: bootconsole [earlyser0] disabled
Dec 28 01:38:54 ward microvm@samba[950331]: [    0.262812] ACPI: Core revision 20220331
[... hangs]
Dec 28 01:41:48 ward systemd[1]: Stopping MicroVM 'samba'...

So naturally I tried disabling acpi first via acpi=off and that indeed fixes the issue, or more likely circumvents it. Now the boot continues normally. I had a quick look at it in gdb - not sure what to make of it yet - but included for future reference:

(gdb) target remote localhost:1234
Remote debugging using localhost:1234
warning: No executable has been specified and target does not support
determining executable automatically.  Try using the "file" command.
0x000000000000fff0 in exception_stacks ()
(gdb) nt
Undefined command: "nt".  Try "help".
(gdb) bt
#0  0x000000000000fff0 in exception_stacks ()
#1  0x0000000000000000 in ?? ()
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
acpi_tb_checksum (length=<optimized out>, 
    buffer=0xffffc900400aa0da <error: Cannot access memory at address 0xffffc900400aa0da>)
    at ../drivers/acpi/acpica/tbprint.c:206
206	../drivers/acpi/acpica/tbprint.c: No such file or directory.
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
acpi_tb_checksum (length=<optimized out>, 
    buffer=0xffffc900400dc17f <error: Cannot access memory at address 0xffffc900400dc17f>)
    at ../drivers/acpi/acpica/tbprint.c:206
206	in ../drivers/acpi/acpica/tbprint.c
(gdb) bt
#0  acpi_tb_checksum (length=<optimized out>, 
    buffer=0xffffc900400dc17f <error: Cannot access memory at address 0xffffc900400dc17f>)
    at ../drivers/acpi/acpica/tbprint.c:206
#1  acpi_tb_verify_checksum (table=0xffffc90040000280, length=<optimized out>) at ../drivers/acpi/acpica/tbprint.c:169
#2  0xffffffff815bb2d3 in acpi_tb_verify_temp_table (
    table_desc=table_desc@entry=0xffffffff82d077a0 <initial_tables+32>, 
    signature=signature@entry=0x0 <fixed_percpu_data>, table_index=table_index@entry=0xffffffff82603ee4)
    at ../drivers/acpi/acpica/tbdata.c:524
#3  0xffffffff82c73137 in acpi_reallocate_root_table () at ../drivers/acpi/acpica/tbxface.c:179
#4  0xffffffff82c711c4 in acpi_early_init () at ../drivers/acpi/bus.c:1203
#5  0xffffffff82c2b421 in start_kernel () at ../init/main.c:1099
#6  0xffffffff8100015a in secondary_startup_64 () at ../arch/x86/kernel/head_64.S:358
#7  0x0000000000000000 in ?? ()

@oddlama
Copy link
Contributor

oddlama commented Dec 28, 2023

In case anyone is interested, what I could find in gdb is that the DSDT acpi table that is loaded from physical memory (i.e. provided by the hypervisor) is corrupt. At this point I have no idea how this is tied to memory amount.

details It hangs because the kernel accesses unmapped memory when calculating the acpi table checksum. The second initial acpi table is corrupted (look at the huge table length, and weird ascii name):

image
image

  • This corrupt table is the DSDT installed in acpi_tb_parse_fadt
  • More specifically it is loaded here from physical memory, which means it was provided like that by the hypervisor. So a bug in qemu (or possibly a misuse of some option).

image

@astro
Copy link
Owner

astro commented Dec 28, 2023

@oddlama Thank you for investigating! Are there any observed downsides to disabling ACPI?

astro added a commit that referenced this issue Dec 28, 2023
Fixes Github issue #171
@oddlama
Copy link
Contributor

oddlama commented Dec 28, 2023

I didn't notice anything on this VM, but I'm not sure. It might affect ACPI shutdown and other host to guest signaling. Maybe issues when passing through physical devices. Generally I'd say it would be a bad idea to do acpi=off. Since this seems to be something related to qemu directly, I'll see if I can raise an issue over there.

@mutantmell mutantmell changed the title QEMU hangs when mem is sufficiently high QEMU hangs when mem is exactly 2GB Dec 30, 2023
@mutantmell
Copy link
Author

mutantmell commented Dec 30, 2023

I found a couple other oddities:

  1. If I change the protocol of the nix store share, the VM boots just fine -- no change to ACPI, sandbox or memory-backend needed.
  2. If I increase the amount of memory to 3GM, the VM boots just fine -- no change to ACPI, sandbox or memory-backend needed. (this indirectly proves that it's not an issue with /tmp not having enough space)
  3. If I let the 2GB sit for a long time, it eventually outputs an error with the nix store mount

(Here's one such error log)

Dec 19 21:43:05 vmhost microvm@guest: Memory KASLR using RDRAND RDTSC...
Dec 19 21:43:06 vmhost microvm@guest: Poking KASLR using RDRAND RDTSC...
Dec 19 21:46:10 vmhost microvm@guest: [  185.197307] PCI: Fatal: No config space access function found
Dec 19 21:46:10 vmhost microvm@guest: [  185.199748] ACPI Error: AE_NO_ACPI_TABLES, While loading namespace from ACPI tables (20220331/tbxfload-68)
Dec 19 21:46:11 vmhost microvm@guest: [1B blob data]
Dec 19 21:46:11 vmhost microvm@guest: <<< NixOS Stage 1 >>>
Dec 19 21:46:11 vmhost microvm@guest:
Dec 19 21:46:11 vmhost microvm@guest: loading module virtio_mmio...
Dec 19 21:46:11 vmhost microvm@guest: loading module virtio_pci...
Dec 19 21:46:11 vmhost microvm@guest: loading module virtio_blk...
Dec 19 21:46:11 vmhost microvm@guest: loading module 9pnet_virtio...
Dec 19 21:46:11 vmhost microvm@guest: loading module 9p...
Dec 19 21:46:11 vmhost microvm@guest: loading module virtiofs...
Dec 19 21:46:11 vmhost microvm@guest: loading module dm_mod...
Dec 19 21:46:11 vmhost microvm@guest: running udev...
Dec 19 21:46:11 vmhost microvm@guest: Starting systemd-udevd version 254.6
Dec 19 21:46:11 vmhost microvm@guest: kbd_mode: KDSKBMODE: Inappropriate ioctl for device
Dec 19 21:46:11 vmhost microvm@guest: [36B blob data]
Dec 19 21:46:11 vmhost microvm@guest: mounting rootfs on /...
Dec 19 21:46:11 vmhost microvm@guest: mounting ro-store on /nix/.ro-store...
Dec 19 21:46:11 vmhost microvm@guest: mount: mounting ro-store on /mnt-root/nix/.ro-store failed: Invalid argument
Dec 19 21:46:11 vmhost microvm@guest:
Dec 19 21:46:11 vmhost microvm@guest: An error occurred in stage 1 of the boot process, which must mount the
Dec 19 21:46:11 vmhost microvm@guest: root filesystem on `/mnt-root' and then start stage 2.  Press one
Dec 19 21:46:11 vmhost microvm@guest: of the following keys:
Dec 19 21:46:11 vmhost microvm@guest:
Dec 19 21:46:11 vmhost microvm@guest:   r) to reboot immediately
Dec 19 21:46:11 vmhost microvm@guest:   *) to ignore the error and continue

(That's an older error that I haven't reproduced recently, but I still get errors like [FAILED] Failed to mount /sysroot/nix/.ro-store)

This is such an obscure/bizarre error that I'm inclined to agree with @oddlama that this is a strange edge-case in the interplay between qemu and virtiofs. I'll keep digging as I have time.

@mutantmell
Copy link
Author

Thank you for investigating! Are there any observed downsides to disabling ACPI?

Looks like disabling ACPI breaks one of my VMs:

qemu-system-x86_64: -device virtio-rng-pci: No 'PCI' bus found for device 'virtio-rng-pci'

Pinning microvm.nix to rev=17e7f0682378e77e0ed0ab5796260bd3beb9d513 does not show this behavior.

@astro
Copy link
Owner

astro commented Jan 1, 2024

I reverted disabling ACPI due to the PCI bus becoming unavailable which seems more severe than this problem. (#178)

@xokdvium
Copy link

I'm also able to reproduce this bug with a similar configuration with exactly 2048M of RAM. Boot log with loglevel=7 confirms that this is the exact same problem with ACPI:

[    0.090043] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    0.090791] Fallback order for Node 0: 0
[    0.090794] Built 1 zonelists, mobility grouping on.  Total pages: 515839
[    0.091844] Policy zone: DMA32
[    0.092260] mem auto-init: stack:all(zero), heap alloc:on, heap free:off
[    0.099534] Memory: 1996068K/2096756K available (16384K kernel code, 2366K rwdata, 10840K rodata, 3116K init, 4492K bss, 100428K reserved, 0K cma-reserved)
[    0.101575] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
Poking KASLR using RDRAND RDTSC...
[    0.102347] ftrace: allocating 42830 entries in 168 pages
[    0.109310] ftrace: allocated 168 pages with 3 groups
[    0.110373] Dynamic Preempt: voluntary
[    0.110773] rcu: Preemptible hierarchical RCU implementation.
[    0.111171] rcu:     RCU event tracing is enabled.
[    0.111486] rcu:     RCU restricting CPUs from NR_CPUS=384 to nr_cpu_ids=2.
[    0.111953]  Trampoline variant of Tasks RCU enabled.
[    0.112304]  Rude variant of Tasks RCU enabled.
[    0.112620]  Tracing variant of Tasks RCU enabled.
[    0.112951] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
[    0.113498] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[    0.115989] NR_IRQS: 24832, nr_irqs: 512, preallocated irqs: 0
[    0.116459] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    0.117016] kfence: initialized - using 2097152 bytes for 255 objects at 0x(____ptrval____)-0x(____ptrval____)
[    0.117832] Console: colour *CGA 80x25
[    0.118113] printk: console [ttyS0] enabled
[    0.118113] printk: console [ttyS0] enabled
[    0.118709] printk: bootconsole [earlyser0] disabled
[    0.118709] printk: bootconsole [earlyser0] disabled
[    0.119427] ACPI: Core revision 20230628

astro added a commit that referenced this issue Oct 12, 2024
@MagicRB
Copy link

MagicRB commented Dec 24, 2024

I let it sit for 25 minutes and it did eventually boot after spinning at 100% cpu usage:

Physical KASLR using RDRAND RDTSC...
Virtual KASLR using RDRAND RDTSC...

Decompressing Linux... No EFI environment detected.
Parsing ELF... Performing relocations... done.
Booting the kernel (entry_offset: 0x0000000000000080).
[    0.000000] Linux version 6.6.63 (nixbld@localhost) (gcc (GCC) 13.3.0, GNU ld (GNU Binutils) 2.43.1) #1-NixOS SMP PREEMPT_DYNAMIC Fri Nov 22 14:38:37 UTC 2024
[    0.000000] Command line: earlyprintk=ttyS0 console=ttyS0 reboot=t panic=-1 root=fstab loglevel=4 init=/nix/store/khcm6pkhcfdf4p8zy7sym4q08gfsz2yd-nixos-system-mullvad-24.11pre-git/init regInfo=/nix/>Dec 24 12:46:58 blowhole microvm@mullvad[3487722]: [    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007fffefff] usable
[    0.000000] BIOS-e820: [mem 0x000000007ffff000-0x000000007fffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[    0.000000] printk: bootconsole [earlyser0] enabled
Memory KASLR using RDRAND RDTSC...
Poking KASLR using RDRAND RDTSC...
[ 1521.925938] PM: Unable to read current time from RTC
[ 1521.933718] PCI: Fatal: No config space access function found
[ 1521.937219] ACPI Error: AE_NO_ACPI_TABLES, While loading namespace from ACPI tables (20230628/tbxfload-68)
[10B blob data]
Welcome to NixOS 24.11 (Vicuna) (Initrd)!

@MagicRB
Copy link

MagicRB commented Dec 24, 2024

Ah it failed later:

[FAILED] Failed to mount /sysroot/nix/.ro-store.                     
See 'systemctl status "sysroot-nix-.ro\\x2dstore.mount"' for details.
[65B blob data]                                                      
[65B blob data]                                                      
[68B blob data]                                                      
[66B blob data]                                                      

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants