Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ppc64le: Tests failed (2): <no-mmap-inval.t> <reg-fd-only.t> #1207

Open
vt-alt opened this issue Aug 17, 2024 · 19 comments
Open

ppc64le: Tests failed (2): <no-mmap-inval.t> <reg-fd-only.t> #1207

vt-alt opened this issue Aug 17, 2024 · 19 comments

Comments

@vt-alt
Copy link

vt-alt commented Aug 17, 2024

JFYI, two tests fail on ppc64 on Linux v6.6.46 for liburing-2.7. They succeed or skipped on x86_64.

+ make runtests
...
[00:01:36] Running test no-mmap-inval.t                                        Got -2, wanted -EFAULT
[00:01:36] Test no-mmap-inval.t failed with ret 1
...
[00:01:49] Running test reg-fd-only.t                                          ring setup failed
[00:01:49] test 8 failed
[00:01:49] Test reg-fd-only.t failed with ret 1

On x86_64:

[00:01:22] Running test no-mmap-inval.t                                        0 sec
...
[00:01:29] Running test reg-fd-only.t                                          Enable huge pages to test big rings
[00:01:29] Skipped
@vt-alt
Copy link
Author

vt-alt commented Aug 19, 2024

Also, the same version suddenly got on aarch64:

[00:00:40] Running test accept-non-empty.t                                     Test accept-non-empty.t failed with ret 78
...
[00:02:40] Tests failed (1): <accept-non-empty.t>

axboe added a commit that referenced this issue Aug 19, 2024
Test for IORING_FEAT_RECVSEND_BUNDLE before setting up the connection
thread, or we could be tearing down pthread data at inopportune moments
leading to odd behavior.

Link: #1207
Fixes: 184e6ec ("test/accept-non-empty: add accept IORING_CQE_F_SOCK_NONEMPTY test")
Signed-off-by: Jens Axboe <[email protected]>
@axboe
Copy link
Owner

axboe commented Aug 19, 2024

Pushed a fix for accept-non-empty, that was a bug in the test.

For ppc64, the -ENOENT for mmap-no-inval is very (very) odd. For reg-fd-only, I'll push a commit to dump 'ret'. Can you try and re-run it? I'm wondering what it's returning. Maybe both are the same arch odditiy and it'll be -ENOENT?!

axboe added a commit that referenced this issue Aug 19, 2024
@vt-alt
Copy link
Author

vt-alt commented Aug 19, 2024

Thanks. Now with ebd6c8f it failed only on ppc64le (and i586 where is usually exclude sqpoll-sleep.t). Kernel is also changed to v6.6.47.

  • ppc64le :
[00:01:36] Running test no-mmap-inval.t                                        Got -2, wanted -EFAULT
[00:01:36] Test no-mmap-inval.t failed with ret 1
...
[00:01:50] Running test reg-fd-only.t                                          ring setup failed: -2
[00:01:50] test 8 failed
[00:01:50] Test reg-fd-only.t failed with ret 1
...
[00:01:52] Running test send-zerocopy.t                                        invalid cqe->res -90 expected 65536
[00:01:52] send failed fixed buf 0, conn 0, addr 1, cork 0
[00:01:52] test_inet_send() failed (defer_taskrun 0)
[00:01:52] Test send-zerocopy.t failed with ret 1
...
[00:02:12] Tests failed (3): <no-mmap-inval.t> <reg-fd-only.t> <send-zerocopy.t>
  • i586:
[00:02:01] Running test sqpoll-sleep.t                                         Test sqpoll-sleep.t failed with ret 1

@isilence
Copy link
Collaborator

isilence commented Aug 19, 2024

Thanks. Now with ebd6c8f it failed only on ppc64le (and i586 where is usually exclude sqpoll-sleep.t). Kernel is also changed to v6.6.47.

* ppc64le :

...

[00:01:52] Running test send-zerocopy.t invalid cqe->res -90 expected 65536
[00:01:52] send failed fixed buf 0, conn 0, addr 1, cork 0

It's UDP for which we "expect" 65536 bytes in a datagram, more than usually supported by UDP. Looks the test wasn't prepared for 16K pages.

if (!tcp && len > 4 * page_sz)
	continue; // skip test

@vt-alt
Copy link
Author

vt-alt commented Oct 31, 2024

For 2.8 test failures on ppc64le on Linux 6.11.5:

[00:01:39] Running test no-mmap-inval.t                                        Got -2, wanted -EFAULT
[00:01:39] Test no-mmap-inval.t failed with ret 1

[00:02:01] Running test reg-fd-only.t                                          ring setup failed: -2
[00:02:01] test 8 failed
[00:02:01] Test reg-fd-only.t failed with ret 1

[00:02:03] Running test recvsend_bundle.t                                      failed recv cqe: -105
[00:02:03] test d failed
[00:02:03] TCP test case (classic=0) failed
[00:02:03] Test recvsend_bundle.t failed with ret 1

[00:03:11] Running test timeout.t                                              child failed 0
[00:03:11] test_timeout_link_cancel failed
[00:03:11] Test timeout.t failed with ret 1

[00:03:17] Tests failed (4): <no-mmap-inval.t> <reg-fd-only.t> <recvsend_bundle.t> <timeout.t>

Additionally, 1 test fail on i586:

[00:02:32] Running test sqpoll-sleep.t                                         Test sqpoll-sleep.t failed with ret 1

[00:02:49] Tests failed (1): <sqpoll-sleep.t>

Temporary build logs:
https://git.altlinux.org/tasks/361211/build/100/ppc64le/log
https://git.altlinux.org/tasks/361211/build/100/i586/log

On x86_64 and aarch64, with same build env and same kernel version tests do not fail.

@vt-alt
Copy link
Author

vt-alt commented Oct 31, 2024

I additionally tested on 6.12-rc5, and the list of failed tests is identical across all architectures.

[00:03:44] Test run complete, kernel: 6.12.0-6.12-alt0.rc5 #1 SMP Sun Oct 27 23:47:43 UTC 2024
[00:03:44] Tests failed (5): <no-mmap-inval.t> <reg-fd-only.t> <recvsend_bundle.t> <recvsend_bundle-inc.t> <timeout.t>
[00:03:15] Test run complete, kernel: 6.12.0-6.12-alt0.rc5 #1 SMP PREEMPT_DYNAMIC Sun Oct 27 23:46:51 UTC 2024
[00:03:15] Tests failed (1): <sqpoll-sleep.t>

Temporary build logs:
https://git.altlinux.org/tasks/361212/build/100/ppc64le/log
https://git.altlinux.org/tasks/361212/build/100/i586/log

@axboe
Copy link
Owner

axboe commented Oct 31, 2024

Thanks for running these. I'll check x86, but I don't have any powerpc to test on... Oh maybe this is a page size thing. What page size is your ppc box running?

@vt-alt
Copy link
Author

vt-alt commented Oct 31, 2024

$ getconf PAGESIZE
65536

axboe added a commit that referenced this issue Oct 31, 2024
100 usec / 1 ms is a bit short, make it 100x larger in the hopes that
it'll fix the test case on some boxes.

Link: #1207
Signed-off-by: Jens Axboe <[email protected]>
@axboe
Copy link
Owner

axboe commented Oct 31, 2024

Can you test sqpoll-sleep after the commit I just made?

@axboe
Copy link
Owner

axboe commented Oct 31, 2024

Can you try and strace no-mmap-inval on ppc and attach it here? It should be using page size dependent code already.

axboe added a commit that referenced this issue Oct 31, 2024
This is most likely because there are no huge pages available, so just
skip the test in that case.

Link: #1207
Signed-off-by: Jens Axboe <[email protected]>
axboe added a commit that referenced this issue Oct 31, 2024
We should loop until WIFEXITED() is true.

Link: #1207
Signed-off-by: Jens Axboe <[email protected]>
@axboe
Copy link
Owner

axboe commented Oct 31, 2024

Pushed some fixes, hopefully fixing some of them.

@vt-alt
Copy link
Author

vt-alt commented Oct 31, 2024

Thanks. After updates applied, up to 59c0cb3 2024-10-30 test/timeout: properly loop around waitpid() status.
i586 still have failure:

[00:04:23] Tests failed (1): <sqpoll-sleep.t>

ppc64le:

[00:03:52] Tests failed (3): <no-mmap-inval.t> <recvsend_bundle.t> <recvsend_bundle-inc.t>
strace -v -f test/no-mmap-inval.t
[00:00:24] #1 SMP Sun Oct 2++ strace -v -f test/no-mmap-inval.t
[00:00:25] execve("test/no-mmap-inval.t", ["test/no-mmap-inval.t"], ["RPM_PYTHON_COMPILE_INCLUDE=/usr/"..., "RPM_FIXUP_TOPDIR=", "RPM_PYTHON3_SELF_PROV_PATH=", "RPM_SOURCE_DIR=/usr/src/RPM/SOUR"..., "RPM_PKG_CONTENTS_INDEX_BIN=/.hos"..., "RPM_PYTHON_LIB_PATH=", "RPM_PYTHON3_VERSION=unknown", "G_BROKEN_FILENAMES=1", "HISTSIZE=999", "HOSTNAME=localhost.localdomain", "RPM_DEBUGINFO_STRIPPED_TERMINATE"..., "RPM_PYTHON3_SITELIBDIR=/usr/lib6"..., "RPM_PYTHON=/usr/bin/python2.7", "RPM_PYTHON3_COMPILE_EXCLUDE=/usr"..., "RPM_PYTHON_COMPILE_DEEP=20", "RPM_PYTHON3_SITELIBDIR_NOARCH=/u"..., "RPM_PYTHON_COMPILE_SKIP_X=1", "RPM_PYTHON_COMPILE_EXCLUDE=/usr/"..., "RPM_LIB=lib64", "RPM_FIXUP_METHOD=binconfig pkgco"..., "RPM_LD_PRELOAD_python=/usr/lib64"..., "PWD=/usr/src/RPM/BUILD/liburing-"..., "RPM_CLEANUP_TOPDIR=", "SOURCE_DATE_EPOCH=1730337221", "LOGNAME=root", "RPM_PYTHON3_COMPILE_SKIP_X=1", "RPM_VERIFY_ELF_METHOD=strict", "RPM_FILES_TO_LD_PRELOAD_python= "..., "RPM_CLEANUP_SKIPLIST=", "RPM_ARCH=ppc64le", "RPM_PYTHON3_COMPILE_CLEAN=1", "RPM_DATADIR=/usr/share", "HOME=/usr/src", "RPM_FILES_TO_LD_PRELOAD_python3="..., "PERL_USE_UNSAFE_INC=1", "RPM_PERL_REQ_METHOD=normal", "RPM_PYTHON_REQ_METHOD=slight", "RPM_TARGET_ARCH=ppc64le", "RPM_PYTHON3=/usr/bin/python3", "RPM_FINDPROV_LIB_PATH=", "RPM_COMPRESS_TOPDIR=/usr", "TMPDIR=/tmp", "RPM_VERIFY_ELF_TOPDIR=", "RPM_CHECK_CONTENTS_METHOD=defaul"..., "RPM_FINDPROV_TOPDIR=", "RPM_PACKAGE_RELEASE=alt1.test.2", "RPM_DEBUGINFO_SKIPLIST=", "RPM_PYTHON_COMPILE_CLEAN=1", "RPM_OS=linux", "RPM_VERIFY_ELF_SKIPLIST=", "MAKEFLAGS=-w -O PAM_SO_SUFFIX=", "RPM_CHECK_CONTENTS_SKIPLIST=", "TERM=dumb", "RPM_PYTHON3_REQ_METHOD=slight", "USER=root", "RPM_PYTHON3_LIBDIR=/usr/lib64/py"..., "RPM_PYTHON3_PATH=/usr/lib64/pyth"..., "PAM_SO_SUFFIX=", "RPM_TARGET_OS=linux", "SHLVL=3", "RPM_BUILD_DIR=/usr/src/RPM/BUILD", "RPM_FINDREQ_TOPDIR=", "SCRIPT=/usr/src/tmp/vm.yp4UP1yb8"..., "RPM_FIXUP_SKIPLIST=", "PAM_NAME_SUFFIX=", "RPM_PYTHON2_PATH=/usr/lib64/pyth"..., "RPM_PYTHON3_COMPILE_DEEP=20", "RPM_OPT_FLAGS=-pipe -frecord-gcc"..., "RPM_PYTHON3_REQ_HIER=yes", "RPM_FINDREQ_SKIPLIST=/usr/share/"..., "RPM_PYTHON3_IMPORT_PATH=", "RPM_DOC_DIR=/usr/share/doc", "RPM_VERIFY_INFO_METHOD=normal", "RPM_PACKAGE_VERSION=2.8", "RPM_STRICT_INTERDEPS=sisyphus.36"..., "RPM_PYTHON3_COMPILE_INCLUDE=/usr"..., "RPM_PYTHON_MODULE_DECLARED=", "RPM_PYTHON_REQ_SKIP=", "RPM_LIBDIR=/usr/lib64", "RPM_LD_PRELOAD_python3=/usr/lib6"..., "RPM_PYTHON_COMPILE_METHOD=ALL", "RPM_PYTHON3_REQ_SKIP=", "PATH=/usr/src/bin:/usr/bin:/bin:"..., "RPM_CHECK_CONTENTS_TOPDIR=", "HISTFILESIZE=9999", "MAIL=/var/mail/builder", "RPM_FINDPROV_SKIPLIST=/usr/share"..., "RPM_FINDPACKAGE_PATH=", "RPM_COMPRESS_SKIPLIST=", "RPM_PACKAGE_NAME=liburing", "RPM_CLEANUP_METHOD=auto", "RPM_BUILD_ROOT=/usr/src/tmp/libu"..., "OLDPWD=/usr/src/RPM/BUILD/liburi"..., "RPM_COMPRESS_METHOD=auto", "RPM_PYTHON3_LIB_PATH=", "_=/usr/bin/strace"]) = 0
[00:00:25] brk(NULL)                               = 0x10016f70000
[00:00:25] access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
[00:00:25] openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
[00:00:25] newfstatat(3, "", {st_dev=makedev(0, 0x14), st_ino=53631, st_mode=S_IFREG|0644, st_nlink=1, st_uid=1031, st_gid=1031, st_blksize=196608, st_blocks=128, st_size=8599, st_atime=1730337271 /* 2024-10-31T01:14:31.422995818+0000 */, st_atime_nsec=422995818, st_mtime=1730337271 /* 2024-10-31T01:14:31.421995880+0000 */, st_mtime_nsec=421995880, st_ctime=1730337271 /* 2024-10-31T01:14:31.421995880+0000 */, st_ctime_nsec=421995880}, AT_EMPTY_PATH) = 0
[00:00:25] mmap(NULL, 8599, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fffaf200000
[00:00:25] close(3)                                = 0
[00:00:25] openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
[00:00:25] read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0\25\0\1\0\0\0\240\243\2\0\0\0\0\0"..., 832) = 832
[00:00:25] newfstatat(3, "", {st_dev=makedev(0, 0x14), st_ino=2969, st_mode=S_IFREG|0755, st_nlink=1, st_uid=1031, st_gid=1031, st_blksize=196608, st_blocks=4864, st_size=2439000, st_atime=1730337245 /* 2024-10-31T01:14:05.859595745+0000 */, st_atime_nsec=859595745, st_mtime=1714366800 /* 2024-04-29T05:00:00+0000 */, st_mtime_nsec=0, st_ctime=1730337243 /* 2024-10-31T01:14:03.321754580+0000 */, st_ctime_nsec=321754580}, AT_EMPTY_PATH) = 0
[00:00:25] mmap(NULL, 2482960, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fffaefa0000
[00:00:25] mmap(0x7fffaf1e0000, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x240000) = 0x7fffaf1e0000
[00:00:25] close(3)                                = 0
[00:00:25] set_tid_address(0x7fffaf2a2e10)         = 176
[00:00:25] set_robust_list(0x7fffaf2a2e20, 24)     = 0
[00:00:25] rseq(0x7fffaf2a3460, 0x20, 0, 0xfe5000b) = 0
[00:00:25] mprotect(0x7fffaf1e0000, 65536, PROT_READ) = 0
[00:00:25] mprotect(0x100ad0000, 65536, PROT_READ) = 0
[00:00:25] mprotect(0x7fffaf290000, 65536, PROT_READ) = 0
[00:00:25] prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
[00:00:25] munmap(0x7fffaf200000, 8599)            = 0
[00:00:25] getrandom("\x57\x15\x16\x31\xd6\xa0\x58\xe8", 8, GRND_NONBLOCK) = 8
[00:00:25] brk(NULL)                               = 0x10016f70000
[00:00:25] brk(0x10016fb0000)                      = 0x10016fb0000
[00:00:25] mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0) = 0x7fffaf200000
[00:00:25] mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = -1 ENOENT (No such file or directory)
[00:00:25] munmap(0x7fffaf200000, 1)               = 0
[00:00:25] write(2, "Got -2, wanted -EFAULT\n", 23Got -2, wanted -EFAULT
[00:00:25] ) = 23
[00:00:25] exit_group(1)                           = ?
[00:00:25] +++ exited with 1 +++

Temporary build logs
https://git.altlinux.org/tasks/361214/build/100/i586/log
https://git.altlinux.org/tasks/361214/build/100/ppc64le/log

axboe added a commit that referenced this issue Oct 31, 2024
See commit becdca8 for details.

Link: #1207
Signed-off-by: Jens Axboe <[email protected]>
@axboe
Copy link
Owner

axboe commented Oct 31, 2024

The bundle one needs a bit more investigation. no-mmap-inval should skip now too on ppc. I'll check the sqpoll-sleep on x86, that's very odd.

axboe added a commit that referenced this issue Nov 1, 2024
Rewrite this test to be a bit better:

- Read wakeup properly with IO_URING_READ_ONCE()
- Check if wakeup has been seen
- Check elapsed time before wakeup flag is seen
- Prepare and push a nop request first, to ensure the thread is up
  and running

Hopefully this will fix the quirks with this test.

Link: #1207
Signed-off-by: Jens Axboe <[email protected]>
@axboe
Copy link
Owner

axboe commented Nov 1, 2024

Pushed another fix for sqpoll-sleep, can you give it a spin on x86?

@vt-alt
Copy link
Author

vt-alt commented Nov 3, 2024

Thanks, For 0733494, on i586 now All tests passed, on ppc64le:

[00:02:13] Running test recvsend_bundle.t                                      failed recv cqe: -105
[00:02:13] test d failed
[00:02:13] TCP test case (classic=0) failed
[00:02:13] Test recvsend_bundle.t failed with ret 1

[00:03:27] Tests failed (1): <recvsend_bundle.t>

@axboe
Copy link
Owner

axboe commented Nov 3, 2024

Thanks, so we're just down to the bundle test. Let one will probably linger for a while until I get access to a ppc (or similar) system. Pretty sure it's a test issue, so I'd say just ignore it for now.

axboe added a commit that referenced this issue Nov 3, 2024
@vt-alt
Copy link
Author

vt-alt commented Nov 3, 2024

IC. Thanks for the help. I decided to try with the latest commit 37a3880 and ppc suddenly reported additional failure

[00:02:10] Running test recv-multishot.t                                       connect failed
[00:02:10] t_create_socket_pair failed: 4
[00:02:10] test stream=1 wait_each=1 recvmsg=0 early_error=4  defer=1 failed
[00:02:10] Test recv-multishot.t failed with ret 1

[00:03:25] Tests failed (2): <recv-multishot.t> <recvsend_bundle.t>

Repeated run didn't show the failure, so perhaps it's intermittent.

@vt-alt
Copy link
Author

vt-alt commented Jan 8, 2025

JFYI. Besides these old tests, that I just skip, the new test failed on all architectures (for liburing-2.9-rc1) and is not reported yet:

[00:01:42] Running test read-inc-file.t                                        fail buffer check loop 0
[00:01:42] Test read-inc-file.t failed with ret 1

This is on Linux v6.12.8 (inside of kvm, with v6.12.6 on the host).

@axboe
Copy link
Owner

axboe commented Jan 8, 2025

That's expected, it's fixed by:

https://git.kernel.dk/cgit/linux/commit/?h=io_uring-6.13&id=ed123c948d06688d10f3b10a7bce1d6fbfd1ed07

which is upstream but hasn't made it into stable just yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants