Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NUMA Nodes Not Correctly Detected in TBB (Version 2022.0) #1588

Closed
GuyZilberman opened this issue Dec 30, 2024 · 9 comments
Closed

NUMA Nodes Not Correctly Detected in TBB (Version 2022.0) #1588

GuyZilberman opened this issue Dec 30, 2024 · 9 comments
Labels

Comments

@GuyZilberman
Copy link

Summary

TBB does not correctly recognize NUMA nodes on a system with 2 NUMA nodes. The tbb::info::numa_nodes() API returns a single invalid index -1, while the system clearly supports NUMA with proper NUMA node IDs.

Version

oneTBB version: 2022.0

Environment

  • Hardware:
    • Architecture: x86_64
    • CPU(s): 192 (2 sockets, 48 cores per socket, 2 threads per core)
    • NUMA node(s): 2
    • NUMA node0 CPU(s): 0-47,96-143
    • NUMA node1 CPU(s): 48-95,144-191
    • CPU Model Name: Intel(R) Xeon(R) Platinum 8468
    • CPU MHz: 3100.186

Output of lscpu | grep "NUMA node":

NUMA node(s):                    2 
NUMA node0 CPU(s):               0-47,96-143  
NUMA node1 CPU(s):               48-95,144-191
  • OS:

    • Name: Ubuntu 20.04.6 LTS
    • Kernel: 5.8.0-50-generic
  • Compiler:

    • nvcc:
      pliops@lab3010:~/guyzi/gpu_kv_api$ nvcc --version
      nvcc: NVIDIA (R) Cuda compiler driver
      Copyright (c) 2005-2024 NVIDIA Corporation
      Built on Tue_Oct_29_23:50:19_PDT_2024
      Cuda compilation tools, release 12.6, V12.6.85
      Build cuda_12.6.r12.6/compiler.35059454_0
      

Observed Behavior

The program outputs the following:

TBB version: 2022.0  
NUMA node indexes: -1

This indicates that TBB failed to detect the NUMA nodes correctly.

Expected Behavior

TBB should recognize and return the correct NUMA node indexes (0 and 1) via the tbb::info::numa_nodes() API.

Steps To Reproduce

Ensure NUMA is enabled on the system. Verify with the command lscpu | grep "NUMA node".
Use the following program to reproduce the issue:

#include <tbb/tbb.h>
#include <iostream>
#include <vector>

int main() {
    std::cout << "TBB version: " << TBB_VERSION_MAJOR << "." << TBB_VERSION_MINOR << std::endl;

    std::vector<tbb::numa_node_id> numa_indexes = tbb::info::numa_nodes();

    if (numa_indexes.empty()) {
        std::cerr << "No NUMA nodes detected. Ensure the system supports NUMA." << std::endl;
        return 1;
    }

    std::cout << "NUMA node indexes: ";
    for (const auto &index : numa_indexes) {
        std::cout << index << " ";
    }
    std::cout << std::endl;

    return 0;
}

Compile the program using the following command:

nvcc -ltbb test_program.cpp -o test_program

Run the compiled program:

./test_program

Observe the output. If NUMA detection fails, the program will print:

TBB version: 2022.0
NUMA node indexes: -1
@pavelkumbrasev
Copy link
Contributor

Hi @GuyZilberman, it seems HWLOC is missing in environment so TBBBind cannot be loaded.
Could you please set TBB_VERSION environment variable to check if TBBBind is present. You can also double check that HWLOC is available on your system - doc link (please take a look at Check HWLOC* on the System part).

@GuyZilberman
Copy link
Author

Hi @pavelkumbrasev,

Thank you for your response. I followed your suggestion to check the environment for TBBBIND and HWLOC. Here’s what I’ve found:

I set the TBB_VERSION environment variable and reran the program. The output confirmed that TBBBIND is unavailable, as shown below:

pliops@lab3010:~/guyzi/gpu_kv_api$ export TBB_VERSION=1
pliops@lab3010:~/guyzi/gpu_kv_api$ ./test_program 
TBBmalloc: SPECIFICATION VERSION        1.0
TBBmalloc: VERSION              2022.0.0
TBBmalloc: INTERFACE VERSION    12140
TBBmalloc: TBB_USE_DEBUG        0
TBBmalloc: TBB_USE_ASSERT       0
TBBmalloc: huge pages   not requested
TBB version: 2022.0
oneTBB: SPECIFICATION VERSION   1.0
oneTBB: VERSION         2022.0.0
oneTBB: INTERFACE VERSION       12140
oneTBB: TBB_USE_DEBUG   0
oneTBB: TBB_USE_ASSERT  0
oneTBB: TOOLS SUPPORT   disabled
oneTBB: TBBBIND UNAVAILABLE
NUMA node indexes: -1 

I verified HWLOC on my system using hwloc-ls. It successfully identifies the NUMA nodes and CPU mappings:

Machine (754GB total)
  Package L#0
    NUMANode L#0 (P#0 376GB)
      ...
  Package L#1
    NUMANode L#1 (P#1 378GB)
      ...

Could you advise on what further steps I should take to resolve this issue?

@pavelkumbrasev
Copy link
Contributor

What HWLOC version do you have? You can also add path to HWLOC libs into LD_LIBRARY_PATH that should help.

@GuyZilberman
Copy link
Author

GuyZilberman commented Jan 2, 2025

HWLOC version:

pliops@lab3010:~/guyzi/gpu_kv_api$ hwloc-ls --version
hwloc-ls 2.1.0

I tried this:

pliops@lab3010:~/guyzi/gpu_kv_api$ ldconfig -p | grep hwloc
        libhwloc.so.15 (libc6,x86-64) => /lib/x86_64-linux-gnu/libhwloc.so.15
pliops@lab3010:~/guyzi/gpu_kv_api$ export LD_LIBRARY_PATH=/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH

However it also didn't work unfortunately... Would love to hear more suggestions

@pavelkumbrasev
Copy link
Contributor

Hmm, at this point I'm not entirely sure what is wrong. Can you try to update HWLOC version to 2.5 or higher?

@GuyZilberman
Copy link
Author

I updated to:

hwloc-ls 2.11.0

But I'm still getting:

oneTBB: TBBBIND UNAVAILABLE
NUMA node indexes: -1 

Was I perhaps supposed to use any flags to enable TBBBIND when I used cmake and make to build oneTBB?

@pavelkumbrasev
Copy link
Contributor

Did you build TBB from source or downloaded from the releases?
Can you also show folder with oneTBB (list of binaries)?

@GuyZilberman
Copy link
Author

This is how I built TBB:

git clone https://github.com/oneapi-src/oneTBB.git
cd oneTBB/
git checkout v2022.0.0
mkdir build
cd build
cmake ..
make -j
sudo make install

This is my list of binaries:

pliops@lab3010:~/guyzi/oneTBB/build/gnu_9.4_cxx11_64_relwithdebinfo$ ll
total 1376360
drwxrwxr-x 2 pliops pliops    12288 Dec 29 20:52 ./
drwxrwxr-x 7 pliops pliops     4096 Dec 29 20:52 ../
-rwxrwxr-x 1 pliops pliops  4722104 Dec 29 20:50 conformance_allocators*
-rwxrwxr-x 1 pliops pliops  4070808 Dec 29 20:50 conformance_arena_constraints*
-rwxrwxr-x 1 pliops pliops  7733856 Dec 29 20:50 conformance_async_node*
-rwxrwxr-x 1 pliops pliops  4182888 Dec 29 20:50 conformance_blocked_range*
-rwxrwxr-x 1 pliops pliops  4116056 Dec 29 20:50 conformance_blocked_range2d*
-rwxrwxr-x 1 pliops pliops  4217392 Dec 29 20:50 conformance_blocked_range3d*
-rwxrwxr-x 1 pliops pliops  5200104 Dec 29 20:50 conformance_blocked_rangeNd*
-rwxrwxr-x 1 pliops pliops  4947400 Dec 29 20:50 conformance_broadcast_node*
-rwxrwxr-x 1 pliops pliops  4658568 Dec 29 20:50 conformance_buffer_node*
-rwxrwxr-x 1 pliops pliops  4295176 Dec 29 20:50 conformance_collaborative_call_once*
-rwxrwxr-x 1 pliops pliops  7460216 Dec 29 20:50 conformance_combinable*
-rwxrwxr-x 1 pliops pliops  7723424 Dec 29 20:50 conformance_composite_node*
-rwxrwxr-x 1 pliops pliops 17241416 Dec 29 20:51 conformance_concurrent_hash_map*
-rwxrwxr-x 1 pliops pliops  5066336 Dec 29 20:50 conformance_concurrent_lru_cache*
-rwxrwxr-x 1 pliops pliops 28765400 Dec 29 20:51 conformance_concurrent_map*
-rwxrwxr-x 1 pliops pliops 12086152 Dec 29 20:51 conformance_concurrent_priority_queue*
-rwxrwxr-x 1 pliops pliops 15940456 Dec 29 20:51 conformance_concurrent_queue*
-rwxrwxr-x 1 pliops pliops 24687288 Dec 29 20:51 conformance_concurrent_set*
-rwxrwxr-x 1 pliops pliops 30351184 Dec 29 20:51 conformance_concurrent_unordered_map*
-rwxrwxr-x 1 pliops pliops 27862792 Dec 29 20:51 conformance_concurrent_unordered_set*
-rwxrwxr-x 1 pliops pliops 18600840 Dec 29 20:51 conformance_concurrent_vector*
-rwxrwxr-x 1 pliops pliops  6186520 Dec 29 20:50 conformance_continue_node*
-rwxrwxr-x 1 pliops pliops 24101984 Dec 29 20:51 conformance_enumerable_thread_specific*
-rwxrwxr-x 1 pliops pliops  7210112 Dec 29 20:50 conformance_function_node*
-rwxrwxr-x 1 pliops pliops  5065024 Dec 29 20:50 conformance_global_control*
-rwxrwxr-x 1 pliops pliops  7748584 Dec 29 20:50 conformance_graph*
-rwxrwxr-x 1 pliops pliops  6299336 Dec 29 20:50 conformance_indexer_node*
-rwxrwxr-x 1 pliops pliops  5692528 Dec 29 20:50 conformance_input_node*
-rwxrwxr-x 1 pliops pliops  8521736 Dec 29 20:50 conformance_join_node*
-rwxrwxr-x 1 pliops pliops  5888000 Dec 29 20:50 conformance_limiter_node*
-rwxrwxr-x 1 pliops pliops  7411064 Dec 29 20:50 conformance_multifunction_node*
-rwxrwxr-x 1 pliops pliops  7063568 Dec 29 20:50 conformance_mutex*
-rwxrwxr-x 1 pliops pliops  5706992 Dec 29 20:50 conformance_overwrite_node*
-rwxrwxr-x 1 pliops pliops  9216464 Dec 29 20:50 conformance_parallel_for*
-rwxrwxr-x 1 pliops pliops  7403096 Dec 29 20:50 conformance_parallel_for_each*
-rwxrwxr-x 1 pliops pliops 16448728 Dec 29 20:51 conformance_parallel_invoke*
-rwxrwxr-x 1 pliops pliops  5133096 Dec 29 20:50 conformance_parallel_pipeline*
-rwxrwxr-x 1 pliops pliops  8215496 Dec 29 20:50 conformance_parallel_reduce*
-rwxrwxr-x 1 pliops pliops  4528544 Dec 29 20:50 conformance_parallel_scan*
-rwxrwxr-x 1 pliops pliops  4214952 Dec 29 20:50 conformance_parallel_sort*
-rwxrwxr-x 1 pliops pliops  4816816 Dec 29 20:50 conformance_priority_queue_node*
-rwxrwxr-x 1 pliops pliops  4667528 Dec 29 20:50 conformance_queue_node*
-rwxrwxr-x 1 pliops pliops  3622608 Dec 29 20:50 conformance_resumable_tasks*
-rwxrwxr-x 1 pliops pliops  5667904 Dec 29 20:50 conformance_sequencer_node*
-rwxrwxr-x 1 pliops pliops  5722408 Dec 29 20:50 conformance_split_node*
-rwxrwxr-x 1 pliops pliops  4475768 Dec 29 20:50 conformance_task_arena*
-rwxrwxr-x 1 pliops pliops  4562016 Dec 29 20:50 conformance_task_group*
-rwxrwxr-x 1 pliops pliops  3450480 Dec 29 20:50 conformance_task_group_context*
-rwxrwxr-x 1 pliops pliops  3644728 Dec 29 20:50 conformance_tick_count*
-rwxrwxr-x 1 pliops pliops  3427928 Dec 29 20:50 conformance_version*
-rwxrwxr-x 1 pliops pliops  5651032 Dec 29 20:50 conformance_write_once_node*
-rwxrwxr-x 1 pliops pliops    54816 Dec 29 20:50 lib_test_malloc_atexit.so*
-rwxrwxr-x 1 pliops pliops    50640 Dec 29 20:50 lib_test_malloc_lib_unload.so*
-rwxrwxr-x 1 pliops pliops    52928 Dec 29 20:50 lib_test_malloc_used_by_lib.so*
lrwxrwxrwx 1 pliops pliops       12 Dec 29 20:50 libtbb.so -> libtbb.so.12*
lrwxrwxrwx 1 pliops pliops       15 Dec 29 20:50 libtbb.so.12 -> libtbb.so.12.14*
-rwxrwxr-x 1 pliops pliops  6204584 Dec 29 20:50 libtbb.so.12.14*
lrwxrwxrwx 1 pliops pliops       17 Dec 29 20:50 libtbbmalloc.so -> libtbbmalloc.so.2*
lrwxrwxrwx 1 pliops pliops       20 Dec 29 20:50 libtbbmalloc.so.2 -> libtbbmalloc.so.2.14*
-rwxrwxr-x 1 pliops pliops  1447128 Dec 29 20:50 libtbbmalloc.so.2.14*
lrwxrwxrwx 1 pliops pliops       23 Dec 29 20:50 libtbbmalloc_proxy.so -> libtbbmalloc_proxy.so.2*
lrwxrwxrwx 1 pliops pliops       26 Dec 29 20:50 libtbbmalloc_proxy.so.2 -> libtbbmalloc_proxy.so.2.14*
-rwxrwxr-x 1 pliops pliops   112496 Dec 29 20:50 libtbbmalloc_proxy.so.2.14*
-rwxrwxr-x 1 pliops pliops  4937536 Dec 29 20:50 test_allocators*
-rwxrwxr-x 1 pliops pliops  4243392 Dec 29 20:50 test_arena_constraints*
-rwxrwxr-x 1 pliops pliops  4664256 Dec 29 20:50 test_arena_priorities*
-rwxrwxr-x 1 pliops pliops 12373184 Dec 29 20:51 test_async_node*
-rwxrwxr-x 1 pliops pliops  3719568 Dec 29 20:50 test_blocked_range*
-rwxrwxr-x 1 pliops pliops  5245656 Dec 29 20:50 test_broadcast_node*
-rwxrwxr-x 1 pliops pliops  6571912 Dec 29 20:50 test_buffer_node*
-rwxrwxr-x 1 pliops pliops  5600512 Dec 29 20:50 test_collaborative_call_once*
-rwxrwxr-x 1 pliops pliops  7077464 Dec 29 20:50 test_composite_node*
-rwxrwxr-x 1 pliops pliops 32853680 Dec 29 20:51 test_concurrent_hash_map*
-rwxrwxr-x 1 pliops pliops  5339432 Dec 29 20:50 test_concurrent_lru_cache*
-rwxrwxr-x 1 pliops pliops 42835080 Dec 29 20:52 test_concurrent_map*
-rwxrwxr-x 1 pliops pliops  3714888 Dec 29 20:50 test_concurrent_monitor*
-rwxrwxr-x 1 pliops pliops  6979536 Dec 29 20:50 test_concurrent_priority_queue*
-rwxrwxr-x 1 pliops pliops  6137424 Dec 29 20:50 test_concurrent_queue*
-rwxrwxr-x 1 pliops pliops  3999760 Dec 29 20:50 test_concurrent_queue_whitebox*
-rwxrwxr-x 1 pliops pliops 40444408 Dec 29 20:52 test_concurrent_set*
-rwxrwxr-x 1 pliops pliops 51391320 Dec 29 20:52 test_concurrent_unordered_map*
-rwxrwxr-x 1 pliops pliops 46805376 Dec 29 20:52 test_concurrent_unordered_set*
-rwxrwxr-x 1 pliops pliops 13177656 Dec 29 20:51 test_concurrent_vector*
-rwxrwxr-x 1 pliops pliops  5838120 Dec 29 20:50 test_continue_node*
-rwxrwxr-x 1 pliops pliops  3719232 Dec 29 20:50 test_dynamic_link*
-rwxrwxr-x 1 pliops pliops 13133896 Dec 29 20:51 test_eh_algorithms*
-rwxrwxr-x 1 pliops pliops 58052496 Dec 29 20:52 test_eh_flow_graph*
-rwxrwxr-x 1 pliops pliops  3741264 Dec 29 20:50 test_eh_thread*
-rwxrwxr-x 1 pliops pliops  6179528 Dec 29 20:50 test_enumerable_thread_specific*
-rwxrwxr-x 1 pliops pliops  4486504 Dec 29 20:50 test_environment_whitebox*
-rwxrwxr-x 1 pliops pliops  5733896 Dec 29 20:50 test_flow_graph*
-rwxrwxr-x 1 pliops pliops  8212320 Dec 29 20:50 test_flow_graph_priorities*
-rwxrwxr-x 1 pliops pliops 12457176 Dec 29 20:51 test_flow_graph_whitebox*
-rwxrwxr-x 1 pliops pliops 11380312 Dec 29 20:51 test_function_node*
-rwxrwxr-x 1 pliops pliops  4565312 Dec 29 20:50 test_global_control*
-rwxrwxr-x 1 pliops pliops  3477272 Dec 29 20:50 test_handle_perror*
-rwxrwxr-x 1 pliops pliops  5212264 Dec 29 20:50 test_hw_concurrency*
-rwxrwxr-x 1 pliops pliops 15072704 Dec 29 20:51 test_indexer_node*
-rwxrwxr-x 1 pliops pliops  5183472 Dec 29 20:50 test_input_node*
-rwxrwxr-x 1 pliops pliops  5284216 Dec 29 20:50 test_intrusive_list*
-rwxrwxr-x 1 pliops pliops 47907784 Dec 29 20:52 test_join_node*
-rwxrwxr-x 1 pliops pliops 18369224 Dec 29 20:51 test_join_node_key_matching*
-rwxrwxr-x 1 pliops pliops 48618896 Dec 29 20:51 test_join_node_key_matching_n_args*
-rwxrwxr-x 1 pliops pliops 18996944 Dec 29 20:51 test_join_node_msg_key_matching*
-rwxrwxr-x 1 pliops pliops 51757640 Dec 29 20:51 test_join_node_msg_key_matching_n_args*
-rwxrwxr-x 1 pliops pliops 15951568 Dec 29 20:51 test_join_node_preview*
-rwxrwxr-x 1 pliops pliops  6449376 Dec 29 20:50 test_limiter_node*
-rwxrwxr-x 1 pliops pliops  3371816 Dec 29 20:50 test_malloc_atexit*
-rwxrwxr-x 1 pliops pliops  5170352 Dec 29 20:50 test_malloc_compliance*
-rwxrwxr-x 1 pliops pliops  3760936 Dec 29 20:50 test_malloc_init_shutdown*
-rwxrwxr-x 1 pliops pliops  3915904 Dec 29 20:50 test_malloc_lib_unload*
-rwxrwxr-x 1 pliops pliops  3679144 Dec 29 20:50 test_malloc_new_handler*
-rwxrwxr-x 1 pliops pliops  4027304 Dec 29 20:50 test_malloc_overload*
-rwxrwxr-x 1 pliops pliops  3354584 Dec 29 20:50 test_malloc_overload_disable*
-rwxrwxr-x 1 pliops pliops  5587968 Dec 29 20:50 test_malloc_pools*
-rwxrwxr-x 1 pliops pliops    23344 Dec 29 20:50 test_malloc_pure_c*
-rwxrwxr-x 1 pliops pliops  4001584 Dec 29 20:50 test_malloc_regression*
-rwxrwxr-x 1 pliops pliops  3531800 Dec 29 20:50 test_malloc_shutdown_hang*
-rwxrwxr-x 1 pliops pliops  3888616 Dec 29 20:50 test_malloc_used_by_lib*
-rwxrwxr-x 1 pliops pliops  6804720 Dec 29 20:50 test_malloc_whitebox*
-rwxrwxr-x 1 pliops pliops 13831880 Dec 29 20:51 test_multifunction_node*
-rwxrwxr-x 1 pliops pliops  6100040 Dec 29 20:50 test_mutex*
-rwxrwxr-x 1 pliops pliops  3872008 Dec 29 20:50 test_openmp*
-rwxrwxr-x 1 pliops pliops  6257488 Dec 29 20:50 test_overwrite_node*
-rwxrwxr-x 1 pliops pliops  6368808 Dec 29 20:50 test_parallel_for*
-rwxrwxr-x 1 pliops pliops 10542456 Dec 29 20:50 test_parallel_for_each*
-rwxrwxr-x 1 pliops pliops  5340320 Dec 29 20:50 test_parallel_invoke*
-rwxrwxr-x 1 pliops pliops 29571616 Dec 29 20:51 test_parallel_pipeline*
-rwxrwxr-x 1 pliops pliops  7570336 Dec 29 20:50 test_parallel_reduce*
-rwxrwxr-x 1 pliops pliops  5647744 Dec 29 20:50 test_parallel_scan*
-rwxrwxr-x 1 pliops pliops  7299152 Dec 29 20:50 test_parallel_sort*
-rwxrwxr-x 1 pliops pliops  4286472 Dec 29 20:50 test_partitioner*
-rwxrwxr-x 1 pliops pliops  7535368 Dec 29 20:50 test_priority_queue_node*
-rwxrwxr-x 1 pliops pliops  7605360 Dec 29 20:50 test_profiling*
-rwxrwxr-x 1 pliops pliops  8395992 Dec 29 20:51 test_queue_node*
-rwxrwxr-x 1 pliops pliops  5154632 Dec 29 20:50 test_resumable_tasks*
-rwxrwxr-x 1 pliops pliops  6422456 Dec 29 20:50 test_scalable_allocator*
-rwxrwxr-x 1 pliops pliops  4583576 Dec 29 20:50 test_scheduler_mix*
-rwxrwxr-x 1 pliops pliops  4091912 Dec 29 20:50 test_semaphore*
-rwxrwxr-x 1 pliops pliops  5254408 Dec 29 20:50 test_sequencer_node*
-rwxrwxr-x 1 pliops pliops 12557160 Dec 29 20:51 test_split_node*
-rwxrwxr-x 1 pliops pliops  6188248 Dec 29 20:50 test_tagged_msg*
-rwxrwxr-x 1 pliops pliops  6860232 Dec 29 20:50 test_task*
-rwxrwxr-x 1 pliops pliops 10552936 Dec 29 20:51 test_task_arena*
-rwxrwxr-x 1 pliops pliops  6221920 Dec 29 20:50 test_task_group*
-rwxrwxr-x 1 pliops pliops  3644072 Dec 29 20:50 test_tbb_fork*
-rwxrwxr-x 1 pliops pliops  4700368 Dec 29 20:50 test_tbb_header*
-rwxrwxr-x 1 pliops pliops  3757568 Dec 29 20:50 test_tick_count*
-rwxrwxr-x 1 pliops pliops  6262392 Dec 29 20:50 test_write_once_node*
-rw-rw-r-- 1 pliops pliops     1073 Dec 29 20:50 vars.sh

@GuyZilberman
Copy link
Author

It seems I used the wrong installation commands earlier. I uninstalled the previous installation and reinstalled following the example provided here. This time, it worked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants