MIG deployment of triton cause "CacheManager Init Failed. Error: -17" #7906

LSC527 · 2024-12-25T11:02:15Z

Description
Same deployment but with different GPU (w/ or w/o MIG). DCGM unable to star when w/ MIG:

CacheManager Init Failed. Error: -17
W1225 10:48:27.718944 4706 metrics.cc:811] "DCGM unable to start: DCGM initialization error"

Similar to #3506 but not caused by inefficient memory.
Triton Information
nvcr.io/nvidia/tritonserver:24.11-py3

To Reproduce
GPUs w/ MIG

sudo docker run -it --rm --network=host --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -e NVIDIA_VISIBLE_DEVICES=0:0 nvcr.io/nvidia/tritonserver
tritonserver --model-repository {my_model_path}

outputs:

I1225 10:48:25.952289 4706 pinned_memory_manager.cc:277] "Pinned memory pool is created at '0x7f24b8000000' with size 268435456"
I1225 10:48:25.954209 4706 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864"
I1225 10:48:25.958281 4706 model_lifecycle.cc:473] "loading: onnx:1"
I1225 10:48:25.960593 4706 onnxruntime.cc:2875] "TRITONBACKEND_Initialize: onnxruntime"
I1225 10:48:25.960634 4706 onnxruntime.cc:2885] "Triton TRITONBACKEND API version: 1.19"
I1225 10:48:25.960657 4706 onnxruntime.cc:2891] "'onnxruntime' TRITONBACKEND API version: 1.19"
I1225 10:48:25.960665 4706 onnxruntime.cc:2921] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}"
I1225 10:48:25.977518 4706 onnxruntime.cc:2986] "TRITONBACKEND_ModelInitialize: onnx (version 1)"
I1225 10:48:25.978169 4706 onnxruntime.cc:984] "skipping model configuration auto-complete for 'onnx': inputs and outputs already specified"
I1225 10:48:25.978790 4706 onnxruntime.cc:3051] "TRITONBACKEND_ModelInstanceInitialize: onnx_0_0 (GPU device 0)"
I1225 10:48:27.703699 4706 model_lifecycle.cc:849] "successfully loaded 'onnx'"
I1225 10:48:27.703793 4706 server.cc:604]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I1225 10:48:27.703839 4706 server.cc:631]
+-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+
| Backend     | Path                                                            | Config                                                                                                                 |
+-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6 |
|             |                                                                 | .000000","default-max-batch-size":"4"}}                                                                                |
+-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+

I1225 10:48:27.703886 4706 server.cc:674]
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
| onnx  | 1       | READY  |
+-------+---------+--------+

CacheManager Init Failed. Error: -17
W1225 10:48:27.718944 4706 metrics.cc:811] "DCGM unable to start: DCGM initialization error"
I1225 10:48:27.719361 4706 metrics.cc:783] "Collecting CPU metrics"
I1225 10:48:27.719448 4706 tritonserver.cc:2598]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                               |
| server_version                   | 2.52.0                                                                                                                                                               |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tens |
|                                  | or_data parameters statistics trace logging                                                                                                                          |
| model_repository_path[0]         | {my_model_path}                                                                                                 |
| model_control_mode               | MODE_NONE                                                                                                                                                            |
| strict_model_config              | 0                                                                                                                                                                    |
| model_config_name                |                                                                                                                                                                      |
| rate_limit                       | OFF                                                                                                                                                                  |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                            |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                             |
| min_supported_compute_capability | 6.0                                                                                                                                                                  |
| strict_readiness                 | 1                                                                                                                                                                    |
| exit_timeout                     | 30                                                                                                                                                                   |
| cache_enabled                    | 0                                                                                                                                                                    |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1225 10:48:27.723652 4706 grpc_server.cc:2558] "Started GRPCInferenceService at 0.0.0.0:8001"
I1225 10:48:27.723879 4706 http_server.cc:4729] "Started HTTPService at 0.0.0.0:8000"
I1225 10:48:27.764810 4706 http_server.cc:362] "Started Metrics Service at 0.0.0.0:8002"

GPUs w/o MIG

sudo docker run -it --rm --network=host --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -e NVIDIA_VISIBLE_DEVICES=0 nvcr.io/nvidia/tritonserver
tritonserver --model-repository {my_model_path}

outputs:

I1225 10:41:12.658976 138 pinned_memory_manager.cc:277] "Pinned memory pool is created at '0x7f0058000000' with size 268435456"
I1225 10:41:12.661708 138 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864"
I1225 10:41:12.667006 138 model_lifecycle.cc:473] "loading: onnx:1"
I1225 10:41:12.671093 138 onnxruntime.cc:2875] "TRITONBACKEND_Initialize: onnxruntime"
I1225 10:41:12.671117 138 onnxruntime.cc:2885] "Triton TRITONBACKEND API version: 1.19"
I1225 10:41:12.671123 138 onnxruntime.cc:2891] "'onnxruntime' TRITONBACKEND API version: 1.19"
I1225 10:41:12.671127 138 onnxruntime.cc:2921] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}"
I1225 10:41:12.688318 138 onnxruntime.cc:2986] "TRITONBACKEND_ModelInitialize: onnx (version 1)"
I1225 10:41:12.688871 138 onnxruntime.cc:984] "skipping model configuration auto-complete for 'onnx': inputs and outputs already specified"
I1225 10:41:12.689461 138 onnxruntime.cc:3051] "TRITONBACKEND_ModelInstanceInitialize: onnx_0_0 (GPU device 0)"
I1225 10:41:14.331226 138 model_lifecycle.cc:849] "successfully loaded 'onnx'"
I1225 10:41:14.331320 138 server.cc:604]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I1225 10:41:14.331363 138 server.cc:631]
+-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+
| Backend     | Path                                                            | Config                                                                                                                 |
+-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6 |
|             |                                                                 | .000000","default-max-batch-size":"4"}}                                                                                |
+-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+

I1225 10:41:14.331410 138 server.cc:674]
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
| onnx  | 1       | READY  |
+-------+---------+--------+

I1225 10:41:14.357465 138 metrics.cc:890] "Collecting metrics for GPU 0: NVIDIA A30"
I1225 10:41:14.365078 138 metrics.cc:783] "Collecting CPU metrics"
I1225 10:41:14.365165 138 tritonserver.cc:2598]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                               |
| server_version                   | 2.52.0                                                                                                                                                               |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tens |
|                                  | or_data parameters statistics trace logging                                                                                                                          |
| model_repository_path[0]         | {my_model_path}                                                                                                 |
| model_control_mode               | MODE_NONE                                                                                                                                                            |
| strict_model_config              | 0                                                                                                                                                                    |
| model_config_name                |                                                                                                                                                                      |
| rate_limit                       | OFF                                                                                                                                                                  |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                            |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                             |
| min_supported_compute_capability | 6.0                                                                                                                                                                  |
| strict_readiness                 | 1                                                                                                                                                                    |
| exit_timeout                     | 30                                                                                                                                                                   |
| cache_enabled                    | 0                                                                                                                                                                    |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1225 10:41:14.369295 138 grpc_server.cc:2558] "Started GRPCInferenceService at 0.0.0.0:8001"
I1225 10:41:14.369542 138 http_server.cc:4729] "Started HTTPService at 0.0.0.0:8000"
I1225 10:41:14.410425 138 http_server.cc:362] "Started Metrics Service at 0.0.0.0:8002"

Expected behavior
No DCGM error when w/ MIG

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MIG deployment of triton cause "CacheManager Init Failed. Error: -17" #7906

MIG deployment of triton cause "CacheManager Init Failed. Error: -17" #7906

LSC527 commented Dec 25, 2024

MIG deployment of triton cause "CacheManager Init Failed. Error: -17" #7906

MIG deployment of triton cause "CacheManager Init Failed. Error: -17" #7906

Comments

LSC527 commented Dec 25, 2024