Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v4.5.0 is not compatible with torch>=2.*.*+cu121 #1806

Open
MahmoudAshraf97 opened this issue Oct 24, 2024 · 25 comments
Open

v4.5.0 is not compatible with torch>=2.*.*+cu121 #1806

MahmoudAshraf97 opened this issue Oct 24, 2024 · 25 comments
Labels
enhancement New feature or request

Comments

@MahmoudAshraf97
Copy link
Contributor

MahmoudAshraf97 commented Oct 24, 2024

Hello
the last release works great with versions of pytorch that use python cuda packages, but when torch that is precompiled with cuda binaries is installed, this error appears:

Unable to load any of {libcudnn_cnn.so.9.1.0, libcudnn_cnn.so.9.1, libcudnn_cnn.so.9, libcudnn_cnn.so}
Invalid handle. Cannot load symbol cudnnCreateConvolutionDescriptor

the only solution so far is to downgrade to 4.4.0 although CuDNN v9.1 is installed both using pip and bundled with pytorch
jhj0517/Whisper-WebUI#348

Update:
as per @BBC-Esq research, ctranslate2>=4.5.0 uses CuDNN v9 which requires CUDA >= 12.3.
Since most issues occur from a conflicting torch and ctranslate2 installations these are tested working combinations:

Torch Version CT2 Version
2.*.*+cu121 <=4.4.0
2.*.*+cu124 >=4.5.0
>=2.4.0 >=4.5.0
<2.4.0 <4.5.0

For google colab users, the quick solution is to downgrade to 4.4.0 as of 24/10/2024 as colab uses torch==2.5.0+cu12.1

@MahmoudAshraf97 MahmoudAshraf97 changed the title v4.5.0 is not compatible with torch==2.5.0+cu121 v4.5.0 is not compatible with torch>=2.4.0+cu121 Oct 24, 2024
@minhthuc2502 minhthuc2502 added the enhancement New feature or request label Oct 24, 2024
@MahmoudAshraf97
Copy link
Contributor Author

Update: it's compatible with torch==2.*+cu124 so it's only incompatible with 12.1, I'll open a PR to solve this

@BBC-Esq
Copy link

BBC-Esq commented Oct 24, 2024

Thanks to me and my analysis here...I want full citation and credit please: ;-)

SYSTRAN/faster-whisper#1082 (comment)

@MahmoudAshraf97 let me know if you need to know how I prefer citations to my contributions made please. Thanks.

@MahmoudAshraf97 MahmoudAshraf97 changed the title v4.5.0 is not compatible with torch>=2.4.0+cu121 v4.5.0 is not compatible with torch>=2.*.*+cu121 Oct 24, 2024
@MarkusGehrig
Copy link

MarkusGehrig commented Oct 28, 2024

@BBC-Esq

Thank you for the list of compatible versions. I have the following versions installed in a Docker container, but I still get the cuDNN error. Am I missing something?

OS:
Debian GNU/Linux 12 (bookworm)

nvidia-smi:
NVIDIA RTX 4500 Ada Gene...
NVIDIA-SMI 560.35.03
Driver Version: 560.35.03
CUDA Version: 12.6

Name: torch
Version: 2.5.0

Name: ctranslate2
Version: 4.5.0

Name: nvidia-cudnn-cu12
Version: 9.1.0.70

Name: nvidia-cuda-cupti-cu12
Version: 12.4.127

Name: nvidia-cublas-cu12
Version: 12.4.5.8

Name: nvidia-cuda-runtime-cu12
Version: 12.4.127

@MahmoudAshraf97
Copy link
Contributor Author

which cudnn error are you getting exactly?

@MarkusGehrig
Copy link

@MahmoudAshraf97
Exactly the same as in #1806 (comment)

@minhthuc2502
Copy link
Collaborator

minhthuc2502 commented Oct 28, 2024

@MarkusGehrig Can you try ldconfig -p | grep cudnn_cnn to verify the location of cudnn_cnn lib?

If you found the lib, you can check if this lib is in the runtime search path. If not, try to add it into LD_LIBRARY_PATH and see what happens.

@MarkusGehrig
Copy link

Thanks for the quick response. But I still get the error.

Unable to load any of {libcudnn_cnn.so.9.1.0, libcudnn_cnn.so.9.1, libcudnn_cnn.so.9, libcudnn_cnn.so}
Invalid handle. Cannot load symbol cudnnCreateConvolutionDescriptor
Aborted (core dumped)

ldconfig -p | grep cudnn_cnn
libcudnn_cnn.so.9 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_cnn.so.9
libcudnn_cnn.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_cnn.so

printenv LD_LIBRARY_PATH
/lib/x86_64-linux-gnu/libcudnn_cnn.so.9

I also have tried to install it with pip, with no change.

@minhthuc2502
Copy link
Collaborator

minhthuc2502 commented Oct 28, 2024

Try to set only LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib/x86_64-linux-gnu/.

And what is the result of this: nm -gD /lib/x86_64-linux-gnu/libcudnn_cnn.so | grep cudnnCreateConvolutionDescriptor?

@MarkusGehrig
Copy link

@minhthuc2502

It works like a charm now. It was a path that needed to be set to LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib/x86_64-linux-gnu/

Thanks!

@BBC-Esq
Copy link

BBC-Esq commented Oct 28, 2024

For everybody's future reference, to investigate possible compatibility issues with other CUDA libraries or platforms in general, you can go here. In the lower left you can select the cuDNN version and it will bring up a page that explains all the nuances:

https://docs.nvidia.com/deeplearning/cudnn/v9.1.1/reference/support-matrix.html

This might help troubleshooting based on platform, etc.

Also, if you need to correlate what version of cuDNN you have you can go to either of these places:

https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/
https://pypi.org/project/nvidia-cudnn-cu12/#history

@BBC-Esq
Copy link

BBC-Esq commented Oct 28, 2024

I noticed that this still says it's using CUDA 12.2 while at the same time using cuDNN 9.1? Prior to 4.5.0, it was cuDNN 8.8.0 and CUDA 12.2?

https://github.com/OpenNMT/CTranslate2/blob/master/python/tools/prepare_build_environment_windows.sh

I'm guessing that's because CUDA 12.2 is compatible with cuDNN 8.8.0 still? However, Ctranslate2 4.5.0+ requires cuDNN 9.1 and is only compatible with CUDA 12.4. (assuming you're also using torch, which only has builds for 12.1 and 12.4).

Not sure if it's redundant because I already posted on faster-whisper, but here is the compatibility outline just FYI:

Ctranslate2 3.24.0 - last to use cuDNN 8.1.1 with CUDA 11.2.2 by default
Ctranslate2 4.0.0 - first to use cuDNN 8.8.0 with CUDA 12.2 by default
Ctranslate2 4.5.0 - first to use cuDNN 9.1 with CUDA 12.2 by default

torch 2.5.0 - supports CUDA 11.8, 12.1, and 12.4
torch 2.4.1 - supports CUDA 11.8, 12.1, and 12.4
torch 2.4.0 - supports CUDA 11.8, 12.1, and 12.4
torch 2.3.1 - supports CUDA 11.8 and 12.1
torch 2.2.2 - supports CUDA 11.8 and 12.1

cuDNN 8.9.7 supports CUDA 11 through 12.2
cuDNN 9.0.0 - " " through 12.3
cuDNN 9.1.0 - " " through 12.4
cuDNN 9.1.1 - " " through 12.4
cuDNN 9.2.0 - " " through 12.5
cuDNN 9.2.1 - " " through 12.5
cuDNN 9.3.0 - " " through 12.6
cuDNN 9.4.0 - " " through 12.6
cuDNN 9.5.0 - " " through 12.6

CORRECTION:

Beginning with cuDNN 9.1.0 I believe that cuDNN in general is forward compatible with CUDA 12.x Here are my sources:

https://docs.nvidia.com/deeplearning/cudnn/v9.1.0/release-notes.html#cudnn-9-1-0

  • "When using the cuDNN static library, you had to use the same major.minor version of the CUDA Toolkit by which cuDNN was built to build your application. This restriction has been lifted for the 12.x build, which now allows you to use any minor version of CUDA Toolkit 12. The 11.x build still has this restriction, which is documented in the cuDNN Support Matrix."

https://docs.nvidia.com/deeplearning/cudnn/v9.1.0/reference/support-matrix.html

  • "The cuDNN build for CUDA 12.x is compatible with CUDA 12.x for all x, including future CUDA 12.x releases that ship after this cuDNN release. This applies to both the dynamic and static builds of cuDNN. The cuDNN build for CUDA 11.x is compatible with CUDA 11.x for all x, but only in the dynamic case. The static build of cuDNN for 11.x must be linked with CUDA 11.8, as denoted in the table above."

Regardless...right now...unless you're using ctranslate2 without torch, you're essentially limited to using CUDA 12.4 since torch doesn't make wheels for higher than 12.4 yet. In other words, even though cuDNN might be compatible with future version of CUDA 12.x, it's my understanding that torch specifically names their wheels as only being compatible with a specific CUDA version - e.g. cu121 or cu124.

@minhthuc2502
Copy link
Collaborator

minhthuc2502 commented Oct 28, 2024

For the same case with google colab users . @MahmoudAshraf97 Could you check if with torch==2.5.0+cu12.1, you can find this lib at /usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib/libcudnn_cnn.so.9? You can try to add to LD_LIBRARY_PATH with this path /usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib/

FYI, cudnn 9 should be compatible with all cuda version, see here. It's weird that it does not work with some torch's versions . In my opinion, it could be the problem of finding path.

@MahmoudAshraf97
Copy link
Contributor Author

apparently there are two cudnn versions installed in colab

ldconfig -p  | grep cudnn_cnn

libcudnn_cnn_train.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8
libcudnn_cnn_train.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_cnn_train.so
libcudnn_cnn_infer.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8
libcudnn_cnn_infer.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_cnn_infer.so
pip show nvidia-cudnn-cu12

Name: nvidia-cudnn-cu12
Version: 9.5.0.50
Summary: cuDNN runtime libraries
Home-page: https://developer.nvidia.com/cuda-zone
Author: Nvidia CUDA Installer Team
Author-email: [[email protected]](mailto:[email protected])
License: NVIDIA Proprietary Software
Location: /usr/local/lib/python3.10/dist-packages
Requires: nvidia-cublas-cu12
Required-by:

adding the pip cudnn path to LD_LIBRARY_PATH will probably solve the problem
torch is using the v9 pip installation and ignoring the v8 while ctranslate2 is looking for v9 in the v8 installation path
probably the best way forward is to make CT2 compatible with pip CUDA libraries without needing to modify the path

@MahmoudAshraf97
Copy link
Contributor Author

@BBC-Esq the problem here is not a cuda vs cudnn compatibility because as per the compatibility matrix we should not be having this issue, it's an environment setup issue that might involve a mismatch between cuda/cudnn pathes
CT2 should support all cuda 12 versions
CT2 >=4.5 should support all cudnn 9 versions along with cuda 12, any difference in the minor or patch versions should not break compatibility

@BBC-Esq
Copy link

BBC-Esq commented Oct 28, 2024

@BBC-Esq the problem here is not a cuda vs cudnn compatibility because as per the compatibility matrix we should not be having this issue, it's an environment setup issue that might involve a mismatch between cuda/cudnn pathes CT2 should support all cuda 12 versions CT2 >=4.5 should support all cudnn 9 versions along with cuda 12, any difference in the minor or patch versions should not break compatibility

I partially understand I think...Will ponder further...In the meantime...you mentioned possibly using the pip CUDA libraries...

I spent the time to aggregate the versions of ALL CUDA-RELATED LIBRARIES on pypi.org by Nvidia release. Here are the versions of all such libraries organized by CUDA Toolkit official Release Number:

LONG LIST HERE
CUDA 12.6.2:

pip install nvidia-cuda-runtime-cu12==12.6.77
pip install nvidia-cublas-cu12==12.6.3.3
pip install nvidia-cuda-cupti-cu12==12.6.80
pip install nvidia-cuda-nvcc-cu12==12.6.77
pip install nvidia-cuda-nvrtc-cu12==12.6.77
pip install nvidia-cuda-sanitizer-api-cu12==12.6.77
pip install nvidia-cufft-cu12==11.3.0.4
pip install nvidia-curand-cu12==10.3.7.77
pip install nvidia-cusolver-cu12==11.7.1.2
pip install nvidia-cusparse-cu12==12.5.4.2
pip install nvidia-cuda-cuxxfilt-cu12==12.6.77
pip install nvidia-npp-cu12==12.3.1.54
pip install nvidia-nvfatbin-cu12==12.6.77
pip install nvidia-nvjitlink-cu12==12.6.77
pip install nvidia-nvjpeg-cu12==12.3.3.54
pip install nvidia-nvml-dev-cu12==12.6.77
pip install nvidia-nvtx-cu12==12.6.77
pip install nvidia-cuda-opencl-cu12==12.6.77

CUDA 12.6.1:

pip install nvidia-cuda-runtime-cu12==12.6.68
pip install nvidia-cublas-cu12==12.6.1.4
pip install nvidia-cuda-cupti-cu12==12.6.68
pip install nvidia-cuda-nvcc-cu12==12.6.68
pip install nvidia-cuda-nvrtc-cu12==12.6.68
pip install nvidia-cuda-sanitizer-api-cu12==12.6.68
pip install nvidia-cufft-cu12==11.2.6.59
pip install nvidia-curand-cu12==10.3.7.68
pip install nvidia-cusolver-cu12==11.6.4.69
pip install nvidia-cusparse-cu12==12.5.3.3
pip install nvidia-cuda-cuxxfilt-cu12==12.6.68
pip install nvidia-npp-cu12==12.3.1.54
pip install nvidia-nvfatbin-cu12==12.6.68
pip install nvidia-nvjitlink-cu12==12.6.68
pip install nvidia-nvjpeg-cu12==12.3.3.54
pip install nvidia-nvml-dev-cu12==12.6.68
pip install nvidia-nvtx-cu12==12.6.68
pip install nvidia-cuda-opencl-cu12==12.6.68

CUDA 12.6.0:

pip install nvidia-cuda-runtime-cu12==12.6.37
pip install nvidia-cublas-cu12==12.6.0.22
pip install nvidia-cuda-cupti-cu12==12.6.37
pip install nvidia-cuda-nvcc-cu12==12.6.20
pip install nvidia-cuda-nvrtc-cu12==12.6.20
pip install nvidia-cuda-sanitizer-api-cu12==12.6.34
pip install nvidia-cufft-cu12==11.2.6.28
pip install nvidia-curand-cu12==10.3.7.37
pip install nvidia-cusolver-cu12==11.6.4.38
pip install nvidia-cusparse-cu12==12.5.2.23
pip install nvidia-cuda-cuxxfilt-cu12==12.6.20
pip install nvidia-npp-cu12==12.3.1.23
pip install nvidia-nvfatbin-cu12==12.6.20
pip install nvidia-nvjitlink-cu12==12.6.20
pip install nvidia-nvjpeg-cu12==12.3.3.23
pip install nvidia-nvml-dev-cu12==12.6.37
pip install nvidia-nvtx-cu12==12.6.37
pip install nvidia-cuda-opencl-cu12==12.6.37

CUDA 12.5.1:

pip install nvidia-cuda-runtime-cu12==12.5.82
pip install nvidia-cublas-cu12==12.5.3.2
pip install nvidia-cuda-cupti-cu12==12.5.82
pip install nvidia-cuda-nvcc-cu12==12.5.82
pip install nvidia-cuda-nvrtc-cu12==12.5.82
pip install nvidia-cuda-sanitizer-api-cu12==12.5.81
pip install nvidia-cufft-cu12==11.2.3.61
pip install nvidia-curand-cu12==10.3.6.82
pip install nvidia-cusolver-cu12==11.6.3.83
pip install nvidia-cusparse-cu12==12.5.1.3
pip install nvidia-cuda-cuxxfilt-cu12==12.5.82
pip install nvidia-npp-cu12==12.3.0.159
pip install nvidia-nvfatbin-cu12==12.5.82
pip install nvidia-nvjitlink-cu12==12.5.82
pip install nvidia-nvjpeg-cu12==12.3.2.81
pip install nvidia-nvml-dev-cu12==12.5.82
pip install nvidia-nvtx-cu12==12.5.82
pip install nvidia-cuda-opencl-cu12==12.5.39

CUDA 12.5.0:

pip install nvidia-cuda-runtime-cu12==12.5.39
pip install nvidia-cublas-cu12==12.5.2.13
pip install nvidia-cuda-cupti-cu12==12.5.39
pip install nvidia-cuda-nvcc-cu12==12.5.40
pip install nvidia-cuda-nvrtc-cu12==12.5.40
pip install nvidia-cuda-sanitizer-api-cu12==12.5.39
pip install nvidia-cufft-cu12==11.2.3.18
pip install nvidia-curand-cu12==10.3.6.39
pip install nvidia-cusolver-cu12==11.6.2.40
pip install nvidia-cusparse-cu12==12.4.1.24
pip install nvidia-cuda-cuxxfilt-cu12==12.5.39
pip install nvidia-npp-cu12==12.3.0.116
pip install nvidia-nvfatbin-cu12==12.5.39
pip install nvidia-nvjitlink-cu12==12.5.40
pip install nvidia-nvjpeg-cu12==12.3.2.38
pip install nvidia-nvml-dev-cu12==12.5.39
pip install nvidia-nvtx-cu12==12.5.39
pip install nvidia-cuda-opencl-cu12==12.5.39

CUDA 12.4.1:

pip install nvidia-cuda-runtime-cu12==12.4.127
pip install nvidia-cublas-cu12==12.4.5.8
pip install nvidia-cuda-cupti-cu12==12.4.127
pip install nvidia-cuda-nvcc-cu12==12.4.131
pip install nvidia-cuda-nvrtc-cu12==12.4.127
pip install nvidia-cuda-sanitizer-api-cu12==12.4.127
pip install nvidia-cufft-cu12==11.2.1.3
pip install nvidia-curand-cu12==10.3.5.147
pip install nvidia-cusolver-cu12==11.6.1.9
pip install nvidia-cusparse-cu12==12.3.1.170
pip install nvidia-cuda-cuxxfilt-cu12==12.4.127
pip install nvidia-npp-cu12==12.2.5.30
pip install nvidia-nvfatbin-cu12==12.4.127
pip install nvidia-nvjitlink-cu12==12.4.127
pip install nvidia-nvjpeg-cu12==12.3.1.117
pip install nvidia-nvml-dev-cu12==12.4.127
pip install nvidia-nvtx-cu12==12.4.127
pip install nvidia-cuda-opencl-cu12==12.4.127

CUDA 12.4.0:

pip install nvidia-cuda-runtime-cu12==12.4.99
pip install nvidia-cublas-cu12==12.4.2.65
pip install nvidia-cuda-cupti-cu12==12.4.99
pip install nvidia-cuda-nvcc-cu12==12.4.99
pip install nvidia-cuda-nvrtc-cu12==12.4.99
pip install nvidia-cuda-sanitizer-api-cu12==12.4.99
pip install nvidia-cufft-cu12==11.2.0.44
pip install nvidia-curand-cu12==10.3.5.119
pip install nvidia-cusolver-cu12==11.6.0.99
pip install nvidia-cusparse-cu12==12.3.0.142
pip install nvidia-cuda-cuxxfilt-cu12==12.4.99
pip install nvidia-npp-cu12==12.2.5.2
pip install nvidia-nvfatbin-cu12==12.4.99
pip install nvidia-nvjitlink-cu12==12.4.99
pip install nvidia-nvjpeg-cu12==12.3.1.89
pip install nvidia-nvml-dev-cu12==12.4.99
pip install nvidia-nvtx-cu12==12.4.99
pip install nvidia-cuda-opencl-cu12==12.4.99

These libraries, while available in the .exe installer, are not available on pypi.org, but shouldn't be needed for ctranslate2 anyways:

cuda_cuobjdump
cuda_nvdisasm
cuda_nvprune
cuda_nvprof
cuda_nvvp
nsight_compute
nsight_vse
nvidia_driver (Windows Driver)

With that being said, you might also consider this script or a modification of it...I'm still customizing it for my own program, actually. Here's the script:

https://github.com/NVIDIA/build-system-archive-import-examples/blob/main/parse_redist.py

Anyhow...I am confident in this versioning...I downloaded all official releases as seen here:

image

I inspected all files that the installer extracts to a temporary directory...there is always a "version.json" file within a sub-folder named "CUDAToolkit" that contains the correlating versions. Here is a brief excerpt of one such file. The version numbers ALWAYS correlate to the versions list on pypi.org. Moreover, they ALWAYS correlate to "distributables" on Nvidia's website.

image

@BBC-Esq
Copy link

BBC-Esq commented Oct 28, 2024

@MahmoudAshraf97 please note...you use the pip method, you will NOT get binaries like nvcc.exe (just to give one example) that ARE installed when using the .exe installer. Using nvcc as an example, nvcc via pip gives, you within the bin folder, the following:

image

If you use the aforementioned download script, however, you will get:

image

This could make a difference if, for example, ctranslate2 relies on such files. I encountered this issue when trying to use the pip installation method with the triton library (which my program needs) because triton needs nvcc.exe...this is but one example.

@BBC-Esq
Copy link

BBC-Esq commented Oct 28, 2024

@BBC-Esq
Copy link

BBC-Esq commented Oct 28, 2024

In the meantime, pursuant to our discussion, I've created a script that will download that appropriate CUDA Toolkit files (by version) or you can choose to download the cuDNN files. Remember, you must still set the appropriate PATH and other variables...and you must still make sure that the cuDNN version you're using is compatible with cuDNN and/or Torch and/or Ctranslate2 and/or any other library you plan to use in your program.

You will only need to pip install pyside6 above and beyond the standard python libraries used. You should name this script download_cuda.py.

FULL SCRIPT HERE ``` __version__ = "0.5.0" minimum = "3.8"

import sys
if sys.version_info < tuple(map(int, minimum.split("."))):
print(
"ERROR: script", file, "version", version, "requires Python %s or later" % minimum
)
sys.exit(1)

import argparse
import os
import stat
import json
import re
import shutil
import tarfile
import zipfile
from urllib.request import urlopen
from pathlib import Path
from PySide6.QtWidgets import (QApplication, QMainWindow, QWidget, QVBoxLayout,
QHBoxLayout, QLabel, QComboBox, QLineEdit, QPushButton,
QTextEdit, QFileDialog, QMessageBox)
from PySide6.QtCore import Qt, QThread, Signal
import subprocess

ARCHIVES = {}
DOMAIN = "https://developer.download.nvidia.com"

CUDA_RELEASES = {
"CUDA 12.4.0": "12.4.0",
"CUDA 12.4.1": "12.4.1",
"CUDA 12.5.0": "12.5.0",
"CUDA 12.5.1": "12.5.1",
"CUDA 12.6.0": "12.6.0",
"CUDA 12.6.1": "12.6.1",
"CUDA 12.6.2": "12.6.2"
}

CUDNN_RELEASES = [
"9.0.0",
"9.1.1",
"9.2.0",
"9.2.1",
"9.3.0",
"9.4.0",
"9.5.0",
"9.5.1"
]

PRODUCTS = {
"CUDA Toolkit": "cuda",
"cuDNN": "cudnn"
}

OPERATING_SYSTEMS = {
"Windows": "windows",
"Linux": "linux"
}

ARCHITECTURES = {
"x86_64": "x86_64",
"PPC64le (Linux only)": "ppc64le",
"SBSA (Linux only)": "sbsa",
"AARCH64 (Linux only)": "aarch64"
}

VARIANTS = {
"CUDA 11": "cuda11",
"CUDA 12": "cuda12"
}

COMPONENTS = {
"All Components": None,
"CUDA Runtime (cudart)": "cuda_cudart",
"CXX Core Compute Libraries": "cuda_cccl",
"CUDA Object Dump Tool": "cuda_cuobjdump",
"CUDA Profiling Tools Interface": "cuda_cupti",
"CUDA Demangler Tool": "cuda_cuxxfilt",
"CUDA Demo Suite": "cuda_demo_suite",
"CUDA Documentation": "cuda_documentation",
"NVIDIA CUDA Compiler": "cuda_nvcc",
"CUDA Binary Utility": "cuda_nvdisasm",
"NVIDIA Management Library Headers": "cuda_nvml_dev",
"CUDA Profiler": "cuda_nvprof",
"CUDA Binary Utility": "cuda_nvprune",
"CUDA Runtime Compilation Library": "cuda_nvrtc",
"CUDA Tools SDK": "cuda_nvtx",
"NVIDIA Visual Profiler": "cuda_nvvp",
"CUDA OpenCL": "cuda_opencl",
"CUDA Profiler API": "cuda_profiler_api",
"CUDA Compute Sanitizer API": "cuda_sanitizer_api",
"CUDA BLAS Library": "libcublas",
"CUDA FFT Library": "libcufft",
"CUDA Random Number Generation Library": "libcurand",
"CUDA Solver Library": "libcusolver",
"CUDA Sparse Matrix Library": "libcusparse",
"NVIDIA Performance Primitives Library": "libnpp",
"NVIDIA Fatbin Utilities": "libnvfatbin",
"NVIDIA JIT Linker Library": "libnvjitlink",
"NVIDIA JPEG Library": "libnvjpeg",
"Nsight Compute": "nsight_compute",
"Nsight Systems": "nsight_systems",
"Nsight Visual Studio Edition": "nsight_vse",
"Visual Studio Integration": "visual_studio_integration"
}

def err(msg):
print("ERROR: " + msg)
sys.exit(1)

def fetch_file(full_path, filename):
download = urlopen(full_path)
if download.status != 200:
print(" -> Failed: " + filename)
else:
print(":: Fetching: " + full_path)
with open(filename, "wb") as file:
file.write(download.read())
print(" -> Wrote: " + filename)

def fix_permissions(directory):
for root, dirs, files in os.walk(directory):
for file in files:
filename = os.path.join(root, file)
octal = os.stat(filename)
os.chmod(filename, octal.st_mode | stat.S_IWRITE)

def flatten_tree(src, dest, tag=None):
if tag:
dest = os.path.join(dest, tag)

try:
    shutil.copytree(src, dest, symlinks=1, dirs_exist_ok=1, ignore_dangling_symlinks=1)
except FileExistsError:
    pass
shutil.rmtree(src)

def parse_artifact(
parent,
manifest,
component,
platform,
retrieve=True,
variant=None,
):
if variant:
full_path = parent + manifest[component][platform][variant]["relative_path"]
else:
full_path = parent + manifest[component][platform]["relative_path"]

filename = os.path.basename(full_path)
file_path = filename
pwd = os.path.join(os.getcwd(), component, platform)

if (
    retrieve
    and not os.path.exists(filename)
    and not os.path.exists(full_path)
    and not os.path.exists(parent + filename)
    and not os.path.exists(pwd + filename)
):
    fetch_file(full_path, filename)
    file_path = filename
    ARCHIVES[platform].append(filename)
elif os.path.exists(filename):
    print("  -> Found: " + filename)
    file_path = filename
    ARCHIVES[platform].append(filename)
elif os.path.exists(full_path):
    file_path = full_path
    print("  -> Found: " + file_path)
    ARCHIVES[platform].append(file_path)
elif os.path.exists(os.path.join(parent, filename)):
    file_path = os.path.join(parent, filename)
    print("  -> Found: " + file_path)
    ARCHIVES[platform].append(file_path)
elif os.path.exists(os.path.join(pwd, filename)):
    file_path = os.path.join(pwd, filename)
    print("  -> Found: " + file_path)
    ARCHIVES[platform].append(file_path)
else:
    print("Parent: " + os.path.join(pwd, filename))
    print("  -> Artifact: " + filename)

def fetch_action(
parent, manifest, component_filter, platform_filter, cuda_filter, retrieve
):
for component in manifest.keys():
if not "name" in manifest[component]:
continue

    if component_filter is not None and component != component_filter:
        continue

    print("\n" + manifest[component]["name"] + ": " + manifest[component]["version"])

    for platform in manifest[component].keys():
        if "variant" in platform:
            continue

        if not platform in ARCHIVES:
            ARCHIVES[platform] = []

        if not isinstance(manifest[component][platform], str):
            if (
                platform_filter is not None
                and platform != platform_filter
                and platform != "source"
            ):
                print("  -> Skipping platform: " + platform)
                continue

            if not "relative_path" in manifest[component][platform]:
                for variant in manifest[component][platform].keys():
                    if cuda_filter is not None and variant != cuda_filter:
                        print("  -> Skipping variant: " + variant)
                        continue

                    parse_artifact(
                        parent,
                        manifest,
                        component,
                        platform,
                        retrieve,
                        variant,
                    )
            else:
                parse_artifact(
                    parent, manifest, component, platform, retrieve
                )

def post_action(output_dir, collapse=True):
if len(ARCHIVES) == 0:
return

print("\nArchives:")
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

for platform in ARCHIVES:
    for archive in ARCHIVES[platform]:
        try:
            binTag = archive.split("-")[3].split("_")[1]
            # print(platform, binTag)
        except:
            binTag = None

        if re.search(r"\.tar\.", archive):
            print(":: tar: " + archive)
            tarball = tarfile.open(archive)
            topdir = os.path.commonprefix(tarball.getnames())
            tarball.extractall()
            tarball.close()

            print("  -> Extracted: " + topdir + "/")
            fix_permissions(topdir)

            if collapse:
                flatdir = os.path.join(output_dir, platform)
                flatten_tree(topdir, flatdir, binTag)
                print("  -> Flattened: " + flatdir + "/")

        elif re.search(r"\.zip", archive):
            print(":: zip: " + archive)
            with zipfile.ZipFile(archive) as zippy:
                topdir = os.path.commonprefix(zippy.namelist())
                zippy.extractall()
            zippy.close()

            print("  -> Extracted: " + topdir)
            fix_permissions(topdir)

            if collapse:
                flatdir = os.path.join(output_dir, platform)
                flatten_tree(topdir, flatdir, binTag)
                print("  -> Flattened: " + flatdir + "/")

print("\nOutput: " + output_dir + "/")
for item in sorted(os.listdir(output_dir)):
    if os.path.isdir(os.path.join(output_dir, item)):
        print(" - " + item + "/")
    elif os.path.isfile(os.path.join(output_dir, item)):
        print(" - " + item)

class DownloadWorker(QThread):
finished = Signal(bool, str)

def __init__(self, args):
    super().__init__()
    self.args = args

def run(self):
    try:
        cmd = [
            sys.executable,
            sys.argv[0],
            "--download-only",
        ]
        
        for arg, value in vars(self.args).items():
            if value is not None:
                cmd.extend([f"--{arg.replace('_', '-')}", str(value)])

        result = subprocess.run(
            cmd,
            check=True,
            capture_output=True,
            text=True
        )
        self.finished.emit(True, "")
    except subprocess.CalledProcessError as e:
        self.finished.emit(False, f"{str(e)}\nOutput: {e.output}")

class DownloaderGUI(QMainWindow):
def init(self):
super().init()
self.setWindowTitle("NVIDIA Package Downloader")
self.download_worker = None
self.setup_ui()

def setup_ui(self):
    central_widget = QWidget()
    self.setCentralWidget(central_widget)
    layout = QVBoxLayout(central_widget)
    layout.setSpacing(10)
    
    # Product selection
    product_layout = QHBoxLayout()
    product_label = QLabel("Product:")
    self.product_combo = QComboBox()
    self.product_combo.addItems(PRODUCTS.keys())
    self.product_combo.currentTextChanged.connect(self.on_product_change)
    product_layout.addWidget(product_label)
    product_layout.addWidget(self.product_combo)
    layout.addLayout(product_layout)

    # Release Label selection
    version_layout = QHBoxLayout()
    version_label = QLabel("Release Label:")
    self.version_combo = QComboBox()
    self.version_combo.addItems(CUDA_RELEASES.keys())
    version_layout.addWidget(version_label)
    version_layout.addWidget(self.version_combo)
    layout.addLayout(version_layout)

    # OS selection
    os_layout = QHBoxLayout()
    os_label = QLabel("Operating System:")
    self.os_combo = QComboBox()
    self.os_combo.addItems(OPERATING_SYSTEMS.keys())
    os_layout.addWidget(os_label)
    os_layout.addWidget(self.os_combo)
    layout.addLayout(os_layout)

    # Architecture selection
    arch_layout = QHBoxLayout()
    arch_label = QLabel("Architecture:")
    self.arch_combo = QComboBox()
    self.arch_combo.addItems(ARCHITECTURES.keys())
    arch_layout.addWidget(arch_label)
    arch_layout.addWidget(self.arch_combo)
    layout.addLayout(arch_layout)

    # Component selection
    comp_layout = QHBoxLayout()
    comp_label = QLabel("Component:")
    self.component_combo = QComboBox()
    self.component_combo.addItem("All Components")
    self.component_combo.addItems(COMPONENTS.keys())
    comp_layout.addWidget(comp_label)
    comp_layout.addWidget(self.component_combo)
    layout.addLayout(comp_layout)

    # Variant selection
    variant_layout = QHBoxLayout()
    variant_label = QLabel("CUDA Variant:")
    self.variant_combo = QComboBox()
    self.variant_combo.addItems(VARIANTS.keys())
    self.variant_combo.setEnabled(False)
    variant_layout.addWidget(variant_label)
    variant_layout.addWidget(self.variant_combo)
    layout.addLayout(variant_layout)

    # Output directory selection
    output_layout = QHBoxLayout()
    output_label = QLabel("Output Directory:")
    self.output_entry = QLineEdit()
    browse_button = QPushButton("Browse")
    browse_button.clicked.connect(self.browse_output)
    output_layout.addWidget(output_label)
    output_layout.addWidget(self.output_entry)
    output_layout.addWidget(browse_button)
    layout.addLayout(output_layout)

    # Command preview
    preview_label = QLabel("Command Preview:")
    self.command_text = QTextEdit()
    self.command_text.setReadOnly(True)
    self.command_text.setMaximumHeight(100)
    layout.addWidget(preview_label)
    layout.addWidget(self.command_text)

    # Download button
    self.download_button = QPushButton("Download")
    self.download_button.clicked.connect(self.execute_download)
    layout.addWidget(self.download_button)

    self.product_combo.currentTextChanged.connect(self.update_command_preview)
    self.version_combo.currentTextChanged.connect(self.update_command_preview)
    self.os_combo.currentTextChanged.connect(self.update_command_preview)
    self.arch_combo.currentTextChanged.connect(self.update_command_preview)
    self.component_combo.currentTextChanged.connect(self.update_command_preview)
    self.variant_combo.currentTextChanged.connect(self.update_command_preview)
    self.output_entry.textChanged.connect(self.update_command_preview)

    self.setMinimumWidth(600)
    self.setMinimumHeight(500)

def on_product_change(self, product_text):
    is_cudnn = PRODUCTS[product_text] == "cudnn"
    
    self.variant_combo.setEnabled(is_cudnn)
    if not is_cudnn:
        self.variant_combo.setCurrentIndex(-1)
    
    self.component_combo.setEnabled(not is_cudnn)
    if is_cudnn:
        self.component_combo.setCurrentIndex(-1)
    
    self.version_combo.blockSignals(True)
    self.version_combo.clear()
    if is_cudnn:
        self.version_combo.addItems(CUDNN_RELEASES)
    else:
        self.version_combo.addItems(CUDA_RELEASES.keys())
    self.version_combo.blockSignals(False)
    
    self.update_command_preview()

def browse_output(self):
    directory = QFileDialog.getExistingDirectory(self, "Select Output Directory")
    if directory:
        self.output_entry.setText(directory)

def update_command_preview(self):
    command = ["python", "download_cuda.py"]
    
    product_text = self.product_combo.currentText()
    if product_text:
        product_key = PRODUCTS[product_text]
        command.extend(["--product", product_key])
    
    if self.version_combo.currentText():
        if PRODUCTS[self.product_combo.currentText()] == "cudnn":
            release_label = self.version_combo.currentText()
        else:
            release_label = CUDA_RELEASES.get(
                self.version_combo.currentText(),
                self.version_combo.currentText()
            )
        command.extend(["--label", release_label])
        
    if self.os_combo.currentText():
        os_key = OPERATING_SYSTEMS[self.os_combo.currentText()]
        command.extend(["--os", os_key])
    
    if self.arch_combo.currentText():
        arch_key = ARCHITECTURES[self.arch_combo.currentText()]
        command.extend(["--arch", arch_key])
    
    if (
        self.product_combo.currentText() != "cuDNN" and
        self.component_combo.currentText() != "All Components" and
        self.component_combo.currentText()
    ):
        component_key = COMPONENTS[self.component_combo.currentText()]
        command.extend(["--component", component_key])
    
    if self.variant_combo.isEnabled() and self.variant_combo.currentText():
        variant_key = VARIANTS[self.variant_combo.currentText()]
        command.extend(["--variant", variant_key])
    
    if self.output_entry.text():
        command.extend(["--output", self.output_entry.text()])
    
    self.command_text.setText(" ".join(command))


def execute_download(self):
    command = self.command_text.toPlainText().strip()
    if command:
        self.download_button.setEnabled(False)
        
        args = argparse.Namespace()
        args.product = PRODUCTS[self.product_combo.currentText()]
        
        if PRODUCTS[self.product_combo.currentText()] == "cudnn":
            args.label = self.version_combo.currentText()
        else:
            args.label = CUDA_RELEASES.get(
                self.version_combo.currentText(),
                self.version_combo.currentText()
            )
        
        args.os = OPERATING_SYSTEMS[self.os_combo.currentText()]
        args.arch = ARCHITECTURES[self.arch_combo.currentText()]
        
        if self.variant_combo.isEnabled() and self.variant_combo.currentText():
            args.variant = VARIANTS[self.variant_combo.currentText()]
        else:
            args.variant = None
        
        if (
            self.product_combo.currentText() != "cuDNN" and
            self.component_combo.currentText() != "All Components" and
            self.component_combo.currentText()
        ):
            args.component = COMPONENTS[self.component_combo.currentText()]
        else:
            args.component = None
        
        args.output = self.output_entry.text() if self.output_entry.text() else "flat"
        
        self.download_worker = DownloadWorker(args)
        self.download_worker.finished.connect(self.on_download_complete)
        self.download_worker.start()
    else:
        QMessageBox.warning(
            self, 
            "Warning", 
            "Please configure the download options first."
        )


def on_download_complete(self, success, error_message):
    self.download_button.setEnabled(True)
    if success:
        QMessageBox.information(self, "Success", "Download completed successfully!")
    else:
        QMessageBox.critical(self, "Error", f"Download failed: {error_message}")

def main():
parser = argparse.ArgumentParser()
parser.add_argument("--download-only", action="store_true", help=argparse.SUPPRESS)
parser.add_argument("--product", help="Product name")
parser.add_argument("--label", help="Release label version")
parser.add_argument("--os", help="Operating System")
parser.add_argument("--arch", help="Architecture")
parser.add_argument("--component", help="Component name")
parser.add_argument("--variant", help="Variant")
parser.add_argument("--output", help="Output directory")

args = parser.parse_args()

if args.download_only:
    try:
        parent = f"{DOMAIN}/compute/{args.product}/redist/"
        manifest_uri = f"{parent}redistrib_{args.label}.json"
        
        manifest_response = urlopen(manifest_uri)
        manifest = json.loads(manifest_response.read())
        
        platform = f"{args.os}-{args.arch}"
        
        fetch_action(
            parent,
            manifest,
            args.component,
            platform,
            args.variant,
            True
        )
        
        post_action(args.output, True)
        
        sys.exit(0)
    except Exception as e:
        print(f"Error during download: {str(e)}", file=sys.stderr)
        sys.exit(1)
else:
    app = QApplication(sys.argv)
    app.setStyle('Fusion')
    window = DownloaderGUI()
    window.show()
    sys.exit(app.exec())

if name == "main":
main()

</details>

@BBC-Esq
Copy link

BBC-Esq commented Oct 30, 2024

Official compatibility matrix that I found at:

https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix

PyTorch version Python C++ Stable CUDA Experimental CUDA Stable ROCm
2.5 >=3.9, <=3.12, (3.13 experimental) C++17 CUDA 11.8, CUDA 12.1, CUDA 12.4, CUDNN 9.1.0.70 None ROCm 6.2
2.4 >=3.8, <=3.12 C++17 CUDA 11.8, CUDA 12.1, CUDNN 9.1.0.70 CUDA 12.4, CUDNN 9.1.0.70 ROCm 6.1
2.3 >=3.8, <=3.11, (3.12 experimental) C++17 CUDA 11.8, CUDNN 8.7.0.84 CUDA 12.1, CUDNN 8.9.2.26 ROCm 6.0
2.2 >=3.8, <=3.11, (3.12 experimental) C++17 CUDA 11.8, CUDNN 8.7.0.84 CUDA 12.1, CUDNN 8.9.2.26 ROCm 5.7
2.1 >=3.8, <=3.11 C++17 CUDA 11.8, CUDNN 8.7.0.84 CUDA 12.1, CUDNN 8.9.2.26 ROCm 5.6
2.0 >=3.8, <=3.11 C++14 CUDA 11.7, CUDNN 8.5.0.96 CUDA 11.8, CUDNN 8.7.0.84 ROCm 5.4
1.13 >=3.7, <=3.10 C++14 CUDA 11.6, CUDNN 8.3.2.44 CUDA 11.7, CUDNN 8.5.0.96 ROCm 5.2
1.12 >=3.7, <=3.10 C++14 CUDA 11.3, CUDNN 8.3.2.44 CUDA 11.6, CUDNN 8.3.2.44 ROCm 5.0

@gstvoschaefer
Copy link

Hello, I’m running WhisperX on Databricks and encountering the same error:

Unable to load any of {libcudnn_cnn.so.9.1.0, libcudnn_cnn.so.9.1, libcudnn_cnn.so.9, libcudnn_cnn.so}
Invalid handle. Cannot load symbol cudnnCreateConvolutionDescriptor

Databricks Cluster Runtime:
15.4.x-gpu-ml-scala2.12

Packages - version:

whisperx - 3.1.5

torch - 2.5.1+cu124

ctranslate2 - 4.5.0

nvidia-cublas-cu12 - 12.4.5.8
nvidia-cuda-cupti-cu12 - 12.4.127
nvidia-cuda-nvrtc-cu12 - 12.4.127
nvidia-cuda-runtime-cu12 - 12.4.127
nvidia-cudnn-cu12 - 9.1.0.70
nvidia-cufft-cu12 - 11.2.1.3
nvidia-curand-cu12 - 10.3.5.147
nvidia-cusolver-cu12 - 11.6.1.9
nvidia-cusparse-cu12 - 12.3.1.170
nvidia-ml-py - 12.555.43
nvidia-nccl-cu12 - 2.21.5
nvidia-nvjitlink-cu12 - 12.4.127
nvidia-nvtx-cu12 - 12.4.127
pip show nvidia-cudnn-cu12

Name: nvidia-cudnn-cu12
Version: 9.1.0.70
Summary: cuDNN runtime libraries
Home-page: https://developer.nvidia.com/cuda-zone
Author: Nvidia CUDA Installer Team
Author-email: [email protected]
License: NVIDIA Proprietary Software
Location: /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages
Requires: nvidia-cublas-cu12
Required-by: torch
!ldconfig -p | grep cudnn_cnn

	libcudnn_cnn_train.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8
	libcudnn_cnn_infer.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8

I’ve tried running:

!export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib/x86_64-linux-gnu/

But there was no change. Could someone help me here? I’ve been trying to find a solution for almost a week.

@BBC-Esq
Copy link

BBC-Esq commented Oct 30, 2024

Here is an ongoing updated .txt that I've been working on if it helps...

RIDICULOUSLY LONG TXT FILE HERE

Following is the Release Compatibility Matrix for PyTorch releases:

PyTorch version Python C++ Stable CUDA Experimental CUDA Stable ROCm
2.5 >=3.9, <=3.12, (3.13 experimental) C++17 CUDA 11.8, CUDA 12.1, CUDA 12.4, CUDNN 9.1.0.70 None ROCm 6.2
2.4 >=3.8, <=3.12 C++17 CUDA 11.8, CUDA 12.1, CUDNN 9.1.0.70 CUDA 12.4, CUDNN 9.1.0.70 ROCm 6.1
2.3 >=3.8, <=3.11, (3.12 experimental) C++17 CUDA 11.8, CUDNN 8.7.0.84 CUDA 12.1, CUDNN 8.9.2.26 ROCm 6.0
2.2 >=3.8, <=3.11, (3.12 experimental) C++17 CUDA 11.8, CUDNN 8.7.0.84 CUDA 12.1, CUDNN 8.9.2.26 ROCm 5.7
2.1 >=3.8, <=3.11 C++17 CUDA 11.8, CUDNN 8.7.0.84 CUDA 12.1, CUDNN 8.9.2.26 ROCm 5.6
2.0 >=3.8, <=3.11 C++14 CUDA 11.7, CUDNN 8.5.0.96 CUDA 11.8, CUDNN 8.7.0.84 ROCm 5.4
1.13 >=3.7, <=3.10 C++14 CUDA 11.6, CUDNN 8.3.2.44 CUDA 11.7, CUDNN 8.5.0.96 ROCm 5.2
1.12 >=3.7, <=3.10 C++14 CUDA 11.3, CUDNN 8.3.2.44 CUDA 11.6, CUDNN 8.3.2.44 ROCm 5.0

CUDNN:

8.9.7.29 = pip install nvidia-cudnn-cu12==8.9.7.29 # cuda 12.2 update 1 recommended for newer GPUs
9.1.0 = pip install nvidia-cudnn-cu12==9.1.0.70 # per nvidia, cuda 12.4 recommended for newer gpus
9.1.1 = pip install nvidia-cudnn-cu12==9.1.1.17 # per nvidia, cuda 12.4 recommended for newer gpus
9.2.0 = pip install nvidia-cudnn-cu12==9.2.0.82 # per nvidia, cuda 12.5 recommended for newer gpus
9.2.1 = pip install nvidia-cudnn-cu12==9.2.1.18
9.3.0 = pip install nvidia-cudnn-cu12==9.3.0.75
9.4.0 = pip install nvidia-cudnn-cu12==9.4.0.58
9.5.0 = pip install nvidia-cudnn-cu12==9.5.0.50
9.5.1 = pip install nvidia-cudnn-cu12==9.5.1.17

CUDA 12.6.2:

pip install nvidia-cuda-runtime-cu12==12.6.77
pip install nvidia-cublas-cu12==12.6.3.3
pip install nvidia-cuda-nvcc-cu12==12.6.77
pip install nvidia-cuda-nvrtc-cu12==12.6.77

pip install nvidia-cuda-cupti-cu12==12.6.80
pip install nvidia-cuda-sanitizer-api-cu12==12.6.77
pip install nvidia-cufft-cu12==11.3.0.4
pip install nvidia-curand-cu12==10.3.7.77
pip install nvidia-cusolver-cu12==11.7.1.2
pip install nvidia-cusparse-cu12==12.5.4.2
pip install nvidia-cuda-cuxxfilt-cu12==12.6.77
pip install nvidia-npp-cu12==12.3.1.54
pip install nvidia-nvfatbin-cu12==12.6.77
pip install nvidia-nvjitlink-cu12==12.6.77
pip install nvidia-nvjpeg-cu12==12.3.3.54
pip install nvidia-nvml-dev-cu12==12.6.77
pip install nvidia-nvtx-cu12==12.6.77
pip install nvidia-cuda-opencl-cu12==12.6.77

CUDA 12.6.1:

pip install nvidia-cuda-runtime-cu12==12.6.68
pip install nvidia-cublas-cu12==12.6.1.4
pip install nvidia-cuda-nvcc-cu12==12.6.68
pip install nvidia-cuda-nvrtc-cu12==12.6.68

pip install nvidia-cuda-cupti-cu12==12.6.68
pip install nvidia-cuda-sanitizer-api-cu12==12.6.68
pip install nvidia-cufft-cu12==11.2.6.59
pip install nvidia-curand-cu12==10.3.7.68
pip install nvidia-cusolver-cu12==11.6.4.69
pip install nvidia-cusparse-cu12==12.5.3.3
pip install nvidia-cuda-cuxxfilt-cu12==12.6.68
pip install nvidia-npp-cu12==12.3.1.54
pip install nvidia-nvfatbin-cu12==12.6.68
pip install nvidia-nvjitlink-cu12==12.6.68
pip install nvidia-nvjpeg-cu12==12.3.3.54
pip install nvidia-nvml-dev-cu12==12.6.68
pip install nvidia-nvtx-cu12==12.6.68
pip install nvidia-cuda-opencl-cu12==12.6.68

CUDA 12.6.0:

pip install nvidia-cuda-runtime-cu12==12.6.37
pip install nvidia-cublas-cu12==12.6.0.22
pip install nvidia-cuda-nvcc-cu12==12.6.20
pip install nvidia-cuda-nvrtc-cu12==12.6.20

pip install nvidia-cuda-cupti-cu12==12.6.37
pip install nvidia-cuda-sanitizer-api-cu12==12.6.34
pip install nvidia-cufft-cu12==11.2.6.28
pip install nvidia-curand-cu12==10.3.7.37
pip install nvidia-cusolver-cu12==11.6.4.38
pip install nvidia-cusparse-cu12==12.5.2.23
pip install nvidia-cuda-cuxxfilt-cu12==12.6.20
pip install nvidia-npp-cu12==12.3.1.23
pip install nvidia-nvfatbin-cu12==12.6.20
pip install nvidia-nvjitlink-cu12==12.6.20
pip install nvidia-nvjpeg-cu12==12.3.3.23
pip install nvidia-nvml-dev-cu12==12.6.37
pip install nvidia-nvtx-cu12==12.6.37
pip install nvidia-cuda-opencl-cu12==12.6.37

CUDA 12.5.1:

pip install nvidia-cuda-runtime-cu12==12.5.82
pip install nvidia-cublas-cu12==12.5.3.2
pip install nvidia-cuda-nvcc-cu12==12.5.82
pip install nvidia-cuda-nvrtc-cu12==12.5.82

pip install nvidia-cuda-cupti-cu12==12.5.82
pip install nvidia-cuda-sanitizer-api-cu12==12.5.81
pip install nvidia-cufft-cu12==11.2.3.61
pip install nvidia-curand-cu12==10.3.6.82
pip install nvidia-cusolver-cu12==11.6.3.83
pip install nvidia-cusparse-cu12==12.5.1.3
pip install nvidia-cuda-cuxxfilt-cu12==12.5.82
pip install nvidia-npp-cu12==12.3.0.159
pip install nvidia-nvfatbin-cu12==12.5.82
pip install nvidia-nvjitlink-cu12==12.5.82
pip install nvidia-nvjpeg-cu12==12.3.2.81
pip install nvidia-nvml-dev-cu12==12.5.82
pip install nvidia-nvtx-cu12==12.5.82
pip install nvidia-cuda-opencl-cu12==12.5.39

CUDA 12.5.0:

pip install nvidia-cuda-runtime-cu12==12.5.39
pip install nvidia-cublas-cu12==12.5.2.13
pip install nvidia-cuda-nvcc-cu12==12.5.40
pip install nvidia-cuda-nvrtc-cu12==12.5.40

pip install nvidia-cuda-cupti-cu12==12.5.39
pip install nvidia-cuda-sanitizer-api-cu12==12.5.39
pip install nvidia-cufft-cu12==11.2.3.18
pip install nvidia-curand-cu12==10.3.6.39
pip install nvidia-cusolver-cu12==11.6.2.40
pip install nvidia-cusparse-cu12==12.4.1.24
pip install nvidia-cuda-cuxxfilt-cu12==12.5.39
pip install nvidia-npp-cu12==12.3.0.116
pip install nvidia-nvfatbin-cu12==12.5.39
pip install nvidia-nvjitlink-cu12==12.5.40
pip install nvidia-nvjpeg-cu12==12.3.2.38
pip install nvidia-nvml-dev-cu12==12.5.39
pip install nvidia-nvtx-cu12==12.5.39
pip install nvidia-cuda-opencl-cu12==12.5.39

CUDA 12.4.1:

pip install nvidia-cuda-runtime-cu12==12.4.127
pip install nvidia-cublas-cu12==12.4.5.8
pip install nvidia-cuda-nvcc-cu12==12.4.131
pip install nvidia-cuda-nvrtc-cu12==12.4.127

pip install nvidia-cuda-cupti-cu12==12.4.127
pip install nvidia-cuda-sanitizer-api-cu12==12.4.127
pip install nvidia-cufft-cu12==11.2.1.3
pip install nvidia-curand-cu12==10.3.5.147
pip install nvidia-cusolver-cu12==11.6.1.9
pip install nvidia-cusparse-cu12==12.3.1.170
pip install nvidia-cuda-cuxxfilt-cu12==12.4.127
pip install nvidia-npp-cu12==12.2.5.30
pip install nvidia-nvfatbin-cu12==12.4.127
pip install nvidia-nvjitlink-cu12==12.4.127
pip install nvidia-nvjpeg-cu12==12.3.1.117
pip install nvidia-nvml-dev-cu12==12.4.127
pip install nvidia-nvtx-cu12==12.4.127
pip install nvidia-cuda-opencl-cu12==12.4.127

CUDA 12.4.0:

pip install nvidia-cuda-runtime-cu12==12.4.99
pip install nvidia-cublas-cu12==12.4.2.65
pip install nvidia-cuda-nvcc-cu12==12.4.99
pip install nvidia-cuda-nvrtc-cu12==12.4.99

pip install nvidia-cuda-cupti-cu12==12.4.99
pip install nvidia-cuda-sanitizer-api-cu12==12.4.99
pip install nvidia-cufft-cu12==11.2.0.44
pip install nvidia-curand-cu12==10.3.5.119
pip install nvidia-cusolver-cu12==11.6.0.99
pip install nvidia-cusparse-cu12==12.3.0.142
pip install nvidia-cuda-cuxxfilt-cu12==12.4.99
pip install nvidia-npp-cu12==12.2.5.2
pip install nvidia-nvfatbin-cu12==12.4.99
pip install nvidia-nvjitlink-cu12==12.4.99
pip install nvidia-nvjpeg-cu12==12.3.1.89
pip install nvidia-nvml-dev-cu12==12.4.99
pip install nvidia-nvtx-cu12==12.4.99
pip install nvidia-cuda-opencl-cu12==12.4.99

CUDA 12.2.2:

pip install nvidia-cuda-runtime-cu12==12.2.140
pip install nvidia-cublas-cu12==12.2.5.6
pip install nvidia-cuda-nvcc-cu12==12.2.140
pip install nvidia-cuda-nvrtc-cu12==12.2.140

pip install nvidia-cuda-cupti-cu12==12.2.142
pip install nvidia-cuda-sanitizer-api-cu12==12.2.140
pip install nvidia-cufft-cu12==11.0.8.103
pip install nvidia-curand-cu12==10.3.3.141
pip install nvidia-cusolver-cu12==11.5.2.141
pip install nvidia-cusparse-cu12==12.1.2.141
pip install nvidia-cuda-cuxxfilt-cu12==12.2.140
pip install nvidia-npp-cu12==12.2.1.4
pip install nvidia-nvjitlink-cu12==12.2.140
pip install nvidia-nvjpeg-cu12==12.2.2.4
pip install nvidia-nvml-dev-cu12==12.2.140
pip install nvidia-nvtx-cu12==12.2.140
pip install nvidia-cuda-opencl-cu12==12.2.140

CUDA 12.1.1:

pip install nvidia-cuda-runtime-cu12==12.1.105
pip install nvidia-cublas-cu12==12.1.3.1
pip install nvidia-cuda-nvcc-cu12==12.1.105
pip install nvidia-cuda-nvrtc-cu12==12.1.105

pip install nvidia-cuda-cupti-cu12==12.1.105
pip install nvidia-cuda-sanitizer-api-cu12==12.1.105
pip install nvidia-cufft-cu12==11.0.2.54
pip install nvidia-curand-cu12==10.3.2.106
pip install nvidia-cusolver-cu12==11.4.5.107
pip install nvidia-cusparse-cu12==12.1.0.106
pip install nvidia-cuda-cuxxfilt-cu12==12.1.105
pip install nvidia-npp-cu12==12.1.0.40
pip install nvidia-nvfatbin-cu12==12.1.105
pip install nvidia-nvjitlink-cu12==12.1.105
pip install nvidia-nvjpeg-cu12==12.2.0.2
pip install nvidia-nvml-dev-cu12==12.1.105
pip install nvidia-nvtx-cu12==12.1.105
pip install nvidia-cuda-opencl-cu12==12.1.105

CUDA 12.1.0:

pip install nvidia-cuda-runtime-cu12==12.1.55
pip install nvidia-cublas-cu12==12.1.0.26
pip install nvidia-cuda-nvcc-cu12==12.1.66
pip install nvidia-cuda-nvrtc-cu12==12.1.55

pip install nvidia-cuda-cupti-cu12==12.1.62
pip install nvidia-cuda-sanitizer-api-cu12==12.1.55
pip install nvidia-cufft-cu12==11.0.2.4
pip install nvidia-curand-cu12==10.3.2.56
pip install nvidia-cusolver-cu12==11.4.4.55
pip install nvidia-cusparse-cu12==12.0.2.55
pip install nvidia-cuda-cuxxfilt-cu12==12.1.55
pip install nvidia-npp-cu12==12.0.2.50
pip install nvidia-nvfatbin-cu12==12.1.55
pip install nvidia-nvjitlink-cu12==12.1.55
pip install nvidia-nvjpeg-cu12==12.1.0.39
pip install nvidia-nvml-dev-cu12==12.1.55
pip install nvidia-nvtx-cu12==12.1.66
pip install nvidia-cuda-opencl-cu12==12.1.56

CUDA 11.8.0:

pip install nvidia-cuda-runtime-cu11==11.8.89
pip install nvidia-cublas-cu11==11.11.3.6
pip install nvidia-cuda-nvcc-cu11==11.8.89
pip install nvidia-cuda-nvrtc-cu11==11.8.89

pip install nvidia-cuda-cupti-cu11==11.8.87
pip install nvidia-cuda-sanitizer-api-cu11==11.8.86
pip install nvidia-cufft-cu11==10.9.0.58
pip install nvidia-curand-cu11==10.3.0.86
pip install nvidia-cusolver-cu11==11.4.1.48
pip install nvidia-cusparse-cu11==11.7.5.86
pip install nvidia-cuda-cuxxfilt-cu11==11.8.86
pip install nvidia-npp-cu11==11.8.0.86
pip install nvidia-nvfatbin-cu11==11.8.86
pip install nvidia-nvjitlink-cu11==11.8.86
pip install nvidia-nvjpeg-cu11==11.9.0.86
pip install nvidia-nvml-dev-cu11==11.8.86
pip install nvidia-nvtx-cu11==11.8.86
pip install nvidia-cuda-opencl-cu11==11.8.86


nvidia-npp-cu12:

Version 12.3.1.54 was shared between CUDA 12.6.2 and CUDA 12.6.1

nvidia-nvjpeg-cu12:

Version 12.3.3.54 was shared between CUDA 12.6.2 and CUDA 12.6.1

nvidia-cuda-opencl-cu12:

Version 12.5.39 was shared between CUDA 12.5.1 and CUDA 12.5.0

If your situation involves having to set PATH and what not, however, I'm afraid I can't help there. I'm on Windows...

ctranslate2==2.5.0 is currently built with cudnn release 9.1, which the the equivalent of pip install nvidia-cudnn-cu12==9.1.0.70 if you use pip...

Also, it's built with CUDA 12.2 in mind, which is seen here...

https://github.com/OpenNMT/CTranslate2/blob/master/python/tools/prepare_build_environment_linux.sh

Here is what AI says when I fed it your question and my matrices...

AI's Response

image

You'll have to analyze all of this because that's as far as I've gotten. Hope it helps...

@atyshka
Copy link

atyshka commented Nov 12, 2024

I was having this issue with CUDA 12.4.1 and Torch 2.5.1. It was resolved by manually setting LD_LIBRARY_PATH, but why is that necessary? Under normal circumstances I would expect it to find the lib automatically

@BBC-Esq
Copy link

BBC-Esq commented Nov 12, 2024

Because now that torch bundles the CUDA .dll files, essentially, it is also setting paths to them in the "torch" source code somewhere. I scoured the source code a few weeks ago and found it...and the paths vary by which version of torch your install (i.e. what platform the wheel is for)...but essentially that's what's going on. That's as far as I got, however, since I had to turn to real world work things...

@BBC-Esq
Copy link

BBC-Esq commented Nov 12, 2024

Ctranslate2 is not the only ones experiencing frustration with how Torch is doing things...

image

@BBC-Esq
Copy link

BBC-Esq commented Dec 9, 2024

See my pull request here to try and address some compatibility issues and for generally useful info:

#1830

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants