Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCM support for AMD GPUs #566

Open
TheMaddax opened this issue Nov 8, 2023 · 5 comments
Open

ROCM support for AMD GPUs #566

TheMaddax opened this issue Nov 8, 2023 · 5 comments

Comments

@TheMaddax
Copy link

I've been using whisper with my AMD 6600 XT, which functioned well with ROCM support. However, with whisperx, it seems like it is not supported to use AMD GPUs. Is there any chance for ROCM support to be implemented with whisperx?

@kandeshvari
Copy link

As I understand the reason is in the ctranslate2 package. There is an issue with AMD support but with no progress :( OpenNMT/CTranslate2#1072

@radna0
Copy link

radna0 commented Jul 1, 2024

Any update on this?

@arlo-phoenix
Copy link

@radna0 You can test my CTranslate2 ROCm fork. See OpenNMT/CTranslate2#1072 (comment). Works very well on my RX6800 GPU and should probably work on every AMD GPU that supports ROCm.

@Medhatt21
Copy link

@arlo-phoenix Worked. Thanks a lot. Is there a chance to publish a docker image of whisperx with rocm support?

@TibixDev
Copy link

TibixDev commented Jan 5, 2025

I spent a few hours trying to get this to work, but unfortunately I couldn't quite get it to run. I feel like I am pretty close but the last error I get I really have zero clue about. Still, I hope this may be useful to someone, because there were a ton of other problems that I managed to solve. The only thing you will need to replace is https://your.url/pytorch_model.bin with a URL where the VAD model is hosted, since in this version there was a hardcoded S3 url which the author disabled, download it from the archive and upload pytorch_model.bin to some web server / CDN. Put your samples in a directory called audio which should reside in the same folder where you placed the compose file.

Dockerfile:

FROM rocm/pytorch:latest

ENV PYTORCH_ROCM_ARCH=gfx1032
ENV HSA_OVERRIDE_GFX_VERSION=10.3.0

SHELL ["/bin/bash", "-c"]
RUN conda init bash && \
    echo "conda activate py_3.9" >> ~/.bashrc

RUN apt-get update && apt-get install -y \
    ffmpeg \
    git \
    wget \
    libomp5 \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /opt/whisperx

RUN git clone https://github.com/arlo-phoenix/CTranslate2-rocm.git --recurse-submodules && \
    cd CTranslate2-rocm && \
    source ~/.bashrc && \
    CLANG_CMAKE_CXX_COMPILER=clang++ \
    CXX=clang++ \
    HIPCXX="$(hipconfig -l)/clang" \
    HIP_PATH="$(hipconfig -R)" \
    cmake -S . -B build \
        -DWITH_MKL=OFF \
        -DWITH_HIP=ON \
        -DCMAKE_HIP_ARCHITECTURES=$PYTORCH_ROCM_ARCH \
        -DBUILD_TESTS=ON \
        -DWITH_CUDNN=ON && \
    cmake --build build -- -j$(nproc) && \
    cd build && \
    cmake --install . --prefix /opt/conda/envs/py_3.9 && \
    ldconfig

RUN source ~/.bashrc && \
    cd /opt/whisperx/CTranslate2-rocm/python && \
    pip install -r install_requirements.txt && \
    CPLUS_INCLUDE_PATH=/opt/conda/envs/py_3.9/include \
    LIBRARY_PATH=/opt/conda/envs/py_3.9/lib \
    python setup.py bdist_wheel && \
    pip install dist/*.whl

RUN source ~/.bashrc && \
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1 --force-reinstall && \
    pip install transformers pandas nltk pyannote.audio==3.1.1 faster-whisper==1.0.1 -U && \
    pip install whisperx==3.1.1 --no-deps

# Patch the asr.py file to fix the bug
RUN sed -i '/"suppress_numerals": False/a \    "max_new_tokens": None,\n    "clip_timestamps": None,\n    "hallucination_silence_threshold": None,' \
    /opt/conda/envs/py_3.10/lib/python3.10/site-packages/whisperx/asr.py

# Patch the vad.py file to update the VAD_SEGMENTATION_URL
RUN sed -i 's|https://whisperx.s3.eu-west-2.amazonaws.com/model_weights/segmentation/0b5b3216d60a2d32fc086b47ea8c67589aaeb26b7e07fcbe620d6d0b83e209ea/pytorch_model.bin|https://your.url/pytorch_model.bin|' \
    /opt/conda/envs/py_3.10/lib/python3.10/site-packages/whisperx/vad.py

# Patch the checksum validation in vad.py
RUN sed -i '/if hashlib.sha256(model_bytes).hexdigest() != VAD_SEGMENTATION_URL.split/,+3d' \
    /opt/conda/envs/py_3.10/lib/python3.10/site-packages/whisperx/vad.py

ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/conda/envs/py_3.9/lib/

# Create symlink for libiomp5
RUN ln -s /opt/rocm-6.3.1/lib/llvm/lib/libiomp5.so /usr/lib/libiomp5.so && \
    ldconfig

# Create an entry script
RUN echo '#!/bin/bash\nsource ~/.bashrc\nwhile true; do sleep 86400; done' > /entrypoint.sh && \
    chmod +x /entrypoint.sh

ENTRYPOINT ["/entrypoint.sh"]

docker-compose.yml

version: '3.8'
services:
  whisperx:
    image: whisperx-rocm
    tty: true
    stdin_open: true
    devices:
      - /dev/kfd
      - /dev/dri
    group_add:
      - video
    volumes:
      - ./audio:/audio
    entrypoint: ["/bin/bash", "-c"]
    command: ["source ~/.bashrc && while true; do sleep 1; done"]

After building the image and creating a container with compose, enter open a shell inside the container, I did it with lazydocker but use whatever you prefer, and execute whisperx /audio/yourfile.wav.

Here's the last error I got that I was unable the resolve:

  warnings.warn(
/opt/conda/envs/py_3.10/lib/python3.10/site-packages/asteroid_filterbanks/param_sinc_fb.py:94: UserWarning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (Triggered internally at ../aten/src/ATen/Context.cpp:296.)
  ft_low = torch.matmul(low, self.n_)
Traceback (most recent call last):
  File "/opt/conda/envs/py_3.10/bin/whisperx", line 8, in <module>
    sys.exit(cli())
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/whisperx/transcribe.py", line 176, in cli
    result = model.transcribe(audio, batch_size=batch_size, chunk_size=chunk_size, print_progress=print_progress)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/whisperx/asr.py", line 194, in transcribe
    language = language or self.detect_language(audio)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/whisperx/asr.py", line 252, in detect_language
    encoder_output = self.model.encode(segment)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/whisperx/asr.py", line 86, in encode
    return self.model.encode(features, to_cpu=to_cpu)
RuntimeError: parallel_for failed: hipErrorInvalidDeviceFunction: invalid device function
(py_3.9) root@2bd0988e5bb6:/opt/whisperx# 

I tried many things, but due to a myriad of factors, I wasn't able to use a newer version of whisperx or faster-whisper. Still, I hope this helps someone and maybe work can continue from this base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants