Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CANN Backend support #1606

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

CANN Backend support #1606

wants to merge 1 commit into from

Conversation

3manifold
Copy link

@3manifold 3manifold commented Jan 26, 2024

CANN Backend support

Introduction

CANN (Compute Architecture of Neural Networks), developed by Huawei, is a heterogeneous computing architecture for AI scenarios.
It provides multi-layer programming interfaces to help users quickly build AI applications and services based on the Ascend platform.

CANN backend in CTranslate2, enables running AI models on the Ascend NPU extending the existing CPU & CUDA workflows. One can find more on Ascend NPU and CANN library here.

Examples of projects that support CANN include ONNX Runtime & OpenCV.

resolves #1609

Notes

Implementation

CANN backend support implementation introduces Device::CANN similarly to CPU & CUDA.
CANN workflow can be enabled using -DWITH_CANN=ON in cmake configuration (see examples/cann). As to CUDA, CANN can coexist alongside CPU workflow.

CANN workflow is accessible through examples (examples/cann/main.cc), cli or Python module.
Operators & primitives were implemented for CANN in order for the end-to-end example in ctranslate2 documentation to run successfully.

Tests

Tests were extended for Device::CANN & respective DataType. Additional tests were also implemented involving extra/edge cases. Gtest output: gtest_cann.log

Environment Setup

  • Download CANN drivers by selecting AArch64.run category (current implementation involved CANN 7.0.RC1.alpha001).
  • Build image & run container as in docker/cann.

For details about how to set up the development environment and operating environment, see Development and Operating Environment Setup
and CANN Software Installation Guide.

Build CANN Python module

CANN Python module is expected to be built using the respective Docker files. Nevertheless, here we provide a quick way for building, ideal for testing and benchmarking.

#!/bin/bash

# execute from project root 
rm -rf build-release/
mkdir build-release && cd build-release || exit
 
cmake -DWITH_CANN=ON -DCMAKE_BUILD_TYPE=Release -DBUILD_CLI=OFF -DWITH_MKL=OFF -DOPENMP_RUNTIME=COMP -DCMAKE_PREFIX_PATH="/opt/OpenBLAS" -DWITH_OPENBLAS=ON -DWITH_RUY=ON ..

VERBOSE=1 make -j"$(nproc)" install && cd ..  

export CIBW_ARCHS=aarch64  
pip3 uninstall --yes ctranslate2

pip install -r python/install_requirements.txt

cd python && python3 setup.py bdist_wheel && cd ..

python3 -m pip install python/dist/ctranslate2*.whl

export LD_LIBRARY_PATH=/usr/local/lib:${LD_LIBRARY_PATH}

Build CANN C++ example

#!/bin/bash

# execute from project root

# first build ct2lib
rm -rf build-release/
mkdir build-release && cd build-release || exit

cmake -DWITH_CANN=ON -DCMAKE_BUILD_TYPE=Release -DBUILD_CLI=OFF -DWITH_MKL=OFF -DOPENMP_RUNTIME=COMP -DCMAKE_PREFIX_PATH="/opt/OpenBLAS" -DWITH_OPENBLAS=ON -DWITH_RUY=ON ..

make -j"$(nproc)"

rm CMakeCache.txt

# then build cann_run
cmake -DCMAKE_BUILD_TYPE=Release ../examples/cann/

make -j"$(nproc)"
# ./cann_run <ende_ctranslate2_path>

Samples

Python

import ctranslate2 

print("get_supported_compute_types for cann: ", ctranslate2.get_supported_compute_types("cann")) 
print("get_cann_device_count: ", ctranslate2.get_cann_device_count())
 
translator = ctranslate2.Translator("/ctranslate2_docs/ende_ctranslate2/", device="auto") 
    
results = translator.translate_batch([["▁H", "ello", "▁world", "!"]])  
output_tokens = results[0].hypotheses[0]
print(output_tokens)
> python3 ct2python_example.py
get_supported_compute_types for cann:  {'int8_float16', 'int8_float32', 'int8', 'float32', 'bfloat16', 'int8_bfloat16', 'float16'}
get_cann_device_count:  8 
['▁Hallo', '▁Welt', '!'] 

C++

Execution example in C++ can be found in examples/cann.

CLI

echo "▁H ello ▁world !" | ./ct2-translator --model "./ende_ctranslate2/"

root@90b230f7e68f /t/t/c/cli# echo  "▁H ello ▁world !" | ./ct2-translator --model "./ende_ctranslate2/"
▁Hallo ▁Welt !

Benchmark

We conducted several runs measuring translation latency using all 192 CPU cores and 1 NPU device for a single batch.
In specific, experiments demonstrate results for 4 consecutive runs involving 4 and 306 tokens respectively. NPU proved
faster in all cases.

Input tokens

4 tokens
{{"▁H", "ello", "▁world", "!"}}
306 tokens
{{"▁In", "▁this", "▁paper", ",", "▁we", "▁speed", "▁up", "▁the", "▁context", "▁extension", "▁of", "▁L", "LM", "s", ",", "▁in", "▁two", "▁aspects", ".", "▁Particularly", ",", "▁it", "▁can", "▁be", "▁implemented", "▁with", "▁only", "▁two", "▁lines", "▁of", "▁code", "▁in", "▁training", ",", "▁while", "▁being", "▁optional", "▁in", "▁in", "fer", "ence", ".", "▁Typical", "ly" , "▁training", "▁L", "LM", "s", "▁with", "▁long", "▁context", "▁sizes", "▁is" ,"▁comp", "ut", "ation", "ally", "▁expensive", "▁requiring", "▁extensive", "▁training", "▁hours", "▁and", "▁G", "PU", "▁resources", ".", "▁On", "▁the", "▁one", "▁hand", ",", "▁although", "▁den", "se", "▁global", "▁attention", "▁is", "▁needed", "▁during", "▁in", "fer", "ence", ",", "▁fine", "-", "tun", "ing", "▁the", "▁model", "▁can" ,"▁be", "▁effectively", "▁and", "▁efficiently", "▁done", "▁by", "▁spar", "se", "▁local", "▁attention", ".", "▁In", "▁this", "▁paper", ",", "▁we", "▁speed", "▁up", "▁the", "▁context", "▁extension", "▁of", "▁L", "LM", "s", ",", "▁in", "▁two", "▁aspects", ".", "▁Particularly", ",", "▁it", "▁can", "▁be", "▁implemented", "▁with", "▁only", "▁two", "▁lines", "▁of", "▁code", "▁in", "▁training", ",", "▁while", "▁being", "▁optional", "▁in", "▁in", "fer", "ence", ".", "▁Typical", "ly" , "▁training", "▁L", "LM", "s", "▁with", "▁long", "▁context", "▁sizes", "▁is" ,"▁comp", "ut", "ation", "ally", "▁expensive", "▁requiring", "▁extensive", "▁training", "▁hours", "▁and", "▁G", "PU", "▁resources", ".", "▁On", "▁the", "▁one", "▁hand", ",", "▁although", "▁den", "se", "▁global", "▁attention", "▁is", "▁needed", "▁during", "▁in", "fer", "ence", ",", "▁fine", "-", "tun", "ing", "▁the", "▁model", "▁can" ,"▁be", "▁effectively", "▁and", "▁efficiently", "▁done", "▁by", "▁spar", "se", "▁local", "▁attention", ".", "▁In", "▁this", "▁paper", ",", "▁we", "▁speed", "▁up", "▁the", "▁context", "▁extension", "▁of", "▁L", "LM", "s", ",", "▁in", "▁two", "▁aspects", ".", "▁Particularly", ",", "▁it", "▁can", "▁be", "▁implemented", "▁with", "▁only", "▁two", "▁lines", "▁of", "▁code", "▁in", "▁training", ",", "▁while", "▁being", "▁optional", "▁in", "▁in", "fer", "ence", ".", "▁Typical", "ly" , "▁training", "▁L", "LM", "s", "▁with", "▁long", "▁context", "▁sizes", "▁is" ,"▁comp", "ut", "ation", "ally", "▁expensive", "▁requiring", "▁extensive", "▁training", "▁hours", "▁and", "▁G", "PU", "▁resources", ".", "▁On", "▁the", "▁one", "▁hand", ",", "▁although", "▁den", "se", "▁global", "▁attention", "▁is", "▁needed", "▁during", "▁in", "fer", "ence", ",", "▁fine", "-", "tun", "ing", "▁the", "▁model", "▁can" ,"▁be", "▁effectively", "▁and", "▁efficiently", "▁done", "▁by", "▁spar", "se", "▁local", "▁attention", "."}}

Hardware

CPU: arm64 Kunpeng 920 Series @2.6GHz (192 cores - utilized all)
NPU: Ascend 910A AI Processor (8 devices - utilized 1)

Experiments

image
image

4 tokens cpu cann
1 0:00:00.098600 0:00:00.093737
2 0:00:00.098584 0:00:00.092929
3 0:00:00.131760 0:00:00.093115
4 0:00:00.109684 0:00:00.093026
306 tokens cpu cann
1 0:00:02.437300 0:00:02.283184
2 0:00:02.468804 0:00:02.018239
3 0:00:02.469789 0:00:01.877654
4 0:00:02.744319 0:00:02.080763

@3manifold 3manifold marked this pull request as draft January 26, 2024 14:23
@3manifold 3manifold force-pushed the ct2-cann branch 2 times, most recently from e7c01a1 to 8ce20f6 Compare January 29, 2024 09:46
@3manifold 3manifold marked this pull request as ready for review January 29, 2024 11:12
Co-authored-by: kandrio <[email protected]>
@fallbernana123456
Copy link

fallbernana123456 commented Jun 4, 2024

cd python && python3 setup.py bdist_wheel && cd ..
    for segment in segments:
  File "/root/exit/envs/python39/lib/python3.9/site-packages/faster_whisper/transcribe.py", line 884, in restore_speech_timestamps
    for segment in segments:
  File "/root/exit/envs/python39/lib/python3.9/site-packages/faster_whisper/transcribe.py", line 396, in generate_segments
    encoder_output = self.encode(segment)
  File "/root/exit/envs/python39/lib/python3.9/site-packages/faster_whisper/transcribe.py", line 574, in encode
    return self.model.encode(features, to_cpu=True)
RuntimeError: not implemented in CANN

我已经编译完成,但是在使用过程中报这个错误。请问有好的解决思路吗?

@3manifold
Copy link
Author

3manifold commented Jun 13, 2024

cd python && python3 setup.py bdist_wheel && cd ..
    for segment in segments:
  File "/root/exit/envs/python39/lib/python3.9/site-packages/faster_whisper/transcribe.py", line 884, in restore_speech_timestamps
    for segment in segments:
  File "/root/exit/envs/python39/lib/python3.9/site-packages/faster_whisper/transcribe.py", line 396, in generate_segments
    encoder_output = self.encode(segment)
  File "/root/exit/envs/python39/lib/python3.9/site-packages/faster_whisper/transcribe.py", line 574, in encode
    return self.model.encode(features, to_cpu=True)
RuntimeError: not implemented in CANN

我已经编译完成,但是在使用过程中报这个错误。请问有好的解决思路吗?

In order for faster-whisper to work, additional tensor operators have to be implemented in CANN. This is a task that's already completed from our side. Nevertheless, we didn't push it to GitHub yet due to change in priorities.

@LIBIN-K
Copy link

LIBIN-K commented Jul 1, 2024

我已经编译完成,CANN 7.0.0.beta1,使用文档示例,过程中遇到此问题:

>>> import ctranslate2
>>> translator=ctranslate2.Translatro("ende_ctranslate2/", device="auto")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'ctranslate2' has no attribute 'Translatro'
>>> translator=ctranslate2.Translator("ende_ctranslate2/", device="auto")
>>> results = translator.translate_batch([["H@@", "ello", "world@@", "!"]])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: CANN failed with error 100024

请问cann仅支持CANN 7.0.RC1.alpha001吗
另外该问题是否有好的思路可以解决

@NickPan7779
Copy link

NickPan7779 commented Dec 30, 2024

In order for faster-whisper to work, additional tensor operators have to be implemented in CANN. This is a task that's already completed from our side. Nevertheless, we didn't push it to GitHub yet due to change in priorities.

@3manifold Could you upload this part of the code (related to Whisper) that is not implemented on CANN to your ct2-cann branch?It would be a great help.Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] CANN Backend support
4 participants