- A sentence for text to speech
- A audio and transcript for voice cloning
The Voice file is output as .wav which path is defined as SAVE_WAV_PATH
in vall-e-x.py
.
This model requires pyopenjtalk for g2p.
pip3 install -r requirements.txt
Automatically downloads the onnx and prototxt files on the first run. It is necessary to be connected to the Internet while downloading.
For the sample sentence,
python3 vall-e-x.py
If you want to specify the input sentence, put the wav path after the --input option. You can use --savepath option to change the name of the output file to save.
python3 vall-e-x.py --input "Hello world." --savepath SAVE_WAV_PATH
Run with audio prompt.
python3 vall-e-x.py -i "音声合成のテストを行なっています。" --audio BASIC5000_0001.wav --transcript "水をマレーシアから買わなくてはならないのです" -e 1
PyTorch 2.2.0.dev20230910
ONNX opset = 15
- nar_decoder.onnx.prototxt
- nar_predict_layers.onnx.prototxt
- ar_audio_embedding.onnx.prototxt
- ar_language_embedding.onnx.prototxt
- ar_text_embedding.onnx.prototxt
- nar_audio_embedding.onnx.prototxt
- nar_audio_embedding_layers.onnx.prototxt
- nar_language_embedding.onnx.prototxt
- nar_text_embedding.onnx.prototxt
- ar_decoder.onnx.prototxt
- ar_decoder.opt.onnx.prototxt
- encodec.onnx.prototxt
- vocos.onnx.prototxt