-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement vggish #1061
base: master
Are you sure you want to change the base?
Implement vggish #1061
Conversation
audio_processing/vggish/vggish.py
Outdated
else: | ||
wav_data = librosa.load(input_path, sr=SAMPLE_RATE)[0] | ||
|
||
samples = wav_data / 32768.0 # Convert to [-1.0, +1.0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
librosaの段階で-1.0 - +1.0に正規化されているのを、さらに/32768.0で小さくしているように見えます。
savepath = get_savepath(args.savepath, input_path) | ||
logger.info(f'saved at : {savepath}') | ||
|
||
np.save(savepath, result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torchで計算したfeatureをnumpyで保存しておいて、誤差をprintしていただければと思います。
@yuki399 こちらも対応検討、お願いします。 |
指摘に関して修正しました |
# Conflicts: # README.md # scripts/download_all_models.sh
@yuki399 入力のバッチサイズをDynamicShapeでエクスポートして、音声の長さが異なる場合も処理可能にできますでしょうか? |
また、--inputに複数のファイルを与えた場合に、それぞれのファイルでEmbeddingを計算し、ファイル同士の距離をprintできればと思います。 |
#1057