-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
At high transcoding speed, high input-fidelity statements are not transcoded #2696
Comments
Anybody agrees with this below? . Action 1: use Go package sample Go main function: Go bindings provides access to method:
Go allows to use float PCM directly:
. Second action is to move to turbo model of October 1 2024
|
just noticed there is an option implemented for whisper.cpp/build/bin/main: — if audio quality is better or the model better, the model decides a segment is not speech with a higher probability the option was added on 241223 hash 153757a will be trying this first against my audio… |
--no-speech-thold Doesn’t work IT’S GENERALLY BROKEN: . as of whisper.cpp 250104 hash ece3ff8:
. Whisper upstream:
. chunk-length tampering is discouraged: . No serious solution:
. identifying noise as non-speech is difficult . sample complaint: |
I tested different implementations of large-v3-turbo on Apple: Mac mini (M1, Late 2020) 8GB 256GB GbEthernet macOS 15.2
noise handling is no better transcoding where working is the bad result is obtained much faster: 4 h is transcribed in 54 seconds at 263× |
How to get to Core ML large-v3-turbo on macOS 15.2 250104 hash ece3ff8 — as brew — you’re in Core ML! The land of 263× whisper_init_state: loading Core ML model from '/opt/oth/whisper.cpp/models/ggml-large-v3-turbo-encoder.mlmodelc' |
Q1. When whisper.cpp transcodes a 5-hour audio file finding mostly noise, it enters a fast mode transcoding at 90×.
BUG: In this mode 1–3-minute conversations are missed, and leave no trace in the output
>>> How can I avoid whisper missing short conversations?
For example, should the context be periodically destroyed
Q2. When audio is 30h+ the file size exceeds 4 GiB which 32-bit .wav cannot handle producing empty files.
>>> How can I transcode large audio files or infinite streams?
ffmpeg has -rf64 option for RF64 format https://en.wikipedia.org/wiki/RF64
is there better input format than wav?
I would prefer feeding RAW samples, float or specific PCM
Q3. >>> Is there some other way of improving transcoding word-yield considering the below commands?
hardware is 8 GiB RAM 2021 Apple M processors macOS and 2 parallel whisper instances
A custom Go binding is being considered
It is batch execution, so slow transcode is not a problem
Creating the audio stream:
whisper command:
The text was updated successfully, but these errors were encountered: