Accept pipe to stream #2694

tamo · 2025-01-02T05:38:43Z

Now it is easy to test with raw PCM data.
Try cat pcmf32.raw | stream
(or pv -qL 64000 pcmf32.raw | stream in realtime)

Note: I haven't tested WIN32 ifdefs.

You can make such data by
ffmpeg -i jfk.wav -f f32le -acodec pcm_f32le jfk.raw
because wav header length (44) is a multiple of sizeof float (4)

I decided to ignore the data before [Start speaking]
because such premature data are not good
for remote-transcription systems like:

mic2pcm | ssh -C remote "stream | lines2googledocs"

or

mic2some | ssh -C remote "ffmpeg -loglevel fatal -i pipe:0 -tune zerolatency -af atempo=1.1 -f f32le -ar 16000 -acodec pcm_f32le pipe:1 | stream"

So if you want to do a strict test, run
stream --test-pipe --no-vt100 2>/dev/null < pcmf32.raw
to get nearly-reproducible results.
If you want to do a strict testing, use --no-timestamps as well.

cat jfk.raw | ./build/bin/stream -m models/ggml-large-v2.bin --step 2000 --test-pipe -no-vt100 2>/dev/null
( And so my fellow Americans...)
( And so my fellow Americans, ask...)
( And so my fellow Americans, ask not what your country will give you, but what your country will give you.)
[00:00:00.000 --> 00:00:30.000]   And so my fellow Americans, ask not what your country can do for you.

( Ask what you can do for your)
[00:00:02.360 --> 00:00:32.360]   Ask what you can do for your country.

VAD:

cat jfk.raw | ./build/bin/stream -m models/ggml-large-v2.bin --step -2000 --test-pipe -no-vt100 2>/dev/null

[00:00:00.000 --> 00:00:03.000]   And so, my fellow Americans.

[00:00:00.000 --> 00:00:07.920]   Ask not what your country can do for you, ask what you can do for your country.

Without it, `stream --save-audio` produces somehow choppy wav: `stream` calculates t_diff in milliseconds and combine audio pieces which are about step_ms long. WHISPER_SAMPLE_RATE / 1000 == only 16 but surprisingly human ears seem to be able to hear the gap as a noise.

Use one deque instead of two vectors (old and new). Old and new are length variables now. Basically: Get `step - new` samples every time. Then substitute `new = (around) step;` The new audio data is simply appended to the deque. (Limit the deque size to 30 seconds.) Pass `old + new` samples to whisper inference. If the data has been consumed, let `old = 0; new = 0;` If some of the data should be kept for the next iter, `old = keep;` If you want to get only N samples next time, `new = step - N;` In VAD mode: `stream --interim --step -3000` will Get 3000ms of audio. Run `vad_simple(step_ms)`. If nothing is detected, get 100ms more audio and retry. If nothing is detected and 3000ms has been passed, go into the interim mode, where `n_segments - 1` segments will be confirmed. (`old -= confirmed_t1`) If `n_segments == 1`, only show the first half of the result. Misc: Increase the default `max_tokens` because 32 is too small for 10 seconds. (Some Japanese speech was garbled.) Write wav as soon as the data is available. `no_timestamps` is the default even for VAD because it is more useful to show to the hard-of-hearing

Now it is easy to test with raw PCM data. Try `cat pcmf32.raw | stream` (or `pv -qL 64000 pcmf32.raw | stream` in realtime) Note: I haven't tested WIN32 ifdefs. You can make such data by `ffmpeg -i jfk.wav -f f32le -acodec pcm_f32le jfk.raw` because wav header length (44) is a multiple of `sizeof float` (4) I decided to ignore the data before `[Start speaking]` because such premature data are not good for remote-transcription systems like: ``` mic2pcm | ssh -C remote "stream | lines2googledocs" ``` or ``` mic2some | ssh -C remote "ffmpeg -loglevel fatal -i pipe:0 -tune zerolatency -af atempo=1.1 -f f32le -ar 16000 -acodec pcm_f32le pipe:1 | stream" ``` So if you want to do a strict test, remove the "ignore" part. Otherwise quite a number of bytes will be ignored.

windows.h defines min unless NOMINMAX is defined

Run `stream --test-pipe --no-vt100 2>/dev/null < pcmf32.raw` to get nearly-reproducible results. If you want to do a strict testing, use `--no-timestamps` as well. ``` cat jfk.raw | ./build/bin/stream -m models/ggml-large-v2.bin --step 2000 --test-pipe -no-vt100 2>/dev/null ( And so my fellow Americans...) ( And so my fellow Americans, ask...) ( And so my fellow Americans, ask not what your country will give you, but what your country will give you.) [00:00:00.000 --> 00:00:30.000] And so my fellow Americans, ask not what your country can do for you. ( Ask what you can do for your) [00:00:02.360 --> 00:00:32.360] Ask what you can do for your country. ``` VAD: ``` cat jfk.raw | ./build/bin/stream -m models/ggml-large-v2.bin --step -2000 --test-pipe -no-vt100 2>/dev/null [00:00:00.000 --> 00:00:03.000] And so, my fellow Americans. [00:00:00.000 --> 00:00:07.920] Ask not what your country can do for you, ask what you can do for your country. ```

tamo added 10 commits January 2, 2025 00:18

Add headers for gcc c++

b27fc1f

Fix armv7-linux build

75099f9

Remove unused n_new_line

03b25dd

Fix windows build (include fcntl.h)

61222da

Fix inconsistency of ifdef

17c7600

Fix windows build

425d3ad

windows.h defines min unless NOMINMAX is defined

tamo mentioned this pull request Jan 3, 2025

Simplify stream's pcmf32 handling #2693

Closed

Run vad_simple on entire pcmf32, not on the last step

f99263e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accept pipe to stream #2694

Accept pipe to stream #2694

tamo commented Jan 2, 2025 •

edited

Loading

Accept pipe to stream #2694

Are you sure you want to change the base?

Accept pipe to stream #2694

Conversation

tamo commented Jan 2, 2025 • edited Loading

tamo commented Jan 2, 2025 •

edited

Loading