Back to tools

Whisper.cpp (PCM Fork)

Available

Whisper.cpp with added stdin/PCM streaming support

open-source C/C++
View on GitHub

Overview

A focused fork of whisper.cpp that adds whisper-stream-pcm: a stdin and pipe-friendly streaming binary for raw PCM audio. It is useful when audio is already coming from another process, service, named pipe, or agent runtime and you do not want SDL microphone capture in the loop.

Key additions

  • stdin and pipe input - read from stdin by default, or from a named pipe/file with --input.
  • Raw PCM formats - accepts little-endian s16 or f32 PCM.
  • No SDL dependency - designed for process pipelines instead of microphone device capture.
  • Optional VAD segmentation - use VAD mode for speech bursts, or fixed-step windows for continuous streams.

Input contract

Normalize audio before it reaches the binary: mono, 16 kHz, raw PCM. The tool does not decode compressed audio, parse WAV headers, resample, or mix channels. ffmpeg works well as the normalization step when the source is a file, stream, or device.

Build target

git clone --branch stream-pcm https://github.com/rmorse/whisper.cpp
cd whisper.cpp
cmake -B build
cmake --build build --config Release

Usage

Stream raw PCM (16 kHz, mono) into the tool (non-VAD):

./build/bin/whisper-stream-pcm -m ./models/ggml-base.en.bin --format s16 --sample-rate 16000 --step 1000 --length 10000 --keep 500

Enable VAD-based segmentation (optional, recommended for speech bursts):

./build/bin/whisper-stream-pcm -m ./models/ggml-base.en.bin --format s16 --sample-rate 16000 --vad --vad-probe-ms 200 --vad-silence-ms 800 --vad-pre-roll-ms 300 --length 8000

You can also read from a named pipe (FIFO):

mkfifo /tmp/whisper.pcm
./build/bin/whisper-stream-pcm -m ./models/ggml-base.en.bin --input /tmp/whisper.pcm --format s16 --sample-rate 16000 --step 1000 --length 10000 --keep 500

Example of piping a WAV file using ffmpeg (optional, -re for realtime pacing):

ffmpeg -re -i samples/jfk.wav -f s16le -ac 1 -ar 16000 - | \
  ./build/bin/whisper-stream-pcm -m ./models/ggml-base.en.bin --format s16 --sample-rate 16000 --step 1000 --length 10000 --keep 500

Windows (PowerShell + cmd /c) pipe example:

cmd /c "ffmpeg -re -hide_banner -loglevel error -i samples\jfk.wav -f s16le -ac 1 -ar 16000 - | build-cpu\bin\Release\whisper-stream-pcm.exe -m models\ggml-base.en.bin --format s16 --sample-rate 16000 --step 1000 --length 10000 --keep 500"

Notes

  • Input must be raw PCM, mono, 16 kHz. The tool does not resample.
  • Supported formats: f32 or s16 (little-endian).
  • Use --input - (default) for stdin.
  • --step must be > 0 unless --vad is enabled.
  • For VAD, --vad-probe-ms should be at least 200 ms; very small probes can fail to trigger.

Building

whisper-stream-pcm does not depend on SDL and builds with the default examples:

cmake -B build
cmake --build build --config Release