Adds a feedforward dynamic range compressor with a brick-wall limiter
applied in the audio callback. Quiet speech gets +12 dB makeup gain,
loud bursts are attenuated 4:1 above -20 dBFS, and the output is
hard-limited at -1 dBFS so nothing clips. Enabled via --normalize/-n
on `cohere on` and `cohere transcribe`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Lets the user pick an input device by index or name substring. Adds
`cohere devices` for listing. For devices that don't support 16kHz
natively (e.g. Sipeed MicArray hw at 48kHz), captures at the device's
native rate and resamples to 16kHz via scipy.signal.resample_poly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mic bumps produce transient spikes that pass VAD onset detection but
contain no real speech — the model hallucinates "thank you" from them.
Added MIN_SPEECH_SECONDS (0.3s) filter to discard segments where the
actual speech portion is too short.
Added a Jupyter notebook (notebooks/audio_debug.ipynb) for real-time
audio visualization: streams RMS + peak amplitude into a live Plotly
FigureWidget, then provides post-hoc waveform inspection, segment
playback, and side-by-side segment comparison.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce InputBackend protocol with WtypeBackend and PrintBackend,
and a command processor that translates spoken commands (enter, new line,
question mark, comma, etc.) into key presses and punctuation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Default is 0.3s for responsive typing. Configurable on both
`cohere on --pause` and `cohere transcribe --stream --pause`.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace argparse CLI with Typer-based CLI supporting `cohere on/off/status`
commands. The daemon runs transcription in the background and types into the
focused Wayland window via wtype. Adds wtype to flake.nix and fixes the
hatchling build backend.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split monolithic transcribe.py into focused modules under
src/cohere_transcribe/ (model, vad, stream, cli), move tests into
tests/, add hatchling build system and CLI entry point, remove
unused shell.nix and main.py.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move LD_LIBRARY_PATH out of env block and include portaudio so
audio devices are discoverable at runtime. Add flake.lock and
a quick microphone test script.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove frame_buf accumulation: blocksize=FRAME_SIZE guarantees indata is
exactly FRAME_SIZE samples, so buffering was unnecessary. Use indata[:, 0].copy()
to avoid stale references from sounddevice's buffer reuse.
- Replace pre_roll list with collections.deque(maxlen=PRE_ROLL_FRAMES) to
eliminate manual bounds-checking (pop(0)) on every frame.
- Warn to stderr if the transcription worker thread outlives its 30s join timeout.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>