cohere-transcribe

Author	SHA1	Message	Date
tomatocream	58fa7526fb	feat: add --normalize compressor + limiter for input audio Adds a feedforward dynamic range compressor with a brick-wall limiter applied in the audio callback. Quiet speech gets +12 dB makeup gain, loud bursts are attenuated 4:1 above -20 dBFS, and the output is hard-limited at -1 dBFS so nothing clips. Enabled via --normalize/-n on `cohere on` and `cohere transcribe`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-06 22:56:21 +08:00
tomatocream	853b5523e5	feat: add --device flag and devices command for mic selection Lets the user pick an input device by index or name substring. Adds `cohere devices` for listing. For devices that don't support 16kHz natively (e.g. Sipeed MicArray hw at 48kHz), captures at the device's native rate and resamples to 16kHz via scipy.signal.resample_poly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-06 22:51:06 +08:00
tomatocream	c487ba8c08	feat: filter short audio segments (mic bumps) and add debug notebook Mic bumps produce transient spikes that pass VAD onset detection but contain no real speech — the model hallucinates "thank you" from them. Added MIN_SPEECH_SECONDS (0.3s) filter to discard segments where the actual speech portion is too short. Added a Jupyter notebook (notebooks/audio_debug.ipynb) for real-time audio visualization: streams RMS + peak amplitude into a live Plotly FigureWidget, then provides post-hoc waveform inspection, segment playback, and side-by-side segment comparison. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-01 16:16:31 +08:00
tomatocream	50f8d158c4	feat: add voice command processing and input backend interface Introduce InputBackend protocol with WtypeBackend and PrintBackend, and a command processor that translates spoken commands (enter, new line, question mark, comma, etc.) into key presses and punctuation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-30 21:37:20 +08:00
tomatocream	f083e424c9	feat: make silence pause duration configurable via --pause flag Default is 0.3s for responsive typing. Configurable on both `cohere on --pause` and `cohere transcribe --stream --pause`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-30 21:12:26 +08:00
tomatocream	92d8ba28d0	feat: add Typer CLI with daemon mode and wtype keyboard injection Replace argparse CLI with Typer-based CLI supporting `cohere on/off/status` commands. The daemon runs transcription in the background and types into the focused Wayland window via wtype. Adds wtype to flake.nix and fixes the hatchling build backend. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-30 21:09:32 +08:00
tomatocream	8d517b3ea8	refactor: restructure project into src layout with proper packaging Split monolithic transcribe.py into focused modules under src/cohere_transcribe/ (model, vad, stream, cli), move tests into tests/, add hatchling build system and CLI entry point, remove unused shell.nix and main.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-30 00:45:56 +08:00

7 Commits