21 Commits

Author SHA1 Message Date
tomatocream 58fa7526fb feat: add --normalize compressor + limiter for input audio
Adds a feedforward dynamic range compressor with a brick-wall limiter
applied in the audio callback. Quiet speech gets +12 dB makeup gain,
loud bursts are attenuated 4:1 above -20 dBFS, and the output is
hard-limited at -1 dBFS so nothing clips. Enabled via --normalize/-n
on `cohere on` and `cohere transcribe`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-06 22:56:21 +08:00
tomatocream 853b5523e5 feat: add --device flag and devices command for mic selection
Lets the user pick an input device by index or name substring. Adds
`cohere devices` for listing. For devices that don't support 16kHz
natively (e.g. Sipeed MicArray hw at 48kHz), captures at the device's
native rate and resamples to 16kHz via scipy.signal.resample_poly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-06 22:51:06 +08:00
tomatocream c487ba8c08 feat: filter short audio segments (mic bumps) and add debug notebook
Mic bumps produce transient spikes that pass VAD onset detection but
contain no real speech — the model hallucinates "thank you" from them.
Added MIN_SPEECH_SECONDS (0.3s) filter to discard segments where the
actual speech portion is too short.

Added a Jupyter notebook (notebooks/audio_debug.ipynb) for real-time
audio visualization: streams RMS + peak amplitude into a live Plotly
FigureWidget, then provides post-hoc waveform inspection, segment
playback, and side-by-side segment comparison.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-01 16:16:31 +08:00
tomatocream a727899ee5 Initial commit: add CLAUDE.md and transcribe.py 2026-05-31 01:05:48 +08:00
tomatocream 50f8d158c4 feat: add voice command processing and input backend interface
Introduce InputBackend protocol with WtypeBackend and PrintBackend,
and a command processor that translates spoken commands (enter, new line,
question mark, comma, etc.) into key presses and punctuation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 21:37:20 +08:00
tomatocream f083e424c9 feat: make silence pause duration configurable via --pause flag
Default is 0.3s for responsive typing. Configurable on both
`cohere on --pause` and `cohere transcribe --stream --pause`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 21:12:26 +08:00
tomatocream 92d8ba28d0 feat: add Typer CLI with daemon mode and wtype keyboard injection
Replace argparse CLI with Typer-based CLI supporting `cohere on/off/status`
commands. The daemon runs transcription in the background and types into the
focused Wayland window via wtype. Adds wtype to flake.nix and fixes the
hatchling build backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 21:09:32 +08:00
tomatocream 8d517b3ea8 refactor: restructure project into src layout with proper packaging
Split monolithic transcribe.py into focused modules under
src/cohere_transcribe/ (model, vad, stream, cli), move tests into
tests/, add hatchling build system and CLI entry point, remove
unused shell.nix and main.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 00:45:56 +08:00
tomatocream cbea62b2a9 fix: add portaudio to LD_LIBRARY_PATH and add flake lockfile
Move LD_LIBRARY_PATH out of env block and include portaudio so
audio devices are discoverable at runtime. Add flake.lock and
a quick microphone test script.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 00:42:36 +08:00
tomatocream 843ec534d1 fix: handle processor.decode returning a list of strings
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-29 03:00:09 +08:00
tomatocream cf18335235 fix: simplify audio callback, use deque for pre-roll, add worker timeout warning
- Remove frame_buf accumulation: blocksize=FRAME_SIZE guarantees indata is
  exactly FRAME_SIZE samples, so buffering was unnecessary. Use indata[:, 0].copy()
  to avoid stale references from sounddevice's buffer reuse.
- Replace pre_roll list with collections.deque(maxlen=PRE_ROLL_FRAMES) to
  eliminate manual bounds-checking (pop(0)) on every frame.
- Warn to stderr if the transcription worker thread outlives its 30s join timeout.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 02:48:51 +08:00
tomatocream 747a4772b6 feat: implement live streaming transcription with VAD
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 02:46:13 +08:00
tomatocream d62fcdd1cd feat: add silence calibration and VAD state machine
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 02:45:09 +08:00
tomatocream 4605be5bc9 refactor: switch to argparse, add --stream and --lang flags
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 02:43:47 +08:00
tomatocream 6bff2875c5 Add implementation plan for live streaming transcription
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-29 02:42:00 +08:00
tomatocream e0911653fe Add design spec for live streaming microphone transcription
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-29 02:38:05 +08:00
tomatocream c055a8ffb9 Replace flake.nix with shell.nix for simpler NixOS dev environment 2026-05-26 01:59:15 +08:00
tomatocream 55a51a7668 Add flake.nix with portaudio + CUDA, microphone support in transcribe.py 2026-05-26 01:55:54 +08:00
tomatocream 8b88489a53 Simplify to audio file input (mic requires PortAudio on NixOS) 2026-05-26 01:49:52 +08:00
tomatocream 14abcb89f2 Add accelerate dependency 2026-05-26 01:38:10 +08:00
tomatocream 82fe21fe41 Add Cohere Transcribe demo with uv + Python 3.14 2026-05-26 01:35:10 +08:00