Commit Graph

17 Commits

Author SHA1 Message Date
tomatocream 50f8d158c4 feat: add voice command processing and input backend interface
Introduce InputBackend protocol with WtypeBackend and PrintBackend,
and a command processor that translates spoken commands (enter, new line,
question mark, comma, etc.) into key presses and punctuation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 21:37:20 +08:00
tomatocream f083e424c9 feat: make silence pause duration configurable via --pause flag
Default is 0.3s for responsive typing. Configurable on both
`cohere on --pause` and `cohere transcribe --stream --pause`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 21:12:26 +08:00
tomatocream 92d8ba28d0 feat: add Typer CLI with daemon mode and wtype keyboard injection
Replace argparse CLI with Typer-based CLI supporting `cohere on/off/status`
commands. The daemon runs transcription in the background and types into the
focused Wayland window via wtype. Adds wtype to flake.nix and fixes the
hatchling build backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 21:09:32 +08:00
tomatocream 8d517b3ea8 refactor: restructure project into src layout with proper packaging
Split monolithic transcribe.py into focused modules under
src/cohere_transcribe/ (model, vad, stream, cli), move tests into
tests/, add hatchling build system and CLI entry point, remove
unused shell.nix and main.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 00:45:56 +08:00
tomatocream cbea62b2a9 fix: add portaudio to LD_LIBRARY_PATH and add flake lockfile
Move LD_LIBRARY_PATH out of env block and include portaudio so
audio devices are discoverable at runtime. Add flake.lock and
a quick microphone test script.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-30 00:42:36 +08:00
tomatocream 843ec534d1 fix: handle processor.decode returning a list of strings
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-29 03:00:09 +08:00
tomatocream cf18335235 fix: simplify audio callback, use deque for pre-roll, add worker timeout warning
- Remove frame_buf accumulation: blocksize=FRAME_SIZE guarantees indata is
  exactly FRAME_SIZE samples, so buffering was unnecessary. Use indata[:, 0].copy()
  to avoid stale references from sounddevice's buffer reuse.
- Replace pre_roll list with collections.deque(maxlen=PRE_ROLL_FRAMES) to
  eliminate manual bounds-checking (pop(0)) on every frame.
- Warn to stderr if the transcription worker thread outlives its 30s join timeout.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 02:48:51 +08:00
tomatocream 747a4772b6 feat: implement live streaming transcription with VAD
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 02:46:13 +08:00
tomatocream d62fcdd1cd feat: add silence calibration and VAD state machine
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 02:45:09 +08:00
tomatocream 4605be5bc9 refactor: switch to argparse, add --stream and --lang flags
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 02:43:47 +08:00
tomatocream 6bff2875c5 Add implementation plan for live streaming transcription
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-29 02:42:00 +08:00
tomatocream e0911653fe Add design spec for live streaming microphone transcription
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-29 02:38:05 +08:00
tomatocream c055a8ffb9 Replace flake.nix with shell.nix for simpler NixOS dev environment 2026-05-26 01:59:15 +08:00
tomatocream 55a51a7668 Add flake.nix with portaudio + CUDA, microphone support in transcribe.py 2026-05-26 01:55:54 +08:00
tomatocream 8b88489a53 Simplify to audio file input (mic requires PortAudio on NixOS) 2026-05-26 01:49:52 +08:00
tomatocream 14abcb89f2 Add accelerate dependency 2026-05-26 01:38:10 +08:00
tomatocream 82fe21fe41 Add Cohere Transcribe demo with uv + Python 3.14 2026-05-26 01:35:10 +08:00