2.9 KiB
2.9 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project
cohere-transcribe — live speech-to-text using the Cohere ASR model (CohereLabs/cohere-transcribe-03-2026) via HuggingFace Transformers. Captures microphone audio, runs voice activity detection (VAD) to segment speech, transcribes each segment, and either prints text or injects it into the focused window via wtype (Wayland).
Development Environment
Nix flake provides the dev shell (Python 3.14, portaudio, CUDA toolkit, wtype, uv). Direnv activates it automatically. Python deps managed by uv.
# Install/sync Python deps
uv sync
# Run the CLI (installed as entry point)
uv run cohere on # start daemon (background, types into focused window)
uv run cohere off # stop daemon
uv run cohere status
uv run cohere transcribe --stream # live transcribe to terminal
uv run cohere transcribe --mic 5 # record 5s then transcribe
uv run cohere transcribe file.wav # transcribe file
# Run mic tests
uv run python tests/test_mic.py
Architecture
Two modes share the same model/VAD pipeline but differ in output:
- Daemon mode (
cohere on): runs as a background process, transcribes speech segments and injects text into the focused window viawtype. State tracked in~/.local/state/cohere/state.json. The daemon is spawned by the CLI (cli.py) which launchesdaemon_main.pyas a detached subprocess. - Stream/one-shot mode (
cohere transcribe): runs in foreground, prints transcriptions to stdout with timestamps.
Key modules
model.py— model loading (load_model) and transcription (transcribe_audio). Single source of truth forMODEL_IDandSAMPLE_RATE(16kHz).vad.py— RMS-based voice activity detection withVADStateMachine. Calibrates ambient noise threshold at startup. Configurable silence duration triggers segment boundaries.stream.py— streaming transcription loop: audio callback feeds VAD, completed segments go to a transcription worker thread via queue.daemon.py— same streaming pattern asstream.pybut outputs viawtypeinstead of print. Also contains daemon lifecycle management (state file, PID tracking, start/stop).cli/cli.py— Typer CLI withon/off/status/transcribecommands.transcribe.py— original standalone script (not part of the package).
Data flow
Microphone → sounddevice.InputStream (50ms frames)
→ VADStateMachine.process_frame()
→ speech segment detected → Queue
→ transcription_worker thread → transcribe_audio()
→ output (wtype or print)
Conventions
- Package uses src layout (
src/cohere_transcribe/), built with hatchling. - Entry points:
cohereandcohere-transcribeboth map tocohere_transcribe.cli:main. - VAD constants are in
vad.py(frame size, pre-roll, silence limits, max segment length). - Daemon state lives at
~/.local/state/cohere/(state.json, daemon.log).