CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project

cohere-transcribe — live speech-to-text using the Cohere ASR model (CohereLabs/cohere-transcribe-03-2026) via HuggingFace Transformers. Captures microphone audio, runs voice activity detection (VAD) to segment speech, transcribes each segment, and either prints text or injects it into the focused window via wtype (Wayland).

Development Environment

Nix flake provides the dev shell (Python 3.14, portaudio, CUDA toolkit, wtype, uv). Direnv activates it automatically. Python deps managed by uv.

# Install/sync Python deps
uv sync

# Run the CLI (installed as entry point)
uv run cohere on          # start daemon (background, types into focused window)
uv run cohere off         # stop daemon
uv run cohere status
uv run cohere transcribe --stream   # live transcribe to terminal
uv run cohere transcribe --mic 5    # record 5s then transcribe
uv run cohere transcribe file.wav   # transcribe file

# Run mic tests
uv run python tests/test_mic.py

Architecture

Two modes share the same model/VAD pipeline but differ in output:

Daemon mode (cohere on): runs as a background process, transcribes speech segments and injects text into the focused window via wtype. State tracked in ~/.local/state/cohere/state.json. The daemon is spawned by the CLI (cli.py) which launches daemon_main.py as a detached subprocess.
Stream/one-shot mode (cohere transcribe): runs in foreground, prints transcriptions to stdout with timestamps.

Key modules

model.py — model loading (load_model) and transcription (transcribe_audio). Single source of truth for MODEL_ID and SAMPLE_RATE (16kHz).
vad.py — RMS-based voice activity detection with VADStateMachine. Calibrates ambient noise threshold at startup. Configurable silence duration triggers segment boundaries.
stream.py — streaming transcription loop: audio callback feeds VAD, completed segments go to a transcription worker thread via queue.
daemon.py — same streaming pattern as stream.py but outputs via wtype instead of print. Also contains daemon lifecycle management (state file, PID tracking, start/stop).
cli/cli.py — Typer CLI with on/off/status/transcribe commands.
transcribe.py — original standalone script (not part of the package).

Data flow

Microphone → sounddevice.InputStream (50ms frames)
  → VADStateMachine.process_frame()
  → speech segment detected → Queue
  → transcription_worker thread → transcribe_audio()
  → output (wtype or print)

Conventions

Package uses src layout (src/cohere_transcribe/), built with hatchling.
Entry points: cohere and cohere-transcribe both map to cohere_transcribe.cli:main.
VAD constants are in vad.py (frame size, pre-roll, silence limits, max segment length).
Daemon state lives at ~/.local/state/cohere/ (state.json, daemon.log).

2.9 KiB Raw Permalink Blame History