mitmproxy's aioquic/mitmproxy-rs deps have no free-threaded wheels and source builds fail. Workload is I/O-bound so free-threading gave no benefit. Switch .python-version to 3.14, drop /tmp stub overrides, reinstall mitmproxy with real wheels (imports cleanly). Update README and design spec to record the decision. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
15 KiB
auto-reverse — Design
Date: 2026-05-31 Status: Approved (pending implementation plan)
Summary
auto-reverse is a conversational CLI that reverse-engineers a website's API by
combining an LLM-driven headed browser (so you can watch and take over) with an
embedded intercepting proxy that captures and documents real traffic in real
time. You state intent in plain language ("map the checkout flow"); Claude
pursues it through the browser; every real request is captured, deduplicated,
and turned into a growing OpenAPI spec + markdown as it happens; Claude reports
findings and you steer — back and forth — until the intent is covered.
It unifies two classic approaches into one tool:
- Approach 3 — an agent drives a real (headed) browser, so a React/SPA behaves normally and runtime API calls actually fire.
- Approach 5 — an in-process capture pipeline documents each new endpoint in real time as traffic flows.
The 3+5 split is preserved as an internal boundary: the Driver Agent only acts (browser) and queries/commands (flows, doc); capture + documentation run independently in the background and keep working even when a human drives.
Goals
- Conversational, intent-driven exploration (not a blind crawler, not fire-and-forget). "Sensible length" is Claude's judgment against the user's stated intent.
- Watchable: headed browser by default; human can grab control at any time (hybrid driving), with all traffic still captured.
- Bounded LLM cost: documentation cost scales with distinct endpoints, not request volume.
- Lossless capture: a raw archive retains everything even when the spec filters noise out.
Non-goals (v1)
- Robust automated login. Auth is a configurable, pluggable concern with a stubbed default (manual-pause strategy); deeper strategies come later.
- Defeating bot-detection / captcha / anti-automation.
- Documenting non-HTTP protocols (websockets/gRPC) — recorded to archive but not modeled in v1.
Approach decision
Chosen: single agent, with browser + recon + doc all exposed as tools
(approach "A" from brainstorming). One Claude tool-use loop is the brain;
capture and deterministic schema inference run in background threads; the LLM
enriches only new endpoint signatures. Rejected: two cooperating agents (more
cost/orchestration, harder to keep coherent in one chat) and fully-deterministic
docs (cheapest but mechanical descriptions — offered instead as a --no-llm-doc
flag).
Architecture
Single free-threaded process (Python 3.14, 3.14+freethreaded), four concurrent
roles sharing an in-memory flow store. Free-threading is what gives real
parallelism to the Python-side work (schema inference, doc generation, capture)
without GIL contention.
┌────────────────────────────────────────────────────────────────┐
│ auto-reverse (one process, free-threaded) │
│ │
│ [main thread] Chat REPL ◄──── you type intent / steer │
│ │ ▲ │
│ ▼ │ streamed replies + findings │
│ ┌─────────────────────────────┐ │
│ │ Driver Agent (Claude API, │ tools: │
│ │ tool-use loop) │ • browser.* (act) │
│ │ │ • flows.* (recon) │
│ └───┬─────────────┬───────────┘ • doc.* (document) │
│ │ browser.* │ flows.* / doc.* │
│ ▼ ▼ │
│ ┌─────────┐ ┌──────────────────────────────┐ │
│ │Playwright│ │ Flow Store (locked) │◄──┐ │
│ │ headed │ │ dedup by signature, samples │ │ push flows │
│ │ browser │ └───────────┬──────────────────┘ │ │
│ └────┬─────┘ │ new-signature events │ │
│ │ proxied ▼ │ │
│ │ ┌───────────────┐ │ │
│ │ │ Doc Worker │ genson schema│ │
│ │ │ [thread] │ + LLM enrich │ │
│ │ └───────┬────────┘ (new only) │ │
│ │ ▼ │ │
│ │ openapi.yaml + API.md │ │
│ ▼ │ │
│ ┌──────────────────────────────────────────┐ │ │
│ │ mitmproxy DumpMaster [thread, asyncio] │──────┘ │
│ │ + addon → raw archive (flows dump + HAR) │ │
│ └──────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
Thread roles
- Main thread — chat REPL + Driver Agent tool-use loop (synchronous, easy to reason about).
- mitmproxy thread — embedded
DumpMasteron its own asyncio loop; a capture addon pushes each flow into the Flow Store and streams raw flows to disk. - Doc Worker thread(s) — consume new-signature events, run deterministic schema inference, call the LLM only to enrich novel endpoints, write spec/markdown.
- Playwright's browser driver is a separate Node subprocess, so it sidesteps free-threaded C-extension concerns.
Components (modules under src/auto_reverse/)
cli.py— entrypoint, arg parsing, wires everything, starts threads.repl.py— chat loop, renders streamed agent output, handles steer/interrupt and the take-over keypress, dispatches/meta-commands locally.agent.py— Claude tool-use loop; owns the conversation.tools/browser.py,tools/flows.py,tools/doc.py— the three tool groups.browser.py— Playwright launch (headed, proxied), take-over/release.proxy.py— embedded mitmproxy master + capture addon.store.py— thread-safe Flow Store: signature dedup, sample retention, scope filtering.doc/schema.py— deterministic JSON Schema inference + merge.doc/openapi.py— incremental OpenAPI assembly.doc/markdown.py— human-readable API docs.doc/client.py— optional typed httpx client generation from the spec.config.py— config + the stubbed pluggable auth strategy.
Data flow
The intent → action → capture → doc cycle:
- You state intent in the REPL. It is added to the conversation; the Driver Agent takes the turn.
- Agent acts via
browser.navigate/click/typeetc. Each action returns a compact page snapshot (URL, accessibility-tree summary or trimmed DOM, visible interactive elements) — not raw HTML — so the agent reasons cheaply about the next step. - Browser fires real requests through the proxy. The capture addon intercepts every flow regardless of who triggered it (agent or human).
- Flow Store ingests each flow: applies the scope filter, computes a signature, dedups, retains a bounded set of samples, and streams the raw flow to the archive on disk. New signatures emit an event.
- Doc Worker consumes new-signature events: infers/merges JSON Schema from
samples (deterministic), and on first sighting of a signature calls the LLM
once to name it, describe it, and group it. Writes
openapi.yaml+API.mdincrementally. - Agent observes & reports: between actions it calls
flows.search/flows.getto see what surfaced, then summarizes in chat (including noting filtered third-party calls). - You steer: redirect, ask questions, approve, or take the mouse. The loop continues until the agent judges the intent covered, then it summarizes and awaits the next intent.
Dedup signature
signature = (method, host, path_template, response_status_class)
path_templatecollapses variable segments via heuristics (numeric ids, UUIDs, hashes, long opaque tokens →{param}), e.g./api/users/4812/orders/99→/api/users/{id}/orders/{id}.- Query params are recorded as parameters, not part of the signature.
- A repeated signature triggers no LLM call; its body/response are merged into the existing schema samples (widening the schema, capturing optional fields).
- Net effect: LLM doc cost scales with distinct endpoints, not request volume.
Scope filtering
- Default in-scope: same-site / same-origin XHR/fetch/document requests to the
target host(s). Static assets (
.js/.css/.png/.woff…) dropped. - Common third-party/analytics hosts (google-analytics, segment, stripe-js, sentry, doubleclick…) dropped by a default denylist but noted so the agent can mention them.
- Configurable allowlist/denylist of hosts + path globs in
config.py; the agent can also be told in chat to include/exclude a host. - Everything is still written to the raw archive even when filtered from the spec — filtering only affects what gets documented.
CLI / REPL UX
Invocation:
auto-reverse <target-url> [options]
--out DIR output dir (default ./auto-reverse-out/<host>-<timestamp>/)
--proxy-port N mitmproxy listen port (default 8080)
--headless run browser headless (default: headed, so you can watch)
--profile DIR persistent browser profile (cookies persist across runs)
--gen-client after the session, generate a typed httpx client from openapi.yaml
--model NAME Claude model (default: claude-opus-4-8)
--scope HOST,... extra in-scope hosts (added to target)
--no-llm-doc deterministic docs only (zero doc-LLM cost)
--resume DIR reopen a previous session's store/spec and keep going
REPL — plain chat plus a few / meta-commands handled locally (not sent to the
LLM):
> map the checkout flow ← natural-language intent
/take hand browser control to you; capture keeps running; /done to return
/stop interrupt the agent's current pursuit (keeps session alive)
/flows [q] print discovered endpoints (optionally filter) — local, no LLM
/spec show current openapi.yaml path + endpoint count
/save flush spec/markdown/archive now
/help /quit
- Streaming: agent replies and tool-call narration stream live.
- Take-over mechanics:
/takepauses the agent loop and surfaces the headed browser to the human; mitmproxy keeps capturing into the same store;/doneresumes the agent, which first callsflows.searchto catch up on what the human did, then continues. - Interrupt: Ctrl-C /
/stopcleanly interrupts mid-pursuit without killing the session or losing captured data.
Error handling
- Browser action fails (selector gone, navigation timeout): the tool returns a structured error + fresh snapshot; the agent re-plans rather than crashing. Bounded retries per action.
- LLM errors / rate limits: exponential backoff in the agent loop; doc-enrichment failures degrade gracefully to deterministic-only docs (the endpoint is still recorded with a mechanical description) and are retried later.
- Proxy/TLS: first run installs/uses the mitmproxy CA; undecryptable flows are logged to the archive and skipped for docs. Clear error if the port is in use.
- Crash safety: spec, markdown, and raw archive are written incrementally and
flushed on every new signature and on exit (including Ctrl-C via a signal
handler), so a mid-session crash never loses discovered endpoints.
--resumereopens them. - Free-threading caveat: the Flow Store is guarded by a lock; queues are thread-safe. If a required C-extension lacks free-threaded wheels, the README documents the fallback (run on the GIL-enabled interpreter).
Testing
- Unit (pure, no network/LLM):
store: signature templating (ids/UUIDs/hashes →{param}), dedup, sample merging, scope filter allow/deny.doc/schema: schema inference + merge widening (optional fields, unions).doc/openapi+doc/markdown: golden-file output from canned flows.
- Tool layer: browser/flows/doc tools tested against a Playwright-driven local fixture site (a tiny Flask/Starlette app with a few JSON endpoints) routed through a real embedded mitmproxy — verifies the full capture→store→doc path with zero external dependencies and no LLM.
- Agent loop: tested with a mocked Claude client returning scripted tool calls, asserting the intent→action→observe cycle and graceful error re-planning.
- End-to-end smoke: against the fixture site, assert a known endpoint lands in
openapi.yamlwith the correct method/path-template/schema. - LLM-dependent enrichment is mocked in CI; a manual/optional live test is gated behind an env flag.
Key dependencies
playwright— headed browser automation.mitmproxy— embedded intercepting proxy (DumpMaster+ addon).anthropic— Claude API (tool-use loop + doc enrichment).genson(or equivalent) — deterministic JSON Schema inference.openapi-python-client(or equivalent) — optional--gen-clientcodegen.- All must be validated for Python 3.14 free-threaded wheel availability during implementation; fallback documented if any are missing.
Decisions made during implementation
- Runtime: standard CPython 3.14 (not free-threaded). mitmproxy cannot run on
3.14t— itsaioquicandmitmproxy-rsdeps ship only Limited-API (abi3) wheels that the free-threaded build rejects, and source builds fail. The workload is entirely I/O-bound (asyncio proxy loop, agent loop on network/LLM, Playwright as a separate Node subprocess), so free-threading offered no practical benefit. The threading architecture below is unchanged; threads run under the GIL, which is fine for I/O-bound work.
Open questions for implementation
- Exact compact-snapshot format the browser tools return to the agent (accessibility tree vs. trimmed DOM) — tune for token cost vs. usefulness.
- Path-template heuristics tuning (avoid over-collapsing legitimately distinct static paths).