docs: add auto-reverse design spec

Conversational CLI that reverse-engineers a website API: LLM-driven
headed browser (approach 3) + embedded mitmproxy capture/doc pipeline
(approach 5), unified as a single tool-use agent. Free-threaded
single-process architecture, intent-driven exploration, hybrid
human/agent control, bounded LLM cost via endpoint-signature dedup.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-05-31 23:13:40 +08:00
parent adcd280bbd
commit 879dfc347d
@@ -0,0 +1,274 @@
# auto-reverse — Design
**Date:** 2026-05-31
**Status:** Approved (pending implementation plan)
## Summary
`auto-reverse` is a conversational CLI that reverse-engineers a website's API by
combining an LLM-driven headed browser (so you can watch and take over) with an
embedded intercepting proxy that captures and documents real traffic in real
time. You state intent in plain language ("map the checkout flow"); Claude
pursues it through the browser; every real request is captured, deduplicated,
and turned into a growing OpenAPI spec + markdown as it happens; Claude reports
findings and you steer — back and forth — until the intent is covered.
It unifies two classic approaches into one tool:
- **Approach 3** — an agent drives a real (headed) browser, so a React/SPA
behaves normally and runtime API calls actually fire.
- **Approach 5** — an in-process capture pipeline documents each new endpoint in
real time as traffic flows.
The 3+5 split is preserved as an *internal* boundary: the Driver Agent only
*acts* (browser) and *queries/commands* (flows, doc); capture + documentation
run independently in the background and keep working even when a human drives.
## Goals
- Conversational, intent-driven exploration (not a blind crawler, not
fire-and-forget). "Sensible length" is Claude's judgment against the user's
stated intent.
- Watchable: headed browser by default; human can grab control at any time
(hybrid driving), with all traffic still captured.
- Bounded LLM cost: documentation cost scales with *distinct endpoints*, not
request volume.
- Lossless capture: a raw archive retains everything even when the spec filters
noise out.
## Non-goals (v1)
- Robust automated login. Auth is a configurable, pluggable concern with a
stubbed default (manual-pause strategy); deeper strategies come later.
- Defeating bot-detection / captcha / anti-automation.
- Documenting non-HTTP protocols (websockets/gRPC) — recorded to archive but not
modeled in v1.
## Approach decision
Chosen: **single agent, with browser + recon + doc all exposed as tools**
(approach "A" from brainstorming). One Claude tool-use loop is the brain;
capture and deterministic schema inference run in background threads; the LLM
enriches only *new* endpoint signatures. Rejected: two cooperating agents (more
cost/orchestration, harder to keep coherent in one chat) and fully-deterministic
docs (cheapest but mechanical descriptions — offered instead as a `--no-llm-doc`
flag).
## Architecture
Single free-threaded process (Python 3.14, `3.14+freethreaded`), four concurrent
roles sharing an in-memory flow store. Free-threading is what gives real
parallelism to the Python-side work (schema inference, doc generation, capture)
without GIL contention.
```
┌────────────────────────────────────────────────────────────────┐
│ auto-reverse (one process, free-threaded) │
│ │
│ [main thread] Chat REPL ◄──── you type intent / steer │
│ │ ▲ │
│ ▼ │ streamed replies + findings │
│ ┌─────────────────────────────┐ │
│ │ Driver Agent (Claude API, │ tools: │
│ │ tool-use loop) │ • browser.* (act) │
│ │ │ • flows.* (recon) │
│ └───┬─────────────┬───────────┘ • doc.* (document) │
│ │ browser.* │ flows.* / doc.* │
│ ▼ ▼ │
│ ┌─────────┐ ┌──────────────────────────────┐ │
│ │Playwright│ │ Flow Store (locked) │◄──┐ │
│ │ headed │ │ dedup by signature, samples │ │ push flows │
│ │ browser │ └───────────┬──────────────────┘ │ │
│ └────┬─────┘ │ new-signature events │ │
│ │ proxied ▼ │ │
│ │ ┌───────────────┐ │ │
│ │ │ Doc Worker │ genson schema│ │
│ │ │ [thread] │ + LLM enrich │ │
│ │ └───────┬────────┘ (new only) │ │
│ │ ▼ │ │
│ │ openapi.yaml + API.md │ │
│ ▼ │ │
│ ┌──────────────────────────────────────────┐ │ │
│ │ mitmproxy DumpMaster [thread, asyncio] │──────┘ │
│ │ + addon → raw archive (flows dump + HAR) │ │
│ └──────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
```
### Thread roles
- **Main thread** — chat REPL + Driver Agent tool-use loop (synchronous, easy to
reason about).
- **mitmproxy thread** — embedded `DumpMaster` on its own asyncio loop; a capture
addon pushes each flow into the Flow Store and streams raw flows to disk.
- **Doc Worker thread(s)** — consume *new-signature* events, run deterministic
schema inference, call the LLM only to enrich novel endpoints, write
spec/markdown.
- Playwright's browser driver is a separate Node subprocess, so it sidesteps
free-threaded C-extension concerns.
### Components (modules under `src/auto_reverse/`)
- `cli.py` — entrypoint, arg parsing, wires everything, starts threads.
- `repl.py` — chat loop, renders streamed agent output, handles steer/interrupt
and the take-over keypress, dispatches `/` meta-commands locally.
- `agent.py` — Claude tool-use loop; owns the conversation.
- `tools/browser.py`, `tools/flows.py`, `tools/doc.py` — the three tool groups.
- `browser.py` — Playwright launch (headed, proxied), take-over/release.
- `proxy.py` — embedded mitmproxy master + capture addon.
- `store.py` — thread-safe Flow Store: signature dedup, sample retention, scope
filtering.
- `doc/schema.py` — deterministic JSON Schema inference + merge.
- `doc/openapi.py` — incremental OpenAPI assembly.
- `doc/markdown.py` — human-readable API docs.
- `doc/client.py` — optional typed httpx client generation from the spec.
- `config.py` — config + the stubbed pluggable auth strategy.
## Data flow
The intent → action → capture → doc cycle:
1. **You state intent** in the REPL. It is added to the conversation; the Driver
Agent takes the turn.
2. **Agent acts** via `browser.navigate` / `click` / `type` etc. Each action
returns a *compact* page snapshot (URL, accessibility-tree summary or trimmed
DOM, visible interactive elements) — not raw HTML — so the agent reasons
cheaply about the next step.
3. **Browser fires real requests** through the proxy. The capture addon
intercepts every flow regardless of who triggered it (agent or human).
4. **Flow Store ingests** each flow: applies the scope filter, computes a
signature, dedups, retains a bounded set of samples, and streams the raw flow
to the archive on disk. New signatures emit an event.
5. **Doc Worker** consumes new-signature events: infers/merges JSON Schema from
samples (deterministic), and on *first* sighting of a signature calls the LLM
once to name it, describe it, and group it. Writes `openapi.yaml` + `API.md`
incrementally.
6. **Agent observes & reports**: between actions it calls `flows.search` /
`flows.get` to see what surfaced, then summarizes in chat (including noting
filtered third-party calls).
7. **You steer**: redirect, ask questions, approve, or take the mouse. The loop
continues until the agent judges the intent covered, then it summarizes and
awaits the next intent.
### Dedup signature
```
signature = (method, host, path_template, response_status_class)
```
- `path_template` collapses variable segments via heuristics (numeric ids,
UUIDs, hashes, long opaque tokens → `{param}`), e.g.
`/api/users/4812/orders/99``/api/users/{id}/orders/{id}`.
- Query params are recorded as parameters, not part of the signature.
- A repeated signature triggers **no LLM call**; its body/response are merged
into the existing schema samples (widening the schema, capturing optional
fields).
- Net effect: LLM doc cost scales with *distinct endpoints*, not request volume.
### Scope filtering
- Default in-scope: same-site / same-origin XHR/fetch/document requests to the
target host(s). Static assets (`.js/.css/.png/.woff`…) dropped.
- Common third-party/analytics hosts (google-analytics, segment, stripe-js,
sentry, doubleclick…) dropped by a default denylist but *noted* so the agent
can mention them.
- Configurable allowlist/denylist of hosts + path globs in `config.py`; the
agent can also be told in chat to include/exclude a host.
- Everything is still written to the **raw archive** even when filtered from the
spec — filtering only affects what gets documented.
## CLI / REPL UX
Invocation:
```
auto-reverse <target-url> [options]
--out DIR output dir (default ./auto-reverse-out/<host>-<timestamp>/)
--proxy-port N mitmproxy listen port (default 8080)
--headless run browser headless (default: headed, so you can watch)
--profile DIR persistent browser profile (cookies persist across runs)
--gen-client after the session, generate a typed httpx client from openapi.yaml
--model NAME Claude model (default: claude-opus-4-8)
--scope HOST,... extra in-scope hosts (added to target)
--no-llm-doc deterministic docs only (zero doc-LLM cost)
--resume DIR reopen a previous session's store/spec and keep going
```
REPL — plain chat plus a few `/` meta-commands handled locally (not sent to the
LLM):
```
> map the checkout flow ← natural-language intent
/take hand browser control to you; capture keeps running; /done to return
/stop interrupt the agent's current pursuit (keeps session alive)
/flows [q] print discovered endpoints (optionally filter) — local, no LLM
/spec show current openapi.yaml path + endpoint count
/save flush spec/markdown/archive now
/help /quit
```
- **Streaming**: agent replies and tool-call narration stream live.
- **Take-over mechanics**: `/take` pauses the agent loop and surfaces the headed
browser to the human; mitmproxy keeps capturing into the same store; `/done`
resumes the agent, which first calls `flows.search` to catch up on what the
human did, then continues.
- **Interrupt**: Ctrl-C / `/stop` cleanly interrupts mid-pursuit without killing
the session or losing captured data.
## Error handling
- **Browser action fails** (selector gone, navigation timeout): the tool returns
a structured error + fresh snapshot; the agent re-plans rather than crashing.
Bounded retries per action.
- **LLM errors / rate limits**: exponential backoff in the agent loop;
doc-enrichment failures degrade gracefully to deterministic-only docs (the
endpoint is still recorded with a mechanical description) and are retried
later.
- **Proxy/TLS**: first run installs/uses the mitmproxy CA; undecryptable flows
are logged to the archive and skipped for docs. Clear error if the port is in
use.
- **Crash safety**: spec, markdown, and raw archive are written incrementally and
flushed on every new signature and on exit (including Ctrl-C via a signal
handler), so a mid-session crash never loses discovered endpoints. `--resume`
reopens them.
- **Free-threading caveat**: the Flow Store is guarded by a lock; queues are
thread-safe. If a required C-extension lacks free-threaded wheels, the README
documents the fallback (run on the GIL-enabled interpreter).
## Testing
- **Unit (pure, no network/LLM):**
- `store`: signature templating (ids/UUIDs/hashes → `{param}`), dedup, sample
merging, scope filter allow/deny.
- `doc/schema`: schema inference + merge widening (optional fields, unions).
- `doc/openapi` + `doc/markdown`: golden-file output from canned flows.
- **Tool layer**: browser/flows/doc tools tested against a Playwright-driven
**local fixture site** (a tiny Flask/Starlette app with a few JSON endpoints)
routed through a real embedded mitmproxy — verifies the full
capture→store→doc path with zero external dependencies and no LLM.
- **Agent loop**: tested with a **mocked Claude client** returning scripted tool
calls, asserting the intent→action→observe cycle and graceful error re-planning.
- **End-to-end smoke**: against the fixture site, assert a known endpoint lands in
`openapi.yaml` with the correct method/path-template/schema.
- LLM-dependent enrichment is mocked in CI; a manual/optional live test is gated
behind an env flag.
## Key dependencies
- `playwright` — headed browser automation.
- `mitmproxy` — embedded intercepting proxy (`DumpMaster` + addon).
- `anthropic` — Claude API (tool-use loop + doc enrichment).
- `genson` (or equivalent) — deterministic JSON Schema inference.
- `openapi-python-client` (or equivalent) — optional `--gen-client` codegen.
- All must be validated for Python 3.14 free-threaded wheel availability during
implementation; fallback documented if any are missing.
## Open questions for implementation
- Confirm free-threaded wheel availability for mitmproxy / playwright on 3.14t;
decide fallback interpreter if needed.
- Exact compact-snapshot format the browser tools return to the agent
(accessibility tree vs. trimmed DOM) — tune for token cost vs. usefulness.
- Path-template heuristics tuning (avoid over-collapsing legitimately distinct
static paths).