docs: add auto-reverse design spec
Conversational CLI that reverse-engineers a website API: LLM-driven headed browser (approach 3) + embedded mitmproxy capture/doc pipeline (approach 5), unified as a single tool-use agent. Free-threaded single-process architecture, intent-driven exploration, hybrid human/agent control, bounded LLM cost via endpoint-signature dedup. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,274 @@
|
||||
# auto-reverse — Design
|
||||
|
||||
**Date:** 2026-05-31
|
||||
**Status:** Approved (pending implementation plan)
|
||||
|
||||
## Summary
|
||||
|
||||
`auto-reverse` is a conversational CLI that reverse-engineers a website's API by
|
||||
combining an LLM-driven headed browser (so you can watch and take over) with an
|
||||
embedded intercepting proxy that captures and documents real traffic in real
|
||||
time. You state intent in plain language ("map the checkout flow"); Claude
|
||||
pursues it through the browser; every real request is captured, deduplicated,
|
||||
and turned into a growing OpenAPI spec + markdown as it happens; Claude reports
|
||||
findings and you steer — back and forth — until the intent is covered.
|
||||
|
||||
It unifies two classic approaches into one tool:
|
||||
|
||||
- **Approach 3** — an agent drives a real (headed) browser, so a React/SPA
|
||||
behaves normally and runtime API calls actually fire.
|
||||
- **Approach 5** — an in-process capture pipeline documents each new endpoint in
|
||||
real time as traffic flows.
|
||||
|
||||
The 3+5 split is preserved as an *internal* boundary: the Driver Agent only
|
||||
*acts* (browser) and *queries/commands* (flows, doc); capture + documentation
|
||||
run independently in the background and keep working even when a human drives.
|
||||
|
||||
## Goals
|
||||
|
||||
- Conversational, intent-driven exploration (not a blind crawler, not
|
||||
fire-and-forget). "Sensible length" is Claude's judgment against the user's
|
||||
stated intent.
|
||||
- Watchable: headed browser by default; human can grab control at any time
|
||||
(hybrid driving), with all traffic still captured.
|
||||
- Bounded LLM cost: documentation cost scales with *distinct endpoints*, not
|
||||
request volume.
|
||||
- Lossless capture: a raw archive retains everything even when the spec filters
|
||||
noise out.
|
||||
|
||||
## Non-goals (v1)
|
||||
|
||||
- Robust automated login. Auth is a configurable, pluggable concern with a
|
||||
stubbed default (manual-pause strategy); deeper strategies come later.
|
||||
- Defeating bot-detection / captcha / anti-automation.
|
||||
- Documenting non-HTTP protocols (websockets/gRPC) — recorded to archive but not
|
||||
modeled in v1.
|
||||
|
||||
## Approach decision
|
||||
|
||||
Chosen: **single agent, with browser + recon + doc all exposed as tools**
|
||||
(approach "A" from brainstorming). One Claude tool-use loop is the brain;
|
||||
capture and deterministic schema inference run in background threads; the LLM
|
||||
enriches only *new* endpoint signatures. Rejected: two cooperating agents (more
|
||||
cost/orchestration, harder to keep coherent in one chat) and fully-deterministic
|
||||
docs (cheapest but mechanical descriptions — offered instead as a `--no-llm-doc`
|
||||
flag).
|
||||
|
||||
## Architecture
|
||||
|
||||
Single free-threaded process (Python 3.14, `3.14+freethreaded`), four concurrent
|
||||
roles sharing an in-memory flow store. Free-threading is what gives real
|
||||
parallelism to the Python-side work (schema inference, doc generation, capture)
|
||||
without GIL contention.
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────────────────┐
|
||||
│ auto-reverse (one process, free-threaded) │
|
||||
│ │
|
||||
│ [main thread] Chat REPL ◄──── you type intent / steer │
|
||||
│ │ ▲ │
|
||||
│ ▼ │ streamed replies + findings │
|
||||
│ ┌─────────────────────────────┐ │
|
||||
│ │ Driver Agent (Claude API, │ tools: │
|
||||
│ │ tool-use loop) │ • browser.* (act) │
|
||||
│ │ │ • flows.* (recon) │
|
||||
│ └───┬─────────────┬───────────┘ • doc.* (document) │
|
||||
│ │ browser.* │ flows.* / doc.* │
|
||||
│ ▼ ▼ │
|
||||
│ ┌─────────┐ ┌──────────────────────────────┐ │
|
||||
│ │Playwright│ │ Flow Store (locked) │◄──┐ │
|
||||
│ │ headed │ │ dedup by signature, samples │ │ push flows │
|
||||
│ │ browser │ └───────────┬──────────────────┘ │ │
|
||||
│ └────┬─────┘ │ new-signature events │ │
|
||||
│ │ proxied ▼ │ │
|
||||
│ │ ┌───────────────┐ │ │
|
||||
│ │ │ Doc Worker │ genson schema│ │
|
||||
│ │ │ [thread] │ + LLM enrich │ │
|
||||
│ │ └───────┬────────┘ (new only) │ │
|
||||
│ │ ▼ │ │
|
||||
│ │ openapi.yaml + API.md │ │
|
||||
│ ▼ │ │
|
||||
│ ┌──────────────────────────────────────────┐ │ │
|
||||
│ │ mitmproxy DumpMaster [thread, asyncio] │──────┘ │
|
||||
│ │ + addon → raw archive (flows dump + HAR) │ │
|
||||
│ └──────────────────────────────────────────┘ │
|
||||
└────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Thread roles
|
||||
|
||||
- **Main thread** — chat REPL + Driver Agent tool-use loop (synchronous, easy to
|
||||
reason about).
|
||||
- **mitmproxy thread** — embedded `DumpMaster` on its own asyncio loop; a capture
|
||||
addon pushes each flow into the Flow Store and streams raw flows to disk.
|
||||
- **Doc Worker thread(s)** — consume *new-signature* events, run deterministic
|
||||
schema inference, call the LLM only to enrich novel endpoints, write
|
||||
spec/markdown.
|
||||
- Playwright's browser driver is a separate Node subprocess, so it sidesteps
|
||||
free-threaded C-extension concerns.
|
||||
|
||||
### Components (modules under `src/auto_reverse/`)
|
||||
|
||||
- `cli.py` — entrypoint, arg parsing, wires everything, starts threads.
|
||||
- `repl.py` — chat loop, renders streamed agent output, handles steer/interrupt
|
||||
and the take-over keypress, dispatches `/` meta-commands locally.
|
||||
- `agent.py` — Claude tool-use loop; owns the conversation.
|
||||
- `tools/browser.py`, `tools/flows.py`, `tools/doc.py` — the three tool groups.
|
||||
- `browser.py` — Playwright launch (headed, proxied), take-over/release.
|
||||
- `proxy.py` — embedded mitmproxy master + capture addon.
|
||||
- `store.py` — thread-safe Flow Store: signature dedup, sample retention, scope
|
||||
filtering.
|
||||
- `doc/schema.py` — deterministic JSON Schema inference + merge.
|
||||
- `doc/openapi.py` — incremental OpenAPI assembly.
|
||||
- `doc/markdown.py` — human-readable API docs.
|
||||
- `doc/client.py` — optional typed httpx client generation from the spec.
|
||||
- `config.py` — config + the stubbed pluggable auth strategy.
|
||||
|
||||
## Data flow
|
||||
|
||||
The intent → action → capture → doc cycle:
|
||||
|
||||
1. **You state intent** in the REPL. It is added to the conversation; the Driver
|
||||
Agent takes the turn.
|
||||
2. **Agent acts** via `browser.navigate` / `click` / `type` etc. Each action
|
||||
returns a *compact* page snapshot (URL, accessibility-tree summary or trimmed
|
||||
DOM, visible interactive elements) — not raw HTML — so the agent reasons
|
||||
cheaply about the next step.
|
||||
3. **Browser fires real requests** through the proxy. The capture addon
|
||||
intercepts every flow regardless of who triggered it (agent or human).
|
||||
4. **Flow Store ingests** each flow: applies the scope filter, computes a
|
||||
signature, dedups, retains a bounded set of samples, and streams the raw flow
|
||||
to the archive on disk. New signatures emit an event.
|
||||
5. **Doc Worker** consumes new-signature events: infers/merges JSON Schema from
|
||||
samples (deterministic), and on *first* sighting of a signature calls the LLM
|
||||
once to name it, describe it, and group it. Writes `openapi.yaml` + `API.md`
|
||||
incrementally.
|
||||
6. **Agent observes & reports**: between actions it calls `flows.search` /
|
||||
`flows.get` to see what surfaced, then summarizes in chat (including noting
|
||||
filtered third-party calls).
|
||||
7. **You steer**: redirect, ask questions, approve, or take the mouse. The loop
|
||||
continues until the agent judges the intent covered, then it summarizes and
|
||||
awaits the next intent.
|
||||
|
||||
### Dedup signature
|
||||
|
||||
```
|
||||
signature = (method, host, path_template, response_status_class)
|
||||
```
|
||||
|
||||
- `path_template` collapses variable segments via heuristics (numeric ids,
|
||||
UUIDs, hashes, long opaque tokens → `{param}`), e.g.
|
||||
`/api/users/4812/orders/99` → `/api/users/{id}/orders/{id}`.
|
||||
- Query params are recorded as parameters, not part of the signature.
|
||||
- A repeated signature triggers **no LLM call**; its body/response are merged
|
||||
into the existing schema samples (widening the schema, capturing optional
|
||||
fields).
|
||||
- Net effect: LLM doc cost scales with *distinct endpoints*, not request volume.
|
||||
|
||||
### Scope filtering
|
||||
|
||||
- Default in-scope: same-site / same-origin XHR/fetch/document requests to the
|
||||
target host(s). Static assets (`.js/.css/.png/.woff`…) dropped.
|
||||
- Common third-party/analytics hosts (google-analytics, segment, stripe-js,
|
||||
sentry, doubleclick…) dropped by a default denylist but *noted* so the agent
|
||||
can mention them.
|
||||
- Configurable allowlist/denylist of hosts + path globs in `config.py`; the
|
||||
agent can also be told in chat to include/exclude a host.
|
||||
- Everything is still written to the **raw archive** even when filtered from the
|
||||
spec — filtering only affects what gets documented.
|
||||
|
||||
## CLI / REPL UX
|
||||
|
||||
Invocation:
|
||||
|
||||
```
|
||||
auto-reverse <target-url> [options]
|
||||
|
||||
--out DIR output dir (default ./auto-reverse-out/<host>-<timestamp>/)
|
||||
--proxy-port N mitmproxy listen port (default 8080)
|
||||
--headless run browser headless (default: headed, so you can watch)
|
||||
--profile DIR persistent browser profile (cookies persist across runs)
|
||||
--gen-client after the session, generate a typed httpx client from openapi.yaml
|
||||
--model NAME Claude model (default: claude-opus-4-8)
|
||||
--scope HOST,... extra in-scope hosts (added to target)
|
||||
--no-llm-doc deterministic docs only (zero doc-LLM cost)
|
||||
--resume DIR reopen a previous session's store/spec and keep going
|
||||
```
|
||||
|
||||
REPL — plain chat plus a few `/` meta-commands handled locally (not sent to the
|
||||
LLM):
|
||||
|
||||
```
|
||||
> map the checkout flow ← natural-language intent
|
||||
/take hand browser control to you; capture keeps running; /done to return
|
||||
/stop interrupt the agent's current pursuit (keeps session alive)
|
||||
/flows [q] print discovered endpoints (optionally filter) — local, no LLM
|
||||
/spec show current openapi.yaml path + endpoint count
|
||||
/save flush spec/markdown/archive now
|
||||
/help /quit
|
||||
```
|
||||
|
||||
- **Streaming**: agent replies and tool-call narration stream live.
|
||||
- **Take-over mechanics**: `/take` pauses the agent loop and surfaces the headed
|
||||
browser to the human; mitmproxy keeps capturing into the same store; `/done`
|
||||
resumes the agent, which first calls `flows.search` to catch up on what the
|
||||
human did, then continues.
|
||||
- **Interrupt**: Ctrl-C / `/stop` cleanly interrupts mid-pursuit without killing
|
||||
the session or losing captured data.
|
||||
|
||||
## Error handling
|
||||
|
||||
- **Browser action fails** (selector gone, navigation timeout): the tool returns
|
||||
a structured error + fresh snapshot; the agent re-plans rather than crashing.
|
||||
Bounded retries per action.
|
||||
- **LLM errors / rate limits**: exponential backoff in the agent loop;
|
||||
doc-enrichment failures degrade gracefully to deterministic-only docs (the
|
||||
endpoint is still recorded with a mechanical description) and are retried
|
||||
later.
|
||||
- **Proxy/TLS**: first run installs/uses the mitmproxy CA; undecryptable flows
|
||||
are logged to the archive and skipped for docs. Clear error if the port is in
|
||||
use.
|
||||
- **Crash safety**: spec, markdown, and raw archive are written incrementally and
|
||||
flushed on every new signature and on exit (including Ctrl-C via a signal
|
||||
handler), so a mid-session crash never loses discovered endpoints. `--resume`
|
||||
reopens them.
|
||||
- **Free-threading caveat**: the Flow Store is guarded by a lock; queues are
|
||||
thread-safe. If a required C-extension lacks free-threaded wheels, the README
|
||||
documents the fallback (run on the GIL-enabled interpreter).
|
||||
|
||||
## Testing
|
||||
|
||||
- **Unit (pure, no network/LLM):**
|
||||
- `store`: signature templating (ids/UUIDs/hashes → `{param}`), dedup, sample
|
||||
merging, scope filter allow/deny.
|
||||
- `doc/schema`: schema inference + merge widening (optional fields, unions).
|
||||
- `doc/openapi` + `doc/markdown`: golden-file output from canned flows.
|
||||
- **Tool layer**: browser/flows/doc tools tested against a Playwright-driven
|
||||
**local fixture site** (a tiny Flask/Starlette app with a few JSON endpoints)
|
||||
routed through a real embedded mitmproxy — verifies the full
|
||||
capture→store→doc path with zero external dependencies and no LLM.
|
||||
- **Agent loop**: tested with a **mocked Claude client** returning scripted tool
|
||||
calls, asserting the intent→action→observe cycle and graceful error re-planning.
|
||||
- **End-to-end smoke**: against the fixture site, assert a known endpoint lands in
|
||||
`openapi.yaml` with the correct method/path-template/schema.
|
||||
- LLM-dependent enrichment is mocked in CI; a manual/optional live test is gated
|
||||
behind an env flag.
|
||||
|
||||
## Key dependencies
|
||||
|
||||
- `playwright` — headed browser automation.
|
||||
- `mitmproxy` — embedded intercepting proxy (`DumpMaster` + addon).
|
||||
- `anthropic` — Claude API (tool-use loop + doc enrichment).
|
||||
- `genson` (or equivalent) — deterministic JSON Schema inference.
|
||||
- `openapi-python-client` (or equivalent) — optional `--gen-client` codegen.
|
||||
- All must be validated for Python 3.14 free-threaded wheel availability during
|
||||
implementation; fallback documented if any are missing.
|
||||
|
||||
## Open questions for implementation
|
||||
|
||||
- Confirm free-threaded wheel availability for mitmproxy / playwright on 3.14t;
|
||||
decide fallback interpreter if needed.
|
||||
- Exact compact-snapshot format the browser tools return to the agent
|
||||
(accessibility tree vs. trimmed DOM) — tune for token cost vs. usefulness.
|
||||
- Path-template heuristics tuning (avoid over-collapsing legitimately distinct
|
||||
static paths).
|
||||
Reference in New Issue
Block a user