diff --git a/docs/superpowers/specs/2026-05-31-auto-reverse-design.md b/docs/superpowers/specs/2026-05-31-auto-reverse-design.md new file mode 100644 index 0000000..1fd6ef9 --- /dev/null +++ b/docs/superpowers/specs/2026-05-31-auto-reverse-design.md @@ -0,0 +1,274 @@ +# auto-reverse — Design + +**Date:** 2026-05-31 +**Status:** Approved (pending implementation plan) + +## Summary + +`auto-reverse` is a conversational CLI that reverse-engineers a website's API by +combining an LLM-driven headed browser (so you can watch and take over) with an +embedded intercepting proxy that captures and documents real traffic in real +time. You state intent in plain language ("map the checkout flow"); Claude +pursues it through the browser; every real request is captured, deduplicated, +and turned into a growing OpenAPI spec + markdown as it happens; Claude reports +findings and you steer — back and forth — until the intent is covered. + +It unifies two classic approaches into one tool: + +- **Approach 3** — an agent drives a real (headed) browser, so a React/SPA + behaves normally and runtime API calls actually fire. +- **Approach 5** — an in-process capture pipeline documents each new endpoint in + real time as traffic flows. + +The 3+5 split is preserved as an *internal* boundary: the Driver Agent only +*acts* (browser) and *queries/commands* (flows, doc); capture + documentation +run independently in the background and keep working even when a human drives. + +## Goals + +- Conversational, intent-driven exploration (not a blind crawler, not + fire-and-forget). "Sensible length" is Claude's judgment against the user's + stated intent. +- Watchable: headed browser by default; human can grab control at any time + (hybrid driving), with all traffic still captured. +- Bounded LLM cost: documentation cost scales with *distinct endpoints*, not + request volume. +- Lossless capture: a raw archive retains everything even when the spec filters + noise out. + +## Non-goals (v1) + +- Robust automated login. Auth is a configurable, pluggable concern with a + stubbed default (manual-pause strategy); deeper strategies come later. +- Defeating bot-detection / captcha / anti-automation. +- Documenting non-HTTP protocols (websockets/gRPC) — recorded to archive but not + modeled in v1. + +## Approach decision + +Chosen: **single agent, with browser + recon + doc all exposed as tools** +(approach "A" from brainstorming). One Claude tool-use loop is the brain; +capture and deterministic schema inference run in background threads; the LLM +enriches only *new* endpoint signatures. Rejected: two cooperating agents (more +cost/orchestration, harder to keep coherent in one chat) and fully-deterministic +docs (cheapest but mechanical descriptions — offered instead as a `--no-llm-doc` +flag). + +## Architecture + +Single free-threaded process (Python 3.14, `3.14+freethreaded`), four concurrent +roles sharing an in-memory flow store. Free-threading is what gives real +parallelism to the Python-side work (schema inference, doc generation, capture) +without GIL contention. + +``` +┌────────────────────────────────────────────────────────────────┐ +│ auto-reverse (one process, free-threaded) │ +│ │ +│ [main thread] Chat REPL ◄──── you type intent / steer │ +│ │ ▲ │ +│ ▼ │ streamed replies + findings │ +│ ┌─────────────────────────────┐ │ +│ │ Driver Agent (Claude API, │ tools: │ +│ │ tool-use loop) │ • browser.* (act) │ +│ │ │ • flows.* (recon) │ +│ └───┬─────────────┬───────────┘ • doc.* (document) │ +│ │ browser.* │ flows.* / doc.* │ +│ ▼ ▼ │ +│ ┌─────────┐ ┌──────────────────────────────┐ │ +│ │Playwright│ │ Flow Store (locked) │◄──┐ │ +│ │ headed │ │ dedup by signature, samples │ │ push flows │ +│ │ browser │ └───────────┬──────────────────┘ │ │ +│ └────┬─────┘ │ new-signature events │ │ +│ │ proxied ▼ │ │ +│ │ ┌───────────────┐ │ │ +│ │ │ Doc Worker │ genson schema│ │ +│ │ │ [thread] │ + LLM enrich │ │ +│ │ └───────┬────────┘ (new only) │ │ +│ │ ▼ │ │ +│ │ openapi.yaml + API.md │ │ +│ ▼ │ │ +│ ┌──────────────────────────────────────────┐ │ │ +│ │ mitmproxy DumpMaster [thread, asyncio] │──────┘ │ +│ │ + addon → raw archive (flows dump + HAR) │ │ +│ └──────────────────────────────────────────┘ │ +└────────────────────────────────────────────────────────────────┘ +``` + +### Thread roles + +- **Main thread** — chat REPL + Driver Agent tool-use loop (synchronous, easy to + reason about). +- **mitmproxy thread** — embedded `DumpMaster` on its own asyncio loop; a capture + addon pushes each flow into the Flow Store and streams raw flows to disk. +- **Doc Worker thread(s)** — consume *new-signature* events, run deterministic + schema inference, call the LLM only to enrich novel endpoints, write + spec/markdown. +- Playwright's browser driver is a separate Node subprocess, so it sidesteps + free-threaded C-extension concerns. + +### Components (modules under `src/auto_reverse/`) + +- `cli.py` — entrypoint, arg parsing, wires everything, starts threads. +- `repl.py` — chat loop, renders streamed agent output, handles steer/interrupt + and the take-over keypress, dispatches `/` meta-commands locally. +- `agent.py` — Claude tool-use loop; owns the conversation. +- `tools/browser.py`, `tools/flows.py`, `tools/doc.py` — the three tool groups. +- `browser.py` — Playwright launch (headed, proxied), take-over/release. +- `proxy.py` — embedded mitmproxy master + capture addon. +- `store.py` — thread-safe Flow Store: signature dedup, sample retention, scope + filtering. +- `doc/schema.py` — deterministic JSON Schema inference + merge. +- `doc/openapi.py` — incremental OpenAPI assembly. +- `doc/markdown.py` — human-readable API docs. +- `doc/client.py` — optional typed httpx client generation from the spec. +- `config.py` — config + the stubbed pluggable auth strategy. + +## Data flow + +The intent → action → capture → doc cycle: + +1. **You state intent** in the REPL. It is added to the conversation; the Driver + Agent takes the turn. +2. **Agent acts** via `browser.navigate` / `click` / `type` etc. Each action + returns a *compact* page snapshot (URL, accessibility-tree summary or trimmed + DOM, visible interactive elements) — not raw HTML — so the agent reasons + cheaply about the next step. +3. **Browser fires real requests** through the proxy. The capture addon + intercepts every flow regardless of who triggered it (agent or human). +4. **Flow Store ingests** each flow: applies the scope filter, computes a + signature, dedups, retains a bounded set of samples, and streams the raw flow + to the archive on disk. New signatures emit an event. +5. **Doc Worker** consumes new-signature events: infers/merges JSON Schema from + samples (deterministic), and on *first* sighting of a signature calls the LLM + once to name it, describe it, and group it. Writes `openapi.yaml` + `API.md` + incrementally. +6. **Agent observes & reports**: between actions it calls `flows.search` / + `flows.get` to see what surfaced, then summarizes in chat (including noting + filtered third-party calls). +7. **You steer**: redirect, ask questions, approve, or take the mouse. The loop + continues until the agent judges the intent covered, then it summarizes and + awaits the next intent. + +### Dedup signature + +``` +signature = (method, host, path_template, response_status_class) +``` + +- `path_template` collapses variable segments via heuristics (numeric ids, + UUIDs, hashes, long opaque tokens → `{param}`), e.g. + `/api/users/4812/orders/99` → `/api/users/{id}/orders/{id}`. +- Query params are recorded as parameters, not part of the signature. +- A repeated signature triggers **no LLM call**; its body/response are merged + into the existing schema samples (widening the schema, capturing optional + fields). +- Net effect: LLM doc cost scales with *distinct endpoints*, not request volume. + +### Scope filtering + +- Default in-scope: same-site / same-origin XHR/fetch/document requests to the + target host(s). Static assets (`.js/.css/.png/.woff`…) dropped. +- Common third-party/analytics hosts (google-analytics, segment, stripe-js, + sentry, doubleclick…) dropped by a default denylist but *noted* so the agent + can mention them. +- Configurable allowlist/denylist of hosts + path globs in `config.py`; the + agent can also be told in chat to include/exclude a host. +- Everything is still written to the **raw archive** even when filtered from the + spec — filtering only affects what gets documented. + +## CLI / REPL UX + +Invocation: + +``` +auto-reverse [options] + + --out DIR output dir (default ./auto-reverse-out/-/) + --proxy-port N mitmproxy listen port (default 8080) + --headless run browser headless (default: headed, so you can watch) + --profile DIR persistent browser profile (cookies persist across runs) + --gen-client after the session, generate a typed httpx client from openapi.yaml + --model NAME Claude model (default: claude-opus-4-8) + --scope HOST,... extra in-scope hosts (added to target) + --no-llm-doc deterministic docs only (zero doc-LLM cost) + --resume DIR reopen a previous session's store/spec and keep going +``` + +REPL — plain chat plus a few `/` meta-commands handled locally (not sent to the +LLM): + +``` +> map the checkout flow ← natural-language intent +/take hand browser control to you; capture keeps running; /done to return +/stop interrupt the agent's current pursuit (keeps session alive) +/flows [q] print discovered endpoints (optionally filter) — local, no LLM +/spec show current openapi.yaml path + endpoint count +/save flush spec/markdown/archive now +/help /quit +``` + +- **Streaming**: agent replies and tool-call narration stream live. +- **Take-over mechanics**: `/take` pauses the agent loop and surfaces the headed + browser to the human; mitmproxy keeps capturing into the same store; `/done` + resumes the agent, which first calls `flows.search` to catch up on what the + human did, then continues. +- **Interrupt**: Ctrl-C / `/stop` cleanly interrupts mid-pursuit without killing + the session or losing captured data. + +## Error handling + +- **Browser action fails** (selector gone, navigation timeout): the tool returns + a structured error + fresh snapshot; the agent re-plans rather than crashing. + Bounded retries per action. +- **LLM errors / rate limits**: exponential backoff in the agent loop; + doc-enrichment failures degrade gracefully to deterministic-only docs (the + endpoint is still recorded with a mechanical description) and are retried + later. +- **Proxy/TLS**: first run installs/uses the mitmproxy CA; undecryptable flows + are logged to the archive and skipped for docs. Clear error if the port is in + use. +- **Crash safety**: spec, markdown, and raw archive are written incrementally and + flushed on every new signature and on exit (including Ctrl-C via a signal + handler), so a mid-session crash never loses discovered endpoints. `--resume` + reopens them. +- **Free-threading caveat**: the Flow Store is guarded by a lock; queues are + thread-safe. If a required C-extension lacks free-threaded wheels, the README + documents the fallback (run on the GIL-enabled interpreter). + +## Testing + +- **Unit (pure, no network/LLM):** + - `store`: signature templating (ids/UUIDs/hashes → `{param}`), dedup, sample + merging, scope filter allow/deny. + - `doc/schema`: schema inference + merge widening (optional fields, unions). + - `doc/openapi` + `doc/markdown`: golden-file output from canned flows. +- **Tool layer**: browser/flows/doc tools tested against a Playwright-driven + **local fixture site** (a tiny Flask/Starlette app with a few JSON endpoints) + routed through a real embedded mitmproxy — verifies the full + capture→store→doc path with zero external dependencies and no LLM. +- **Agent loop**: tested with a **mocked Claude client** returning scripted tool + calls, asserting the intent→action→observe cycle and graceful error re-planning. +- **End-to-end smoke**: against the fixture site, assert a known endpoint lands in + `openapi.yaml` with the correct method/path-template/schema. +- LLM-dependent enrichment is mocked in CI; a manual/optional live test is gated + behind an env flag. + +## Key dependencies + +- `playwright` — headed browser automation. +- `mitmproxy` — embedded intercepting proxy (`DumpMaster` + addon). +- `anthropic` — Claude API (tool-use loop + doc enrichment). +- `genson` (or equivalent) — deterministic JSON Schema inference. +- `openapi-python-client` (or equivalent) — optional `--gen-client` codegen. +- All must be validated for Python 3.14 free-threaded wheel availability during + implementation; fallback documented if any are missing. + +## Open questions for implementation + +- Confirm free-threaded wheel availability for mitmproxy / playwright on 3.14t; + decide fallback interpreter if needed. +- Exact compact-snapshot format the browser tools return to the agent + (accessibility tree vs. trimmed DOM) — tune for token cost vs. usefulness. +- Path-template heuristics tuning (avoid over-collapsing legitimately distinct + static paths).