Files
auto-reverse/docs/superpowers/specs/2026-05-31-auto-reverse-design.md
tomatocream 422990bc4e fix: target standard CPython 3.14 (mitmproxy incompatible with 3.14t)
mitmproxy's aioquic/mitmproxy-rs deps have no free-threaded wheels and
source builds fail. Workload is I/O-bound so free-threading gave no
benefit. Switch .python-version to 3.14, drop /tmp stub overrides,
reinstall mitmproxy with real wheels (imports cleanly). Update README
and design spec to record the decision.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 23:50:26 +08:00

15 KiB

auto-reverse — Design

Date: 2026-05-31 Status: Approved (pending implementation plan)

Summary

auto-reverse is a conversational CLI that reverse-engineers a website's API by combining an LLM-driven headed browser (so you can watch and take over) with an embedded intercepting proxy that captures and documents real traffic in real time. You state intent in plain language ("map the checkout flow"); Claude pursues it through the browser; every real request is captured, deduplicated, and turned into a growing OpenAPI spec + markdown as it happens; Claude reports findings and you steer — back and forth — until the intent is covered.

It unifies two classic approaches into one tool:

  • Approach 3 — an agent drives a real (headed) browser, so a React/SPA behaves normally and runtime API calls actually fire.
  • Approach 5 — an in-process capture pipeline documents each new endpoint in real time as traffic flows.

The 3+5 split is preserved as an internal boundary: the Driver Agent only acts (browser) and queries/commands (flows, doc); capture + documentation run independently in the background and keep working even when a human drives.

Goals

  • Conversational, intent-driven exploration (not a blind crawler, not fire-and-forget). "Sensible length" is Claude's judgment against the user's stated intent.
  • Watchable: headed browser by default; human can grab control at any time (hybrid driving), with all traffic still captured.
  • Bounded LLM cost: documentation cost scales with distinct endpoints, not request volume.
  • Lossless capture: a raw archive retains everything even when the spec filters noise out.

Non-goals (v1)

  • Robust automated login. Auth is a configurable, pluggable concern with a stubbed default (manual-pause strategy); deeper strategies come later.
  • Defeating bot-detection / captcha / anti-automation.
  • Documenting non-HTTP protocols (websockets/gRPC) — recorded to archive but not modeled in v1.

Approach decision

Chosen: single agent, with browser + recon + doc all exposed as tools (approach "A" from brainstorming). One Claude tool-use loop is the brain; capture and deterministic schema inference run in background threads; the LLM enriches only new endpoint signatures. Rejected: two cooperating agents (more cost/orchestration, harder to keep coherent in one chat) and fully-deterministic docs (cheapest but mechanical descriptions — offered instead as a --no-llm-doc flag).

Architecture

Single free-threaded process (Python 3.14, 3.14+freethreaded), four concurrent roles sharing an in-memory flow store. Free-threading is what gives real parallelism to the Python-side work (schema inference, doc generation, capture) without GIL contention.

┌────────────────────────────────────────────────────────────────┐
│  auto-reverse  (one process, free-threaded)                      │
│                                                                  │
│  [main thread]  Chat REPL  ◄──── you type intent / steer         │
│       │              ▲                                           │
│       ▼              │ streamed replies + findings               │
│  ┌─────────────────────────────┐                                │
│  │  Driver Agent (Claude API,  │   tools:                        │
│  │  tool-use loop)             │   • browser.*  (act)            │
│  │                             │   • flows.*    (recon)          │
│  └───┬─────────────┬───────────┘   • doc.*      (document)       │
│      │ browser.*   │ flows.* / doc.*                             │
│      ▼             ▼                                             │
│  ┌─────────┐   ┌──────────────────────────────┐                 │
│  │Playwright│  │  Flow Store (locked)          │◄──┐             │
│  │ headed   │  │  dedup by signature, samples  │   │ push flows  │
│  │ browser  │  └───────────┬──────────────────┘   │             │
│  └────┬─────┘              │ new-signature events  │             │
│       │ proxied            ▼                       │             │
│       │            ┌───────────────┐               │             │
│       │            │ Doc Worker     │  genson schema│            │
│       │            │ [thread]       │  + LLM enrich │            │
│       │            └───────┬────────┘   (new only)  │            │
│       │                    ▼                        │            │
│       │            openapi.yaml + API.md            │            │
│       ▼                                             │            │
│  ┌──────────────────────────────────────────┐      │            │
│  │ mitmproxy DumpMaster [thread, asyncio]     │──────┘            │
│  │ + addon  →  raw archive (flows dump + HAR) │                  │
│  └──────────────────────────────────────────┘                  │
└────────────────────────────────────────────────────────────────┘

Thread roles

  • Main thread — chat REPL + Driver Agent tool-use loop (synchronous, easy to reason about).
  • mitmproxy thread — embedded DumpMaster on its own asyncio loop; a capture addon pushes each flow into the Flow Store and streams raw flows to disk.
  • Doc Worker thread(s) — consume new-signature events, run deterministic schema inference, call the LLM only to enrich novel endpoints, write spec/markdown.
  • Playwright's browser driver is a separate Node subprocess, so it sidesteps free-threaded C-extension concerns.

Components (modules under src/auto_reverse/)

  • cli.py — entrypoint, arg parsing, wires everything, starts threads.
  • repl.py — chat loop, renders streamed agent output, handles steer/interrupt and the take-over keypress, dispatches / meta-commands locally.
  • agent.py — Claude tool-use loop; owns the conversation.
  • tools/browser.py, tools/flows.py, tools/doc.py — the three tool groups.
  • browser.py — Playwright launch (headed, proxied), take-over/release.
  • proxy.py — embedded mitmproxy master + capture addon.
  • store.py — thread-safe Flow Store: signature dedup, sample retention, scope filtering.
  • doc/schema.py — deterministic JSON Schema inference + merge.
  • doc/openapi.py — incremental OpenAPI assembly.
  • doc/markdown.py — human-readable API docs.
  • doc/client.py — optional typed httpx client generation from the spec.
  • config.py — config + the stubbed pluggable auth strategy.

Data flow

The intent → action → capture → doc cycle:

  1. You state intent in the REPL. It is added to the conversation; the Driver Agent takes the turn.
  2. Agent acts via browser.navigate / click / type etc. Each action returns a compact page snapshot (URL, accessibility-tree summary or trimmed DOM, visible interactive elements) — not raw HTML — so the agent reasons cheaply about the next step.
  3. Browser fires real requests through the proxy. The capture addon intercepts every flow regardless of who triggered it (agent or human).
  4. Flow Store ingests each flow: applies the scope filter, computes a signature, dedups, retains a bounded set of samples, and streams the raw flow to the archive on disk. New signatures emit an event.
  5. Doc Worker consumes new-signature events: infers/merges JSON Schema from samples (deterministic), and on first sighting of a signature calls the LLM once to name it, describe it, and group it. Writes openapi.yaml + API.md incrementally.
  6. Agent observes & reports: between actions it calls flows.search / flows.get to see what surfaced, then summarizes in chat (including noting filtered third-party calls).
  7. You steer: redirect, ask questions, approve, or take the mouse. The loop continues until the agent judges the intent covered, then it summarizes and awaits the next intent.

Dedup signature

signature = (method, host, path_template, response_status_class)
  • path_template collapses variable segments via heuristics (numeric ids, UUIDs, hashes, long opaque tokens → {param}), e.g. /api/users/4812/orders/99/api/users/{id}/orders/{id}.
  • Query params are recorded as parameters, not part of the signature.
  • A repeated signature triggers no LLM call; its body/response are merged into the existing schema samples (widening the schema, capturing optional fields).
  • Net effect: LLM doc cost scales with distinct endpoints, not request volume.

Scope filtering

  • Default in-scope: same-site / same-origin XHR/fetch/document requests to the target host(s). Static assets (.js/.css/.png/.woff…) dropped.
  • Common third-party/analytics hosts (google-analytics, segment, stripe-js, sentry, doubleclick…) dropped by a default denylist but noted so the agent can mention them.
  • Configurable allowlist/denylist of hosts + path globs in config.py; the agent can also be told in chat to include/exclude a host.
  • Everything is still written to the raw archive even when filtered from the spec — filtering only affects what gets documented.

CLI / REPL UX

Invocation:

auto-reverse <target-url> [options]

  --out DIR            output dir (default ./auto-reverse-out/<host>-<timestamp>/)
  --proxy-port N       mitmproxy listen port (default 8080)
  --headless           run browser headless (default: headed, so you can watch)
  --profile DIR        persistent browser profile (cookies persist across runs)
  --gen-client         after the session, generate a typed httpx client from openapi.yaml
  --model NAME         Claude model (default: claude-opus-4-8)
  --scope HOST,...     extra in-scope hosts (added to target)
  --no-llm-doc         deterministic docs only (zero doc-LLM cost)
  --resume DIR         reopen a previous session's store/spec and keep going

REPL — plain chat plus a few / meta-commands handled locally (not sent to the LLM):

> map the checkout flow                ← natural-language intent
/take         hand browser control to you; capture keeps running; /done to return
/stop         interrupt the agent's current pursuit (keeps session alive)
/flows [q]    print discovered endpoints (optionally filter) — local, no LLM
/spec         show current openapi.yaml path + endpoint count
/save         flush spec/markdown/archive now
/help  /quit
  • Streaming: agent replies and tool-call narration stream live.
  • Take-over mechanics: /take pauses the agent loop and surfaces the headed browser to the human; mitmproxy keeps capturing into the same store; /done resumes the agent, which first calls flows.search to catch up on what the human did, then continues.
  • Interrupt: Ctrl-C / /stop cleanly interrupts mid-pursuit without killing the session or losing captured data.

Error handling

  • Browser action fails (selector gone, navigation timeout): the tool returns a structured error + fresh snapshot; the agent re-plans rather than crashing. Bounded retries per action.
  • LLM errors / rate limits: exponential backoff in the agent loop; doc-enrichment failures degrade gracefully to deterministic-only docs (the endpoint is still recorded with a mechanical description) and are retried later.
  • Proxy/TLS: first run installs/uses the mitmproxy CA; undecryptable flows are logged to the archive and skipped for docs. Clear error if the port is in use.
  • Crash safety: spec, markdown, and raw archive are written incrementally and flushed on every new signature and on exit (including Ctrl-C via a signal handler), so a mid-session crash never loses discovered endpoints. --resume reopens them.
  • Free-threading caveat: the Flow Store is guarded by a lock; queues are thread-safe. If a required C-extension lacks free-threaded wheels, the README documents the fallback (run on the GIL-enabled interpreter).

Testing

  • Unit (pure, no network/LLM):
    • store: signature templating (ids/UUIDs/hashes → {param}), dedup, sample merging, scope filter allow/deny.
    • doc/schema: schema inference + merge widening (optional fields, unions).
    • doc/openapi + doc/markdown: golden-file output from canned flows.
  • Tool layer: browser/flows/doc tools tested against a Playwright-driven local fixture site (a tiny Flask/Starlette app with a few JSON endpoints) routed through a real embedded mitmproxy — verifies the full capture→store→doc path with zero external dependencies and no LLM.
  • Agent loop: tested with a mocked Claude client returning scripted tool calls, asserting the intent→action→observe cycle and graceful error re-planning.
  • End-to-end smoke: against the fixture site, assert a known endpoint lands in openapi.yaml with the correct method/path-template/schema.
  • LLM-dependent enrichment is mocked in CI; a manual/optional live test is gated behind an env flag.

Key dependencies

  • playwright — headed browser automation.
  • mitmproxy — embedded intercepting proxy (DumpMaster + addon).
  • anthropic — Claude API (tool-use loop + doc enrichment).
  • genson (or equivalent) — deterministic JSON Schema inference.
  • openapi-python-client (or equivalent) — optional --gen-client codegen.
  • All must be validated for Python 3.14 free-threaded wheel availability during implementation; fallback documented if any are missing.

Decisions made during implementation

  • Runtime: standard CPython 3.14 (not free-threaded). mitmproxy cannot run on 3.14t — its aioquic and mitmproxy-rs deps ship only Limited-API (abi3) wheels that the free-threaded build rejects, and source builds fail. The workload is entirely I/O-bound (asyncio proxy loop, agent loop on network/LLM, Playwright as a separate Node subprocess), so free-threading offered no practical benefit. The threading architecture below is unchanged; threads run under the GIL, which is fine for I/O-bound work.

Open questions for implementation

  • Exact compact-snapshot format the browser tools return to the agent (accessibility tree vs. trimmed DOM) — tune for token cost vs. usefulness.
  • Path-template heuristics tuning (avoid over-collapsing legitimately distinct static paths).