feat: add litellm client adapter, JSONL flow detail, and sample output
This commit is contained in:
@@ -0,0 +1,70 @@
|
||||
# CLAUDE.md — auto-reverse
|
||||
|
||||
## Active Context
|
||||
|
||||
- Swapped `anthropic` SDK → `litellm` for multi-provider support (OpenRouter, mimo, etc)
|
||||
- Added `archive.jsonl` format to persist full request/response details
|
||||
- Default model: `openrouter/xiaomi/mimo-v2.5-pro`
|
||||
- Successfully reverse-engineered URA GLS APIs from `eservice.ura.gov.sg`
|
||||
|
||||
## Key Learnings
|
||||
|
||||
### JS Source Analysis Is The Real Breakthrough
|
||||
The MITM proxy captures traffic but the **JavaScript source files** contain the real API documentation:
|
||||
- Auth flows (JSONP tokens, Bearer headers, cookie patterns)
|
||||
- Endpoint URLs and query structures
|
||||
- Field names and data types
|
||||
- Service registries and feature flags
|
||||
|
||||
**Always fetch and analyze JS source files** after capturing traffic. The files to look for:
|
||||
- `*Service.js` — service-specific logic and API calls
|
||||
- `Env.js` or `*Config.js` — host URLs, environment settings
|
||||
- `*API.js` — auth patterns, token management
|
||||
- `*Controller.js` — orchestration, service registry
|
||||
|
||||
### Archive Format Must Capture Everything
|
||||
The original `archive.log` only saved `METHOD host/path status` — useless for replay.
|
||||
The new `archive.jsonl` captures:
|
||||
- Full request headers (cookies, auth tokens, referer)
|
||||
- Request/query parameters
|
||||
- Response headers (set-cookie, content-type)
|
||||
- Response body (full JSON)
|
||||
|
||||
### LLM Agent Is Optional Overhead
|
||||
For API reverse-engineering, the real value is:
|
||||
1. **Proxy capture** (mitmproxy) — discovers endpoints
|
||||
2. **JS source analysis** — reveals auth, structure, fields
|
||||
3. **Standard API patterns** (ArcGIS REST, etc.) — enables replay
|
||||
|
||||
The LLM agent driving the browser adds cost and latency but wasn't essential for the URA workflow. Consider a "no-llm" mode that just captures + analyzes.
|
||||
|
||||
### Geo-blocking Awareness
|
||||
Singapore government sites (URA, HDB, etc.) use Azure Application Gateway WAF that blocks non-SG IPs. The tool should:
|
||||
- Detect 403 responses from WAFs and report geo-blocking
|
||||
- Use the browser's context (same proxy) to fetch APIs, not direct `requests.get()`
|
||||
- Document that the browser must be on an authorized IP
|
||||
|
||||
### ArcGIS REST Services Are Common
|
||||
Government map sites often use ArcGIS REST services. Standard patterns:
|
||||
```
|
||||
GET /arcgis/rest/services/<name>/MapServer/<layer_id>?f=json # metadata
|
||||
GET /arcgis/rest/services/<name>/MapServer/<layer_id>/query?where=1=1&outFields=*&f=json # data
|
||||
GET /arcgis/rest/services/<name>/MapServer/export?... # map tiles
|
||||
```
|
||||
The tool should auto-detect ArcGIS endpoints and suggest these queries.
|
||||
|
||||
### Auth Pattern Detection Needed
|
||||
The tool should automatically detect and document:
|
||||
- JSONP token endpoints (strip callback wrapper, extract JWT)
|
||||
- Bearer token auth headers
|
||||
- Cookie-based sessions
|
||||
- AWS Signature auth (different from simple Bearer)
|
||||
|
||||
## Improvements To Make
|
||||
|
||||
1. **Auto-analyze captured JS files** — extract API endpoints, auth patterns, headers
|
||||
2. **Export replay scripts** — generate `requests.get()` or `curl` from captured flows
|
||||
3. **ArcGIS-aware analysis** — detect MapServer endpoints, auto-query metadata
|
||||
4. **Auth pattern detection** — notice JSONP tokens, Bearer headers, cookies
|
||||
5. **Request/response diffing** — compare same endpoint with different params
|
||||
6. **Skip LLM for simple sites** — proxy + JS analysis mode without agent
|
||||
Reference in New Issue
Block a user