docs: add AGENTS.md for leetcode extractor and update active context

This commit is contained in:
2026-06-01 02:08:45 +08:00
parent b4f25ab87b
commit 142f2469ec
2 changed files with 90 additions and 3 deletions
+85
View File
@@ -0,0 +1,85 @@
# AGENTS.md — leetcode/
## What This Is
An idempotent extractor that pulls the NeetCode roadmap dependency graph
and problem list from the live site (neetcode.io). Outputs structured
JSON, Graphviz DOT, and Emacs org-mode files.
## How It Works
NeetCode is an Angular SPA. The data we need is split across lazy-loaded
JS chunks:
1. **HTML** (`/roadmap`) — contains the `<script>` tags pointing to the
runtime and main bundle filenames (content-hashed).
2. **Runtime JS** — maps chunk IDs to content hashes:
`7669:"fc6133d290d8d0ad"`.
3. **Main bundle** (`main.*.js`) — contains all ~965 problems with
fields: `problem`, `pattern`, `link`, `difficulty`, `code`, flags
(`neetcode150`, `blind75`, `neetcode250`, `premium`).
4. **Chunk 7669** — contains the **graph nodes** (`id`, `name`,
`parentId[]`) and course-to-topic mappings. The `parentId` array
is the edge list — each entry points to a prerequisite topic.
The script (`extract.mjs`) resolves the hashed filenames at runtime,
downloads the chunks, and regex-extracts the data structures.
## Running
```bash
node extract.mjs # writes to ./out/
node extract.mjs --stdout # prints full JSON to stdout
node extract.mjs --cache /tmp/nc # custom cache directory
```
Downloads are cached in `.cache/` (gitignored). Re-runs are instant
and produce byte-identical output.
## Output Files
| File | Contents |
|------|----------|
| `out/roadmap.json` | Full data: graph, all 965 problems, courses |
| `out/roadmap-neetcode150.json` | NeetCode 150 only (199 problems) |
| `out/roadmap.dot` | Graphviz DOT (render with `dot -Tsvg`) |
| `out/roadmap.org` | Org-mode with `TODO` checklists, Python/C++ links |
| `neetcode-roadmap-graph.json` | Standalone edge list (manual copy) |
| `neetcode-roadmap.dot` | Standalone DOT (manual copy) |
## The Dependency Graph
18 topics, 21 edges, topologically ordered:
```
Arrays & Hashing
├── Two Pointers
│ ├── Sliding Window
│ ├── Linked List → Trees
│ └── Binary Search → Trees
│ ├── Tries
│ ├── Heap / Priority Queue → Intervals, Greedy, Advanced Graphs
│ └── Backtracking
│ ├── Graphs → Advanced Graphs, 2-D DP, Math & Geometry
│ └── 1-D Dynamic Programming → 2-D DP, Bit Manipulation
└── Stack
```
## Org-Mode Format
Each topic is a `* TODO` heading with a `[/]` cookie for progress.
Problems are `- [ ] TODO` items with difficulty tags (`:easy:`,
`:medium:`, `:hard:`). Python and C++ solution links are nested
`- [ ] TODO` sub-items. LeetCode and video links are plain list items.
## Updating
Just re-run `node extract.mjs`. It fetches fresh data from the site
(cached locally). If NeetCode changes their chunk structure, the
regexes in `extractGraphNodes()` and `extractProblems()` will need
updating.
## Dependencies
None. Uses only Node.js built-ins (`fs`, `path`, `url`, `fetch`).
Requires Node 18+ for native `fetch`.