docs: add AGENTS.md for leetcode extractor and update active context
This commit is contained in:
@@ -0,0 +1,85 @@
|
||||
# AGENTS.md — leetcode/
|
||||
|
||||
## What This Is
|
||||
|
||||
An idempotent extractor that pulls the NeetCode roadmap dependency graph
|
||||
and problem list from the live site (neetcode.io). Outputs structured
|
||||
JSON, Graphviz DOT, and Emacs org-mode files.
|
||||
|
||||
## How It Works
|
||||
|
||||
NeetCode is an Angular SPA. The data we need is split across lazy-loaded
|
||||
JS chunks:
|
||||
|
||||
1. **HTML** (`/roadmap`) — contains the `<script>` tags pointing to the
|
||||
runtime and main bundle filenames (content-hashed).
|
||||
2. **Runtime JS** — maps chunk IDs to content hashes:
|
||||
`7669:"fc6133d290d8d0ad"`.
|
||||
3. **Main bundle** (`main.*.js`) — contains all ~965 problems with
|
||||
fields: `problem`, `pattern`, `link`, `difficulty`, `code`, flags
|
||||
(`neetcode150`, `blind75`, `neetcode250`, `premium`).
|
||||
4. **Chunk 7669** — contains the **graph nodes** (`id`, `name`,
|
||||
`parentId[]`) and course-to-topic mappings. The `parentId` array
|
||||
is the edge list — each entry points to a prerequisite topic.
|
||||
|
||||
The script (`extract.mjs`) resolves the hashed filenames at runtime,
|
||||
downloads the chunks, and regex-extracts the data structures.
|
||||
|
||||
## Running
|
||||
|
||||
```bash
|
||||
node extract.mjs # writes to ./out/
|
||||
node extract.mjs --stdout # prints full JSON to stdout
|
||||
node extract.mjs --cache /tmp/nc # custom cache directory
|
||||
```
|
||||
|
||||
Downloads are cached in `.cache/` (gitignored). Re-runs are instant
|
||||
and produce byte-identical output.
|
||||
|
||||
## Output Files
|
||||
|
||||
| File | Contents |
|
||||
|------|----------|
|
||||
| `out/roadmap.json` | Full data: graph, all 965 problems, courses |
|
||||
| `out/roadmap-neetcode150.json` | NeetCode 150 only (199 problems) |
|
||||
| `out/roadmap.dot` | Graphviz DOT (render with `dot -Tsvg`) |
|
||||
| `out/roadmap.org` | Org-mode with `TODO` checklists, Python/C++ links |
|
||||
| `neetcode-roadmap-graph.json` | Standalone edge list (manual copy) |
|
||||
| `neetcode-roadmap.dot` | Standalone DOT (manual copy) |
|
||||
|
||||
## The Dependency Graph
|
||||
|
||||
18 topics, 21 edges, topologically ordered:
|
||||
|
||||
```
|
||||
Arrays & Hashing
|
||||
├── Two Pointers
|
||||
│ ├── Sliding Window
|
||||
│ ├── Linked List → Trees
|
||||
│ └── Binary Search → Trees
|
||||
│ ├── Tries
|
||||
│ ├── Heap / Priority Queue → Intervals, Greedy, Advanced Graphs
|
||||
│ └── Backtracking
|
||||
│ ├── Graphs → Advanced Graphs, 2-D DP, Math & Geometry
|
||||
│ └── 1-D Dynamic Programming → 2-D DP, Bit Manipulation
|
||||
└── Stack
|
||||
```
|
||||
|
||||
## Org-Mode Format
|
||||
|
||||
Each topic is a `* TODO` heading with a `[/]` cookie for progress.
|
||||
Problems are `- [ ] TODO` items with difficulty tags (`:easy:`,
|
||||
`:medium:`, `:hard:`). Python and C++ solution links are nested
|
||||
`- [ ] TODO` sub-items. LeetCode and video links are plain list items.
|
||||
|
||||
## Updating
|
||||
|
||||
Just re-run `node extract.mjs`. It fetches fresh data from the site
|
||||
(cached locally). If NeetCode changes their chunk structure, the
|
||||
regexes in `extractGraphNodes()` and `extractProblems()` will need
|
||||
updating.
|
||||
|
||||
## Dependencies
|
||||
|
||||
None. Uses only Node.js built-ins (`fs`, `path`, `url`, `fetch`).
|
||||
Requires Node 18+ for native `fetch`.
|
||||
Reference in New Issue
Block a user