Skip to content

Tag every tool and add a Yahoo-style category directory#284

Open
obra wants to merge 8 commits into
simonw:mainfrom
obra:tags-and-directory
Open

Tag every tool and add a Yahoo-style category directory#284
obra wants to merge 8 commits into
simonw:mainfrom
obra:tags-and-directory

Conversation

@obra

@obra obra commented Jun 5, 2026

Copy link
Copy Markdown

Tag every tool, then browse them like it's 1998

This adds a tag vocabulary to the doc-generation pipeline and builds a classic Yahoo-style category directory on top of it.

What's in here

1. Tags for every tool — two namespaces, stored in each committed *.docs.md as comments that don't disturb the existing description extraction:

<existing description>

<!-- topics: images-graphics, developer-tools -->
<!-- features: canvas, url-state, file-upload, web-serial -->
<!-- Generated from commit: HASH -->
  • topics = what the tool does (the directory categories)
  • features = browser capabilities it actually uses (Canvas, Clipboard, WebAssembly, Web Serial, …)

All 215 tools are backfilled. tags.json is a shared, evolving vocabulary that's injected into the classification prompt so tags converge instead of sprawling.

2. The doc-generation pipeline self-tags new toolswrite_docs.py's existing Haiku call now also emits topics/features (with the vocabulary injected), and re-tags any tool still missing tags. gather_links.py surfaces tags into tools.json. The Pages workflow commits tags.json alongside the generated docs so coined tags persist. No new tool needs manual tagging.

3. A two-level category hierarchytags.json groups the 19 subcategory topics under 5 top-level categories:

Top-level Subcategories
Development & APIs Developer Tools · Code Sandboxes & REPLs · Web APIs & Services · CSS & Layout · Encoding & Security
Data, AI & Databases AI & LLMs · Data & JSON · SQLite & Databases · Data Visualization
Images, Audio & Documents Images & Graphics · Audio & Video · Documents & PDF
Text, Writing & Reference Text & Writing · Accessibility · Reference & Education
Web, Social & Fun Social & Feeds · Maps & Geography · Productivity · Games & Fun

The original LLM pass dumped 104/215 tools into a developer-tools catch-all, so it was carved: two new subcategories (Code Sandboxes & REPLs, Web APIs & Services) were pulled out and the redundant developer-tools tag dropped from tools that already had a specific home — 104 → 14 residual, nothing else bloated.

4. The directory (all generated from the tags, regenerated every build):

  • Homepage — a compact Yahoo "Browse by category" index: bold top-level categories with [count] superscripts and their subcategories as inline links.
  • Subcategory pages (/category-<slug>) — standalone drill-down pages listing each subcategory's tools with descriptions.
  • All-tools page (/categories) — the full hierarchy plus a "Browse by browser feature" section.

Shared chrome/CSS lives in a new page_template.py used by both the homepage and the category pages.

Screenshots

Homepage — Browse by category
Homepage Browse by category

/categories — All tools by category
All tools by category

/category-developer-tools — a subcategory page
Subcategory page

Browse by browser feature
Browse by browser feature

Notes / caveats

  • I couldn't run the live LLM path locally (no ANTHROPIC_API_KEY / llm-anthropic in the build env here), so the 215-tool backfill was done by a separate model pass reading each tool's HTML; the pipeline changes were verified by unit tests and a dry run, but the Haiku TAGS: emission only runs in CI. It's self-healing: a tool that ever lands without tags gets re-tagged on the next build.
  • A few tools have a literal # Documentation summary (e.g. API Explorer, Icon Editor) — pre-existing *.docs.md data, not touched here; they'll self-correct when those docs regenerate.
  • New unit tests: test_tags_lib.py, test_write_docs.py, test_directory.py (22 tests).

The generated pages (categories.html, category-*.html) are gitignored like index.html/colophon.html.


Provenance

Built with Claude Code 2.1.165 (harness) driving Claude Opus 4.8 (claude-opus-4-8[1m]), session 9b1d21ea-3d17-4022-8604-120ddc65d9e2, working with Jesse Vincent (@obra).

The full sequence of human prompts that produced this PR (verbatim)

the github action for this repo generates summaries for each of the tools in this repo. I'd like you to add tags for what each tool does and for which browser features it uses. then I'd like you to build a classical yahoo-style homepage browser for all the tools fed from the tags. then i'd like you to update the github action to add tags for newly created tools. when you're done, make me screenshots

(answering scoping questions)

Backfill: "I classify all 215 now" — note: "simon says there are only 215. be careful"
Browser features: "LLM-classified"
Homepage: "extend the homepage"

the tagging prompt should include all the current tags on any tool to make it easier to pick good tags

LFG

can you look at the devtools category and see if you can carve it cleanly?

(answering the carve question)

Dedup: "Drop redundant developer-tools"
"also, the top level was categories and subcategories"

carve them all into a hierarchy. (attached the 1998 Yahoo! homepage screenshot as the visual target)

browse by category is what we want. subcategories go to standalone pages. do it. no "All tools by category" on the homepage. But having an "All tools by category" page is good, too

where are the screenshots?

great. please make a PR. identify yourself by harness and model (and versions) and ideally session. include the screenshots and all the prompting that got us here.

obra added 8 commits June 5, 2026 15:11
- tags_lib.py: shared parse/render helpers for .docs.md tag comments
- tags.json: seed topic + feature vocabulary, injected into the LLM prompt
- write_docs.py: LLM call now emits topics/features, merges new tags
- gather_links.py: surfaces topics/features into tools.json
- unit tests for both parsers
Classified every tool's docs.md with topic tags (what it does) and
feature tags (browser capabilities used), drawn from the tags.json
vocabulary. Added web-serial to the vocabulary.
directory.py groups every tool by topic and by browser feature into a
classic directory layout; build_index.py injects it below the recent
section with retro styling and anchor-linked feature chips.
- Split the 104-tool developer-tools catch-all: new code-sandboxes (10)
  and apis-services (16) topics, and drop the redundant developer-tools
  tag from tools that already have a specific home (104 -> 14 residual).
- Restructure tags.json into top-level categories -> subcategories.
- directory.py now renders the two-level Yahoo-style hierarchy with a
  category nav and per-category banners.
Add a compact two-column category index (bold category names with inline
subcategory links and tool counts), with the full per-category tool
listings beneath it.
- Homepage now shows only the Yahoo-style 'Browse by category' index;
  each subcategory links to its own standalone page.
- New build_categories.py generates a 'All tools by category' page
  (full hierarchy + browse-by-feature) and one page per subcategory
  (breadcrumb + tools with descriptions).
- Extract shared chrome/directory CSS into page_template.py, used by
  both the homepage and the category pages (de-dups the styles).
- Wire build_categories.py into build.sh; gitignore the generated pages.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant