Skip to content

WIP: Add Root Archive technical design document#1273

Open
stevenle wants to merge 1 commit into
mainfrom
claude/website-screenshot-archive-design-qxt5im
Open

WIP: Add Root Archive technical design document#1273
stevenle wants to merge 1 commit into
mainfrom
claude/website-screenshot-archive-design-qxt5im

Conversation

@stevenle

Copy link
Copy Markdown
Member

Summary

This PR introduces a comprehensive technical design document for Root Archive, a new application in the monorepo that captures and stores screenshots of public websites over time, similar to the Internet Archive's Wayback Machine.

Key Changes

  • Added apps/root-archive/DESIGN.md — A detailed 444-line technical design document covering:
    • Architecture overview — Multi-service design with separate API (control plane) and Worker services on Cloud Run, using Cloud Tasks for queueing, Firestore for metadata, and Cloud Storage for image blobs
    • Domain model — Definitions of Targets, Capture Jobs, Page Captures, and Schedules with complete Firestore schema and data structures
    • GCP infrastructure choices — Rationale for Cloud Run (vs App Engine), Cloud Tasks (vs Pub/Sub), Firestore, and Cloud Storage
    • Screenshot worker design — Playwright-based implementation with per-request browser contexts, idempotency guarantees, and crash isolation
    • API surface — RESTful endpoints for job submission, capture history, and schedule management
    • Viewer UI — Minimal Root.js-based frontend for browsing captures and timelines
    • Security considerations — SSRF protection, resource exhaustion controls, politeness/robots.txt handling, and least-privilege IAM
    • Cost & scaling analysis — Dominant cost drivers and tuning knobs
    • Alternatives considered — Justification for key technology choices
    • Observability — Structured logging, metrics, tracing, and dead-letter handling
    • Rollout milestones — Five-phase delivery plan from skeleton to hardening
    • Open questions — Auth model, public vs private serving, retention, and future extensions

Notable Details

  • The design reuses existing repo patterns: Firestore (like root-cms) and Cloud Storage (like apps/root-services)
  • Proposes content-hash deduplication to avoid storing duplicate screenshots of unchanged pages
  • Includes comprehensive security hardening for SSRF protection and abuse prevention
  • Leaves room for future multi-tenancy and advanced features (recursive crawl, WARC replay, diff view)
  • Document is marked as Draft / RFC for community feedback before implementation begins

https://claude.ai/code/session_014BthJ31Z8k3PH5eEjAtvGk

@stevenle stevenle changed the title Add Root Archive technical design document WIP: Add Root Archive technical design document Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants