Skip to content

claude-code-chat-browser: Add parse/export/search performance benchmarks and CI artifacts #74

@clean6378-max-it

Description

@clean6378-max-it

Calendar Day

Thursday, June 12, 2026

Planned Effort

3 story points — sprint item #7 (Medium): Performance benchmarks for parse/export path

One issue → one PR.

Depends on: Mon–Wed week-2 work merged or rebased (stable CI matrix + parser hardening). Independent of export-warning response shape — benchmarks measure throughput, not HTTP headers.

Out of scope: perf regression gates in CI, caching architecture changes, frontend benchmarks.

Problem

No benchmarks/, perf/, or performance tests exist. The app re-parses JSONL from disk on session detail (api/sessions.py), search (api/search.py), and bulk export (api/export_api.pyrun_bulk_export). Long sessions (thousands of lines) and large bulk exports have no latency or memory baselines, so regressions in the parse boundary pipeline go undetected.

Goal

Establish repeatable, local performance measurements with:

  1. pytest-benchmark harness under tests/benchmarks/
  2. Synthetic corpora including a 5,000+ line session file
  3. tracemalloc peak-memory check on large parse
  4. Non-gating CI job that uploads benchmark JSON artifacts
  5. benchmarks/README.md documenting local runs

Scope

1. Dependencies and layout

Touch points: requirements-dev.txt, tests/benchmarks/, benchmarks/README.md, optional benchmarks/baselines.json, pyproject.toml (pytest marker)

  • Add pytest-benchmark>=4.0.0 to dev dependencies.
  • Create tests/benchmarks/ with modules for parse, export, search, and memory.

2. Synthetic fixtures

Build on patterns in tests/conftest.py and tests/fixtures/session_with_tools.jsonl:

Fixture Size Purpose
small ~10 JSONL lines Fast sanity bench
medium ~500 lines Typical long session
large ≥ 5,000 lines Memory pressure + worst-case parse
export corpus 10 / 50 / 100 session files Bulk export scaling
search corpus multi-session project tree Full linear scan search

Large file may be generated at test session scope (tmp_path_factory) rather than committed, as long as generation always produces ≥ 5,000 lines.

3. Benchmark scenarios

Scenario Target Tool
Single-session parse (small/medium/large) utils/jsonl_parser.parse_session pytest-benchmark
Bulk export (10 / 50 / 100 sessions) utils.export_engine.run_bulk_export + NoopSink pytest-benchmark
Search across corpus GET /api/search via Flask test client or equivalent loop in api/search.py pytest-benchmark
Large-parse memory parse_session on large file tracemalloc assert (regular pytest test)

Use @pytest.mark.benchmark on timing tests. Parametrize export counts with distinct benchmark ids.

4. Memory ceiling

  • Wrap large-file parse_session in tracemalloc.start() / get_traced_memory().
  • Assert peak allocated memory < 10× on-disk file size (document in test if ceiling adjusted).

5. CI (informational only)

Touch points: .github/workflows/ci.yml

Add benchmarks job on ubuntu-latest:

pytest tests/benchmarks/
  --benchmark-only
  --benchmark-json=benchmark-results.json
  -o addopts=
  • Upload benchmark-results.json via actions/upload-artifact.
  • No --benchmark-compare fail gate — baselines stabilize first.
  • Run with -o addopts= to disable coverage overhead from pyproject.toml addopts.
  • test_parse_memory.py runs in main pytest job (not --benchmark-only).

6. Documentation

  • benchmarks/README.md — local commands, scenario table, CI artifact note, how to refresh baselines.json.
  • One-line link from CONTRIBUTING.md testing section.

Acceptance Criteria

  • Benchmark module exists under tests/benchmarks/ using pytest-benchmark
  • Benchmarks cover: single-session parse (small, medium, large), bulk export (10, 50, 100 sessions), search across synthetic corpus
  • Large-file fixture defined with ≥ 5,000 JSONL lines
  • Benchmark results captured in CI as artifacts (not pass/fail perf gates)
  • Memory usage measured for large-file parse (tracemalloc); peak under documented ceiling
  • benchmarks/README.md or CONTRIBUTING section explains local runs
  • pytest -q, mypy, ruff check . pass in main CI jobs
  • PR approved by at least 1 reviewer

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions