claude-code-chat-browser: Add parse/export/search performance benchmarks and CI artifacts

## Calendar Day

Thursday, June 12, 2026

## Planned Effort

**3 story points** — sprint item **#7** (Medium): Performance benchmarks for parse/export path

**One issue → one PR.**

**Depends on:** Mon–Wed week-2 work merged or rebased (stable CI matrix + parser hardening). Independent of export-warning response shape — benchmarks measure throughput, not HTTP headers.

**Out of scope:** perf regression gates in CI, caching architecture changes, frontend benchmarks.

## Problem

No `benchmarks/`, `perf/`, or performance tests exist. The app **re-parses JSONL from disk** on session detail (`api/sessions.py`), search (`api/search.py`), and bulk export (`api/export_api.py` → `run_bulk_export`). Long sessions (thousands of lines) and large bulk exports have **no latency or memory baselines**, so regressions in the parse boundary pipeline go undetected.

## Goal

Establish **repeatable, local** performance measurements with:

1. `pytest-benchmark` harness under `tests/benchmarks/`
2. Synthetic corpora including a **5,000+ line** session file
3. **tracemalloc** peak-memory check on large parse
4. **Non-gating** CI job that uploads benchmark JSON artifacts
5. `benchmarks/README.md` documenting local runs

## Scope

### 1. Dependencies and layout

**Touch points:** `requirements-dev.txt`, `tests/benchmarks/`, `benchmarks/README.md`, optional `benchmarks/baselines.json`, `pyproject.toml` (pytest marker)

- Add `pytest-benchmark>=4.0.0` to dev dependencies.
- Create `tests/benchmarks/` with modules for parse, export, search, and memory.

### 2. Synthetic fixtures

Build on patterns in `tests/conftest.py` and `tests/fixtures/session_with_tools.jsonl`:

| Fixture | Size | Purpose |
|---------|------|---------|
| small | ~10 JSONL lines | Fast sanity bench |
| medium | ~500 lines | Typical long session |
| large | **≥ 5,000 lines** | Memory pressure + worst-case parse |
| export corpus | 10 / 50 / 100 session files | Bulk export scaling |
| search corpus | multi-session project tree | Full linear scan search |

Large file may be **generated at test session scope** (`tmp_path_factory`) rather than committed, as long as generation always produces ≥ 5,000 lines.

### 3. Benchmark scenarios

| Scenario | Target | Tool |
|----------|--------|------|
| Single-session parse (small/medium/large) | `utils/jsonl_parser.parse_session` | `pytest-benchmark` |
| Bulk export (10 / 50 / 100 sessions) | `utils.export_engine.run_bulk_export` + `NoopSink` | `pytest-benchmark` |
| Search across corpus | `GET /api/search` via Flask test client **or** equivalent loop in `api/search.py` | `pytest-benchmark` |
| Large-parse memory | `parse_session` on large file | `tracemalloc` assert (regular pytest test) |

Use `@pytest.mark.benchmark` on timing tests. Parametrize export counts with distinct benchmark `id`s.

### 4. Memory ceiling

- Wrap large-file `parse_session` in `tracemalloc.start()` / `get_traced_memory()`.
- Assert peak allocated memory **&lt; 10×** on-disk file size (document in test if ceiling adjusted).

### 5. CI (informational only)

**Touch points:** `.github/workflows/ci.yml`

Add **`benchmarks`** job on `ubuntu-latest`:

```yaml
pytest tests/benchmarks/
  --benchmark-only
  --benchmark-json=benchmark-results.json
  -o addopts=
```

- Upload `benchmark-results.json` via `actions/upload-artifact`.
- **No** `--benchmark-compare` fail gate — baselines stabilize first.
- Run with `-o addopts=` to disable coverage overhead from `pyproject.toml` `addopts`.
- `test_parse_memory.py` runs in main `pytest` job (not `--benchmark-only`).

### 6. Documentation

- `benchmarks/README.md` — local commands, scenario table, CI artifact note, how to refresh `baselines.json`.
- One-line link from `CONTRIBUTING.md` testing section.

## Acceptance Criteria

- [ ] Benchmark module exists under `tests/benchmarks/` using `pytest-benchmark`
- [ ] Benchmarks cover: single-session parse (small, medium, large), bulk export (10, 50, 100 sessions), search across synthetic corpus
- [ ] Large-file fixture defined with ≥ 5,000 JSONL lines
- [ ] Benchmark results captured in CI as artifacts (not pass/fail perf gates)
- [ ] Memory usage measured for large-file parse (`tracemalloc`); peak under documented ceiling
- [ ] `benchmarks/README.md` or CONTRIBUTING section explains local runs
- [ ] `pytest -q`, `mypy`, `ruff check .` pass in main CI jobs
- [ ] PR approved by at least 1 reviewer

Scenario	Target	Tool
Single-session parse (small/medium/large)	`utils/jsonl_parser.parse_session`	`pytest-benchmark`
Bulk export (10 / 50 / 100 sessions)	`utils.export_engine.run_bulk_export` + `NoopSink`	`pytest-benchmark`
Search across corpus	`GET /api/search` via Flask test client or equivalent loop in `api/search.py`	`pytest-benchmark`
Large-parse memory	`parse_session` on large file	`tracemalloc` assert (regular pytest test)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

claude-code-chat-browser: Add parse/export/search performance benchmarks and CI artifacts #74

Calendar Day

Planned Effort

Problem

Goal

Scope

1. Dependencies and layout

2. Synthetic fixtures

3. Benchmark scenarios

4. Memory ceiling

5. CI (informational only)

6. Documentation

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Fixture	Size	Purpose
small	~10 JSONL lines	Fast sanity bench
medium	~500 lines	Typical long session
large	≥ 5,000 lines	Memory pressure + worst-case parse
export corpus	10 / 50 / 100 session files	Bulk export scaling
search corpus	multi-session project tree	Full linear scan search

claude-code-chat-browser: Add parse/export/search performance benchmarks and CI artifacts #74

Description

Calendar Day

Planned Effort

Problem

Goal

Scope

1. Dependencies and layout

2. Synthetic fixtures

3. Benchmark scenarios

4. Memory ceiling

5. CI (informational only)

6. Documentation

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions