test: add parse/export/search performance benchmarks and CI artifacts by clean6378-max-it · Pull Request #76 · cppalliance/claude-code-chat-browser

clean6378-max-it · 2026-06-11T15:53:05Z

Closes #74

Summary

Add pytest-benchmark harness under tests/benchmarks/ for parse, bulk export, and search paths
Generate synthetic corpora at session scope (10 / 500 / 5000+ line JSONL; 10 / 50 / 100 session export trees; 50-session search corpus)
Add tracemalloc memory ceiling test for large-file parse (peak < 10× on-disk file size)
Add non-gating benchmarks CI job that uploads benchmark-results.json as an artifact
Document local runs in benchmarks/README.md and link from CONTRIBUTING.md
Skip timing benchmarks in normal pytest via --benchmark-skip; memory test runs in main CI

Sprint items

Closes Chen W2 #7

Test plan

pytest tests/benchmarks/ --benchmark-only -o addopts= -v
pytest tests/benchmarks/test_parse_memory.py -v
pytest -q green locally
mypy -p api -p utils -p models green
ruff check . and ruff format --check . green
CI benchmarks job completes and benchmark-results artifact is downloadable

Summary by CodeRabbit

Tests
- Added performance benchmarks for parsing, search, and bulk export across multiple dataset sizes.
- Added a memory-usage test, fixtures to generate synthetic corpora, and a test client for search benchmarking.
Documentation
- Added a benchmarking guide with local run instructions and baseline refresh notes.
- Updated contributing docs with benchmark execution commands.
Chores
- CI now runs benchmarks and uploads a benchmark-results artifact.
- Added pytest-benchmark as a dev dependency and included a baseline snapshot.

coderabbitai · 2026-06-11T15:53:18Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds pytest-benchmark benchmarks (parse, export, search), fixtures to synthesize JSONL corpora (small/medium/large and multi-session export/search corpora), a tracemalloc memory ceiling test, CI job uploading benchmark JSON, and documentation for local runs and baselines.

Changes

Performance Benchmark Suite

Layer / File(s)	Summary
Dependencies and pytest configuration `requirements-dev.txt`, `pyproject.toml`, `benchmarks/baselines.json`	Added `pytest-benchmark==5.2.3`, registered the `benchmark` pytest marker, updated `addopts` to skip benchmarks by default, and added an initial `benchmarks/baselines.json`.
Test fixture infrastructure `tests/benchmarks/conftest.py`	Built synthetic JSONL corpus generation and fixtures: `write_jsonl`, `seed_search_corpus`, small/medium/large parse files, parametrized `export_corpus`, and `bench_client_search_corpus` Flask test client.
Parse benchmarks `tests/benchmarks/test_parse_bench.py`, `tests/benchmarks/test_parse_memory.py`	Implemented three parse throughput benchmarks (small/medium/large) using pytest-benchmark and a tracemalloc memory ceiling test asserting peak < 10× on-disk file size.
Export and search benchmarks `tests/benchmarks/test_export_bench.py`, `tests/benchmarks/test_search_bench.py`	Added parametrized bulk export benchmarks (10, 50, 100 sessions) measuring `run_bulk_export` and a full-corpus search benchmark via Flask test client GET `/api/search`.
CI integration and documentation `.github/workflows/ci.yml`, `CONTRIBUTING.md`, `benchmarks/README.md`	Configured CI `benchmarks` job that runs `pytest --benchmark-only`, emits `benchmark-results.json`, uploads it as an artifact, added a contributing-guide test command, and added README with run and baseline instructions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

timon0305
wpak-ai

"🐰 Hop hop, I ran the tests with care,
JSONL forests everywhere,
Benchmarks hum, baselines wait,
Tracemalloc guards memory's gate,
Little rabbit claps — performance fair! 🥕"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 26.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding parse/export/search performance benchmarks and CI artifacts.
Linked Issues check	✅ Passed	The PR fulfills all core objectives from issue `#74`: pytest-benchmark harness with parse/export/search tests, synthetic fixtures (small ~10, medium ~500, large ≥5000 lines), tracemalloc memory test, non-gating CI job with artifact upload, and documentation.
Out of Scope Changes check	✅ Passed	All changes directly support benchmark infrastructure. Minor updates to pyproject.toml (ruff glob fix to tests/*/.py) and requirements-dev.txt (pytest-benchmark dependency) are necessary supporting changes within scope.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chore/parse-export-benchmarks

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@requirements-dev.txt`:
- Line 8: Update verification for the pytest-benchmark pin: confirm that the
project's benchmarks and test suite are compatible with pytest-benchmark 5.x by
running CI/tests with pytest-benchmark==5.2.3 (or latest) and reviewing 5.x
release notes for breaking changes; if incompatible or unsafe, pin
requirements-dev.txt to a specific compatible version (e.g.,
pytest-benchmark==4.0.0) or add an upper bound (e.g.,
pytest-benchmark>=4.0.0,<5.0.0) and document the decision in the PR, referencing
the package name pytest-benchmark when making the change.

In `@tests/benchmarks/conftest.py`:
- Line 26: This is a static-analysis false positive: keep using json.dumps when
writing to the file but annotate the call site (the f.write(json.dumps(entry,
separators=(",", ":")) + "\n") expression) with a short inline comment
explaining that jsonify is a Flask helper for HTTP responses and not for file
serialization, and add a suppression directive for the static analyzer (e.g., an
ast-grep/linters-ignore comment) on that line so the hint is not raised again.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f2f5f062-13bd-46db-a7f0-dd9ac150905b

📥 Commits

Reviewing files that changed from the base of the PR and between 8083f3e and 6283017.

📒 Files selected for processing (12)

.github/workflows/ci.yml
CONTRIBUTING.md
benchmarks/README.md
benchmarks/baselines.json
pyproject.toml
requirements-dev.txt
tests/benchmarks/__init__.py
tests/benchmarks/conftest.py
tests/benchmarks/test_export_bench.py
tests/benchmarks/test_parse_bench.py
tests/benchmarks/test_parse_memory.py
tests/benchmarks/test_search_bench.py

coderabbitai

🧹 Nitpick comments (1)

tests/benchmarks/conftest.py (1)
26-30: 💤 Low value

Documentation is good; suppression format note.

The inline comment clearly documents why json.dumps is correct for file I/O (not Flask's HTTP jsonify), which addresses the past review feedback well. However, the linters-ignore directive on line 28 is not a recognized suppression format for ast-grep, so the [info] hint may still appear. Since the code is correct and well-documented, this is acceptable—the comment's documentation value is more important than suppressing the false-positive hint.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/benchmarks/conftest.py` around lines 26 - 30, The inline suppression
"linters-ignore: prefer-jsonify" is not a recognized ast-grep format and should
be removed; keep the explanatory comment and the json.dumps call that builds the
serialized variable, and if you need to silence the false positive instead add
the repository's supported suppression syntax (use the project's documented
ast-grep/flake8/pytest suppression keyword) next to the json.dumps line or in a
nearby comment so the linter recognizes it.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/benchmarks/conftest.py`:
- Around line 26-30: The inline suppression "linters-ignore: prefer-jsonify" is
not a recognized ast-grep format and should be removed; keep the explanatory
comment and the json.dumps call that builds the serialized variable, and if you
need to silence the false positive instead add the repository's supported
suppression syntax (use the project's documented ast-grep/flake8/pytest
suppression keyword) next to the json.dumps line or in a nearby comment so the
linter recognizes it.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 575ac0e3-08ed-4c88-ab95-b08edf66daae

📥 Commits

Reviewing files that changed from the base of the PR and between 6283017 and e38b910.

📒 Files selected for processing (2)

requirements-dev.txt
tests/benchmarks/conftest.py

…tures Pin pytest-benchmark to 5.2.3 after verifying compatibility with pytest 9.0 and the benchmark suite. Annotate benchmark JSONL serialization to document that json.dumps is intentional for file I/O, not Flask jsonify.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/benchmarks/conftest.py (1)

16-31: ⚡ Quick win

Parse the template once outside the loop.

The json.loads(TEMPLATE_LINE) call on line 20 is executed for every iteration, re-parsing the same JSON string up to 5,000 times for the large fixture. Parsing once and using copy.deepcopy() in the loop would eliminate this overhead and improve fixture generation performance.

♻️ Proposed refactor

+import copy
 import json
 from pathlib import Path
 
 ...
 
 def write_jsonl(path: Path, line_count: int) -> Path:
     """Write a JSONL session file with *line_count* rows derived from the template fixture."""
+    template = json.loads(TEMPLATE_LINE)
     with path.open("w", encoding="utf-8") as f:
         for i in range(line_count):
-            entry = json.loads(TEMPLATE_LINE)
+            entry = copy.deepcopy(template)
             entry["timestamp"] = f"2026-06-12T10:{i % 60:02d}:00Z"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/benchmarks/conftest.py` around lines 16 - 31, The write_jsonl function
currently calls json.loads(TEMPLATE_LINE) inside the for-loop causing repeated
parsing; parse TEMPLATE_LINE once before the loop (e.g., template =
json.loads(TEMPLATE_LINE)) and inside the loop use copy.deepcopy(template) to
get a fresh entry, then modify entry["timestamp"] and the message content as
before; ensure copy is imported (from copy import deepcopy or import copy) and
replace json.loads(TEMPLATE_LINE) references with deepcopy(template) to
eliminate the repeated parse.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/benchmarks/conftest.py`:
- Around line 34-45: The seed_search_corpus function creates the bench-project
directory with project.mkdir(parents=True) which can fail if the directory
already exists; update the call in seed_search_corpus to include exist_ok=True
(i.e., project.mkdir(parents=True, exist_ok=True)) so the function is robust to
repeated runs or pre-existing directories.

---

Nitpick comments:
In `@tests/benchmarks/conftest.py`:
- Around line 16-31: The write_jsonl function currently calls
json.loads(TEMPLATE_LINE) inside the for-loop causing repeated parsing; parse
TEMPLATE_LINE once before the loop (e.g., template = json.loads(TEMPLATE_LINE))
and inside the loop use copy.deepcopy(template) to get a fresh entry, then
modify entry["timestamp"] and the message content as before; ensure copy is
imported (from copy import deepcopy or import copy) and replace
json.loads(TEMPLATE_LINE) references with deepcopy(template) to eliminate the
repeated parse.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7a9d7b84-0ef6-4ab8-93a8-4ddfb51b4e73

📥 Commits

Reviewing files that changed from the base of the PR and between e38b910 and 68ba23c.

📒 Files selected for processing (12)

.github/workflows/ci.yml
CONTRIBUTING.md
benchmarks/README.md
benchmarks/baselines.json
pyproject.toml
requirements-dev.txt
tests/benchmarks/__init__.py
tests/benchmarks/conftest.py
tests/benchmarks/test_export_bench.py
tests/benchmarks/test_parse_bench.py
tests/benchmarks/test_parse_memory.py
tests/benchmarks/test_search_bench.py

✅ Files skipped from review due to trivial changes (2)

benchmarks/README.md
benchmarks/baselines.json

🚧 Files skipped from review as they are similar to previous changes (8)

.github/workflows/ci.yml
pyproject.toml
requirements-dev.txt
tests/benchmarks/test_parse_bench.py
tests/benchmarks/test_search_bench.py
tests/benchmarks/test_export_bench.py
tests/benchmarks/test_parse_memory.py
CONTRIBUTING.md

Parse the JSONL template once per write_jsonl call and deepcopy entries in the loop. Use exist_ok=True when creating bench-project in seed_search_corpus.

clean6378-max-it · 2026-06-11T21:12:17Z

@CodeRabbit please review this PR inn more depth

coderabbitai · 2026-06-11T21:14:26Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== tests/benchmarks/conftest.py ===" && cat tests/benchmarks/conftest.py
echo ""
echo "=== tests/benchmarks/test_parse_bench.py ===" && cat tests/benchmarks/test_parse_bench.py
echo ""
echo "=== tests/benchmarks/test_export_bench.py ===" && cat tests/benchmarks/test_export_bench.py
echo ""
echo "=== tests/benchmarks/test_search_bench.py ===" && cat tests/benchmarks/test_search_bench.py
echo ""
echo "=== tests/benchmarks/test_parse_memory.py ===" && cat tests/benchmarks/test_parse_memory.py
echo ""
echo "=== pyproject.toml ===" && cat pyproject.toml
echo ""
echo "=== .github/workflows/ci.yml ===" && cat .github/workflows/ci.yml
echo ""
echo "=== benchmarks/README.md ===" && cat benchmarks/README.md
echo ""
echo "=== requirements-dev.txt ===" && cat requirements-dev.txt
echo ""
echo "=== CONTRIBUTING.md ===" && cat CONTRIBUTING.md
echo ""
echo "=== benchmarks/baselines.json ===" && cat benchmarks/baselines.json

Length of output: 23586

🏁 Script executed:

#!/bin/bash
# Check the fixtures directory and what session_with_tools.jsonl looks like
echo "=== Fixtures directory ===" && ls tests/fixtures/ 2>/dev/null || ls fixtures/ 2>/dev/null
echo ""
echo "=== session_with_tools.jsonl (first 3 lines) ===" && head -3 tests/fixtures/session_with_tools.jsonl 2>/dev/null || head -3 fixtures/session_with_tools.jsonl 2>/dev/null
echo ""
# Check existing utils for parse_session and run_bulk_export signatures
echo "=== parse_session function ===" 
ast-grep --pattern 'def parse_session($_) {$$$}'
echo ""
echo "=== run_bulk_export function ===" 
ast-grep --pattern 'def run_bulk_export($$$) {$$$}'
echo ""
# Check for NoopSink
echo "=== NoopSink ===" && rg -n "NoopSink" --type py -C 3
echo ""
# Check create_app signature
echo "=== create_app ===" && rg -n "def create_app" --type py -C 5

Length of output: 5766

@clean6378-max-it — here is an in-depth review of PR #76.

Deep Review: PR `#76` — Parse/Export/Search Performance Benchmarks

`tests/benchmarks/conftest.py`

1. Module-level file I/O is fragile (medium)

TEMPLATE_LINE is read at import time, before any test runs:

TEMPLATE_LINE = (FIXTURES / "session_with_tools.jsonl").read_text(...).splitlines()[0]

If the fixture file is missing or the path changes, the entire conftest fails to import, which silently kills all benchmark discovery with a cryptic ModuleNotFoundError. Move this inside write_jsonl or into a @pytest.fixture(scope="session") so the failure surface is a clear pytest error at the right point.

2. `export_corpus` fixture scope vs. parametrized size (medium)

export_corpus is function-scoped (default @pytest.fixture). For the 100-session parametrized case that writes 2 000 files, pytest recreates the corpus directory on every test invocation. Upgrade to scope="session" with tmp_path_factory to match the parse fixtures:

`@pytest.fixture`(scope="session", params=[10, 50, 100], ids=["sessions-10","sessions-50","sessions-100"])
def export_corpus(tmp_path_factory: pytest.TempPathFactory, request: pytest.FixtureRequest) -> Path:
    count = request.param
    root = tmp_path_factory.mktemp(f"export_corpus_{count}")
    project = root / "bench-project"
    project.mkdir()
    for i in range(count):
        write_jsonl(project / f"session_{i:04d}.jsonl", 20)
    return project

Note: with indirect=True in the test, the parametrize values still flow through request.param correctly.

3. `bench_client_search_corpus` should also be session-scoped (low-medium)

While only one test currently uses this fixture, keeping it function-scoped means a future second test pays the full 50-session + Flask-app setup cost again. Session scope costs nothing here.

4. Timestamp cycling in `write_jsonl` (low)

entry["timestamp"] = f"2026-06-12T10:{i % 60:02d}:00Z"

For line_count=5000 this produces 83 full cycles with duplicate timestamps. If parse_session deduplicates or sorts on timestamp, the large-file benchmark exercises artificially compressed data. Use a more unique scheme:

minute = (i // 60) % 60
second = i % 60
entry["timestamp"] = f"2026-06-12T{10 + i // 3600:02d}:{minute:02d}:{second:02d}Z"

`tests/benchmarks/test_parse_memory.py`

5. `tracemalloc.clear_traces()` missing — contamination risk (high)

tracemalloc.start()
try:
    parse_session(str(path))
    _, peak = tracemalloc.get_traced_memory()
finally:
    tracemalloc.stop()

If any prior test (including a pytest plugin or coverage tool) called tracemalloc.start() without stopping, calling it again here is a no-op. Worse, get_traced_memory() will return the peak since that earlier start, inflating the number. Fix:

tracemalloc.start()
tracemalloc.clear_traces()   # ← add this
try:
    parse_session(str(path))
    _, peak = tracemalloc.get_traced_memory()
finally:
    tracemalloc.stop()

6. No assertion on `parse_session` return value (low)

If parse_session silently returns an empty result (e.g., due to a schema mismatch in generated data), the memory test still passes while proving nothing. Add a minimal sanity check:

result = parse_session(str(path))
assert result is not None and len(result) > 0, "parse_session returned empty — corpus data invalid"

`pyproject.toml`

7. `per-file-ignores` glob doesn't cover benchmark subdirectory (medium)

[tool.ruff.lint.per-file-ignores]
"tests/*.py" = ["E402"]

The glob tests/*.py matches files directly in tests/, but not tests/benchmarks/conftest.py or any file in subdirectories. Ruff will enforce E402 (import-not-at-top) on the benchmark files if they ever need a sys.path prepend. Fix:

"tests/**/*.py" = ["E402"]

Or add an explicit entry:

"tests/benchmarks/*.py" = ["E402"]

`.github/workflows/ci.yml` — `benchmarks` job

8. Artifact upload skipped when benchmarks fail (medium)

- name: Run benchmarks
  run: >
    pytest tests/benchmarks/ --benchmark-only ...

- uses: actions/upload-artifact@...
  with:
    name: benchmark-results
    path: benchmark-results.json

If the pytest step exits non-zero (import error, fixture crash, etc.), the upload step is skipped entirely. Add if: always() to preserve the partial JSON for debugging:

- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02
  if: always()
  with:
    name: benchmark-results
    path: benchmark-results.json

9. Job fails loudly but is labelled "informational" — consider `continue-on-error` (low)

The job name says "informational" and the README says there is no gate, yet a benchmark crash will mark the CI run red. Since it has no needs: gate either, it may block merge if added to branch protection later. Consider:

benchmarks:
  name: Performance benchmarks (informational)
  runs-on: ubuntu-latest
  continue-on-error: true   # ← prevents a flaky benchmark from blocking merges

10. `benchmarks` job has no dependency on `pytest` passing (low)

The benchmarks job starts in parallel with all other jobs and could spend CI minutes on a commit that already fails unit tests. Adding needs: [pytest] would avoid that waste:

benchmarks:
  needs: [pytest]

`benchmarks/README.md`

11. README placement vs. actual test location (low)

The README lives at benchmarks/README.md but all test code lives under tests/benchmarks/. The top of the README should make that explicit so new contributors don't hunt for benchmark files:

> Test files live under `tests/benchmarks/`; this directory holds only documentation and the informational `baselines.json` snapshot.

Minor / Nits

Location	Issue
All test files	`benchmark` fixture parameter has no type hint. Consider `from pytest_benchmark.fixture import BenchmarkFixture` and annotating as `benchmark: BenchmarkFixture`.
`conftest.py` L33	`entry.setdefault("message", {})` is never a non-dict for the given fixture; the `isinstance(msg, dict)` guard is redundant, but harmless.
`baselines.json`	Committing an all-null snapshot is fine as a placeholder, but `"updated": null, "machine": null` could be dropped and documented as optional fields to reduce confusion.
`test_export_bench.py`	The inner `_run` closure captures `export_corpus` from the outer scope. Works correctly; just worth a comment since it's non-obvious.

Summary table

#	Severity	File	Issue
1	Medium	`conftest.py`	Module-level file I/O — fragile on missing fixture
2	Medium	`conftest.py`	`export_corpus` function-scoped, should be session
3	Low	`conftest.py`	`bench_client_search_corpus` function-scoped
4	Low	`conftest.py`	Timestamp cycles for large files
5	High	`test_parse_memory.py`	Missing `tracemalloc.clear_traces()` — contamination risk
6	Low	`test_parse_memory.py`	No assertion on `parse_session` return value
7	Medium	`pyproject.toml`	`per-file-ignores` glob misses `tests/benchmarks/`
8	Medium	`ci.yml`	Artifact upload not guarded by `if: always()`
9	Low	`ci.yml`	No `continue-on-error: true` for informational job
10	Low	`ci.yml`	No `needs: [pytest]` guard
11	Low	`benchmarks/README.md`	Doesn't state actual test file location

The most actionable fixes are #5 (tracemalloc contamination), #2 (fixture scope for corpus creation), #7 (ruff glob), and #8 (CI artifact upload guard). The rest are low-risk polish items.

clean6378-max-it · 2026-06-11T21:27:01Z

5, 6, 7, 11 are low priority, Rest are optional , no need to fix

@CodeRabbit

coderabbitai · 2026-06-11T21:27:17Z

Tip

For best results, initiate chat on the files or code changes.

@clean6378-max-it — Understood, thanks for clarifying! All noted:

#5, #6, #7, #11 — acknowledged as low priority; can be revisited in a follow-up if needed.
#1, #2, #3, #4, #8, #9, #10 — optional polish items, won't be pursued in this PR.

The core benchmark harness, CI job, fixtures, and documentation are all in good shape. Happy to help if anything else comes up. 🐇

Reset tracemalloc peak before measuring large-file parse, assert non-empty message count, extend E402 per-file-ignores to tests/**, and clarify README that benchmark tests live under tests/benchmarks/.

Drop unnecessary actions: write on benchmarks CI job; assert explicit search hit count on list response; document 10x memory ceiling and v1 template limitations in README; note NoopSink export rounds are stateless.

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread requirements-dev.txt Outdated

Comment thread tests/benchmarks/conftest.py Outdated

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

clean6378-max-it added 2 commits June 12, 2026 00:54

test: add parse/export/search performance benchmarks and CI artifacts

bfdd7c3

clean6378-max-it force-pushed the chore/parse-export-benchmarks branch from e38b910 to 68ba23c Compare June 11, 2026 16:56

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread tests/benchmarks/conftest.py

refactor(benchmarks): harden corpus helpers in conftest

c1d3030

Parse the JSONL template once per write_jsonl call and deepcopy entries in the loop. Use exist_ok=True when creating bench-project in seed_search_corpus.

clean6378-max-it requested a review from timon0305 June 11, 2026 20:45

fix(benchmarks): harden memory test and ruff test glob

ea3b1ca

Reset tracemalloc peak before measuring large-file parse, assert non-empty message count, extend E402 per-file-ignores to tests/**, and clarify README that benchmark tests live under tests/benchmarks/.

timon0305 requested changes Jun 12, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml Outdated

Comment thread tests/benchmarks/test_search_bench.py Outdated

Comment thread tests/benchmarks/test_parse_memory.py

Comment thread tests/benchmarks/test_export_bench.py

Comment thread tests/benchmarks/conftest.py

fix(benchmarks): address PR #76 review feedback from @timon0305

800244a

Drop unnecessary actions: write on benchmarks CI job; assert explicit search hit count on list response; document 10x memory ceiling and v1 template limitations in README; note NoopSink export rounds are stateless.

clean6378-max-it requested a review from timon0305 June 12, 2026 13:39

timon0305 approved these changes Jun 12, 2026

View reviewed changes

timon0305 requested a review from wpak-ai June 12, 2026 13:46

wpak-ai approved these changes Jun 12, 2026

View reviewed changes

wpak-ai merged commit 0435b2d into master Jun 12, 2026
16 checks passed

wpak-ai deleted the chore/parse-export-benchmarks branch June 12, 2026 15:52

Conversation

clean6378-max-it commented Jun 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Sprint items

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

clean6378-max-it commented Jun 11, 2026

Uh oh!

coderabbitai Bot commented Jun 11, 2026

Deep Review: PR #76 — Parse/Export/Search Performance Benchmarks

tests/benchmarks/conftest.py

1. Module-level file I/O is fragile (medium)

2. export_corpus fixture scope vs. parametrized size (medium)

3. bench_client_search_corpus should also be session-scoped (low-medium)

4. Timestamp cycling in write_jsonl (low)

tests/benchmarks/test_parse_memory.py

5. tracemalloc.clear_traces() missing — contamination risk (high)

6. No assertion on parse_session return value (low)

pyproject.toml

7. per-file-ignores glob doesn't cover benchmark subdirectory (medium)

.github/workflows/ci.yml — benchmarks job

8. Artifact upload skipped when benchmarks fail (medium)

9. Job fails loudly but is labelled "informational" — consider continue-on-error (low)

10. benchmarks job has no dependency on pytest passing (low)

benchmarks/README.md

11. README placement vs. actual test location (low)

Minor / Nits

Summary table

Uh oh!

clean6378-max-it commented Jun 11, 2026

Uh oh!

coderabbitai Bot commented Jun 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clean6378-max-it commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

Deep Review: PR `#76` — Parse/Export/Search Performance Benchmarks

`tests/benchmarks/conftest.py`

2. `export_corpus` fixture scope vs. parametrized size (medium)

3. `bench_client_search_corpus` should also be session-scoped (low-medium)

4. Timestamp cycling in `write_jsonl` (low)

`tests/benchmarks/test_parse_memory.py`

5. `tracemalloc.clear_traces()` missing — contamination risk (high)

6. No assertion on `parse_session` return value (low)

`pyproject.toml`

7. `per-file-ignores` glob doesn't cover benchmark subdirectory (medium)

`.github/workflows/ci.yml` — `benchmarks` job

9. Job fails loudly but is labelled "informational" — consider `continue-on-error` (low)

10. `benchmarks` job has no dependency on `pytest` passing (low)

`benchmarks/README.md`