test: add parse/export/search performance benchmarks and CI artifacts#76
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds pytest-benchmark benchmarks (parse, export, search), fixtures to synthesize JSONL corpora (small/medium/large and multi-session export/search corpora), a tracemalloc memory ceiling test, CI job uploading benchmark JSON, and documentation for local runs and baselines. ChangesPerformance Benchmark Suite
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@requirements-dev.txt`:
- Line 8: Update verification for the pytest-benchmark pin: confirm that the
project's benchmarks and test suite are compatible with pytest-benchmark 5.x by
running CI/tests with pytest-benchmark==5.2.3 (or latest) and reviewing 5.x
release notes for breaking changes; if incompatible or unsafe, pin
requirements-dev.txt to a specific compatible version (e.g.,
pytest-benchmark==4.0.0) or add an upper bound (e.g.,
pytest-benchmark>=4.0.0,<5.0.0) and document the decision in the PR, referencing
the package name pytest-benchmark when making the change.
In `@tests/benchmarks/conftest.py`:
- Line 26: This is a static-analysis false positive: keep using json.dumps when
writing to the file but annotate the call site (the f.write(json.dumps(entry,
separators=(",", ":")) + "\n") expression) with a short inline comment
explaining that jsonify is a Flask helper for HTTP responses and not for file
serialization, and add a suppression directive for the static analyzer (e.g., an
ast-grep/linters-ignore comment) on that line so the hint is not raised again.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f2f5f062-13bd-46db-a7f0-dd9ac150905b
📒 Files selected for processing (12)
.github/workflows/ci.ymlCONTRIBUTING.mdbenchmarks/README.mdbenchmarks/baselines.jsonpyproject.tomlrequirements-dev.txttests/benchmarks/__init__.pytests/benchmarks/conftest.pytests/benchmarks/test_export_bench.pytests/benchmarks/test_parse_bench.pytests/benchmarks/test_parse_memory.pytests/benchmarks/test_search_bench.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/benchmarks/conftest.py (1)
26-30: 💤 Low valueDocumentation is good; suppression format note.
The inline comment clearly documents why
json.dumpsis correct for file I/O (not Flask's HTTPjsonify), which addresses the past review feedback well. However, thelinters-ignoredirective on line 28 is not a recognized suppression format for ast-grep, so the [info] hint may still appear. Since the code is correct and well-documented, this is acceptable—the comment's documentation value is more important than suppressing the false-positive hint.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/benchmarks/conftest.py` around lines 26 - 30, The inline suppression "linters-ignore: prefer-jsonify" is not a recognized ast-grep format and should be removed; keep the explanatory comment and the json.dumps call that builds the serialized variable, and if you need to silence the false positive instead add the repository's supported suppression syntax (use the project's documented ast-grep/flake8/pytest suppression keyword) next to the json.dumps line or in a nearby comment so the linter recognizes it.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/benchmarks/conftest.py`:
- Around line 26-30: The inline suppression "linters-ignore: prefer-jsonify" is
not a recognized ast-grep format and should be removed; keep the explanatory
comment and the json.dumps call that builds the serialized variable, and if you
need to silence the false positive instead add the repository's supported
suppression syntax (use the project's documented ast-grep/flake8/pytest
suppression keyword) next to the json.dumps line or in a nearby comment so the
linter recognizes it.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 575ac0e3-08ed-4c88-ab95-b08edf66daae
📒 Files selected for processing (2)
requirements-dev.txttests/benchmarks/conftest.py
…tures Pin pytest-benchmark to 5.2.3 after verifying compatibility with pytest 9.0 and the benchmark suite. Annotate benchmark JSONL serialization to document that json.dumps is intentional for file I/O, not Flask jsonify.
e38b910 to
68ba23c
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/benchmarks/conftest.py (1)
16-31: ⚡ Quick winParse the template once outside the loop.
The
json.loads(TEMPLATE_LINE)call on line 20 is executed for every iteration, re-parsing the same JSON string up to 5,000 times for the large fixture. Parsing once and usingcopy.deepcopy()in the loop would eliminate this overhead and improve fixture generation performance.♻️ Proposed refactor
+import copy import json from pathlib import Path ... def write_jsonl(path: Path, line_count: int) -> Path: """Write a JSONL session file with *line_count* rows derived from the template fixture.""" + template = json.loads(TEMPLATE_LINE) with path.open("w", encoding="utf-8") as f: for i in range(line_count): - entry = json.loads(TEMPLATE_LINE) + entry = copy.deepcopy(template) entry["timestamp"] = f"2026-06-12T10:{i % 60:02d}:00Z"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/benchmarks/conftest.py` around lines 16 - 31, The write_jsonl function currently calls json.loads(TEMPLATE_LINE) inside the for-loop causing repeated parsing; parse TEMPLATE_LINE once before the loop (e.g., template = json.loads(TEMPLATE_LINE)) and inside the loop use copy.deepcopy(template) to get a fresh entry, then modify entry["timestamp"] and the message content as before; ensure copy is imported (from copy import deepcopy or import copy) and replace json.loads(TEMPLATE_LINE) references with deepcopy(template) to eliminate the repeated parse.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/benchmarks/conftest.py`:
- Around line 34-45: The seed_search_corpus function creates the bench-project
directory with project.mkdir(parents=True) which can fail if the directory
already exists; update the call in seed_search_corpus to include exist_ok=True
(i.e., project.mkdir(parents=True, exist_ok=True)) so the function is robust to
repeated runs or pre-existing directories.
---
Nitpick comments:
In `@tests/benchmarks/conftest.py`:
- Around line 16-31: The write_jsonl function currently calls
json.loads(TEMPLATE_LINE) inside the for-loop causing repeated parsing; parse
TEMPLATE_LINE once before the loop (e.g., template = json.loads(TEMPLATE_LINE))
and inside the loop use copy.deepcopy(template) to get a fresh entry, then
modify entry["timestamp"] and the message content as before; ensure copy is
imported (from copy import deepcopy or import copy) and replace
json.loads(TEMPLATE_LINE) references with deepcopy(template) to eliminate the
repeated parse.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7a9d7b84-0ef6-4ab8-93a8-4ddfb51b4e73
📒 Files selected for processing (12)
.github/workflows/ci.ymlCONTRIBUTING.mdbenchmarks/README.mdbenchmarks/baselines.jsonpyproject.tomlrequirements-dev.txttests/benchmarks/__init__.pytests/benchmarks/conftest.pytests/benchmarks/test_export_bench.pytests/benchmarks/test_parse_bench.pytests/benchmarks/test_parse_memory.pytests/benchmarks/test_search_bench.py
✅ Files skipped from review due to trivial changes (2)
- benchmarks/README.md
- benchmarks/baselines.json
🚧 Files skipped from review as they are similar to previous changes (8)
- .github/workflows/ci.yml
- pyproject.toml
- requirements-dev.txt
- tests/benchmarks/test_parse_bench.py
- tests/benchmarks/test_search_bench.py
- tests/benchmarks/test_export_bench.py
- tests/benchmarks/test_parse_memory.py
- CONTRIBUTING.md
Parse the JSONL template once per write_jsonl call and deepcopy entries in the loop. Use exist_ok=True when creating bench-project in seed_search_corpus.
|
@CodeRabbit please review this PR inn more depth |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== tests/benchmarks/conftest.py ===" && cat tests/benchmarks/conftest.py
echo ""
echo "=== tests/benchmarks/test_parse_bench.py ===" && cat tests/benchmarks/test_parse_bench.py
echo ""
echo "=== tests/benchmarks/test_export_bench.py ===" && cat tests/benchmarks/test_export_bench.py
echo ""
echo "=== tests/benchmarks/test_search_bench.py ===" && cat tests/benchmarks/test_search_bench.py
echo ""
echo "=== tests/benchmarks/test_parse_memory.py ===" && cat tests/benchmarks/test_parse_memory.py
echo ""
echo "=== pyproject.toml ===" && cat pyproject.toml
echo ""
echo "=== .github/workflows/ci.yml ===" && cat .github/workflows/ci.yml
echo ""
echo "=== benchmarks/README.md ===" && cat benchmarks/README.md
echo ""
echo "=== requirements-dev.txt ===" && cat requirements-dev.txt
echo ""
echo "=== CONTRIBUTING.md ===" && cat CONTRIBUTING.md
echo ""
echo "=== benchmarks/baselines.json ===" && cat benchmarks/baselines.jsonLength of output: 23586 🏁 Script executed: #!/bin/bash
# Check the fixtures directory and what session_with_tools.jsonl looks like
echo "=== Fixtures directory ===" && ls tests/fixtures/ 2>/dev/null || ls fixtures/ 2>/dev/null
echo ""
echo "=== session_with_tools.jsonl (first 3 lines) ===" && head -3 tests/fixtures/session_with_tools.jsonl 2>/dev/null || head -3 fixtures/session_with_tools.jsonl 2>/dev/null
echo ""
# Check existing utils for parse_session and run_bulk_export signatures
echo "=== parse_session function ==="
ast-grep --pattern 'def parse_session($_) {$$$}'
echo ""
echo "=== run_bulk_export function ==="
ast-grep --pattern 'def run_bulk_export($$$) {$$$}'
echo ""
# Check for NoopSink
echo "=== NoopSink ===" && rg -n "NoopSink" --type py -C 3
echo ""
# Check create_app signature
echo "=== create_app ===" && rg -n "def create_app" --type py -C 5Length of output: 5766
Deep Review: PR
|
| Location | Issue |
|---|---|
| All test files | benchmark fixture parameter has no type hint. Consider from pytest_benchmark.fixture import BenchmarkFixture and annotating as benchmark: BenchmarkFixture. |
conftest.py L33 |
entry.setdefault("message", {}) is never a non-dict for the given fixture; the isinstance(msg, dict) guard is redundant, but harmless. |
baselines.json |
Committing an all-null snapshot is fine as a placeholder, but "updated": null, "machine": null could be dropped and documented as optional fields to reduce confusion. |
test_export_bench.py |
The inner _run closure captures export_corpus from the outer scope. Works correctly; just worth a comment since it's non-obvious. |
Summary table
| # | Severity | File | Issue |
|---|---|---|---|
| 1 | Medium | conftest.py |
Module-level file I/O — fragile on missing fixture |
| 2 | Medium | conftest.py |
export_corpus function-scoped, should be session |
| 3 | Low | conftest.py |
bench_client_search_corpus function-scoped |
| 4 | Low | conftest.py |
Timestamp cycles for large files |
| 5 | High | test_parse_memory.py |
Missing tracemalloc.clear_traces() — contamination risk |
| 6 | Low | test_parse_memory.py |
No assertion on parse_session return value |
| 7 | Medium | pyproject.toml |
per-file-ignores glob misses tests/benchmarks/ |
| 8 | Medium | ci.yml |
Artifact upload not guarded by if: always() |
| 9 | Low | ci.yml |
No continue-on-error: true for informational job |
| 10 | Low | ci.yml |
No needs: [pytest] guard |
| 11 | Low | benchmarks/README.md |
Doesn't state actual test file location |
The most actionable fixes are #5 (tracemalloc contamination), #2 (fixture scope for corpus creation), #7 (ruff glob), and #8 (CI artifact upload guard). The rest are low-risk polish items.
|
5, 6, 7, 11 are low priority, Rest are optional , no need to fix |
|
Tip For best results, initiate chat on the files or code changes.
The core benchmark harness, CI job, fixtures, and documentation are all in good shape. Happy to help if anything else comes up. 🐇 |
Reset tracemalloc peak before measuring large-file parse, assert non-empty message count, extend E402 per-file-ignores to tests/**, and clarify README that benchmark tests live under tests/benchmarks/.
Drop unnecessary actions: write on benchmarks CI job; assert explicit search hit count on list response; document 10x memory ceiling and v1 template limitations in README; note NoopSink export rounds are stateless.
Closes #74
Summary
pytest-benchmarkharness undertests/benchmarks/for parse, bulk export, and search pathstracemallocmemory ceiling test for large-file parse (peak < 10× on-disk file size)benchmarksCI job that uploadsbenchmark-results.jsonas an artifactbenchmarks/README.mdand link fromCONTRIBUTING.mdpytestvia--benchmark-skip; memory test runs in main CISprint items
Closes Chen W2 #7
Test plan
pytest tests/benchmarks/ --benchmark-only -o addopts= -vpytest tests/benchmarks/test_parse_memory.py -vpytest -qgreen locallymypy -p api -p utils -p modelsgreenruff check .andruff format --check .greenbenchmarksjob completes andbenchmark-resultsartifact is downloadableSummary by CodeRabbit
Tests
Documentation
Chores