Skip to content

fix(export): surface bulk partial failures via X-Export-Warnings header#73

Merged
wpak-ai merged 5 commits into
masterfrom
fix/export-partial-failure-warnings
Jun 10, 2026
Merged

fix(export): surface bulk partial failures via X-Export-Warnings header#73
wpak-ai merged 5 commits into
masterfrom
fix/export-partial-failure-warnings

Conversation

@clean6378-max-it

@clean6378-max-it clean6378-max-it commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

closes #69

Summary

Surfaces per-session bulk export failures to HTTP clients instead of silently omitting failed sessions from the ZIP. Keeps the stable 200 + ZIP body contract from Week 1; partial failures are signaled via X-Export-Warnings.

Problem

POST /api/export caught per-session parse/export errors, logged them, and returned HTTP 200 with a ZIP missing failed sessions. BulkExportResult.failure_count was populated but never exposed to the client.

Changes

  • utils/export_engine.py — ExportFailure dataclass; BulkExportResult.failures; failure_code_for_exception() maps parse vs post-parse errors to PARSE_ERROR / INTERNAL_ERROR
  • api/export_api.py — response routing:
    • All succeed → 200 ZIP (no header)
    • Partial fail → 200 ZIP + X-Export-Warnings JSON header
    • All candidates fail → 422 + EXPORT_ALL_FAILED + failures in JSON body
    • Nothing to export → 422 + EXPORT_NOTHING_TO_EXPORT (unchanged)
  • api/error_codes.py — EXPORT_ALL_FAILED
  • docs/api-reference.md — documents header shape and new error code
  • tests/test_export_api_bulk.py — all-succeed, partial-fail, all-fail, export-state bookmark tests

X-Export-Warnings shape:

[{"session_id": "session_def456", "code": "PARSE_ERROR", "message": "human-readable detail"}]

Export state bookmark (_write_state) only includes successfully exported session IDs — failed sessions are not advanced in incremental mode.

Acceptance criteria

  • Machine-readable list of skipped sessions on partial export
  • All-fail returns error status, not 200
  • Partial export: 200 ZIP + X-Export-Warnings
  • Structured ErrorCode values used
  • docs/api-reference.md updated
  • Tests: all-succeed, partial-fail, all-fail, state excludes failures
  • pytest, mypy, ruff check ., ruff format --check . pass locally

Test plan

pytest tests/test_export_api_bulk.py -v
pytest -q
mypy -p api -p utils -p models
ruff check .
ruff format --check .

Manual smoke: POST /api/export against a corpus with one corrupt .jsonl → inspect X-Export-Warnings header.

Out of scope

  • CLI export warning surfacing (HTTP only)
  • TypedDict parse boundary (Tuesday PR 1)
  • ZIP layout changes beyond existing manifest.jsonl

Dependencies

Sprint item #2 · 5 pt · branch fix/export-partial-failure-warnings

Summary by CodeRabbit

  • New Features

    • Export now reports structured failures: partial exports return HTTP 200 with a bounded/truncated X-Export-Warnings header plus export-warnings.json inside the ZIP; if every session fails, endpoint returns HTTP 422 with EXPORT_ALL_FAILED and a failures payload.
    • Error codes centralized and standardized across the API for stable machine-readable codes.
  • Documentation

    • API reference updated to document EXPORT_ALL_FAILED, partial-export behavior, header payload shape, and in‑ZIP warnings file.
  • Tests

    • Expanded bulk-export tests (success, partial fail, all-fail), header truncation/size fallback, state persistence excluding failed sessions, and CLI export exit behavior.

Accumulate structured ExportFailure entries in BulkExportResult, return
422 EXPORT_ALL_FAILED when every candidate fails, and attach X-Export-Warnings
on partial 200 ZIP responses without changing the stable ZIP body contract.
@clean6378-max-it clean6378-max-it self-assigned this Jun 9, 2026
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 65dcaae9-2d63-467d-a873-fd0dfcf6da66

📥 Commits

Reviewing files that changed from the base of the PR and between 58b9581 and 3f00fd8.

📒 Files selected for processing (2)
  • tests/test_cli_export_exit_codes.py
  • utils/export_engine.py

📝 Walkthrough

Walkthrough

This PR captures per-session export failures as structured ExportFailure entries, exposes them via a bounded X-Export-Warnings header and export-warnings.json inside ZIPs for partial success, and returns ErrorCode.EXPORT_ALL_FAILED (HTTP 422) with failures when every attempted session fails; tests and docs updated accordingly.

Changes

Structured bulk-export failure reporting

Layer / File(s) Summary
Error codes and failure data model
models/error_codes.py, api/error_codes.py, utils/export_engine.py
Adds shared ErrorCode enum; imports ErrorCode; ExportFailure dataclass; failure_code_for_exception() and failure_message_for_code() helpers; BulkExportResult.failures list.
Bulk-export failure recording
utils/export_engine.py
_record_failure() now accepts phase and appends ExportFailure with computed ErrorCode; export-phase exceptions recorded with phase="export".
API failure serialization and response mapping
api/export_api.py
Adds JSON serialization helpers, header/ZIP filename and size constants, bounded header payload builder, writes export-warnings.json into the ZIP on partial success, sets X-Export-Warnings header when failures exist, and returns EXPORT_ALL_FAILED (422) with failures when no sessions were exported.
Response documentation
docs/api-reference.md
Documents EXPORT_ALL_FAILED (HTTP 422), clarifies partial-export behavior (200 + ZIP of successes, X-Export-Warnings header, and export-warnings.json in ZIP), and clarifies EXPORT_NOTHING_TO_EXPORT.
Comprehensive test coverage
tests/test_export_api_bulk.py, tests/test_cli_export_exit_codes.py
Adds tests: all succeed (200, no header), partial parse failure (200, header + ZIP warnings), all parse fail (422 with EXPORT_ALL_FAILED and failures list), header truncation/byte-size fallback tests, persistence/state tests ensuring failed sessions are excluded from saved state, and updates CLI-exit tests to use ExportFailure objects.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • timon0305
  • wpak-ai

Poem

🐰 When exports skip a session or two,
I note the why and bundle what’s true —
Warnings tucked inside the ZIP we send,
Headers whisper which ones failed to mend.
Hop, hop, clarity — the export’s new view!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: surfacing bulk partial export failures via the X-Export-Warnings HTTP header, which is the primary objective of this PR.
Linked Issues check ✅ Passed All coding objectives from issue #69 are met: structured failures accumulated in BulkExportResult, X-Export-Warnings header emitted for partial success, 422+EXPORT_ALL_FAILED for all-fail, export state excludes failed sessions, ErrorCode extended, docs updated, comprehensive tests added.
Out of Scope Changes check ✅ Passed All changes directly support the stated objectives: error code enum consolidation, failure tracking, response routing, documentation, and test coverage are all scope-aligned with issue #69.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/export-partial-failure-warnings

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/export_api.py`:
- Around line 180-182: The current code writes the entire failure_payload into
resp.headers["X-Export-Warnings"] which can exceed header size limits; change
this to a bounded header strategy: when result.failures is non-empty, compute a
compact summary (e.g., total_failures count, first N failure summaries or IDs)
and set resp.headers["X-Export-Warnings"] to a JSON object containing the count,
truncated=true flag, and the sampled entries instead of the full
failure_payload; put the full failure_payload into the response body/manifest
(or a downloadable URL) for clients that need details. Ensure you update the
logic that populates failure_payload and references to result.failures so the
header uses only the capped sample and includes truncation metadata.

In `@utils/export_engine.py`:
- Around line 289-294: The current ExportFailure uses raw exception text
(str(exc)) which can leak sensitive info; instead, map the exception to a
stable, sanitized message using the failure code / ErrorCode mapping and include
that in ExportFailure.message. Update the code around result.failures.append
(the ExportFailure constructor call where session_id=sid and
code=failure_code_for_exception(exc, phase=phase)) to call an existing or new
helper (e.g., failure_code_for_exception -> use its returned code with a lookup
like message_for_error_code(code) or a small function
sanitized_message_for_exception(exc, phase)) and pass that sanitized, stable
text as message rather than str(exc).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 266ab9dd-dd9b-4997-a7dd-a0c62cf82b3b

📥 Commits

Reviewing files that changed from the base of the PR and between 32e60f4 and 367ded9.

📒 Files selected for processing (5)
  • api/error_codes.py
  • api/export_api.py
  • docs/api-reference.md
  • tests/test_export_api_bulk.py
  • utils/export_engine.py

Comment thread api/export_api.py
Comment thread utils/export_engine.py
@clean6378-max-it

Copy link
Copy Markdown
Collaborator Author

@CodeRabbit , Can you reeviewe this PR fully?

@clean6378-max-it

Copy link
Copy Markdown
Collaborator Author

@CodeRabbit. check this PR?

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== utils/export_engine.py ==="
cat utils/export_engine.py

echo ""
echo "=== api/export_api.py ==="
cat api/export_api.py

echo ""
echo "=== api/error_codes.py ==="
cat api/error_codes.py

echo ""
echo "=== docs/api-reference.md ==="
cat docs/api-reference.md

echo ""
echo "=== tests/test_export_api_bulk.py ==="
cat tests/test_export_api_bulk.py

Length of output: 44546


🏁 Script executed:

#!/bin/bash
echo "=== conftest.py (root) ==="
fd -i "conftest.py" --exec cat {}

echo ""
echo "=== Check failure_count usage ==="
rg -n "failure_count" --type py

echo ""
echo "=== Check ExportFailure import usage ==="
rg -n "ExportFailure" --type py

echo ""
echo "=== Check _EXPORT_WARNINGS_HEADER_MAX constants ==="
rg -n "_EXPORT_WARNINGS_HEADER_MAX" --type py

echo ""
echo "=== Check str(item.code) / StrEnum behaviour ==="
rg -n "StrEnum" --type py

Length of output: 4396



@clean6378-max-it Sure! Here is a full review of PR #73.


PR #73 Review — fix(export): surface bulk partial failures via X-Export-Warnings header

Overall the PR is well-structured, correctly maps the acceptance criteria to code, and the tests cover the main happy/sad paths. A few items worth addressing before merge:


utils/export_engine.py

1. Redundant isinstance check in failure_code_for_exception (Medium)

json.JSONDecodeError is a subclass of ValueError, and ValueError is already in EXPORT_ERRORS. This means the first branch is dead code — the second branch would catch it with the same return value:

def failure_code_for_exception(exc: Exception, *, phase: str = "parse") -> ErrorCode:
    if phase == "export":
        return ErrorCode.INTERNAL_ERROR
    if isinstance(exc, json.JSONDecodeError):   # ← dead: ValueError below already covers it
        return ErrorCode.PARSE_ERROR
    if isinstance(exc, EXPORT_ERRORS):
        return ErrorCode.PARSE_ERROR
    return ErrorCode.INTERNAL_ERROR

The first isinstance branch can be removed without any behaviour change.

2. phase parameter should use Literal (Minor)

The phase parameter accepts any str silently falling back to INTERNAL_ERROR for unknown values. Tightening the type catches misuse at static-analysis time:

from typing import Literal

def failure_code_for_exception(
    exc: Exception,
    *,
    phase: Literal["parse", "export"] = "parse",
) -> ErrorCode:

3. failure_count is now always equal to len(result.failures) (Minor)

_record_failure atomically increments failure_count and appends to failures, so they can never diverge. failure_count is still consumed by scripts/export.py:415, so it cannot be removed without a CLI-side change — but it is worth noting the dual bookkeeping. A follow-up could replace result.failure_count with a @property delegating to len(self.failures).


api/export_api.py

4. _export_warnings_header_payload — single-entry overflow edge case (Minor)

The while-loop's exit condition is len(sample) > 1, so if a single serialized entry were to exceed _EXPORT_WARNINGS_HEADER_MAX_BYTES the function would still return an oversized payload:

while (
    len(json.dumps(payload, separators=(",", ":"))) > _EXPORT_WARNINGS_HEADER_MAX_BYTES
    and len(sample) > 1   # ← loop exits here; payload may still be > 8192 bytes
):

In practice each entry is ≈150 bytes so this cannot happen today, but a defensive hard-limit on the final payload would be safer:

result_json = json.dumps(payload, separators=(",", ":"))
if len(result_json) > _EXPORT_WARNINGS_HEADER_MAX_BYTES:
    # last-resort: omit failures list entirely, keep counts
    payload = {"total_failures": total, "truncated": True, "failures": []}
return payload

5. _serialize_export_failures return type annotation is too narrow (Minor)

The function is declared as returning list[dict[str, str]] but code is str(item.code) (a StrEnum) — which is indeed a str at runtime, so this is correct. However, if a non-string value is ever added to the dict in future, the annotation would be silently wrong. list[dict[str, object]] would be more honest and consistent with _export_warnings_header_payload's return type.


tests/test_export_api_bulk.py

6. No test for byte-level truncation in _export_warnings_header_payload (Minor)

The entry-count cap (_EXPORT_WARNINGS_HEADER_MAX_ENTRIES = 20) and the byte-cap while-loop are both untested. A unit test for _export_warnings_header_payload directly (not through HTTP) would guard against regressions:

from api.export_api import _export_warnings_header_payload
from utils.export_engine import ExportFailure
from api.error_codes import ErrorCode

def test_header_payload_truncates_at_entry_limit():
    failures = [
        ExportFailure(session_id=f"sess_{i:04d}", message="Failed to parse session", code=ErrorCode.PARSE_ERROR)
        for i in range(25)
    ]
    payload = _export_warnings_header_payload(failures)
    assert payload["total_failures"] == 25
    assert payload["truncated"] is True
    assert len(payload["failures"]) <= 20

7. No test for since="incremental" with partial failures (Minor)

All four new tests use since="all". A test verifying that the failed session is excluded from the incremental state bookmark (sessions dict) when since="incremental" would complete coverage of the bookmark semantics.


docs/api-reference.md

8. EXPORT_ALL_FAILED error body failures shape could be clearer (Minor)

The error table says:

failures (same shape as X-Export-Warnings entries)

This is technically correct (each item in both arrays has session_id, code, message) but "X-Export-Warnings entries" is ambiguous because the header value is a wrapper object ({total_failures, truncated, failures}), not a plain array. Consider wording it as:

failures — flat array of {"session_id", "code", "message"} objects (same item shape as the failures array in the X-Export-Warnings header)


Summary table

# Severity File Item
1 Medium utils/export_engine.py Redundant isinstance(exc, json.JSONDecodeError) branch
2 Minor utils/export_engine.py phase should use Literal["parse","export"]
3 Minor utils/export_engine.py failure_count vs len(failures) dual bookkeeping
4 Minor api/export_api.py Single-entry overflow not guarded in _export_warnings_header_payload
5 Minor api/export_api.py _serialize_export_failures return type
6 Minor tests/test_export_api_bulk.py No test for byte-level header truncation
7 Minor tests/test_export_api_bulk.py No test for since="incremental" with partial failures
8 Minor docs/api-reference.md EXPORT_ALL_FAILED failures shape wording ambiguous

The core logic (failure accumulation, response routing, ZIP embedding, state bookmark exclusion) is correct and the acceptance criteria from issue #69 are all met. Items 1 and 4 are the most worth fixing before merge; the rest can reasonably be follow-up work.

Remove redundant JSONDecodeError branch, tighten phase typing, add
last-resort header byte cap, and add truncation/incremental bookmark tests.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_export_api_bulk.py (1)

129-141: ⚡ Quick win

Tighten the assertion for entry-limit truncation.

The test name indicates it's specifically testing entry-limit truncation, and there's a separate test for byte-size overflow (lines 144-156). The assertion on line 141 should use == 20 instead of <= 20 to precisely verify that the entry limit is applied correctly. Using <= weakens the test and could miss regressions in the truncation logic.

✏️ Proposed fix
-    assert len(payload["failures"]) <= 20
+    assert len(payload["failures"]) == 20
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_export_api_bulk.py` around lines 129 - 141, The test
test_export_warnings_header_payload_truncates_at_entry_limit should assert the
exact entry-limit truncation; update the assertion that currently checks
len(payload["failures"]) <= 20 to require len(payload["failures"]) == 20 so the
test precisely verifies _export_warnings_header_payload enforces the 20-entry
cap; modify the assertion in the test function
test_export_warnings_header_payload_truncates_at_entry_limit accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_export_api_bulk.py`:
- Around line 129-141: The test
test_export_warnings_header_payload_truncates_at_entry_limit should assert the
exact entry-limit truncation; update the assertion that currently checks
len(payload["failures"]) <= 20 to require len(payload["failures"]) == 20 so the
test precisely verifies _export_warnings_header_payload enforces the 20-entry
cap; modify the assertion in the test function
test_export_warnings_header_payload_truncates_at_entry_limit accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ec63f992-f2f4-478b-b00c-442aac2c9585

📥 Commits

Reviewing files that changed from the base of the PR and between 1029d31 and 2dc2d25.

📒 Files selected for processing (4)
  • api/export_api.py
  • docs/api-reference.md
  • tests/test_export_api_bulk.py
  • utils/export_engine.py
✅ Files skipped from review due to trivial changes (1)
  • docs/api-reference.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • utils/export_engine.py
  • api/export_api.py

Comment thread utils/export_engine.py Outdated
Comment thread utils/export_engine.py
Extract ErrorCode to models/error_codes.py so export_engine no longer
transitively imports Flask via api. Document export-phase exc handling.
@clean6378-max-it

Copy link
Copy Markdown
Collaborator Author

@CodeRabbit , all comments are addressed?

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== utils/export_engine.py ==="
cat utils/export_engine.py

echo ""
echo "=== api/export_api.py ==="
cat api/export_api.py

echo ""
echo "=== tests/test_export_api_bulk.py ==="
cat tests/test_export_api_bulk.py

echo ""
echo "=== docs/api-reference.md ==="
cat docs/api-reference.md

Length of output: 45857



@clean6378-max-it Almost — 7 out of 8 comments have been addressed. Here's the status:

# Severity Item Status
1 Medium Remove redundant isinstance(exc, json.JSONDecodeError) branch ✅ Fixed
2 Minor phase uses Literal["parse", "export"] ✅ Fixed
3 Minor failure_count vs len(failures) dual bookkeeping ⏳ Still open (follow-up)
4 Minor Last-resort byte-cap guard in _export_warnings_header_payload ✅ Fixed
5 Minor _serialize_export_failures return type broadened to list[dict[str, object]] ✅ Fixed
6 Minor Tests for entry-limit and byte-overflow truncation ✅ Fixed
7 Minor Test for since="incremental" with partial failures ✅ Fixed
8 Minor Docs wording for EXPORT_ALL_FAILED failures field ✅ Fixed

The one remaining open item is #3BulkExportResult.failure_count is still an independent counter that always equals len(result.failures). This was flagged as a minor follow-up in the original review (the suggestion was to replace it with a @property delegating to len(self.failures)). Since failure_count is also consumed by the CLI (scripts/export.py), addressing it requires a coordinated CLI-side change — it's fine to defer to a follow-up PR. All the acceptance-criteria items from issue #69 are met.

@clean6378-max-it clean6378-max-it requested a review from wpak-ai June 10, 2026 23:25
@wpak-ai wpak-ai merged commit 8083f3e into master Jun 10, 2026
11 checks passed
@wpak-ai wpak-ai deleted the fix/export-partial-failure-warnings branch June 10, 2026 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

claude-code-chat-browser: Surface bulk-export partial failures instead of silent ZIP skips

3 participants