Skip to content

bugsnag: chain-aware grouping key and chain title in error list#262

Open
morgan-wowk wants to merge 1 commit into
graphite-base/262from
bugsnag-error-chain
Open

bugsnag: chain-aware grouping key and chain title in error list#262
morgan-wowk wants to merge 1 commit into
graphite-base/262from
bugsnag-error-chain

Conversation

@morgan-wowk

@morgan-wowk morgan-wowk commented May 28, 2026

Copy link
Copy Markdown
Collaborator

normalize_error_chain walks __cause__/__context__ and joins each level's normalized string with " <- ". This replaces the single-exception normalize_error_message call used for the grouping key so that "LauncherError <- TimeoutError" and "LauncherError <- ApiException" land in separate, stable groups rather than collapsing into one group per root cause.

build_chain_title does the same walk but truncates each level to 80 chars and returns None for single exceptions. _before_notify uses it to override the root-cause errorClass that Bugsnag displays in the error list, giving reviewers the full raise path at a glance without needing to open the stacktrace.

morgan-wowk commented May 28, 2026

Copy link
Copy Markdown
Collaborator Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

normalize_error_chain walks __cause__/__context__ and joins each
level's normalized string with " <- ".  This replaces the single-
exception normalize_error_message call used for the grouping key so
that "LauncherError <- TimeoutError" and "LauncherError <- ApiException"
land in separate, stable groups rather than one per root cause.

build_chain_title does the same walk but truncates each level to 80
chars and returns None for single exceptions.  _before_notify uses it
to override the root-cause errorClass shown in the Bugsnag error list,
giving reviewers the full raise path at a glance.
@morgan-wowk morgan-wowk force-pushed the bugsnag-error-chain branch from afad13f to 563c451 Compare June 3, 2026 19:22
@morgan-wowk morgan-wowk changed the base branch from graphite-base/262 to bugsnag-grouping-strip-json June 3, 2026 19:22
@morgan-wowk morgan-wowk changed the base branch from bugsnag-grouping-strip-json to graphite-base/262 June 3, 2026 19:33
"""
parts: list[str] = []
seen: set[int] = set()
exc: BaseException | None = exception

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This is an AI-generated code review comment.

Duplicated chain-walking loop between normalize_error_chain and build_chain_title.

Both functions share identical traversal logic — same seen set, same len(parts) < 4 cap, same __cause__ or __context__ walk. The only divergence is the per-part truncation step and the None return for single exceptions.

Two places that have to stay in lockstep on every future change (max depth, BaseExceptionGroup support, additional cycle detection, etc.). Easy to drift.

Fix: extract one private helper:

def _walk_chain(exception: BaseException, max_depth: int = 4) -> list[BaseException]:
    excs: list[BaseException] = []
    seen: set[int] = set()
    exc: BaseException | None = exception
    while exc is not None and id(exc) not in seen and len(excs) < max_depth:
        seen.add(id(exc))
        excs.append(exc)
        exc = exc.__cause__ or (None if exc.__suppress_context__ else exc.__context__)
    return excs

Then normalize_error_chain is one comprehension and build_chain_title is one comprehension with the truncation step.

parts: list[str] = []
seen: set[int] = set()
exc: BaseException | None = exception
while exc is not None and id(exc) not in seen and len(parts) < 4:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This is an AI-generated code review comment.

Grouping key produced by normalize_error_chain is length-unbounded — Bugsnag may silently truncate.

parts.append(normalize_error_message(exception=exc))  # no per-part cap

This function is used as the Bugsnag custom grouping key. Each part has no length bound — only build_chain_title caps at _CHAIN_PART_MAX_LEN. With 4 chained exceptions each carrying a long _strip_generic-processed message, the grouping key can run into the thousands of characters. If Bugsnag silently truncates custom grouping keys beyond some limit, two chains that differ only past the truncation point group together — exactly the opposite of this PR's intent.

Fix: either (a) verify Bugsnag's actual custom-grouping-key length limit and document it in a comment, or (b) apply a generous per-part cap inside normalize_error_chain too — e.g. 200 chars, separate from the display-oriented 80-char cap. The cap is stable across chain orderings and protects against accidental key blow-up.

title = f"{prefix}: {chain_title}" if prefix else chain_title
event.errors[-1].error_class = title
except Exception:
_logger.debug("Could not set chain title on errorClass", exc_info=True)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This is an AI-generated code review comment.

DEBUG-level logging hides SDK API drift in the title override.

except Exception:
    _logger.debug("Could not set chain title on errorClass", exc_info=True)

The code is built on a load-bearing assumption that event.errors[-1].error_class is the field Bugsnag displays as the error-list title. If a future SDK changes that shape, the assignment will throw AttributeError, the chain title silently disappears from Bugsnag, and the DEBUG log will be invisible in any production logging config. This is the kind of integration drift that doesn't break unit tests but quietly degrades the product.

Fix: log at warning (or error) level. Still non-fatal — the original error still reports — but the next time the SDK shape drifts, you find out instead of guessing why the error list looks the way it used to.

Note: the same DEBUG pattern at the prefix-prepending block above (~lines 55-59) has the same issue; worth bumping that too, although it's pre-existing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants