Skip to content

bugsnag: propagate cloud_provider into contextual logging for queued executions#268

Open
morgan-wowk wants to merge 1 commit into
bugsnag/service-slicefrom
bugsnag/cloud-provider-context
Open

bugsnag: propagate cloud_provider into contextual logging for queued executions#268
morgan-wowk wants to merge 1 commit into
bugsnag/service-slicefrom
bugsnag/cloud-provider-context

Conversation

@morgan-wowk

Copy link
Copy Markdown
Collaborator

Adds CLOUD_PROVIDER_ANNOTATION_KEY to common_annotations so the annotation
key has a single definition in the OSS layer.

In the queued-execution processing span, reads the cloud_provider value from
task_spec annotations (already in memory) and adds it to the contextual
logging context when present. The existing _before_notify hook then includes
it automatically in the tangle_context tab on every Bugsnag event raised
during that execution's processing.

morgan-wowk commented May 28, 2026

Copy link
Copy Markdown
Collaborator Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

morgan-wowk commented May 28, 2026

Copy link
Copy Markdown
Collaborator Author

CC: @yuechao-qin

This will allow you to filter errors to Nebius or GKE.

@morgan-wowk morgan-wowk requested a review from yuechao-qin May 28, 2026 22:58
@morgan-wowk morgan-wowk force-pushed the bugsnag/cloud-provider-context branch from 3bb4cc1 to faed3ca Compare June 3, 2026 19:22
@morgan-wowk morgan-wowk force-pushed the bugsnag/service-slice branch from abe0fb0 to 4730f88 Compare June 3, 2026 19:22
@morgan-wowk morgan-wowk force-pushed the bugsnag/cloud-provider-context branch from faed3ca to a2974f3 Compare June 3, 2026 19:33
…executions

Adds CLOUD_PROVIDER_ANNOTATION_KEY to common_annotations so the annotation
key has a single definition in the OSS layer.

In the queued-execution processing span, reads the cloud_provider value from
task_spec annotations (already in memory) and adds it to the contextual
logging context when present. The existing _before_notify hook then includes
it automatically in the tangle_context tab on every Bugsnag event raised
during that execution's processing.
@morgan-wowk morgan-wowk force-pushed the bugsnag/cloud-provider-context branch from a2974f3 to 410f807 Compare June 3, 2026 19:35
ctx: dict[str, str] = {"execution_id": execution.id}
cloud_provider = (
(execution.task_spec or {})
.get("annotations", {})

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This is an AI-generated code review comment.

annotations: None (key present, value None) makes the chained .get crash.

cloud_provider = (
    (execution.task_spec or {})
    .get("annotations", {})           # returns None, NOT {}, if annotations is None
    .get(_CLOUD_PROVIDER_ANNOTATION_KEY)
)

.get("annotations", {}) only uses the default when the key is missing. If task_spec = {"annotations": None} (key present, value None — common in JSON-roundtripped data where keys are emitted even when unset), .get("annotations", {}) returns None and the trailing .get(...) raises AttributeError: 'NoneType' object has no attribute 'get'. The crash propagates out of execution_logging_context, breaking the with block setup at the call site and silently degrading log correlation for that execution.

Fix: chain or {} after the inner .get too:

cloud_provider = (
    ((execution.task_spec or {}).get("annotations") or {})
    .get(_CLOUD_PROVIDER_ANNOTATION_KEY)
)

from contextlib import contextmanager
from typing import Any, Optional

from .. import backend_types_sql as bts

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This is an AI-generated code review comment.

New instrumentation → core import is a circular-import hazard.

from .. import backend_types_sql as bts

The instrumentation package was previously dependency-free w.r.t. core model code — bugsnag_instrumentation.py, error_normalization.py, and the rest don't import from ... This is the first instrumentation → core import. If backend_types_sql ever wants to call back into instrumentation for logging (a plausible future change — e.g. logging inside an __post_init__ or validator), you get an import cycle.

The only runtime use of bts here is the type annotation on execution: bts.ExecutionNode. That doesn't need a real import.

Fix: guard with TYPE_CHECKING:

from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from .. import backend_types_sql as bts

bts.ExecutionNode becomes a forward reference resolved by the type checker, with zero runtime cost and no cycle risk.

_context_metadata.set(prev_metadata)


def execution_logging_context(execution: bts.ExecutionNode):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This is an AI-generated code review comment.

No unit tests for execution_logging_context.

The new function has several non-trivial branches: task_spec is None, task_spec without annotations, annotations without the cloud-provider key, the key set to a string, the key set to None, the key with an empty string. None of these is pinned by a test, and the orchestrator call site relies on the helper being correct.

Fix: add a handful of cases in tests/instrumentation/test_contextual_logging.py:

def test_execution_logging_context_no_annotations():
    exc = make_execution_node(id="e1", task_spec={})
    with contextual_logging.execution_logging_context(exc):
        assert contextual_logging.get_context_metadata("execution_id") == "e1"
        assert contextual_logging.get_context_metadata("cloud_provider") is None

def test_execution_logging_context_with_cloud_provider():
    exc = make_execution_node(
        id="e1",
        task_spec={"annotations": {"cloud-pipelines.net/orchestration/cloud_provider": "gcp"}},
    )
    with contextual_logging.execution_logging_context(exc):
        assert contextual_logging.get_context_metadata("cloud_provider") == "gcp"

def test_execution_logging_context_annotations_is_none():
    # Regression guard for None handling (see other comment).
    exc = make_execution_node(id="e1", task_spec={"annotations": None})
    with contextual_logging.execution_logging_context(exc):
        assert contextual_logging.get_context_metadata("cloud_provider") is None


from .. import backend_types_sql as bts

_CLOUD_PROVIDER_ANNOTATION_KEY = "cloud-pipelines.net/orchestration/cloud_provider"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This is an AI-generated code review comment.

Module docstring's metadata-key list should mention cloud_provider.

After this PR, cloud_provider starts appearing in logs for queued executions. The module docstring at the top of this file enumerates the common metadata keys (request_id, pipeline_run_id, execution_id, etc.) — but cloud_provider isn't listed. People grepping the docstring to learn "what metadata keys exist?" won't find it.

Fix: add one line to the docstring's "Common metadata keys" list:

- cloud_provider: From task_spec annotations, set by execution_logging_context for orchestrated runs.


from .. import backend_types_sql as bts

_CLOUD_PROVIDER_ANNOTATION_KEY = "cloud-pipelines.net/orchestration/cloud_provider"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nebius_launchers.py has this same exact constant. Would it make sense to create a shared config/model/etc file that both of them reference against? Or some way to not duplicate the variable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants