refactor(telemetry): align metrics/pricing init with tracing pattern#1252
refactor(telemetry): align metrics/pricing init with tracing pattern#1252ajbozarth wants to merge 2 commits into
Conversation
Wraps init logic in metrics.py and pricing.py in callable `_setup_*()` functions so tests can re-trigger init via a helper instead of `importlib.reload`. Drops the seven module-level env-read constants from metrics.py; env vars are now read at call time inside `_setup_meter_provider()` and hot-path record_* gates check `_meter is None`. Adds test/telemetry/conftest.py with shared `reset_*_state()` helpers that replace ~40 `importlib.reload(mellea.telemetry.*)` calls across all four telemetry test suites. Consolidates three drifted copies of `_reset_tracing_state()` left over from PR generative-computing#1181. Assisted-by: Claude Code Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
|
PR #1252 Code Review — Complete Findings I've completed a comprehensive review of PR #1252 (telemetry refactor) examining all of metrics.py, pricing.py, and tracing.py. Here are all the issues found: 🔴 CONFIRMED BUG — Issue #1 _meter not reset when metrics disabled (Line 319-341 in metrics.py) The PR changed hot-path gates from if not _METRICS_ENABLED: to if _meter is None:. This exposes a state sync bug: When _setup_metrics() disables metrics, it sets _metrics_enabled = False but fails to reset _meter = None, violating the invariant: _meter is not None ⟺ _metrics_enabled is True Why tests don't catch it: conftest.py pre-clears _meter = None (line 11) before calling _setup_metrics(), masking the bug. Fix: Add one line at line 329: Hardcoded lazy globals list has no safeguard (conftest.py lines 14-25) The reset function hardcodes 11 lazy instrument globals. If a new lazy global is added to metrics.py and forgotten in conftest:
This is a silent test isolation failure that's hard to debug. Fix: Auto-discover lazy instruments in conftest instead of hardcoding the list. Plugin registration won't recover if framework unavailable initially (metrics.py lines 273-294) If _HAS_PLUGIN_FRAMEWORK is False on first setup, _plugins_registered is never set to True. Combined with conftest intentionally not resetting this flag, plugins will never register even if the framework becomes available later. Fix: Either (a) always set flag on first call, or (b) use tri-state, or (c) document the assumption. 📋 PRE-EXISTING CLEANUP (Not PR issues)
Both are pre-existing code debt. Don't apply in this PR; file separate tech-debt issues. ✅ What Works Well
Approval Decision |
akihikokuroda
left a comment
There was a problem hiding this comment.
Please fix the issue 1. Thanks!
|
For reference: Executive SummaryThis PR successfully refactors Issues Found
ISSUE #1: CONFIRMED BUG —
|
… return Guarantees the invariant `<field> is not None ⟺ <module>_enabled is True` in `_setup_metrics()` and `_setup_tracing()`. Without this, calling either setup function with a populated state and metrics/tracing now disabled (e.g. via env-var flip) would leave a stale meter/tracer reference that the hot-path `if X is None` gates would treat as live. Assisted-by: Claude Code Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
|
Thanks for the thorough review. Walking through each item: Issue #1 — Issue #2 — Hardcoded lazy globals list in conftest: That fragility has to live somewhere — auto-discovery via Issue #3 — Plugin registration won't recover if framework unavailable: I think this analysis is inverted. Issues #4 & #5 — Counter/no-op duplication: These aren't issues — they're the right shape. Each lazy-init getter reads top-to-bottom in five seconds (name, description, unit, return); a registry/decorator would replace that with indirection that's harder to grep and harder to type-check. The three no-op classes are two lines each and match three distinct OTel protocols ( |
planetf1
left a comment
There was a problem hiding this comment.
Nice work — clean approach and the _meter_provider.get_meter() fix is the right call. A few small notes below; none are blockers.
| monkeypatch.setenv("MELLEA_METRICS_ENABLED", value) | ||
| importlib.reload(mellea.telemetry.metrics) | ||
| reset_metrics_state() | ||
| from mellea.telemetry.metrics import is_metrics_enabled |
There was a problem hiding this comment.
This (and test_metrics_disabled_with_falsy_values just below) leaves the module enabled on exit — monkeypatch restores the env vars but won't re-run reset_metrics_state(), so the built provider lingers until the next test's fixture rebuilds it. Works today, but it's an order-dependency waiting to bite. Either a final reset after the loop:
| from mellea.telemetry.metrics import is_metrics_enabled | |
| assert is_metrics_enabled(), f"Failed for value: {value}" | |
| reset_metrics_state() |
or adding clean_metrics_env to the signature so teardown handles it. Not a blocker.
There was a problem hiding this comment.
Sorry, comment off by a line
| metrics._meter_provider.shutdown() | ||
| metrics._meter_provider = None | ||
| metrics._meter = None | ||
| # Cached lazy instruments — bound to the old MeterProvider, must clear so |
There was a problem hiding this comment.
This nulls each lazy instrument global by hand, and right now it's an exact match for the twelve in metrics.py — nice. The only fragility is drift: a new _*_counter getter added later won't get cleared here, and it'd leak a stale instrument across resets without any test failing loudly. Might be worth a short note on the getter section in metrics.py (or here) cross-referencing this reset, so the next person adding an instrument knows to update both spots. Not blocking.
| metrics._requirement_checks_counter = None | ||
| metrics._requirement_failures_counter = None | ||
| metrics._tool_calls_counter = None | ||
| # _plugins_registered intentionally NOT reset — registry is process-global. |
There was a problem hiding this comment.
Could spell out the consequence here — the flag guards an idempotent register, and clearing it would make the next _setup_metrics() re-register already-registered plugins, which surfaces as spurious ValueError/UserWarning noise:
| # _plugins_registered intentionally NOT reset — registry is process-global. | |
| # _plugins_registered intentionally NOT reset: the registry is process-global, | |
| # and re-registering already-registered plugins raises ValueError (surfaced as | |
| # a UserWarning). The guard stays set so re-setup is a no-op for plugins. |
Pull Request
Issue
Fixes #
Description
Brings
mellea.telemetry.metricsandmellea.telemetry.pricingonto the same module-init pattern thatmellea.telemetry.tracingadopted in #1181. Eliminatesimportlib.reload(mellea.telemetry.*)from every telemetry test (~40 sites) and consolidates the three drifted copies of_reset_tracing_state()that PR #1181 left behind.No public API change. No production behavior change. Test side absorbed nearly all of the diff.
Why: after #1181,
tracing.pywas the only telemetry pillar with its init logic in a recallable_setup_tracing()function.metrics.pyandpricing.pystill captured env vars into module-level constants at the top of the module, which forced every test that toggled an env var toimportlib.reload(mellea.telemetry.metrics)to re-execute the module body. Each reload re-runs imports and side effects — slow, noisy, and fragile.Source-side changes:
metrics.py— Module-level env-read constants removed. The seven private_METRICS_*/_OTLP_METRICS_ENDPOINT/_SERVICE_NAME/_EXPORT_INTERVAL_MILLISconstants no longer exist; env-var reads live inside_setup_meter_provider(), matchingtracing.py. The hot-path gate in everyrecord_*function checks_meter is Nonedirectly (semantically equivalent to the old_METRICS_ENABLEDcheck). Also fixes a parallel of the tracing-side regression caught in fix: reload module for telemetry testing so all tests can run #805:_meteris now bound off the freshly built provider rather than via OTel's globalmetrics.get_meter(), sinceset_meter_provider()is one-shot per process and the global meter would otherwise fall back to the original (now shutdown) provider after a reset.pricing.py— Init logic wrapped in_setup_pricing(). Same shape as the others; no production behavior change.Minor cleanup: the redundant
if is_tracing_enabled():guard at the module-load_setup_tracing()call site is dropped (the function early-returns on its own), andis_metrics_enabled()/is_tracing_enabled()now cache their result likeis_pricing_enabled()already does.Test-side changes:
test/telemetry/conftest.pywith sharedreset_metrics_state(),reset_tracing_state(),reset_logging_state(),reset_pricing_state()helpers replacing allimportlib.reloadcalls. New regression testtest_meter_remains_functional_after_repeated_resetsexercises_setup_metrics()across multiple reset cycles and asserts recordings still land in the active reader._reset_tracing_state()consolidated.test_tracing.pyandtest_tracing_backend.pypreviously omitted the_tracer_provider.shutdown()call thattest_tracing_plugins.pyhad — the consolidated helper does the shutdown for all three. Strictly safer cleanup; no perceptible slowdown (idleBatchSpanProcessorworkers join in microseconds)._LITELLM_AVAILABLEand_OTEL_AVAILABLE: tests no longer manipulatesys.modulesto fake import failure. They now usemonkeypatch.setattr(...)directly on the module attribute. Same code paths exercised; module-load-timetry: import ...blocks are no longer reverified per-test.test_metrics.pyshifted from cached env captures (_METRICS_CONSOLE is True,_OTLP_METRICS_ENDPOINT == "...", etc.) to behavior on the constructedMeterProvider(resource attributes, attached metric readers, mock-patched exporter call args). Strictly stronger — proves env values were both read and applied.test/telemetry/__init__.pyadded sofrom test.telemetry.conftest import ...resolves under mypy. Same patterntest/plugins/__init__.pyalready uses.What didn't change:
is_metrics_enabled(),is_tracing_enabled(),is_pricing_enabled(),get_otlp_log_handler(), allrecord_*andcreate_*functions — same signatures, same semantics._plugins_registeredsemantics preserved (process-global, intentionally not reset by tests).Testing
2 e2e tests in
test_metrics_backend.py::test_huggingface_token_metrics_integrationfail on local — pre-existinghuggingface.py:1119torch tensor-indexing issue, unrelated to this PR. Not seen in-m "not e2e".Opened #1253
Attribution
Adding a new component, requirement, sampling strategy, or tool?
If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.
NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.