Skip to content

feat(belief-state): add phase0 runtime measurement#228

Merged
drewstone merged 2 commits into
mainfrom
feat/belief-phase0-measurement
Jun 6, 2026
Merged

feat(belief-state): add phase0 runtime measurement#228
drewstone merged 2 commits into
mainfrom
feat/belief-phase0-measurement

Conversation

@drewstone
Copy link
Copy Markdown
Contributor

@drewstone drewstone commented Jun 6, 2026

Summary

  • add buildRuntimeBeliefPhase0Measurement() for joining runtime producer decisions, lifecycle evidence, labels, and run split metadata
  • emit completed BeliefDecisionPoint rows plus a BeliefDecisionResearchEvidencePacket and coverage summary
  • keep missing labels/run joins diagnostic-only and keep OPE blocked when propensities are absent
  • keep the Phase 0 measurement test under tests/belief-state/ because it exercises cross-module join behavior rather than a single local unit

Verification

  • pnpm exec vitest run tests/belief-state/phase0-measurement.test.ts
  • pnpm exec vitest run src/belief-state
  • pnpm typecheck
  • pnpm lint
  • pnpm test
  • pnpm build
  • pnpm verify:package

Notes: pnpm lint exits cleanly with two pre-existing warnings outside this patch.

@tangletools
Copy link
Copy Markdown
Contributor

✅ No Blockers — ab2a6f7a

Readiness 86/100 · Confidence 75/100 · 6 findings (6 low)

deepseek glm aggregate
Readiness 86 86 86
Confidence 75 75 75
Correctness 86 86 86
Security 86 86 86
Testing 86 86 86
Architecture 86 86 86

Full multi-shot audit completed 3/3 planned shots over 5 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 3/3 planned shots over 5 changed files. Global verifier still owns final merge decision.

🟡 LOW split metadata requirement inconsistency with roadmap — .evolve/pursuits/2026-06-04-belief-state-agent-eval.md

Line 30: next-action promotion requirements dropped 'split metadata' from the list (old: '>= 200 labeled decision points, split metadata, integrity checks...', new: '>= 200 labeled decision points, integrity checks...'). However the roadmap at docs/research/belief-state-agent-eval-roadmap.md:228 still lists 'Every row has train/dev/holdout split' as a Phase 0 completion criterion. The code (phase0-measurement.ts:104,162) captures split metadata from RunRecord, but the research evidence gates (research-evidence.ts) do not currently enforce it. This may be intentional — reflecting that split is data-property not gate — but the inconsistency

🟡 LOW Test map row is aspirational, not verified — docs/research/belief-state-agent-eval-roadmap.md

Line 572 adds a test-map row for phase0-measurement.test.ts describing expected behavior ('joins runtime producer decisions...without fabricating missing joins or propensities'). This is a planning artifact — the test file exists in the PR's code changes (outside this shot's scope) but the roadmap assertion about what it tests is only as reliable as the test implementation. No action needed for a docs-only shot, but the global verifier should confirm the test file matches this description.

🟡 LOW Tests don't exercise label-to-point probability propagation — src/belief-state/phase0-measurement.test.ts

No test verifies that RuntimeBeliefDecisionLabel.behaviorProb or .targetProb propagate through to BeliefDecisionPoint.behaviorProb/.targetProb. The counterfactual test only asserts their absence. The underlying runtimeDecisionPointToBeliefDecisionPoint handles this correctly (tested in runtime-hooks.test.ts), but the Phase 0 integration path isn't covered. Add a test with labels carrying behaviorProb=0.3 and targetProb=0.5 and assert withBehaviorProb and withTargetProb summary counts.

🟡 LOW Loose options spread passes runtime fields to downstream packet builder — src/belief-state/phase0-measurement.ts

Line 127-130: buildBeliefDecisionResearchEvidencePacket({ ...options, points }) spreads all BuildRuntimeBeliefPhase0MeasurementOptions fields including runs, decisions, events, labels into the packet builder, which passes them to analyzeBeliefDecisionCorpus. JS ignores unknown properties at runtime so no functional bug, but future field name collisions and auditability suffer. Fix: destructure only the fields that BuildBeliefDecisionResearchEvidencePacketOptions accepts.

🟡 LOW compactMetadata duplicated across phase0-measurement.ts and runtime-hooks.ts — src/belief-state/phase0-measurement.ts

Lines 175-178 define compactMetadata identically to runtime-hooks.ts:381-384. Same signature, same filter-from-entries logic. Should be extracted to a shared internal utility (e.g. ./internal/compact-metadata.ts) or re-exported from runtime-hooks. Minor DRY violation that increases maintenance surface.

🟡 LOW labelJoinRate conflates label join success with downstream validation failures — src/belief-state/phase0-measurement.ts

Line 157: labelJoinRate: ratio(points.length, producerDecisionCount). The numerator is points.length (points that passed ALL downstream validation via runtimeDecisionPointToBeliefDecisionPoint), not producerDecisionCount - missingRunRecordCount - missingLabelCount. If a label join succeeds but runtimeDecisionPointToBeliefDecisionPoint returns no point (e.g. chosenAction missing, unsupported kind), labelJoinRate drops without any diagnostic explaining the mismatch between missingLabelCount and the actual completed count. The runJoinRate on [line 156](https://github.com/tangle-network/agent-eval/blob/ab2a6f7ad415e3b4a6c0d76309be8a8750b7


tangletools · 2026-06-06T13:55:29Z · trace

tangletools
tangletools previously approved these changes Jun 6, 2026
Copy link
Copy Markdown
Contributor

@tangletools tangletools left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Approved — 6 non-blocking findings — ab2a6f7a

Full multi-shot audit completed 3/3 planned shots over 5 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 3/3 planned shots over 5 changed files. Global verifier still owns final merge decision.

Full immutable report for this review: trace

Summary comment for this run: full summary


tangletools · 2026-06-06T13:55:29Z · immutable trace

Copy link
Copy Markdown
Contributor

@tangletools tangletools left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Refreshed approval after new commits — bcf09d0f

A previous trusted approval on this PR was invalidated by new commits.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: stale_approval_refresh · 2026-06-06T17:28:50Z

@drewstone drewstone merged commit 4fbaa4f into main Jun 6, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants