The `fedify bench` command with its scenario and report schemas by dahlia · Pull Request #791 · fedify-dev/fedify

dahlia · 2026-06-05T07:37:09Z

Resolves #783, the second of the five benchmarking steps tracked in #744. It adds the client half of fedify bench to @fedify/cli: a load generator that exercises a Fedify server the way the fediverse does.

Generic HTTP load tools (autocannon, wrk, k6) cannot sign an inbox delivery, build a realistic ActivityStreams payload, or read a target's queue depth, so against a federation server they measure the wrong thing. The server half landed earlier in #782, which added benchmarkMode to FederationOptions together with the cooperative /.well-known/fedify/bench/stats and …/trigger endpoints. This PR is what drives that target.

The command acts as a synthetic remote actor. It generates keys and serves its own actor and key documents over loopback, then discovers the recipient's inbox the way a real peer would. Every delivery is signed with the same @fedify/fedify signer a real sender uses, so the crypto cost lands in the measured latency. It drives the load, reads the target's server-side metrics from the stats endpoint, and renders one report model as text, JSON, or Markdown.

What it includes:

A scenario suite in YAML or JSON that declares the target, the actors to sign as, shared defaults, and a list of scenarios, each with an expect block of pass/fail thresholds that doubles as a CI gate.
Two runners, inbox (the signed end-to-end delivery benchmark) and webfinger. The format and the schema can express the other types from Performance benchmarking tools for Fedify federation workloads #744 (actor, object, fanout, collection, failure, mixed), but a scenario whose type has no runner yet is rejected with a clear message rather than silently skipped.
Open-loop (rate) and closed-loop (concurrency) load, with coordinated-omission correction so a stalled target shows up as latency instead of disappearing, plus constant or Poisson arrivals and an optional maxInFlight cap.
Three signing strategies kept off the send critical path, chosen per scenario: pipeline (background signers fill a bounded buffer), jit, and presign.
Supporting machinery: target safety gating, recipient discovery, the synthetic actor/key server, and an in-house log-linear latency histogram. The text, JSON, and Markdown outputs all derive from one report model, so they cannot drift apart.

Scenario format and JSON schema

The schema is dual-maintained. A frozen TypeScript literal embedded in the CLI is what the runtime validates against, using @cfworker/json-schema (pure JavaScript, so it survives deno compile); the committed schema/bench/scenario-v1.json and schema/bench/report-v1.json are the published copies. A test guard keeps the embedded and published forms byte-identical and refuses any edit to an already-published version, so a -v1 URL never changes meaning. The # yaml-language-server: line in a suite gives editors autocomplete and validation against the published URL.

Safety

A run proceeds without friction against a loopback or private target, or any target that advertises benchmark mode. A public target that does not advertise it is refused unless you pass --allow-unsafe-target, which is mandatory and never prompted in CI. The gate classifies the actual load destination, not only the declared target, so a loopback target paired with a public recipient (or an explicit public inbox:) cannot route load to production behind the gate's back. For the same reason, benchmark traffic does not follow redirects. Signed scenarios additionally need the synthetic actor server to be reachable from the target: a loopback target reaches it automatically, and a non-loopback target requires --advertise-host.

Schema hosting

The schemas live at https://json-schema.fedify.dev/. This PR adds the static assets under schema/: the two JSON files, an index.html landing page, a contributor README.md, and _headers with netlify.toml that set CORS, long-lived immutable caching, and the application/schema+json content type. The hosting itself is configured on Netlify out of band; CI does not upload anything.

Testing and documentation

The benchmark test suite runs under both Node and Deno (about 240 tests), including an end-to-end inbox benchmark against a real benchmarkMode server that verifies the signatures, so the signed delivery path is run rather than mocked. docs/manual/benchmarking.md gains a client section covering the suite format, the actor and signing model, the output formats, and safety; CHANGES.md has an entry under version 2.3.0.

Add the skeleton for a new `fedify bench` subcommand in @fedify/cli that will run ActivityPub-specific load benchmarks against a cooperative Fedify target running in benchmark mode. This first step wires the command into the CLI without the engine: - Define the Optique `benchCommand` with the suite-file argument and the --target, --format, --output, --dry-run, and --allow-unsafe-target options, plus a stub `runBench` that is fleshed out in later steps. - Register the command in the runner and dispatcher, and add a `bench` section to the configuration schema. - Add the `@cfworker/json-schema` (draft 2020-12 validator) and `yaml` dependencies used by the scenario format, to both deno.json and package.json. - Cover argument parsing with tests. fedify-dev#783 fedify-dev#744 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

Add a lightweight HdrHistogram-style log-linear histogram used by the benchmark engine to record latency samples and compute percentiles with bounded relative error. Values are bucketed by octave and split into linear sub-buckets, so the relative error stays roughly constant across the whole range. The structure is sparse, mergeable, and serializable, which lets percentiles from several runs be re-aggregated without coordinated-omission error and lets the report carry an optional serialized histogram. Sub-bucket indices are derived from the mantissa ratio to avoid denormal underflow, and non-positive samples (including -0) are normalized to the zero bucket. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

Add the small, pure building blocks the scenario format is built on: - `asList()`: scalar-or-list coercion, so fields such as `recipient`, `seed`, `collection`, and `type` can accept either a single value or a list while the common single-value case stays terse. - `parseSize()` / `resolveGenerate()`: typed payload-generation directives (e.g. `content: { generate: lorem, size: 2KB }`) that produce deterministic output of an exact byte size, with the size parser bounded to the safe-integer range. - A logic-less GitHub-Actions-style `${{ ... }}` template engine (dotted-path resolution plus whitelisted helper calls). Lookups go through own properties only, with a denylist for prototype members, and unclosed delimiters, trailing text, and unbalanced quotes are rejected rather than silently mishandled, so the format cannot turn into a programming language. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

Define the `fedify bench` scenario suite format and its published JSON Schema (draft 2020-12). The format is a suite of `version`, `target`, `defaults`, `actors`, and `scenarios`, with an `expect` block per scenario, and it can express every scenario type discussed for the tool (inbox, webfinger, actor, object, fanout, collection, failure, mixed) even though only inbox and webfinger will have runners. Rather than a schema-first single source, the published JSON Schema and the TypeScript types are maintained as two artifacts kept identical by a drift guard. Runtime validation uses `@cfworker/json-schema`, and a validated value is narrowed with an `as unknown as` cast. Three cross-field rules live in the schema where an editor can flag them: - exactly one HTTP request signature scheme per actor group (`contains` + `minContains`/`maxContains`); - `rate` XOR `concurrency` in a load block (`oneOf`); - the allowed `expect` metrics per scenario type (`if`/`then` + `propertyNames`). The embedded schema object is the editing source; *schema/bench/* holds the hosted copy, regenerated by *scripts/generate-bench-schema.ts*. Four guards run as tests: structural/meta validation, example-fixture validation (valid and invalid fixtures covering every scenario type), drift between the embedded object and the published file, and git-based immutability of already-published version files. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

Add the normalization step that turns a schema-validated suite into the resolved form the engine runs: - `parseDuration()` and `parseRate()` parse the human-friendly duration (`30s`) and rate (`200/s`) units into milliseconds and requests per second, rejecting non-positive and overflowing magnitudes. - `normalizeSuite()` applies suite defaults, coerces the top-level scalar-or-list fields to arrays, resolves the target (with a `--target` override), and determines the open- or closed-loop load model, inheriting compatible fields such as `arrival` and `maxInFlight` from the defaults while a scenario's `rate`/ `concurrency` selects the model. It also enforces the one cross-field rule the JSON Schema cannot express: the buffered signing modes (`pipeline`, `presign`) pre-sign requests, so they require the target's signature time window to be off; a time-windowed target must use `signing: jit`. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

Define the canonical benchmark report: the single result model from which the terminal, JSON, and Markdown renderers all derive, so the outputs can never drift apart. JSON is the canonical machine form, pinned by a published draft-2020-12 schema (schema/bench/report-v1.json). The model splits `client` and `server` numbers by nesting so it is clear which the load generator measured and which came from the target's stats endpoint, bakes the unit into numeric keys (latencyMs, drainMs), turns each expect assertion into an evaluated record, and carries first-class environment/target/configHash reproducibility metadata plus an optional serialized histogram. The report schema is registered alongside the scenario schema, so the existing structural, fixture, drift, and immutability guards now cover it too; a valid report fixture is added. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

Turn each scenario's `expect` block into evaluated records that gate a run. `parseAssertion()` parses a human assertion (">= 99%", "< 100ms", "< 2s", ">= 500/s", "== 0") into an operator and a machine-clean threshold: percentages become ratios, durations milliseconds, rates per second. `evaluateExpect()` looks each metric up by name (successRate, throughputPerSec, errors.4xx/5xx/total, latency.*, signatureVerification.*, queueDrain.*), checks the assertion's unit is compatible with the metric's natural unit, and compares. Equality is tolerant for float metrics but exact for counts. A `fail`-severity assertion gates the build while `warn` only annotates, and a missing or unmeasured metric fails cleanly. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

Assemble the canonical report from measured scenario data and render it in three forms from that single model: - `buildScenarioResult()`/`buildReport()` turn resolved scenarios and their measurements into the report, evaluating each `expect` block, summarizing the load model, and computing the overall gate. - `detectEnvironment()` and `configHash()` capture the reproducibility metadata (runtime, OS, CPU count, and a stable sha256 over the canonicalized configuration, honoring `toJSON()` so URLs hash by value). - The JSON renderer is the canonical machine form (pinned by the report schema); the terminal-text and Markdown renderers derive from the same model. A shared metric-unit registry keeps the evaluator and the renderers in agreement, so measured values display in the metric's own unit while an explicit assertion unit stays visible. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

Add the client-side safety guard and the discovery that finds where to deliver: - `classifyTarget()` sorts a target into loopback/private/public from its host (IP-literal aware, IPv4-mapped IPv6 decoded), conservatively treating anything it cannot confirm as public. - `assertTargetAllowed()` lets loopback/private targets and any target advertising benchmark mode run without friction, and refuses only a public target that does not advertise benchmark mode unless --allow-unsafe-target is given (mandatory, with no interactive prompt); --dry-run bypasses the gate since it only inspects. - `probeBenchmarkMode()` reads the cooperative `stats` endpoint to detect benchmark mode and the target's Fedify version, never throwing. - `discoverInbox()` resolves a handle or actor URI to its personal and shared inbox the way a remote peer would, building private-address-allowing loaders for loopback targets, and `selectInbox()` picks the inbox for the scenario's mode. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

Stand up the benchmark's own synthetic remote peer. An author picks signature standards and the key set is derived: HTTP request signatures and LD Signatures share one RSA pair, FEP-8b32 uses an Ed25519 pair. `buildFleet()` expands the actor groups into members with generated keys, and `spawnSyntheticServer()` serves each member as a normal ActivityPub actor document with an embedded `publicKey` and `assertionMethod` over plain loopback HTTP. The target dereferences a signature's keyId during verification, so serving exactly the document a real actor exposes lets verification resolve the key the same way; a fixed actor set keeps this on a cold path a warm-up window excludes. A test confirms the served document parses back into a verifiable actor whose keys resolve. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

Sign inbox deliveries reusing the @fedify/fedify signers so the client pays realistic crypto cost. `signInboxDelivery()` applies the FEP-8b32 object proof and the LD Signature to the document, then the HTTP request signature (cavage or rfc9421) to the final body. `createActivityIdMinter()` mints a unique activity id per request, satisfying Fedify's always-on inbox idempotency automatically. `createSigningPipeline()` keeps RSA signing off the send critical path with three lookahead modes: `jit`, `pipeline` (default; background signers keep a bounded buffer filled and buffer starvation surfaces the client as the bottleneck), and `presign`. The pipeline cannot hang on a stuck factory, drops transient sign failures, and fails fast on deterministic ones. Tests verify the produced cavage and rfc9421 requests pass Fedify's own verifyRequest against synthetic-server keys. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

Drive load and turn the raw samples into client-side metrics. `runLoad()` supports open-loop (a fixed arrival schedule, with latency measured from each request's scheduled time — the coordinated-omission correction — so a stalled target or maxInFlight backpressure shows up as latency rather than being omitted) and closed-loop (N virtual users). A fair slot-transferring semaphore enforces `maxInFlight` in both models and reports backpressure as the saturation signal; arrivals are a lazy generator (constant or seeded Poisson) and only in-flight dispatches are retained, so memory stays flat on long runs. `aggregateSamples()` excludes warm-up samples and produces request counts, success rate, throughput over the measured window, latency percentiles from the log-linear histogram, and errors grouped by kind, status, and reason. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

Wire the engine into runnable scenarios. The stats client reads the cooperative `stats` endpoint and projects the signature-verification histogram and queue depth into the report's server section, robust to malformed snapshots. The inbox runner discovers the recipient inbox, builds a signing factory over the synthetic fleet, drives the signing pipeline and load generator, aggregates the client metrics, and attaches the server metrics; the webfinger runner drives handle-resolution lookups. A registry dispatches by type and reports a clear error for the scenario types that the format expresses but this version does not run. `presign` signing now requires an open-loop load (a closed-loop run has no fixed request count to pre-sign). An end-to-end test stands up a real `benchmarkMode` Fedify federation and confirms signed inbox deliveries verify, the inbox listener runs, and server-side signature-verification metrics are read back. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

Implement `runBench`: load, validate, and normalize the suite (any configuration error logs a friendly message and exits 2), preflight the scenario runners so an unsupported type fails fast, classify and probe the target, and apply the safety gate. A `--dry-run` prints the plan and sends nothing. For a real run it builds the synthetic actor server once when a signed scenario needs it, runs each scenario, assembles the report, renders it to the chosen format (stdout or a file), and sets the exit code to 0 when the gate passes and 1 otherwise. The default exit sets `process.exitCode` so cleanup and output flushing finish first. Signed scenarios are refused against a public target, since the synthetic actor server is only reachable on the client's loopback. Dependencies are injectable, and tests cover the passing and failing gates, dry run, the unsafe-target and public-signed refusals, and an invalid suite. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

Wire the logic-less `${{ ... }}` template engine into the load pipeline: `renderSuiteTemplates()` expands templates in a parsed suite with a context exposing the target (host, hostname, port, origin, href, protocol) plus the default helpers, and `runBench` runs it between loading and validation. This is what makes `recipient: "http://${{ target.host }}/users/alice"` resolve to a concrete URL. The target comes from `--target` or the suite's own `target`, neither of which is templated. Tests cover rendering and the end-to-end inbox run now uses templating. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

Extend the benchmarking manual with the client side: a getting-started scenario suite, the actors and signature-standards model, `${{ }}` templating, open- and closed-loop load with the signing modes, the output formats and CI usage, the safety gate, and the http/loopback caveats. Add the @fedify/cli changelog entry for the new command. fedify-dev#783 fedify-dev#744 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

Address four behavioral gaps where the bench engine silently accepted options it did not actually apply: - Reject `runs` greater than 1 during normalization. Repeated runs are not implemented yet, so accepting the field gave a single run while implying several. - Fail a scenario that measured zero requests instead of letting every `expect` assertion pass vacuously, and reject a `warmup` that is not shorter than the `duration` (which would leave no measured window). - Reject inbox `activity` options the runner cannot honor. The runner always delivers a `Create` carrying an embedded `Note`, so a non-`Create` activity type, a non-`Note` `object.type`, or `embedObject: false` is now refused up front through a new optional `validate()` on the runner, called during preflight. Scalar-or-list type fields are checked in full, not just their first element. - Implement multi-recipient delivery in the inbox runner: every recipient's inbox is discovered, and deliveries (with the synthetic actors that sign them) are rotated across the recipients, modeling a server receiving from many peers into many local inboxes. The scenario format and JSON Schema still express these options; only the inbox/webfinger runners constrain what they execute in this version. fedify-dev#783 fedify-dev#744 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

A malformed `expect` assertion was only parsed while evaluating results, which happens after the entire benchmark load has been sent. Worse, the run loop has no catch around result building, so the resulting AssertionParseError escaped uncaught and crashed the command instead of failing as a configuration error. Add validateExpectBlock(), which parses every assertion in a scenario's `expect` block, and run it in the preflight step (alongside runner validation) before any probe or load. A typo in a CI gate now exits 2 without sending traffic, with a message naming the offending metric. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

The cooperative `stats` endpoint is cumulative and has no reset, but the inbox and webfinger runners read it once at the end, so the reported server numbers (and any signatureVerification.* expectations) folded in warm-up traffic and every earlier scenario in the suite. Client samples were already windowed; the server side was not, so the two disagreed. Take a server snapshot at the measured-window boundary and diff it against the end snapshot: - stats-client.ts gains a raw `ServerSnapshot` (signature histogram and queue-depth gauge), `parseServerSnapshot`, `diffSnapshots` (subtracts bucket counts; the gauge is not cumulative, so the end value is kept), and `snapshotToMetrics`. `fetchServerSnapshot` returns `null` only on transport or parse failure; an available-but-empty snapshot is non-null, so an unavailable baseline is never mistaken for an empty one. Histogram subtraction requires identical bucket boundaries, and refuses (yields no signature metric) otherwise. - runner.ts gains `withMeasuredWindowStart`, which gates every measured send on a one-shot boundary callback so the baseline is captured before any measured request reaches the target. - The inbox and webfinger runners snapshot the baseline at the boundary and report server metrics only when both ends of the window were captured, instead of falling back to the cumulative snapshot. A few warm-up requests still in flight at the boundary may be attributed to the window; a hard drain would distort the coordinated-omission client latency, so that bounded residue is accepted and documented. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

The scenario schema's `load` object required exactly one of `rate` or `concurrency`, so a block that set only `arrival` or `maxInFlight` and inherited its load model was rejected before normalization, even though `resolveLoad()` already supports such partial overrides (inheriting the model, or falling back to the default open-loop rate). Relax the constraint to forbid only `rate` and `concurrency` together, allowing either or neither. This lets a suite write, for example, `defaults: { load: { maxInFlight: 100 } }` or override just `arrival` on one scenario. The embedded schema literal and the published schema/bench/scenario-v1.json are regenerated together (the v1 file is new on this branch, so it is not yet immutable). fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

The synthetic actor/key server bound loopback and advertised `127.0.0.1` actor and key IDs, which the target dereferences to verify HTTP signatures. A same-machine (loopback) target reaches it, but a non-loopback target dereferences its own `127.0.0.1`, fails key lookup, and rejects every signed delivery. The command nonetheless allowed signed scenarios against private targets, so they failed silently. Add a `--advertise-host` option. When set, the synthetic server binds every interface (`0.0.0.0`, or `::` for an IPv6 host) and advertises the given host in its actor, key, and base URLs, so a non-loopback target can dereference them. `resolveAdvertiseHost()` validates the value as a bare host name, IPv4 address, or IPv6 literal (bracketing IPv6 for the URL authority and binding the matching family), rejecting a scheme, port, path, or other URL syntax with a clear configuration error. Signed scenarios are now refused (exit 2) when the target is non-loopback and no `--advertise-host` is given, instead of running and failing on the target. The documentation is updated accordingly. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

The `--user-agent` value was passed only to the document loader, so the benchmark's main requests — the runners' inbox POSTs and WebFinger GETs, the benchmark-mode probe, and the server stats reads — went out with the runtime's default User-Agent. A target that inspects, logs, or rate-limits by User-Agent saw the wrong value, so the option was silently ineffective for the traffic that matters. Wrap the fetch implementation once with withUserAgent(), so every benchmark request carries the configured User-Agent. A prebuilt request (the signed inbox delivery, a WebFinger GET) has the header set in place rather than recloned, leaving the already-signed body and digest untouched; the User-Agent is not part of the signed header set, so this does not affect verification. A User-Agent the caller already set is left as-is. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

The text and Markdown renderers only surfaced server queue metrics when a drain-latency histogram was present, with the depth shown merely as a suffix to that line. The current stats reader supplies `queue.depthMax` without `drainMs`, so queue depth never appeared in the human-readable output even though it was in the JSON model; the Markdown form rendered no queue metrics at all. Render queue depth on its own: - text: keep the combined drain line (now only when it has at least one percentile), otherwise print a standalone `Server queue depth max` line whenever a depth is reported. - Markdown: add a queue drain p95 row when present and a queue depth max row whenever a depth is reported. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

`new URL("localhost:3000")` parses as the `localhost:` scheme with an empty host, a common typo for a missing `http://`. Normalization accepted it, so `--dry-run` succeeded while a real run would misclassify the target or build an unsupported fetch URL. Targets carrying credentials (`http://user:pass@host`) were likewise accepted even though `fetch` rejects them. Reject, during normalization, any target whose protocol is not `http:` or `https:`, whose host is empty, or that carries embedded credentials, with a message pointing at the likely fix. The probe and runners only make bare HTTP(S) requests, so these never produce a working run. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

The safety gate classified only the suite `target`, but an `inbox` scenario's actual signed-load destination is the discovered inbox (or an explicit `inbox:` URL), which can differ from the target. A loopback `target` with a public `recipient`, or `inbox: https://prod.example/inbox`, would send benchmark POST load to a public inbox with no gate at all, bypassing the guard against accidentally benchmarking production. The synthetic-reachability rule was likewise only checked against the target tier, not the destination that actually verifies signatures. Gate each resolved inbox destination before any load reaches it: - assertInboxDestinationAllowed() refuses a public destination unless it shares the gated target's origin while the target advertises benchmark mode (inheriting its gate), or --allow-unsafe-target is given; and refuses a non-loopback destination unless a reachable synthetic host was advertised (--advertise-host). Origins are compared (scheme, host, effective port), so an http inbox does not inherit an https target. - The inbox runner calls an injected destination gate for each resolved inbox before sending; the orchestrator maps a refusal to exit 2. Discovery (a read) still runs, but no benchmark load is sent to an ungated destination. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

The default fetch follows redirects, which let two safety checks be bypassed. A public target whose `stats` endpoint redirected to a host serving benchmark-mode JSON was marked as advertising benchmark mode, so the gate allowed load against it. And a gated loopback, private, or benchmark target that answered a WebFinger GET or a signed inbox POST with a 307/308 could carry that load to an ungated public service, slipping past the destination gate. Make every benchmark request non-following: - The benchmark-mode probe and the server stats read use `redirect: "manual"`, so a redirect is treated as "not advertised" and "unavailable" respectively rather than trusted. - `sendRequest` re-wraps any non-manual request as `redirect: "manual"` and records a redirect (opaque or 3xx) as a failed send, so no signed load reaches the redirect target; the WebFinger and inbox requests are built with `redirect: "manual"` so the common path needs no re-clone. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

coderabbitai · 2026-06-05T07:37:23Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 407951d0-4fb9-4b77-be23-0a6f44f8ad50

📥 Commits

Reviewing files that changed from the base of the PR and between 4d980e2 and 374b41e.

📒 Files selected for processing (13)

docs/manual/benchmarking.md
packages/cli/package.json
packages/cli/src/bench/actor/fleet.test.ts
packages/cli/src/bench/actor/fleet.ts
packages/cli/src/bench/result/build.test.ts
packages/cli/src/bench/result/build.ts
packages/cli/src/bench/result/expect/evaluate.ts
packages/cli/src/bench/signing/pipeline.test.ts
packages/cli/src/bench/signing/pipeline.ts
packages/cli/src/bench/template/generate.test.ts
packages/cli/src/bench/template/generate.ts
packages/cli/src/bench/template/template.test.ts
packages/cli/src/bench/template/template.ts

📝 Walkthrough

Walkthrough

Adds a new fedify bench CLI: scenario schema/types/validation/normalization; load generator and clock; signing pipeline; inbox/webfinger runners and runner plumbing; discovery/probe and safety gates; metrics histogram/aggregation and stats client; synthetic actor server; report model and text/json/markdown renderers; published JSON Schemas, fixtures, and tests.

Changes

fedify bench end-to-end

Layer / File(s)	Summary
CLI command and orchestration `packages/cli/src/bench/command.ts`, `packages/cli/src/bench/action.ts`, `packages/cli/src/mod.ts`, `packages/cli/src/runner.ts`	Adds `benchCommand`, `runBench` orchestration, deps injection (`RunBenchDeps`), dry-run plan rendering, `withUserAgent`, and wires the subcommand into the CLI.
Scenario schema, types, validator, and normalization `packages/cli/src/bench/scenario/*`, `packages/cli/src/bench/scenario/schema.ts`, `packages/cli/src/bench/scenario/types.ts`	Adds canonical scenario TypeScript types, embedded scenario JSON Schema, `validateSuite`, `normalizeSuite`, unit/rate parsers, coercion helpers, and schema-driven validation with friendly error formatting.
Load generation and clock `packages/cli/src/bench/load/*`	Implements arrival scheduling (constant/Poisson, seeded RNG), monotonic Clock abstraction, open-loop and closed-loop `runLoad` with coordinated-omission correction and `maxInFlight` semaphore.
Signing pipeline and signer `packages/cli/src/bench/signing/*`, `packages/cli/src/bench/signing/signer.ts`	Adds signing pipeline modes (jit/pipeline/presign), buffered pipeline implementation, activity-id minter, and `signInboxDelivery` that applies document and request signatures per standards.
Scenario runners and plumbing `packages/cli/src/bench/scenarios/*`, `packages/cli/src/bench/scenarios/runner.ts`	Implements shared runner plumbing (`sendRequest`, measured-window start), `inboxRunner` and `webfingerRunner`, runner registry, discovery logic, and related tests.
Safety gates and tiers `packages/cli/src/bench/safety/*`	Adds target classification (loopback/private/public), `assertTargetAllowed`, `assertInboxDestinationAllowed`, `UnsafeTargetError`, and tests covering advertise-host and allow-unsafe-target behaviors.
Metrics, histogram, and stats client `packages/cli/src/bench/metrics/*`	Implements LogLinearHistogram, sample aggregation (aggregateSamples), server stats parsing/diffing/projection and fetch helpers for cooperative `/.well-known/fedify/bench/stats`.
Report model and renderers `packages/cli/src/bench/result/`, `packages/cli/src/bench/render/`	Adds canonical `BenchReport` model, builders (`buildScenarioResult`, `buildReport`, `configHash`, `detectEnvironment`) and renderers (`renderText`, `renderJson`, `renderMarkdown`) plus schema and renderer tests.
Synthetic server and actor docs `packages/cli/src/bench/server/synthetic.ts`	Synthetic actor/key server implementation and advertise-host handling with tests verifying served actor documents and advertised URL behavior.
Templates, helpers, generate directive `packages/cli/src/bench/template/*`	Adds `${{ ... }}` template engine, default helpers, deterministic `generate` directive (lorem), and tests for templating and generation.
Published schemas, tooling, and docs `schema/bench/*`, `packages/cli/scripts/generate-bench-schema.ts`, `schema/index.html`, `schema/_headers`, `schema/netlify.toml`, `schema/README.md`	Embeds and publishes scenario/report schemas, serializer and generation script, Netlify headers, hosting metadata and documentation, plus CI guards/tests to validate fixtures and immutability.
Fixtures and tests `packages/cli/src/bench/__fixtures__/`, many `packages/cli/src/bench//*.test.ts`	Adds invalid/valid scenario fixtures, report fixtures, and an extensive test suite covering parsing, validation, scheduling, signing, pipeline behavior, runners, metrics, rendering, and safety gates.
Changelog and docs `CHANGES.md`, `docs/manual/benchmarking.md`	Adds changelog entry for `@fedify/cli` Version 2.3.0 and a comprehensive `fedify bench` manual section with examples, safety notes, and schema links.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant CLI as fedify bench
  participant Target as Benchmark Target
  participant Stats as Target Stats Endpoint
  participant Synthetic as Synthetic Actor Server

  User->>CLI: run `fedify bench suite.yaml`
  CLI->>Stats: probe /.well-known/fedify/bench/stats (manual redirect)
  Stats-->>CLI: benchmarkMode + version?
  CLI->>Synthetic: spawn synthetic actor server (if signed scenarios)
  CLI->>Target: execute scenario traffic (signed requests)
  Target-->>CLI: responses (client latencies, statuses)
  CLI->>Stats: fetch baseline and end snapshots
  Stats-->>CLI: JSON snapshots
  CLI->>User: render report (text/json/markdown) and exit with pass/fail code

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

Benchmarking: fedify bench engine, scenario format, and JSON schema hosting #783: Primary issue implementing the benchmarking engine, scenario format, and schema hosting — this PR implements the scoped objectives.
Performance benchmarking tools for Fedify federation workloads #744: Broader benchmarking design; many design decisions implemented here derive from that issue.
Add cooperative benchmark mode for federation targets #787: Cooperative stats endpoint contract related to the stats-client probe and parsing.

Possibly related PRs

fedify-dev/fedify#787: Shares the cooperative benchmark stats flow and endpoint JSON contract used by the new CLI stats client.

Suggested labels

type/feature, type/documentation, component/testing, component/signatures, examples

Suggested reviewers

sij411
2chanhaeng

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

dahlia · 2026-06-05T07:38:02Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d980e2bf4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

gemini-code-assist

Code Review

This pull request introduces the 'fedify bench' command to '@fedify/cli' for benchmarking federation workloads, supporting signed inbox deliveries and WebFinger lookups. The implementation includes scenario suite validation against a published JSON Schema, a logic-less template engine, a synthetic actor/key server, an open/closed-loop load generator, background signing pipelines, and multiple report renderers. Feedback suggests implementing depth limits in recursive functions ('renderValue' and 'canonicalJson') to prevent stack overflows from deeply nested inputs, as well as using lazy cloning in 'renderValue' to minimize cloning costs.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

codecov · 2026-06-05T07:48:04Z

Codecov Report

❌ Patch coverage is 93.04960% with 227 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
packages/cli/src/bench/result/expect/evaluate.ts	76.25%	17 Missing and 16 partials ⚠️
packages/cli/src/bench/action.ts	87.32%	15 Missing and 12 partials ⚠️
packages/cli/src/bench/result/build.ts	80.55%	15 Missing and 6 partials ⚠️
packages/cli/src/bench/metrics/stats-client.ts	87.40%	7 Missing and 10 partials ⚠️
packages/cli/src/bench/metrics/histogram.ts	87.50%	5 Missing and 11 partials ⚠️
packages/cli/src/bench/scenarios/inbox.ts	89.40%	10 Missing and 6 partials ⚠️
packages/cli/src/bench/template/template.ts	91.53%	3 Missing and 8 partials ⚠️
packages/cli/src/bench/render/text.ts	91.34%	1 Missing and 8 partials ⚠️
packages/cli/src/bench/discovery/discover.ts	84.00%	4 Missing and 4 partials ⚠️
packages/cli/src/bench/result/expect/metrics.ts	75.75%	3 Missing and 5 partials ⚠️
... and 16 more

Files with missing lines	Coverage Δ
packages/cli/src/bench/actor/fleet.ts	`100.00% <100.00%> (ø)`
packages/cli/src/bench/command.ts	`100.00% <100.00%> (ø)`
packages/cli/src/bench/load/clock.ts	`100.00% <100.00%> (ø)`
packages/cli/src/bench/load/generator.ts	`100.00% <100.00%> (ø)`
packages/cli/src/bench/render/index.ts	`100.00% <100.00%> (ø)`
packages/cli/src/bench/render/json.ts	`100.00% <100.00%> (ø)`
packages/cli/src/bench/result/expect/assert.ts	`100.00% <100.00%> (ø)`
packages/cli/src/bench/result/schema.ts	`100.00% <100.00%> (ø)`
packages/cli/src/bench/safety/gate.ts	`100.00% <100.00%> (ø)`
packages/cli/src/bench/scenario/coerce.ts	`100.00% <100.00%> (ø)`
... and 37 more

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/cli/src/bench/actor/fleet.ts`:
- Around line 33-46: The function httpStandardOf currently uses
standards.find(...) so it silently accepts the first matching HTTP signature
when multiple are present; update httpStandardOf to collect all matches (e.g.,
standards.filter(...)), verify there is exactly one match, throw a TypeError if
count !== 1, and return that single match cast to HttpSignatureStandard; ensure
this behavior aligns with the FleetMember.httpStandard expectation and the
invalid fixture for two schemes.

In `@packages/cli/src/bench/result/expect/evaluate.ts`:
- Around line 197-209: The function sumErrors currently accepts min?: number,
max?: number but uses max! which is unsafe; change the signature for sumErrors
to require the min/max pair atomically (e.g., replace the two optional numeric
params with a single optional range object or tuple like range?: { min: number;
max: number } or range?: [number, number]) and update the implementation to read
range.min/range.max (or range[0]/range[1]) instead of min/max and remove
non-null assertions; then update all call sites that use sumErrors (calls that
currently pass both min and max should pass the new range value, and callers
that passed only one should be adjusted or left without the range) so the API
guarantees both bounds are provided together.

In `@packages/cli/src/bench/scenarios/inbox.ts`:
- Line 66: Remove the redundant validateActivity(scenario) invocation: the
existing validate() implementation already calls validateActivity(scenario) and
the scenario runner guarantees validate() is called before run(), so delete the
extra validateActivity(scenario) call (the one immediately after validate()) to
avoid duplicate validation and improve clarity; keep validate() and the run()
flow unchanged.

In `@packages/cli/src/bench/template/generate.ts`:
- Around line 111-116: Add an explicit upper bound for payload sizes (e.g.,
const MAX_PAYLOAD_BYTES = 100 * 1024 * 1024) and enforce it before attempting to
allocate strings: update parseSize() to validate that the parsed numeric size
does not exceed MAX_PAYLOAD_BYTES and throw a clear error (or clamp/return a
controlled failure) if it does; alternatively (or additionally) add the same
check at the start of generateLorem(size) to guard against huge repeats and
throw a RangeError with a descriptive message referencing the limit. Ensure
references to generateLorem() and parseSize() are updated to rely on this limit
so oversized inputs are rejected rather than causing memory exhaustion or
String.repeat RangeErrors.

In `@schema/README.md`:
- Line 9: In the README.md line containing the plain path
schema/bench/scenario-v1.json, wrap that file path in asterisks so it becomes
*schema/bench/scenario-v1.json* to comply with the markdown coding guideline for
**/*.md files; update the single occurrence of the string in the file
accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f64291df-e7d7-4523-833d-dffc7534da37

📥 Commits

Reviewing files that changed from the base of the PR and between e9ad1fa and 4d980e2.

⛔ Files ignored due to path filters (3)

deno.lock is excluded by !**/*.lock
pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
schema/logo.svg is excluded by !**/*.svg

📒 Files selected for processing (104)

CHANGES.md
docs/manual/benchmarking.md
packages/cli/deno.json
packages/cli/package.json
packages/cli/scripts/generate-bench-schema.ts
packages/cli/src/bench/__fixtures__/invalid/bad-expect-metric.yaml
packages/cli/src/bench/__fixtures__/invalid/failure-missing-fault.yaml
packages/cli/src/bench/__fixtures__/invalid/missing-version.yaml
packages/cli/src/bench/__fixtures__/invalid/mixed-bad-metric.yaml
packages/cli/src/bench/__fixtures__/invalid/rate-and-concurrency.yaml
packages/cli/src/bench/__fixtures__/invalid/two-http-schemes.yaml
packages/cli/src/bench/__fixtures__/invalid/unknown-field.yaml
packages/cli/src/bench/__fixtures__/reports/inbox-report.json
packages/cli/src/bench/__fixtures__/scenarios/all-types.yaml
packages/cli/src/bench/__fixtures__/scenarios/ci-gate.json
packages/cli/src/bench/__fixtures__/scenarios/getting-started.yaml
packages/cli/src/bench/action.test.ts
packages/cli/src/bench/action.ts
packages/cli/src/bench/actor/documents.ts
packages/cli/src/bench/actor/fleet.ts
packages/cli/src/bench/actor/keys.ts
packages/cli/src/bench/command.test.ts
packages/cli/src/bench/command.ts
packages/cli/src/bench/discovery/discover.test.ts
packages/cli/src/bench/discovery/discover.ts
packages/cli/src/bench/discovery/probe.test.ts
packages/cli/src/bench/discovery/probe.ts
packages/cli/src/bench/load/arrival.test.ts
packages/cli/src/bench/load/arrival.ts
packages/cli/src/bench/load/clock.ts
packages/cli/src/bench/load/generator.test.ts
packages/cli/src/bench/load/generator.ts
packages/cli/src/bench/metrics/aggregate.test.ts
packages/cli/src/bench/metrics/aggregate.ts
packages/cli/src/bench/metrics/histogram.test.ts
packages/cli/src/bench/metrics/histogram.ts
packages/cli/src/bench/metrics/stats-client.test.ts
packages/cli/src/bench/metrics/stats-client.ts
packages/cli/src/bench/mod.ts
packages/cli/src/bench/render/format.ts
packages/cli/src/bench/render/index.ts
packages/cli/src/bench/render/json.ts
packages/cli/src/bench/render/markdown.ts
packages/cli/src/bench/render/render.test.ts
packages/cli/src/bench/render/text.ts
packages/cli/src/bench/result/build.test.ts
packages/cli/src/bench/result/build.ts
packages/cli/src/bench/result/expect/assert.test.ts
packages/cli/src/bench/result/expect/assert.ts
packages/cli/src/bench/result/expect/evaluate.test.ts
packages/cli/src/bench/result/expect/evaluate.ts
packages/cli/src/bench/result/expect/metrics.ts
packages/cli/src/bench/result/model.ts
packages/cli/src/bench/result/schema.ts
packages/cli/src/bench/safety/gate.test.ts
packages/cli/src/bench/safety/gate.ts
packages/cli/src/bench/safety/tiers.test.ts
packages/cli/src/bench/safety/tiers.ts
packages/cli/src/bench/scenario/coerce.test.ts
packages/cli/src/bench/scenario/coerce.ts
packages/cli/src/bench/scenario/errors.ts
packages/cli/src/bench/scenario/load.test.ts
packages/cli/src/bench/scenario/load.ts
packages/cli/src/bench/scenario/normalize.test.ts
packages/cli/src/bench/scenario/normalize.ts
packages/cli/src/bench/scenario/schema.ts
packages/cli/src/bench/scenario/types.ts
packages/cli/src/bench/scenario/units.test.ts
packages/cli/src/bench/scenario/units.ts
packages/cli/src/bench/scenario/validate.test.ts
packages/cli/src/bench/scenario/validate.ts
packages/cli/src/bench/scenarios/inbox.test.ts
packages/cli/src/bench/scenarios/inbox.ts
packages/cli/src/bench/scenarios/registry.test.ts
packages/cli/src/bench/scenarios/registry.ts
packages/cli/src/bench/scenarios/runner.test.ts
packages/cli/src/bench/scenarios/runner.ts
packages/cli/src/bench/scenarios/webfinger.test.ts
packages/cli/src/bench/scenarios/webfinger.ts
packages/cli/src/bench/schema-paths.ts
packages/cli/src/bench/schema.test.ts
packages/cli/src/bench/schemas.ts
packages/cli/src/bench/server/synthetic.test.ts
packages/cli/src/bench/server/synthetic.ts
packages/cli/src/bench/signing/activity-id.test.ts
packages/cli/src/bench/signing/activity-id.ts
packages/cli/src/bench/signing/pipeline.test.ts
packages/cli/src/bench/signing/pipeline.ts
packages/cli/src/bench/signing/signer.test.ts
packages/cli/src/bench/signing/signer.ts
packages/cli/src/bench/template/generate.test.ts
packages/cli/src/bench/template/generate.ts
packages/cli/src/bench/template/helpers.ts
packages/cli/src/bench/template/template.test.ts
packages/cli/src/bench/template/template.ts
packages/cli/src/config.ts
packages/cli/src/mod.ts
packages/cli/src/runner.ts
schema/README.md
schema/_headers
schema/bench/report-v1.json
schema/bench/scenario-v1.json
schema/index.html
schema/netlify.toml

The benchmark end-to-end tests do real RSA key generation, signed inbox delivery, and server round-trips, which take a few seconds under CI CPU contention. Bun applies a default per-test timeout of 5000 ms (node:test and deno test have none), and the cli package's `test:bun` was the only one without a `--timeout` flag, so `runBench - passing gate exits 0…` and `runBench - failing gate exits 1` timed out on the Bun CI job while passing everywhere else. Run the cli Bun tests with `--timeout 60000`, matching the heaviest sibling packages (fedify, vocab, the database adapters). fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

The benchmarking manual's "Templating" section used inline code spans containing `${{ … }}`. VitePress compiles each Markdown page as a Vue component, and Vue interpreted the `{{ … }}` inside the inline `<code>` as a mustache interpolation, producing invalid generated code (`_ctx.…`) and failing the VitePress build with a Rollup parse error. Fenced code blocks were unaffected because they render through the syntax highlighter. Rewrite the paragraph so it no longer puts double-brace delimiters in an inline code span: it describes the templating in prose and points to the `recipient` line in the example suite above, where the literal `${{ target.host }}` already appears inside a fenced YAML block. A full `vitepress build` now succeeds. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

The previous fix avoided the VitePress build failure by rewording the "Templating" section to drop the inline `${{ … }}` code, since Vue compiled those braces inside inline code as a mustache interpolation. Use VitePress's own escape instead: wrap the paragraph in a `::: v-pre` container, where Vue leaves interpolation untouched, and restore the explicit inline `${{ … }}` and `${{ target.host }}` so the templating syntax is shown directly again. A full `vitepress build` succeeds and the rendered page contains the literal braces; `hongdown --check` stays happy with the container (unlike a raw inline `<code v-pre>`, which it reflows and breaks). fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5

In `presign` mode the whole run is meant to be signed before the timed window opens. The buffered producer kept refilling as `next()` drained it, so background signers created replacement requests during the run, doing crypto on the client and skewing the very cost presign isolates. Cap background production at the pre-signed total so the signers stop once the run is signed; if an open-loop run overshoots its estimate (a few extra Poisson arrivals), those are signed on demand rather than triggering a continuous background refill. fedify-dev#791 (comment) Assisted-by: Claude Code:claude-opus-4-8

A scenario suite is parsed from user-supplied YAML or JSON, so its shape is untrusted enough that a pathologically deep tree or an enormous generated payload should fail with a clear error rather than overflow the stack or exhaust memory. Add a recursion-depth guard to the template renderer and to the config-hash walk, and (separately) cap a generated payload's size in `resolveGenerate`, where the string is actually allocated; `parseSize` stays a plain unit parser. The template renderer also keeps the original reference for subtrees it did not change, avoiding needless cloning. fedify-dev#791 (comment) fedify-dev#791 (comment) fedify-dev#791 (comment) Assisted-by: Claude Code:claude-opus-4-8

httpStandardOf claimed in its error message that a group must declare exactly one HTTP request signature standard, but it used find(), so a group listing both draft-cavage-http-signatures-12 and rfc9421 silently took the first instead of being rejected. The JSON Schema already forbids this, so this is defense in depth for the runtime path. Collect all HTTP standards and throw when there is not exactly one, so the function honors its own contract even on unvalidated input. fedify-dev#791 (comment) Assisted-by: Claude Code:claude-opus-4-8

sumErrors took optional min and max parameters but dereferenced max with a non-null assertion, so a future caller passing only min would crash. Replace the loose pair with one optional { min, max } range so the bounds cannot be supplied half-way, and update the call sites. fedify-dev#791 (comment) Assisted-by: Claude Code:claude-opus-4-8

dahlia · 2026-06-05T09:04:28Z

@codex review

dahlia · 2026-06-05T09:04:30Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces the fedify bench command—a comprehensive benchmarking tool for Fedify federation workloads. It acts as a synthetic remote actor to drive ActivityPub-specific load (such as signed inbox deliveries and WebFinger lookups) against a target running in benchmark mode. The implementation includes scenario suite loading and validation against a published JSON Schema, synthetic actor and key generation, open-loop and closed-loop load generation with coordinated-omission correction, a sparse log-linear histogram for latency metrics, safety gates to prevent targeting public production servers, and multiple report renderers (text, JSON, and Markdown). No review comments were provided, so there is no feedback to address.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 374b41ecc5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-05T09:10:18Z

+  if (mode != null && mode !== "shared" && mode !== "personal") {
+    return new URL(mode);


Validate explicit inbox URLs before running

When an inbox scenario uses a value other than shared or personal, the schema still accepts any string and preflight does not validate this field. A typo like inbox: shraed makes new URL(mode) throw after discovery and outside runBench's configuration-error handling, so the CLI crashes with an uncaught exception instead of exiting 2; non-HTTP absolute URLs can also get through to the send path as measured failures. Please reject malformed or non-http(s) explicit inbox URLs during validation/normalization before starting the run.

Useful? React with 👍 / 👎.

dahlia added 26 commits June 5, 2026 15:38

dahlia added this to the Fedify 2.3 milestone Jun 5, 2026

dahlia self-assigned this Jun 5, 2026

dahlia added the component/cli CLI tools related label Jun 5, 2026

dahlia added this to STF 2025–2026 Milestone 6: Observability and Monitoring (OpenTelemetry) Jun 5, 2026

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread packages/cli/src/bench/signing/pipeline.ts

gemini-code-assist Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread packages/cli/src/bench/template/template.ts

Comment thread packages/cli/src/bench/result/build.ts Outdated

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread packages/cli/src/bench/actor/fleet.ts

Comment thread packages/cli/src/bench/result/expect/evaluate.ts Outdated

Comment thread packages/cli/src/bench/scenarios/inbox.ts

Comment thread packages/cli/src/bench/template/generate.ts

Comment thread schema/README.md

dahlia added 7 commits June 5, 2026 16:54

gemini-code-assist Bot reviewed Jun 5, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

		if (mode != null && mode !== "shared" && mode !== "personal") {
		return new URL(mode);

Uh oh!

Conversation

dahlia commented Jun 5, 2026

Scenario format and JSON schema

Safety

Schema hosting

Testing and documentation

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

dahlia commented Jun 5, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dahlia commented Jun 5, 2026

Uh oh!

dahlia commented Jun 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

codecov Bot commented Jun 5, 2026 •

edited

Loading