Skip to content

The fedify bench command with its scenario and report schemas#791

Open
dahlia wants to merge 33 commits into
fedify-dev:mainfrom
dahlia:feat/bench/cli-engine
Open

The fedify bench command with its scenario and report schemas#791
dahlia wants to merge 33 commits into
fedify-dev:mainfrom
dahlia:feat/bench/cli-engine

Conversation

@dahlia
Copy link
Copy Markdown
Member

@dahlia dahlia commented Jun 5, 2026

Resolves #783, the second of the five benchmarking steps tracked in #744. It adds the client half of fedify bench to @fedify/cli: a load generator that exercises a Fedify server the way the fediverse does.

Generic HTTP load tools (autocannon, wrk, k6) cannot sign an inbox delivery, build a realistic ActivityStreams payload, or read a target's queue depth, so against a federation server they measure the wrong thing. The server half landed earlier in #782, which added benchmarkMode to FederationOptions together with the cooperative /.well-known/fedify/bench/stats and …/trigger endpoints. This PR is what drives that target.

The command acts as a synthetic remote actor. It generates keys and serves its own actor and key documents over loopback, then discovers the recipient's inbox the way a real peer would. Every delivery is signed with the same @fedify/fedify signer a real sender uses, so the crypto cost lands in the measured latency. It drives the load, reads the target's server-side metrics from the stats endpoint, and renders one report model as text, JSON, or Markdown.

What it includes:

  • A scenario suite in YAML or JSON that declares the target, the actors to sign as, shared defaults, and a list of scenarios, each with an expect block of pass/fail thresholds that doubles as a CI gate.
  • Two runners, inbox (the signed end-to-end delivery benchmark) and webfinger. The format and the schema can express the other types from Performance benchmarking tools for Fedify federation workloads #744 (actor, object, fanout, collection, failure, mixed), but a scenario whose type has no runner yet is rejected with a clear message rather than silently skipped.
  • Open-loop (rate) and closed-loop (concurrency) load, with coordinated-omission correction so a stalled target shows up as latency instead of disappearing, plus constant or Poisson arrivals and an optional maxInFlight cap.
  • Three signing strategies kept off the send critical path, chosen per scenario: pipeline (background signers fill a bounded buffer), jit, and presign.
  • Supporting machinery: target safety gating, recipient discovery, the synthetic actor/key server, and an in-house log-linear latency histogram. The text, JSON, and Markdown outputs all derive from one report model, so they cannot drift apart.

Scenario format and JSON schema

The schema is dual-maintained. A frozen TypeScript literal embedded in the CLI is what the runtime validates against, using @cfworker/json-schema (pure JavaScript, so it survives deno compile); the committed schema/bench/scenario-v1.json and schema/bench/report-v1.json are the published copies. A test guard keeps the embedded and published forms byte-identical and refuses any edit to an already-published version, so a -v1 URL never changes meaning. The # yaml-language-server: line in a suite gives editors autocomplete and validation against the published URL.

Safety

A run proceeds without friction against a loopback or private target, or any target that advertises benchmark mode. A public target that does not advertise it is refused unless you pass --allow-unsafe-target, which is mandatory and never prompted in CI. The gate classifies the actual load destination, not only the declared target, so a loopback target paired with a public recipient (or an explicit public inbox:) cannot route load to production behind the gate's back. For the same reason, benchmark traffic does not follow redirects. Signed scenarios additionally need the synthetic actor server to be reachable from the target: a loopback target reaches it automatically, and a non-loopback target requires --advertise-host.

Schema hosting

The schemas live at https://json-schema.fedify.dev/. This PR adds the static assets under schema/: the two JSON files, an index.html landing page, a contributor README.md, and _headers with netlify.toml that set CORS, long-lived immutable caching, and the application/schema+json content type. The hosting itself is configured on Netlify out of band; CI does not upload anything.

Testing and documentation

The benchmark test suite runs under both Node and Deno (about 240 tests), including an end-to-end inbox benchmark against a real benchmarkMode server that verifies the signatures, so the signed delivery path is run rather than mocked. docs/manual/benchmarking.md gains a client section covering the suite format, the actor and signing model, the output formats, and safety; CHANGES.md has an entry under version 2.3.0.

dahlia added 26 commits June 5, 2026 15:38
Add the skeleton for a new `fedify bench` subcommand in @fedify/cli that
will run ActivityPub-specific load benchmarks against a cooperative
Fedify target running in benchmark mode.

This first step wires the command into the CLI without the engine:

 -  Define the Optique `benchCommand` with the suite-file argument and the
    --target, --format, --output, --dry-run, and --allow-unsafe-target
    options, plus a stub `runBench` that is fleshed out in later steps.
 -  Register the command in the runner and dispatcher, and add a `bench`
    section to the configuration schema.
 -  Add the `@cfworker/json-schema` (draft 2020-12 validator) and `yaml`
    dependencies used by the scenario format, to both deno.json and
    package.json.
 -  Cover argument parsing with tests.

fedify-dev#783
fedify-dev#744

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Add a lightweight HdrHistogram-style log-linear histogram used by the
benchmark engine to record latency samples and compute percentiles with
bounded relative error.  Values are bucketed by octave and split into
linear sub-buckets, so the relative error stays roughly constant across
the whole range.  The structure is sparse, mergeable, and serializable,
which lets percentiles from several runs be re-aggregated without
coordinated-omission error and lets the report carry an optional
serialized histogram.

Sub-bucket indices are derived from the mantissa ratio to avoid denormal
underflow, and non-positive samples (including -0) are normalized to the
zero bucket.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Add the small, pure building blocks the scenario format is built on:

 -  `asList()`: scalar-or-list coercion, so fields such as `recipient`,
    `seed`, `collection`, and `type` can accept either a single value or a
    list while the common single-value case stays terse.
 -  `parseSize()` / `resolveGenerate()`: typed payload-generation
    directives (e.g. `content: { generate: lorem, size: 2KB }`) that
    produce deterministic output of an exact byte size, with the size
    parser bounded to the safe-integer range.
 -  A logic-less GitHub-Actions-style `${{ ... }}` template engine
    (dotted-path resolution plus whitelisted helper calls).  Lookups go
    through own properties only, with a denylist for prototype members,
    and unclosed delimiters, trailing text, and unbalanced quotes are
    rejected rather than silently mishandled, so the format cannot turn
    into a programming language.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Define the `fedify bench` scenario suite format and its published
JSON Schema (draft 2020-12).  The format is a suite of `version`,
`target`, `defaults`, `actors`, and `scenarios`, with an `expect` block
per scenario, and it can express every scenario type discussed for the
tool (inbox, webfinger, actor, object, fanout, collection, failure,
mixed) even though only inbox and webfinger will have runners.

Rather than a schema-first single source, the published JSON Schema and
the TypeScript types are maintained as two artifacts kept identical by a
drift guard.  Runtime validation uses `@cfworker/json-schema`, and a
validated value is narrowed with an `as unknown as` cast.  Three
cross-field rules live in the schema where an editor can flag them:

 -  exactly one HTTP request signature scheme per actor group
    (`contains` + `minContains`/`maxContains`);
 -  `rate` XOR `concurrency` in a load block (`oneOf`);
 -  the allowed `expect` metrics per scenario type (`if`/`then` +
    `propertyNames`).

The embedded schema object is the editing source; *schema/bench/*
holds the hosted copy, regenerated by *scripts/generate-bench-schema.ts*.
Four guards run as tests: structural/meta validation, example-fixture
validation (valid and invalid fixtures covering every scenario type),
drift between the embedded object and the published file, and git-based
immutability of already-published version files.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Add the normalization step that turns a schema-validated suite into the
resolved form the engine runs:

 -  `parseDuration()` and `parseRate()` parse the human-friendly duration
    (`30s`) and rate (`200/s`) units into milliseconds and requests per
    second, rejecting non-positive and overflowing magnitudes.
 -  `normalizeSuite()` applies suite defaults, coerces the top-level
    scalar-or-list fields to arrays, resolves the target (with a
    `--target` override), and determines the open- or closed-loop load
    model, inheriting compatible fields such as `arrival` and
    `maxInFlight` from the defaults while a scenario's `rate`/
    `concurrency` selects the model.

It also enforces the one cross-field rule the JSON Schema cannot express:
the buffered signing modes (`pipeline`, `presign`) pre-sign requests, so
they require the target's signature time window to be off; a
time-windowed target must use `signing: jit`.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Define the canonical benchmark report: the single result model from
which the terminal, JSON, and Markdown renderers all derive, so the
outputs can never drift apart.  JSON is the canonical machine form,
pinned by a published draft-2020-12 schema (schema/bench/report-v1.json).

The model splits `client` and `server` numbers by nesting so it is clear
which the load generator measured and which came from the target's stats
endpoint, bakes the unit into numeric keys (latencyMs, drainMs), turns
each expect assertion into an evaluated record, and carries first-class
environment/target/configHash reproducibility metadata plus an optional
serialized histogram.

The report schema is registered alongside the scenario schema, so the
existing structural, fixture, drift, and immutability guards now cover it
too; a valid report fixture is added.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Turn each scenario's `expect` block into evaluated records that gate a
run.  `parseAssertion()` parses a human assertion (">= 99%", "< 100ms",
"< 2s", ">= 500/s", "== 0") into an operator and a machine-clean
threshold: percentages become ratios, durations milliseconds, rates per
second.  `evaluateExpect()` looks each metric up by name (successRate,
throughputPerSec, errors.4xx/5xx/total, latency.*, signatureVerification.*,
queueDrain.*), checks the assertion's unit is compatible with the
metric's natural unit, and compares.  Equality is tolerant for float
metrics but exact for counts.  A `fail`-severity assertion gates the
build while `warn` only annotates, and a missing or unmeasured metric
fails cleanly.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Assemble the canonical report from measured scenario data and render it
in three forms from that single model:

 -  `buildScenarioResult()`/`buildReport()` turn resolved scenarios and
    their measurements into the report, evaluating each `expect` block,
    summarizing the load model, and computing the overall gate.
 -  `detectEnvironment()` and `configHash()` capture the reproducibility
    metadata (runtime, OS, CPU count, and a stable sha256 over the
    canonicalized configuration, honoring `toJSON()` so URLs hash by
    value).
 -  The JSON renderer is the canonical machine form (pinned by the
    report schema); the terminal-text and Markdown renderers derive from
    the same model.  A shared metric-unit registry keeps the evaluator
    and the renderers in agreement, so measured values display in the
    metric's own unit while an explicit assertion unit stays visible.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Add the client-side safety guard and the discovery that finds where to
deliver:

 -  `classifyTarget()` sorts a target into loopback/private/public from
    its host (IP-literal aware, IPv4-mapped IPv6 decoded), conservatively
    treating anything it cannot confirm as public.
 -  `assertTargetAllowed()` lets loopback/private targets and any target
    advertising benchmark mode run without friction, and refuses only a
    public target that does not advertise benchmark mode unless
    --allow-unsafe-target is given (mandatory, with no interactive
    prompt); --dry-run bypasses the gate since it only inspects.
 -  `probeBenchmarkMode()` reads the cooperative `stats` endpoint to
    detect benchmark mode and the target's Fedify version, never throwing.
 -  `discoverInbox()` resolves a handle or actor URI to its personal and
    shared inbox the way a remote peer would, building
    private-address-allowing loaders for loopback targets, and
    `selectInbox()` picks the inbox for the scenario's mode.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Stand up the benchmark's own synthetic remote peer.  An author picks
signature standards and the key set is derived: HTTP request signatures
and LD Signatures share one RSA pair, FEP-8b32 uses an Ed25519 pair.
`buildFleet()` expands the actor groups into members with generated keys,
and `spawnSyntheticServer()` serves each member as a normal ActivityPub
actor document with an embedded `publicKey` and `assertionMethod` over
plain loopback HTTP.

The target dereferences a signature's keyId during verification, so
serving exactly the document a real actor exposes lets verification
resolve the key the same way; a fixed actor set keeps this on a cold path
a warm-up window excludes.  A test confirms the served document parses
back into a verifiable actor whose keys resolve.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Sign inbox deliveries reusing the @fedify/fedify signers so the client
pays realistic crypto cost.  `signInboxDelivery()` applies the FEP-8b32
object proof and the LD Signature to the document, then the HTTP request
signature (cavage or rfc9421) to the final body.
`createActivityIdMinter()` mints a unique activity id per request,
satisfying Fedify's always-on inbox idempotency automatically.

`createSigningPipeline()` keeps RSA signing off the send critical path
with three lookahead modes: `jit`, `pipeline` (default; background
signers keep a bounded buffer filled and buffer starvation surfaces the
client as the bottleneck), and `presign`.  The pipeline cannot hang on a
stuck factory, drops transient sign failures, and fails fast on
deterministic ones.  Tests verify the produced cavage and rfc9421
requests pass Fedify's own verifyRequest against synthetic-server keys.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Drive load and turn the raw samples into client-side metrics.
`runLoad()` supports open-loop (a fixed arrival schedule, with latency
measured from each request's scheduled time — the coordinated-omission
correction — so a stalled target or maxInFlight backpressure shows up as
latency rather than being omitted) and closed-loop (N virtual users).
A fair slot-transferring semaphore enforces `maxInFlight` in both models
and reports backpressure as the saturation signal; arrivals are a lazy
generator (constant or seeded Poisson) and only in-flight dispatches are
retained, so memory stays flat on long runs.

`aggregateSamples()` excludes warm-up samples and produces request
counts, success rate, throughput over the measured window, latency
percentiles from the log-linear histogram, and errors grouped by kind,
status, and reason.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Wire the engine into runnable scenarios.  The stats client reads the
cooperative `stats` endpoint and projects the signature-verification
histogram and queue depth into the report's server section, robust to
malformed snapshots.  The inbox runner discovers the recipient inbox,
builds a signing factory over the synthetic fleet, drives the signing
pipeline and load generator, aggregates the client metrics, and attaches
the server metrics; the webfinger runner drives handle-resolution
lookups.  A registry dispatches by type and reports a clear error for the
scenario types that the format expresses but this version does not run.

`presign` signing now requires an open-loop load (a closed-loop run has
no fixed request count to pre-sign).  An end-to-end test stands up a real
`benchmarkMode` Fedify federation and confirms signed inbox deliveries
verify, the inbox listener runs, and server-side signature-verification
metrics are read back.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Implement `runBench`: load, validate, and normalize the suite (any
configuration error logs a friendly message and exits 2), preflight the
scenario runners so an unsupported type fails fast, classify and probe
the target, and apply the safety gate.  A `--dry-run` prints the plan and
sends nothing.  For a real run it builds the synthetic actor server once
when a signed scenario needs it, runs each scenario, assembles the
report, renders it to the chosen format (stdout or a file), and sets the
exit code to 0 when the gate passes and 1 otherwise.

The default exit sets `process.exitCode` so cleanup and output flushing
finish first.  Signed scenarios are refused against a public target,
since the synthetic actor server is only reachable on the client's
loopback.  Dependencies are injectable, and tests cover the passing and
failing gates, dry run, the unsafe-target and public-signed refusals, and
an invalid suite.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Wire the logic-less `${{ ... }}` template engine into the load pipeline:
`renderSuiteTemplates()` expands templates in a parsed suite with a
context exposing the target (host, hostname, port, origin, href,
protocol) plus the default helpers, and `runBench` runs it between
loading and validation.  This is what makes `recipient:
"http://${{ target.host }}/users/alice"` resolve to a concrete URL.

The target comes from `--target` or the suite's own `target`, neither of
which is templated.  Tests cover rendering and the end-to-end inbox run
now uses templating.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Extend the benchmarking manual with the client side: a getting-started
scenario suite, the actors and signature-standards model, `${{ }}`
templating, open- and closed-loop load with the signing modes, the
output formats and CI usage, the safety gate, and the http/loopback
caveats.  Add the @fedify/cli changelog entry for the new command.

fedify-dev#783
fedify-dev#744

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Address four behavioral gaps where the bench engine silently accepted
options it did not actually apply:

 -  Reject `runs` greater than 1 during normalization.  Repeated runs are
    not implemented yet, so accepting the field gave a single run while
    implying several.

 -  Fail a scenario that measured zero requests instead of letting every
    `expect` assertion pass vacuously, and reject a `warmup` that is not
    shorter than the `duration` (which would leave no measured window).

 -  Reject inbox `activity` options the runner cannot honor.  The runner
    always delivers a `Create` carrying an embedded `Note`, so a
    non-`Create` activity type, a non-`Note` `object.type`, or
    `embedObject: false` is now refused up front through a new optional
    `validate()` on the runner, called during preflight.  Scalar-or-list
    type fields are checked in full, not just their first element.

 -  Implement multi-recipient delivery in the inbox runner: every
    recipient's inbox is discovered, and deliveries (with the synthetic
    actors that sign them) are rotated across the recipients, modeling a
    server receiving from many peers into many local inboxes.

The scenario format and JSON Schema still express these options; only the
inbox/webfinger runners constrain what they execute in this version.

fedify-dev#783
fedify-dev#744

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
A malformed `expect` assertion was only parsed while evaluating
results, which happens after the entire benchmark load has been sent.
Worse, the run loop has no catch around result building, so the
resulting AssertionParseError escaped uncaught and crashed the command
instead of failing as a configuration error.

Add validateExpectBlock(), which parses every assertion in a scenario's
`expect` block, and run it in the preflight step (alongside runner
validation) before any probe or load.  A typo in a CI gate now exits 2
without sending traffic, with a message naming the offending metric.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
The cooperative `stats` endpoint is cumulative and has no reset, but the
inbox and webfinger runners read it once at the end, so the reported
server numbers (and any signatureVerification.* expectations) folded in
warm-up traffic and every earlier scenario in the suite.  Client samples
were already windowed; the server side was not, so the two disagreed.

Take a server snapshot at the measured-window boundary and diff it
against the end snapshot:

 -  stats-client.ts gains a raw `ServerSnapshot` (signature histogram and
    queue-depth gauge), `parseServerSnapshot`, `diffSnapshots` (subtracts
    bucket counts; the gauge is not cumulative, so the end value is kept),
    and `snapshotToMetrics`.  `fetchServerSnapshot` returns `null` only on
    transport or parse failure; an available-but-empty snapshot is
    non-null, so an unavailable baseline is never mistaken for an empty
    one.  Histogram subtraction requires identical bucket boundaries, and
    refuses (yields no signature metric) otherwise.

 -  runner.ts gains `withMeasuredWindowStart`, which gates every measured
    send on a one-shot boundary callback so the baseline is captured
    before any measured request reaches the target.

 -  The inbox and webfinger runners snapshot the baseline at the boundary
    and report server metrics only when both ends of the window were
    captured, instead of falling back to the cumulative snapshot.

A few warm-up requests still in flight at the boundary may be attributed
to the window; a hard drain would distort the coordinated-omission client
latency, so that bounded residue is accepted and documented.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
The scenario schema's `load` object required exactly one of `rate` or
`concurrency`, so a block that set only `arrival` or `maxInFlight` and
inherited its load model was rejected before normalization, even though
`resolveLoad()` already supports such partial overrides (inheriting the
model, or falling back to the default open-loop rate).

Relax the constraint to forbid only `rate` and `concurrency` together,
allowing either or neither.  This lets a suite write, for example,
`defaults: { load: { maxInFlight: 100 } }` or override just `arrival` on
one scenario.  The embedded schema literal and the published
schema/bench/scenario-v1.json are regenerated together (the v1 file is
new on this branch, so it is not yet immutable).

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
The synthetic actor/key server bound loopback and advertised
`127.0.0.1` actor and key IDs, which the target dereferences to verify
HTTP signatures.  A same-machine (loopback) target reaches it, but a
non-loopback target dereferences its own `127.0.0.1`, fails key lookup,
and rejects every signed delivery.  The command nonetheless allowed
signed scenarios against private targets, so they failed silently.

Add a `--advertise-host` option.  When set, the synthetic server binds
every interface (`0.0.0.0`, or `::` for an IPv6 host) and advertises the
given host in its actor, key, and base URLs, so a non-loopback target
can dereference them.  `resolveAdvertiseHost()` validates the value as a
bare host name, IPv4 address, or IPv6 literal (bracketing IPv6 for the
URL authority and binding the matching family), rejecting a scheme,
port, path, or other URL syntax with a clear configuration error.

Signed scenarios are now refused (exit 2) when the target is
non-loopback and no `--advertise-host` is given, instead of running and
failing on the target.  The documentation is updated accordingly.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
The `--user-agent` value was passed only to the document loader, so the
benchmark's main requests — the runners' inbox POSTs and WebFinger GETs,
the benchmark-mode probe, and the server stats reads — went out with the
runtime's default User-Agent.  A target that inspects, logs, or
rate-limits by User-Agent saw the wrong value, so the option was
silently ineffective for the traffic that matters.

Wrap the fetch implementation once with withUserAgent(), so every
benchmark request carries the configured User-Agent.  A prebuilt request
(the signed inbox delivery, a WebFinger GET) has the header set in place
rather than recloned, leaving the already-signed body and digest
untouched; the User-Agent is not part of the signed header set, so this
does not affect verification.  A User-Agent the caller already set is
left as-is.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
The text and Markdown renderers only surfaced server queue metrics when
a drain-latency histogram was present, with the depth shown merely as a
suffix to that line.  The current stats reader supplies
`queue.depthMax` without `drainMs`, so queue depth never appeared in the
human-readable output even though it was in the JSON model; the Markdown
form rendered no queue metrics at all.

Render queue depth on its own:

 -  text: keep the combined drain line (now only when it has at least one
    percentile), otherwise print a standalone `Server queue depth max`
    line whenever a depth is reported.
 -  Markdown: add a queue drain p95 row when present and a queue depth max
    row whenever a depth is reported.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
`new URL("localhost:3000")` parses as the `localhost:` scheme with an
empty host, a common typo for a missing `http://`.  Normalization
accepted it, so `--dry-run` succeeded while a real run would misclassify
the target or build an unsupported fetch URL.  Targets carrying
credentials (`http://user:pass@host`) were likewise accepted even though
`fetch` rejects them.

Reject, during normalization, any target whose protocol is not `http:`
or `https:`, whose host is empty, or that carries embedded credentials,
with a message pointing at the likely fix.  The probe and runners only
make bare HTTP(S) requests, so these never produce a working run.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
The safety gate classified only the suite `target`, but an `inbox`
scenario's actual signed-load destination is the discovered inbox (or an
explicit `inbox:` URL), which can differ from the target.  A loopback
`target` with a public `recipient`, or `inbox: https://prod.example/inbox`,
would send benchmark POST load to a public inbox with no gate at all,
bypassing the guard against accidentally benchmarking production.  The
synthetic-reachability rule was likewise only checked against the target
tier, not the destination that actually verifies signatures.

Gate each resolved inbox destination before any load reaches it:

 -  assertInboxDestinationAllowed() refuses a public destination unless it
    shares the gated target's origin while the target advertises benchmark
    mode (inheriting its gate), or --allow-unsafe-target is given; and
    refuses a non-loopback destination unless a reachable synthetic host
    was advertised (--advertise-host).  Origins are compared (scheme, host,
    effective port), so an http inbox does not inherit an https target.
 -  The inbox runner calls an injected destination gate for each resolved
    inbox before sending; the orchestrator maps a refusal to exit 2.

Discovery (a read) still runs, but no benchmark load is sent to an
ungated destination.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
The default fetch follows redirects, which let two safety checks be
bypassed.  A public target whose `stats` endpoint redirected to a host
serving benchmark-mode JSON was marked as advertising benchmark mode, so
the gate allowed load against it.  And a gated loopback, private, or
benchmark target that answered a WebFinger GET or a signed inbox POST
with a 307/308 could carry that load to an ungated public service,
slipping past the destination gate.

Make every benchmark request non-following:

 -  The benchmark-mode probe and the server stats read use
    `redirect: "manual"`, so a redirect is treated as "not advertised"
    and "unavailable" respectively rather than trusted.
 -  `sendRequest` re-wraps any non-manual request as `redirect: "manual"`
    and records a redirect (opaque or 3xx) as a failed send, so no signed
    load reaches the redirect target; the WebFinger and inbox requests are
    built with `redirect: "manual"` so the common path needs no re-clone.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 407951d0-4fb9-4b77-be23-0a6f44f8ad50

📥 Commits

Reviewing files that changed from the base of the PR and between 4d980e2 and 374b41e.

📒 Files selected for processing (13)
  • docs/manual/benchmarking.md
  • packages/cli/package.json
  • packages/cli/src/bench/actor/fleet.test.ts
  • packages/cli/src/bench/actor/fleet.ts
  • packages/cli/src/bench/result/build.test.ts
  • packages/cli/src/bench/result/build.ts
  • packages/cli/src/bench/result/expect/evaluate.ts
  • packages/cli/src/bench/signing/pipeline.test.ts
  • packages/cli/src/bench/signing/pipeline.ts
  • packages/cli/src/bench/template/generate.test.ts
  • packages/cli/src/bench/template/generate.ts
  • packages/cli/src/bench/template/template.test.ts
  • packages/cli/src/bench/template/template.ts

📝 Walkthrough

Walkthrough

Adds a new fedify bench CLI: scenario schema/types/validation/normalization; load generator and clock; signing pipeline; inbox/webfinger runners and runner plumbing; discovery/probe and safety gates; metrics histogram/aggregation and stats client; synthetic actor server; report model and text/json/markdown renderers; published JSON Schemas, fixtures, and tests.

Changes

fedify bench end-to-end

Layer / File(s) Summary
CLI command and orchestration
packages/cli/src/bench/command.ts, packages/cli/src/bench/action.ts, packages/cli/src/mod.ts, packages/cli/src/runner.ts
Adds benchCommand, runBench orchestration, deps injection (RunBenchDeps), dry-run plan rendering, withUserAgent, and wires the subcommand into the CLI.
Scenario schema, types, validator, and normalization
packages/cli/src/bench/scenario/*, packages/cli/src/bench/scenario/schema.ts, packages/cli/src/bench/scenario/types.ts
Adds canonical scenario TypeScript types, embedded scenario JSON Schema, validateSuite, normalizeSuite, unit/rate parsers, coercion helpers, and schema-driven validation with friendly error formatting.
Load generation and clock
packages/cli/src/bench/load/*
Implements arrival scheduling (constant/Poisson, seeded RNG), monotonic Clock abstraction, open-loop and closed-loop runLoad with coordinated-omission correction and maxInFlight semaphore.
Signing pipeline and signer
packages/cli/src/bench/signing/*, packages/cli/src/bench/signing/signer.ts
Adds signing pipeline modes (jit/pipeline/presign), buffered pipeline implementation, activity-id minter, and signInboxDelivery that applies document and request signatures per standards.
Scenario runners and plumbing
packages/cli/src/bench/scenarios/*, packages/cli/src/bench/scenarios/runner.ts
Implements shared runner plumbing (sendRequest, measured-window start), inboxRunner and webfingerRunner, runner registry, discovery logic, and related tests.
Safety gates and tiers
packages/cli/src/bench/safety/*
Adds target classification (loopback/private/public), assertTargetAllowed, assertInboxDestinationAllowed, UnsafeTargetError, and tests covering advertise-host and allow-unsafe-target behaviors.
Metrics, histogram, and stats client
packages/cli/src/bench/metrics/*
Implements LogLinearHistogram, sample aggregation (aggregateSamples), server stats parsing/diffing/projection and fetch helpers for cooperative /.well-known/fedify/bench/stats.
Report model and renderers
packages/cli/src/bench/result/*, packages/cli/src/bench/render/*
Adds canonical BenchReport model, builders (buildScenarioResult, buildReport, configHash, detectEnvironment) and renderers (renderText, renderJson, renderMarkdown) plus schema and renderer tests.
Synthetic server and actor docs
packages/cli/src/bench/server/synthetic.ts
Synthetic actor/key server implementation and advertise-host handling with tests verifying served actor documents and advertised URL behavior.
Templates, helpers, generate directive
packages/cli/src/bench/template/*
Adds ${{ ... }} template engine, default helpers, deterministic generate directive (lorem), and tests for templating and generation.
Published schemas, tooling, and docs
schema/bench/*, packages/cli/scripts/generate-bench-schema.ts, schema/index.html, schema/_headers, schema/netlify.toml, schema/README.md
Embeds and publishes scenario/report schemas, serializer and generation script, Netlify headers, hosting metadata and documentation, plus CI guards/tests to validate fixtures and immutability.
Fixtures and tests
packages/cli/src/bench/__fixtures__/*, many packages/cli/src/bench/*/*.test.ts
Adds invalid/valid scenario fixtures, report fixtures, and an extensive test suite covering parsing, validation, scheduling, signing, pipeline behavior, runners, metrics, rendering, and safety gates.
Changelog and docs
CHANGES.md, docs/manual/benchmarking.md
Adds changelog entry for @fedify/cli Version 2.3.0 and a comprehensive fedify bench manual section with examples, safety notes, and schema links.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant CLI as fedify bench
  participant Target as Benchmark Target
  participant Stats as Target Stats Endpoint
  participant Synthetic as Synthetic Actor Server

  User->>CLI: run `fedify bench suite.yaml`
  CLI->>Stats: probe /.well-known/fedify/bench/stats (manual redirect)
  Stats-->>CLI: benchmarkMode + version?
  CLI->>Synthetic: spawn synthetic actor server (if signed scenarios)
  CLI->>Target: execute scenario traffic (signed requests)
  Target-->>CLI: responses (client latencies, statuses)
  CLI->>Stats: fetch baseline and end snapshots
  Stats-->>CLI: JSON snapshots
  CLI->>User: render report (text/json/markdown) and exit with pass/fail code
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

Possibly related PRs

  • fedify-dev/fedify#787: Shares the cooperative benchmark stats flow and endpoint JSON contract used by the new CLI stats client.

Suggested labels

type/feature, type/documentation, component/testing, component/signatures, examples

Suggested reviewers

  • sij411
  • 2chanhaeng
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@dahlia
Copy link
Copy Markdown
Member Author

dahlia commented Jun 5, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d980e2bf4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/cli/src/bench/signing/pipeline.ts
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the 'fedify bench' command to '@fedify/cli' for benchmarking federation workloads, supporting signed inbox deliveries and WebFinger lookups. The implementation includes scenario suite validation against a published JSON Schema, a logic-less template engine, a synthetic actor/key server, an open/closed-loop load generator, background signing pipelines, and multiple report renderers. Feedback suggests implementing depth limits in recursive functions ('renderValue' and 'canonicalJson') to prevent stack overflows from deeply nested inputs, as well as using lazy cloning in 'renderValue' to minimize cloning costs.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread packages/cli/src/bench/template/template.ts
Comment thread packages/cli/src/bench/result/build.ts Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 5, 2026

Codecov Report

❌ Patch coverage is 93.04960% with 227 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
packages/cli/src/bench/result/expect/evaluate.ts 76.25% 17 Missing and 16 partials ⚠️
packages/cli/src/bench/action.ts 87.32% 15 Missing and 12 partials ⚠️
packages/cli/src/bench/result/build.ts 80.55% 15 Missing and 6 partials ⚠️
packages/cli/src/bench/metrics/stats-client.ts 87.40% 7 Missing and 10 partials ⚠️
packages/cli/src/bench/metrics/histogram.ts 87.50% 5 Missing and 11 partials ⚠️
packages/cli/src/bench/scenarios/inbox.ts 89.40% 10 Missing and 6 partials ⚠️
packages/cli/src/bench/template/template.ts 91.53% 3 Missing and 8 partials ⚠️
packages/cli/src/bench/render/text.ts 91.34% 1 Missing and 8 partials ⚠️
packages/cli/src/bench/discovery/discover.ts 84.00% 4 Missing and 4 partials ⚠️
packages/cli/src/bench/result/expect/metrics.ts 75.75% 3 Missing and 5 partials ⚠️
... and 16 more
Files with missing lines Coverage Δ
packages/cli/src/bench/actor/fleet.ts 100.00% <100.00%> (ø)
packages/cli/src/bench/command.ts 100.00% <100.00%> (ø)
packages/cli/src/bench/load/clock.ts 100.00% <100.00%> (ø)
packages/cli/src/bench/load/generator.ts 100.00% <100.00%> (ø)
packages/cli/src/bench/render/index.ts 100.00% <100.00%> (ø)
packages/cli/src/bench/render/json.ts 100.00% <100.00%> (ø)
packages/cli/src/bench/result/expect/assert.ts 100.00% <100.00%> (ø)
packages/cli/src/bench/result/schema.ts 100.00% <100.00%> (ø)
packages/cli/src/bench/safety/gate.ts 100.00% <100.00%> (ø)
packages/cli/src/bench/scenario/coerce.ts 100.00% <100.00%> (ø)
... and 37 more

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/cli/src/bench/actor/fleet.ts`:
- Around line 33-46: The function httpStandardOf currently uses
standards.find(...) so it silently accepts the first matching HTTP signature
when multiple are present; update httpStandardOf to collect all matches (e.g.,
standards.filter(...)), verify there is exactly one match, throw a TypeError if
count !== 1, and return that single match cast to HttpSignatureStandard; ensure
this behavior aligns with the FleetMember.httpStandard expectation and the
invalid fixture for two schemes.

In `@packages/cli/src/bench/result/expect/evaluate.ts`:
- Around line 197-209: The function sumErrors currently accepts min?: number,
max?: number but uses max! which is unsafe; change the signature for sumErrors
to require the min/max pair atomically (e.g., replace the two optional numeric
params with a single optional range object or tuple like range?: { min: number;
max: number } or range?: [number, number]) and update the implementation to read
range.min/range.max (or range[0]/range[1]) instead of min/max and remove
non-null assertions; then update all call sites that use sumErrors (calls that
currently pass both min and max should pass the new range value, and callers
that passed only one should be adjusted or left without the range) so the API
guarantees both bounds are provided together.

In `@packages/cli/src/bench/scenarios/inbox.ts`:
- Line 66: Remove the redundant validateActivity(scenario) invocation: the
existing validate() implementation already calls validateActivity(scenario) and
the scenario runner guarantees validate() is called before run(), so delete the
extra validateActivity(scenario) call (the one immediately after validate()) to
avoid duplicate validation and improve clarity; keep validate() and the run()
flow unchanged.

In `@packages/cli/src/bench/template/generate.ts`:
- Around line 111-116: Add an explicit upper bound for payload sizes (e.g.,
const MAX_PAYLOAD_BYTES = 100 * 1024 * 1024) and enforce it before attempting to
allocate strings: update parseSize() to validate that the parsed numeric size
does not exceed MAX_PAYLOAD_BYTES and throw a clear error (or clamp/return a
controlled failure) if it does; alternatively (or additionally) add the same
check at the start of generateLorem(size) to guard against huge repeats and
throw a RangeError with a descriptive message referencing the limit. Ensure
references to generateLorem() and parseSize() are updated to rely on this limit
so oversized inputs are rejected rather than causing memory exhaustion or
String.repeat RangeErrors.

In `@schema/README.md`:
- Line 9: In the README.md line containing the plain path
schema/bench/scenario-v1.json, wrap that file path in asterisks so it becomes
*schema/bench/scenario-v1.json* to comply with the markdown coding guideline for
**/*.md files; update the single occurrence of the string in the file
accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f64291df-e7d7-4523-833d-dffc7534da37

📥 Commits

Reviewing files that changed from the base of the PR and between e9ad1fa and 4d980e2.

⛔ Files ignored due to path filters (3)
  • deno.lock is excluded by !**/*.lock
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
  • schema/logo.svg is excluded by !**/*.svg
📒 Files selected for processing (104)
  • CHANGES.md
  • docs/manual/benchmarking.md
  • packages/cli/deno.json
  • packages/cli/package.json
  • packages/cli/scripts/generate-bench-schema.ts
  • packages/cli/src/bench/__fixtures__/invalid/bad-expect-metric.yaml
  • packages/cli/src/bench/__fixtures__/invalid/failure-missing-fault.yaml
  • packages/cli/src/bench/__fixtures__/invalid/missing-version.yaml
  • packages/cli/src/bench/__fixtures__/invalid/mixed-bad-metric.yaml
  • packages/cli/src/bench/__fixtures__/invalid/rate-and-concurrency.yaml
  • packages/cli/src/bench/__fixtures__/invalid/two-http-schemes.yaml
  • packages/cli/src/bench/__fixtures__/invalid/unknown-field.yaml
  • packages/cli/src/bench/__fixtures__/reports/inbox-report.json
  • packages/cli/src/bench/__fixtures__/scenarios/all-types.yaml
  • packages/cli/src/bench/__fixtures__/scenarios/ci-gate.json
  • packages/cli/src/bench/__fixtures__/scenarios/getting-started.yaml
  • packages/cli/src/bench/action.test.ts
  • packages/cli/src/bench/action.ts
  • packages/cli/src/bench/actor/documents.ts
  • packages/cli/src/bench/actor/fleet.ts
  • packages/cli/src/bench/actor/keys.ts
  • packages/cli/src/bench/command.test.ts
  • packages/cli/src/bench/command.ts
  • packages/cli/src/bench/discovery/discover.test.ts
  • packages/cli/src/bench/discovery/discover.ts
  • packages/cli/src/bench/discovery/probe.test.ts
  • packages/cli/src/bench/discovery/probe.ts
  • packages/cli/src/bench/load/arrival.test.ts
  • packages/cli/src/bench/load/arrival.ts
  • packages/cli/src/bench/load/clock.ts
  • packages/cli/src/bench/load/generator.test.ts
  • packages/cli/src/bench/load/generator.ts
  • packages/cli/src/bench/metrics/aggregate.test.ts
  • packages/cli/src/bench/metrics/aggregate.ts
  • packages/cli/src/bench/metrics/histogram.test.ts
  • packages/cli/src/bench/metrics/histogram.ts
  • packages/cli/src/bench/metrics/stats-client.test.ts
  • packages/cli/src/bench/metrics/stats-client.ts
  • packages/cli/src/bench/mod.ts
  • packages/cli/src/bench/render/format.ts
  • packages/cli/src/bench/render/index.ts
  • packages/cli/src/bench/render/json.ts
  • packages/cli/src/bench/render/markdown.ts
  • packages/cli/src/bench/render/render.test.ts
  • packages/cli/src/bench/render/text.ts
  • packages/cli/src/bench/result/build.test.ts
  • packages/cli/src/bench/result/build.ts
  • packages/cli/src/bench/result/expect/assert.test.ts
  • packages/cli/src/bench/result/expect/assert.ts
  • packages/cli/src/bench/result/expect/evaluate.test.ts
  • packages/cli/src/bench/result/expect/evaluate.ts
  • packages/cli/src/bench/result/expect/metrics.ts
  • packages/cli/src/bench/result/model.ts
  • packages/cli/src/bench/result/schema.ts
  • packages/cli/src/bench/safety/gate.test.ts
  • packages/cli/src/bench/safety/gate.ts
  • packages/cli/src/bench/safety/tiers.test.ts
  • packages/cli/src/bench/safety/tiers.ts
  • packages/cli/src/bench/scenario/coerce.test.ts
  • packages/cli/src/bench/scenario/coerce.ts
  • packages/cli/src/bench/scenario/errors.ts
  • packages/cli/src/bench/scenario/load.test.ts
  • packages/cli/src/bench/scenario/load.ts
  • packages/cli/src/bench/scenario/normalize.test.ts
  • packages/cli/src/bench/scenario/normalize.ts
  • packages/cli/src/bench/scenario/schema.ts
  • packages/cli/src/bench/scenario/types.ts
  • packages/cli/src/bench/scenario/units.test.ts
  • packages/cli/src/bench/scenario/units.ts
  • packages/cli/src/bench/scenario/validate.test.ts
  • packages/cli/src/bench/scenario/validate.ts
  • packages/cli/src/bench/scenarios/inbox.test.ts
  • packages/cli/src/bench/scenarios/inbox.ts
  • packages/cli/src/bench/scenarios/registry.test.ts
  • packages/cli/src/bench/scenarios/registry.ts
  • packages/cli/src/bench/scenarios/runner.test.ts
  • packages/cli/src/bench/scenarios/runner.ts
  • packages/cli/src/bench/scenarios/webfinger.test.ts
  • packages/cli/src/bench/scenarios/webfinger.ts
  • packages/cli/src/bench/schema-paths.ts
  • packages/cli/src/bench/schema.test.ts
  • packages/cli/src/bench/schemas.ts
  • packages/cli/src/bench/server/synthetic.test.ts
  • packages/cli/src/bench/server/synthetic.ts
  • packages/cli/src/bench/signing/activity-id.test.ts
  • packages/cli/src/bench/signing/activity-id.ts
  • packages/cli/src/bench/signing/pipeline.test.ts
  • packages/cli/src/bench/signing/pipeline.ts
  • packages/cli/src/bench/signing/signer.test.ts
  • packages/cli/src/bench/signing/signer.ts
  • packages/cli/src/bench/template/generate.test.ts
  • packages/cli/src/bench/template/generate.ts
  • packages/cli/src/bench/template/helpers.ts
  • packages/cli/src/bench/template/template.test.ts
  • packages/cli/src/bench/template/template.ts
  • packages/cli/src/config.ts
  • packages/cli/src/mod.ts
  • packages/cli/src/runner.ts
  • schema/README.md
  • schema/_headers
  • schema/bench/report-v1.json
  • schema/bench/scenario-v1.json
  • schema/index.html
  • schema/netlify.toml

Comment thread packages/cli/src/bench/actor/fleet.ts
Comment thread packages/cli/src/bench/result/expect/evaluate.ts Outdated
Comment thread packages/cli/src/bench/scenarios/inbox.ts
Comment thread packages/cli/src/bench/template/generate.ts
Comment thread schema/README.md
dahlia added 7 commits June 5, 2026 16:54
The benchmark end-to-end tests do real RSA key generation, signed inbox
delivery, and server round-trips, which take a few seconds under CI CPU
contention.  Bun applies a default per-test timeout of 5000 ms (node:test
and deno test have none), and the cli package's `test:bun` was the only
one without a `--timeout` flag, so `runBench - passing gate exits 0…` and
`runBench - failing gate exits 1` timed out on the Bun CI job while
passing everywhere else.

Run the cli Bun tests with `--timeout 60000`, matching the heaviest
sibling packages (fedify, vocab, the database adapters).

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
The benchmarking manual's "Templating" section used inline code spans
containing `${{ … }}`.  VitePress compiles each Markdown page as a Vue
component, and Vue interpreted the `{{ … }}` inside the inline `<code>`
as a mustache interpolation, producing invalid generated code
(`_ctx.…`) and failing the VitePress build with a Rollup parse error.
Fenced code blocks were unaffected because they render through the
syntax highlighter.

Rewrite the paragraph so it no longer puts double-brace delimiters in an
inline code span: it describes the templating in prose and points to the
`recipient` line in the example suite above, where the literal
`${{ target.host }}` already appears inside a fenced YAML block.  A full
`vitepress build` now succeeds.

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
The previous fix avoided the VitePress build failure by rewording the
"Templating" section to drop the inline `${{ … }}` code, since Vue
compiled those braces inside inline code as a mustache interpolation.

Use VitePress's own escape instead: wrap the paragraph in a `::: v-pre`
container, where Vue leaves interpolation untouched, and restore the
explicit inline `${{ … }}` and `${{ target.host }}` so the templating
syntax is shown directly again.  A full `vitepress build` succeeds and
the rendered page contains the literal braces; `hongdown --check` stays
happy with the container (unlike a raw inline `<code v-pre>`, which it
reflows and breaks).

fedify-dev#783

Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
In `presign` mode the whole run is meant to be signed before the timed
window opens.  The buffered producer kept refilling as `next()` drained
it, so background signers created replacement requests during the run,
doing crypto on the client and skewing the very cost presign isolates.

Cap background production at the pre-signed total so the signers stop
once the run is signed; if an open-loop run overshoots its estimate (a
few extra Poisson arrivals), those are signed on demand rather than
triggering a continuous background refill.

fedify-dev#791 (comment)

Assisted-by: Claude Code:claude-opus-4-8
A scenario suite is parsed from user-supplied YAML or JSON, so its shape
is untrusted enough that a pathologically deep tree or an enormous
generated payload should fail with a clear error rather than overflow the
stack or exhaust memory.

Add a recursion-depth guard to the template renderer and to the
config-hash walk, and (separately) cap a generated payload's size in
`resolveGenerate`, where the string is actually allocated; `parseSize`
stays a plain unit parser.  The template renderer also keeps the original
reference for subtrees it did not change, avoiding needless cloning.

fedify-dev#791 (comment)
fedify-dev#791 (comment)
fedify-dev#791 (comment)

Assisted-by: Claude Code:claude-opus-4-8
httpStandardOf claimed in its error message that a group must declare
exactly one HTTP request signature standard, but it used find(), so a
group listing both draft-cavage-http-signatures-12 and rfc9421 silently
took the first instead of being rejected.  The JSON Schema already
forbids this, so this is defense in depth for the runtime path.

Collect all HTTP standards and throw when there is not exactly one, so
the function honors its own contract even on unvalidated input.

fedify-dev#791 (comment)

Assisted-by: Claude Code:claude-opus-4-8
sumErrors took optional min and max parameters but dereferenced max with
a non-null assertion, so a future caller passing only min would crash.
Replace the loose pair with one optional { min, max } range so the bounds
cannot be supplied half-way, and update the call sites.

fedify-dev#791 (comment)

Assisted-by: Claude Code:claude-opus-4-8
@dahlia
Copy link
Copy Markdown
Member Author

dahlia commented Jun 5, 2026

@codex review

@dahlia
Copy link
Copy Markdown
Member Author

dahlia commented Jun 5, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the fedify bench command—a comprehensive benchmarking tool for Fedify federation workloads. It acts as a synthetic remote actor to drive ActivityPub-specific load (such as signed inbox deliveries and WebFinger lookups) against a target running in benchmark mode. The implementation includes scenario suite loading and validation against a published JSON Schema, synthetic actor and key generation, open-loop and closed-loop load generation with coordinated-omission correction, a sparse log-linear histogram for latency metrics, safety gates to prevent targeting public production servers, and multiple report renderers (text, JSON, and Markdown). No review comments were provided, so there is no feedback to address.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 374b41ecc5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +117 to +118
if (mode != null && mode !== "shared" && mode !== "personal") {
return new URL(mode);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate explicit inbox URLs before running

When an inbox scenario uses a value other than shared or personal, the schema still accepts any string and preflight does not validate this field. A typo like inbox: shraed makes new URL(mode) throw after discovery and outside runBench's configuration-error handling, so the CLI crashes with an uncaught exception instead of exiting 2; non-HTTP absolute URLs can also get through to the send path as measured failures. Please reject malformed or non-http(s) explicit inbox URLs during validation/normalization before starting the run.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/cli CLI tools related

Development

Successfully merging this pull request may close these issues.

Benchmarking: fedify bench engine, scenario format, and JSON schema hosting

1 participant