The fedify bench command with its scenario and report schemas#791
The fedify bench command with its scenario and report schemas#791dahlia wants to merge 33 commits into
fedify bench command with its scenario and report schemas#791Conversation
Add the skeleton for a new `fedify bench` subcommand in @fedify/cli that
will run ActivityPub-specific load benchmarks against a cooperative
Fedify target running in benchmark mode.
This first step wires the command into the CLI without the engine:
- Define the Optique `benchCommand` with the suite-file argument and the
--target, --format, --output, --dry-run, and --allow-unsafe-target
options, plus a stub `runBench` that is fleshed out in later steps.
- Register the command in the runner and dispatcher, and add a `bench`
section to the configuration schema.
- Add the `@cfworker/json-schema` (draft 2020-12 validator) and `yaml`
dependencies used by the scenario format, to both deno.json and
package.json.
- Cover argument parsing with tests.
fedify-dev#783
fedify-dev#744
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Add a lightweight HdrHistogram-style log-linear histogram used by the benchmark engine to record latency samples and compute percentiles with bounded relative error. Values are bucketed by octave and split into linear sub-buckets, so the relative error stays roughly constant across the whole range. The structure is sparse, mergeable, and serializable, which lets percentiles from several runs be re-aggregated without coordinated-omission error and lets the report carry an optional serialized histogram. Sub-bucket indices are derived from the mantissa ratio to avoid denormal underflow, and non-positive samples (including -0) are normalized to the zero bucket. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
Add the small, pure building blocks the scenario format is built on:
- `asList()`: scalar-or-list coercion, so fields such as `recipient`,
`seed`, `collection`, and `type` can accept either a single value or a
list while the common single-value case stays terse.
- `parseSize()` / `resolveGenerate()`: typed payload-generation
directives (e.g. `content: { generate: lorem, size: 2KB }`) that
produce deterministic output of an exact byte size, with the size
parser bounded to the safe-integer range.
- A logic-less GitHub-Actions-style `${{ ... }}` template engine
(dotted-path resolution plus whitelisted helper calls). Lookups go
through own properties only, with a denylist for prototype members,
and unclosed delimiters, trailing text, and unbalanced quotes are
rejected rather than silently mishandled, so the format cannot turn
into a programming language.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Define the `fedify bench` scenario suite format and its published
JSON Schema (draft 2020-12). The format is a suite of `version`,
`target`, `defaults`, `actors`, and `scenarios`, with an `expect` block
per scenario, and it can express every scenario type discussed for the
tool (inbox, webfinger, actor, object, fanout, collection, failure,
mixed) even though only inbox and webfinger will have runners.
Rather than a schema-first single source, the published JSON Schema and
the TypeScript types are maintained as two artifacts kept identical by a
drift guard. Runtime validation uses `@cfworker/json-schema`, and a
validated value is narrowed with an `as unknown as` cast. Three
cross-field rules live in the schema where an editor can flag them:
- exactly one HTTP request signature scheme per actor group
(`contains` + `minContains`/`maxContains`);
- `rate` XOR `concurrency` in a load block (`oneOf`);
- the allowed `expect` metrics per scenario type (`if`/`then` +
`propertyNames`).
The embedded schema object is the editing source; *schema/bench/*
holds the hosted copy, regenerated by *scripts/generate-bench-schema.ts*.
Four guards run as tests: structural/meta validation, example-fixture
validation (valid and invalid fixtures covering every scenario type),
drift between the embedded object and the published file, and git-based
immutability of already-published version files.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Add the normalization step that turns a schema-validated suite into the
resolved form the engine runs:
- `parseDuration()` and `parseRate()` parse the human-friendly duration
(`30s`) and rate (`200/s`) units into milliseconds and requests per
second, rejecting non-positive and overflowing magnitudes.
- `normalizeSuite()` applies suite defaults, coerces the top-level
scalar-or-list fields to arrays, resolves the target (with a
`--target` override), and determines the open- or closed-loop load
model, inheriting compatible fields such as `arrival` and
`maxInFlight` from the defaults while a scenario's `rate`/
`concurrency` selects the model.
It also enforces the one cross-field rule the JSON Schema cannot express:
the buffered signing modes (`pipeline`, `presign`) pre-sign requests, so
they require the target's signature time window to be off; a
time-windowed target must use `signing: jit`.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Define the canonical benchmark report: the single result model from which the terminal, JSON, and Markdown renderers all derive, so the outputs can never drift apart. JSON is the canonical machine form, pinned by a published draft-2020-12 schema (schema/bench/report-v1.json). The model splits `client` and `server` numbers by nesting so it is clear which the load generator measured and which came from the target's stats endpoint, bakes the unit into numeric keys (latencyMs, drainMs), turns each expect assertion into an evaluated record, and carries first-class environment/target/configHash reproducibility metadata plus an optional serialized histogram. The report schema is registered alongside the scenario schema, so the existing structural, fixture, drift, and immutability guards now cover it too; a valid report fixture is added. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
Turn each scenario's `expect` block into evaluated records that gate a
run. `parseAssertion()` parses a human assertion (">= 99%", "< 100ms",
"< 2s", ">= 500/s", "== 0") into an operator and a machine-clean
threshold: percentages become ratios, durations milliseconds, rates per
second. `evaluateExpect()` looks each metric up by name (successRate,
throughputPerSec, errors.4xx/5xx/total, latency.*, signatureVerification.*,
queueDrain.*), checks the assertion's unit is compatible with the
metric's natural unit, and compares. Equality is tolerant for float
metrics but exact for counts. A `fail`-severity assertion gates the
build while `warn` only annotates, and a missing or unmeasured metric
fails cleanly.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Assemble the canonical report from measured scenario data and render it
in three forms from that single model:
- `buildScenarioResult()`/`buildReport()` turn resolved scenarios and
their measurements into the report, evaluating each `expect` block,
summarizing the load model, and computing the overall gate.
- `detectEnvironment()` and `configHash()` capture the reproducibility
metadata (runtime, OS, CPU count, and a stable sha256 over the
canonicalized configuration, honoring `toJSON()` so URLs hash by
value).
- The JSON renderer is the canonical machine form (pinned by the
report schema); the terminal-text and Markdown renderers derive from
the same model. A shared metric-unit registry keeps the evaluator
and the renderers in agreement, so measured values display in the
metric's own unit while an explicit assertion unit stays visible.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Add the client-side safety guard and the discovery that finds where to
deliver:
- `classifyTarget()` sorts a target into loopback/private/public from
its host (IP-literal aware, IPv4-mapped IPv6 decoded), conservatively
treating anything it cannot confirm as public.
- `assertTargetAllowed()` lets loopback/private targets and any target
advertising benchmark mode run without friction, and refuses only a
public target that does not advertise benchmark mode unless
--allow-unsafe-target is given (mandatory, with no interactive
prompt); --dry-run bypasses the gate since it only inspects.
- `probeBenchmarkMode()` reads the cooperative `stats` endpoint to
detect benchmark mode and the target's Fedify version, never throwing.
- `discoverInbox()` resolves a handle or actor URI to its personal and
shared inbox the way a remote peer would, building
private-address-allowing loaders for loopback targets, and
`selectInbox()` picks the inbox for the scenario's mode.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Stand up the benchmark's own synthetic remote peer. An author picks signature standards and the key set is derived: HTTP request signatures and LD Signatures share one RSA pair, FEP-8b32 uses an Ed25519 pair. `buildFleet()` expands the actor groups into members with generated keys, and `spawnSyntheticServer()` serves each member as a normal ActivityPub actor document with an embedded `publicKey` and `assertionMethod` over plain loopback HTTP. The target dereferences a signature's keyId during verification, so serving exactly the document a real actor exposes lets verification resolve the key the same way; a fixed actor set keeps this on a cold path a warm-up window excludes. A test confirms the served document parses back into a verifiable actor whose keys resolve. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
Sign inbox deliveries reusing the @fedify/fedify signers so the client pays realistic crypto cost. `signInboxDelivery()` applies the FEP-8b32 object proof and the LD Signature to the document, then the HTTP request signature (cavage or rfc9421) to the final body. `createActivityIdMinter()` mints a unique activity id per request, satisfying Fedify's always-on inbox idempotency automatically. `createSigningPipeline()` keeps RSA signing off the send critical path with three lookahead modes: `jit`, `pipeline` (default; background signers keep a bounded buffer filled and buffer starvation surfaces the client as the bottleneck), and `presign`. The pipeline cannot hang on a stuck factory, drops transient sign failures, and fails fast on deterministic ones. Tests verify the produced cavage and rfc9421 requests pass Fedify's own verifyRequest against synthetic-server keys. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
Drive load and turn the raw samples into client-side metrics. `runLoad()` supports open-loop (a fixed arrival schedule, with latency measured from each request's scheduled time — the coordinated-omission correction — so a stalled target or maxInFlight backpressure shows up as latency rather than being omitted) and closed-loop (N virtual users). A fair slot-transferring semaphore enforces `maxInFlight` in both models and reports backpressure as the saturation signal; arrivals are a lazy generator (constant or seeded Poisson) and only in-flight dispatches are retained, so memory stays flat on long runs. `aggregateSamples()` excludes warm-up samples and produces request counts, success rate, throughput over the measured window, latency percentiles from the log-linear histogram, and errors grouped by kind, status, and reason. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
Wire the engine into runnable scenarios. The stats client reads the cooperative `stats` endpoint and projects the signature-verification histogram and queue depth into the report's server section, robust to malformed snapshots. The inbox runner discovers the recipient inbox, builds a signing factory over the synthetic fleet, drives the signing pipeline and load generator, aggregates the client metrics, and attaches the server metrics; the webfinger runner drives handle-resolution lookups. A registry dispatches by type and reports a clear error for the scenario types that the format expresses but this version does not run. `presign` signing now requires an open-loop load (a closed-loop run has no fixed request count to pre-sign). An end-to-end test stands up a real `benchmarkMode` Fedify federation and confirms signed inbox deliveries verify, the inbox listener runs, and server-side signature-verification metrics are read back. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
Implement `runBench`: load, validate, and normalize the suite (any configuration error logs a friendly message and exits 2), preflight the scenario runners so an unsupported type fails fast, classify and probe the target, and apply the safety gate. A `--dry-run` prints the plan and sends nothing. For a real run it builds the synthetic actor server once when a signed scenario needs it, runs each scenario, assembles the report, renders it to the chosen format (stdout or a file), and sets the exit code to 0 when the gate passes and 1 otherwise. The default exit sets `process.exitCode` so cleanup and output flushing finish first. Signed scenarios are refused against a public target, since the synthetic actor server is only reachable on the client's loopback. Dependencies are injectable, and tests cover the passing and failing gates, dry run, the unsafe-target and public-signed refusals, and an invalid suite. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
Wire the logic-less `${{ ... }}` template engine into the load pipeline:
`renderSuiteTemplates()` expands templates in a parsed suite with a
context exposing the target (host, hostname, port, origin, href,
protocol) plus the default helpers, and `runBench` runs it between
loading and validation. This is what makes `recipient:
"http://${{ target.host }}/users/alice"` resolve to a concrete URL.
The target comes from `--target` or the suite's own `target`, neither of
which is templated. Tests cover rendering and the end-to-end inbox run
now uses templating.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Extend the benchmarking manual with the client side: a getting-started
scenario suite, the actors and signature-standards model, `${{ }}`
templating, open- and closed-loop load with the signing modes, the
output formats and CI usage, the safety gate, and the http/loopback
caveats. Add the @fedify/cli changelog entry for the new command.
fedify-dev#783
fedify-dev#744
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
Address four behavioral gaps where the bench engine silently accepted
options it did not actually apply:
- Reject `runs` greater than 1 during normalization. Repeated runs are
not implemented yet, so accepting the field gave a single run while
implying several.
- Fail a scenario that measured zero requests instead of letting every
`expect` assertion pass vacuously, and reject a `warmup` that is not
shorter than the `duration` (which would leave no measured window).
- Reject inbox `activity` options the runner cannot honor. The runner
always delivers a `Create` carrying an embedded `Note`, so a
non-`Create` activity type, a non-`Note` `object.type`, or
`embedObject: false` is now refused up front through a new optional
`validate()` on the runner, called during preflight. Scalar-or-list
type fields are checked in full, not just their first element.
- Implement multi-recipient delivery in the inbox runner: every
recipient's inbox is discovered, and deliveries (with the synthetic
actors that sign them) are rotated across the recipients, modeling a
server receiving from many peers into many local inboxes.
The scenario format and JSON Schema still express these options; only the
inbox/webfinger runners constrain what they execute in this version.
fedify-dev#783
fedify-dev#744
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
A malformed `expect` assertion was only parsed while evaluating results, which happens after the entire benchmark load has been sent. Worse, the run loop has no catch around result building, so the resulting AssertionParseError escaped uncaught and crashed the command instead of failing as a configuration error. Add validateExpectBlock(), which parses every assertion in a scenario's `expect` block, and run it in the preflight step (alongside runner validation) before any probe or load. A typo in a CI gate now exits 2 without sending traffic, with a message naming the offending metric. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
The cooperative `stats` endpoint is cumulative and has no reset, but the
inbox and webfinger runners read it once at the end, so the reported
server numbers (and any signatureVerification.* expectations) folded in
warm-up traffic and every earlier scenario in the suite. Client samples
were already windowed; the server side was not, so the two disagreed.
Take a server snapshot at the measured-window boundary and diff it
against the end snapshot:
- stats-client.ts gains a raw `ServerSnapshot` (signature histogram and
queue-depth gauge), `parseServerSnapshot`, `diffSnapshots` (subtracts
bucket counts; the gauge is not cumulative, so the end value is kept),
and `snapshotToMetrics`. `fetchServerSnapshot` returns `null` only on
transport or parse failure; an available-but-empty snapshot is
non-null, so an unavailable baseline is never mistaken for an empty
one. Histogram subtraction requires identical bucket boundaries, and
refuses (yields no signature metric) otherwise.
- runner.ts gains `withMeasuredWindowStart`, which gates every measured
send on a one-shot boundary callback so the baseline is captured
before any measured request reaches the target.
- The inbox and webfinger runners snapshot the baseline at the boundary
and report server metrics only when both ends of the window were
captured, instead of falling back to the cumulative snapshot.
A few warm-up requests still in flight at the boundary may be attributed
to the window; a hard drain would distort the coordinated-omission client
latency, so that bounded residue is accepted and documented.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
The scenario schema's `load` object required exactly one of `rate` or
`concurrency`, so a block that set only `arrival` or `maxInFlight` and
inherited its load model was rejected before normalization, even though
`resolveLoad()` already supports such partial overrides (inheriting the
model, or falling back to the default open-loop rate).
Relax the constraint to forbid only `rate` and `concurrency` together,
allowing either or neither. This lets a suite write, for example,
`defaults: { load: { maxInFlight: 100 } }` or override just `arrival` on
one scenario. The embedded schema literal and the published
schema/bench/scenario-v1.json are regenerated together (the v1 file is
new on this branch, so it is not yet immutable).
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
The synthetic actor/key server bound loopback and advertised `127.0.0.1` actor and key IDs, which the target dereferences to verify HTTP signatures. A same-machine (loopback) target reaches it, but a non-loopback target dereferences its own `127.0.0.1`, fails key lookup, and rejects every signed delivery. The command nonetheless allowed signed scenarios against private targets, so they failed silently. Add a `--advertise-host` option. When set, the synthetic server binds every interface (`0.0.0.0`, or `::` for an IPv6 host) and advertises the given host in its actor, key, and base URLs, so a non-loopback target can dereference them. `resolveAdvertiseHost()` validates the value as a bare host name, IPv4 address, or IPv6 literal (bracketing IPv6 for the URL authority and binding the matching family), rejecting a scheme, port, path, or other URL syntax with a clear configuration error. Signed scenarios are now refused (exit 2) when the target is non-loopback and no `--advertise-host` is given, instead of running and failing on the target. The documentation is updated accordingly. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
The `--user-agent` value was passed only to the document loader, so the benchmark's main requests — the runners' inbox POSTs and WebFinger GETs, the benchmark-mode probe, and the server stats reads — went out with the runtime's default User-Agent. A target that inspects, logs, or rate-limits by User-Agent saw the wrong value, so the option was silently ineffective for the traffic that matters. Wrap the fetch implementation once with withUserAgent(), so every benchmark request carries the configured User-Agent. A prebuilt request (the signed inbox delivery, a WebFinger GET) has the header set in place rather than recloned, leaving the already-signed body and digest untouched; the User-Agent is not part of the signed header set, so this does not affect verification. A User-Agent the caller already set is left as-is. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
The text and Markdown renderers only surfaced server queue metrics when
a drain-latency histogram was present, with the depth shown merely as a
suffix to that line. The current stats reader supplies
`queue.depthMax` without `drainMs`, so queue depth never appeared in the
human-readable output even though it was in the JSON model; the Markdown
form rendered no queue metrics at all.
Render queue depth on its own:
- text: keep the combined drain line (now only when it has at least one
percentile), otherwise print a standalone `Server queue depth max`
line whenever a depth is reported.
- Markdown: add a queue drain p95 row when present and a queue depth max
row whenever a depth is reported.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
`new URL("localhost:3000")` parses as the `localhost:` scheme with an
empty host, a common typo for a missing `http://`. Normalization
accepted it, so `--dry-run` succeeded while a real run would misclassify
the target or build an unsupported fetch URL. Targets carrying
credentials (`http://user:pass@host`) were likewise accepted even though
`fetch` rejects them.
Reject, during normalization, any target whose protocol is not `http:`
or `https:`, whose host is empty, or that carries embedded credentials,
with a message pointing at the likely fix. The probe and runners only
make bare HTTP(S) requests, so these never produce a working run.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
The safety gate classified only the suite `target`, but an `inbox` scenario's actual signed-load destination is the discovered inbox (or an explicit `inbox:` URL), which can differ from the target. A loopback `target` with a public `recipient`, or `inbox: https://prod.example/inbox`, would send benchmark POST load to a public inbox with no gate at all, bypassing the guard against accidentally benchmarking production. The synthetic-reachability rule was likewise only checked against the target tier, not the destination that actually verifies signatures. Gate each resolved inbox destination before any load reaches it: - assertInboxDestinationAllowed() refuses a public destination unless it shares the gated target's origin while the target advertises benchmark mode (inheriting its gate), or --allow-unsafe-target is given; and refuses a non-loopback destination unless a reachable synthetic host was advertised (--advertise-host). Origins are compared (scheme, host, effective port), so an http inbox does not inherit an https target. - The inbox runner calls an injected destination gate for each resolved inbox before sending; the orchestrator maps a refusal to exit 2. Discovery (a read) still runs, but no benchmark load is sent to an ungated destination. fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
The default fetch follows redirects, which let two safety checks be
bypassed. A public target whose `stats` endpoint redirected to a host
serving benchmark-mode JSON was marked as advertising benchmark mode, so
the gate allowed load against it. And a gated loopback, private, or
benchmark target that answered a WebFinger GET or a signed inbox POST
with a 307/308 could carry that load to an ungated public service,
slipping past the destination gate.
Make every benchmark request non-following:
- The benchmark-mode probe and the server stats read use
`redirect: "manual"`, so a redirect is treated as "not advertised"
and "unavailable" respectively rather than trusted.
- `sendRequest` re-wraps any non-manual request as `redirect: "manual"`
and records a redirect (opaque or 3xx) as a failed send, so no signed
load reaches the redirect target; the WebFinger and inbox requests are
built with `redirect: "manual"` so the common path needs no re-clone.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (13)
📝 WalkthroughWalkthroughAdds a new Changesfedify bench end-to-end
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI as fedify bench
participant Target as Benchmark Target
participant Stats as Target Stats Endpoint
participant Synthetic as Synthetic Actor Server
User->>CLI: run `fedify bench suite.yaml`
CLI->>Stats: probe /.well-known/fedify/bench/stats (manual redirect)
Stats-->>CLI: benchmarkMode + version?
CLI->>Synthetic: spawn synthetic actor server (if signed scenarios)
CLI->>Target: execute scenario traffic (signed requests)
Target-->>CLI: responses (client latencies, statuses)
CLI->>Stats: fetch baseline and end snapshots
Stats-->>CLI: JSON snapshots
CLI->>User: render report (text/json/markdown) and exit with pass/fail code
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4d980e2bf4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request introduces the 'fedify bench' command to '@fedify/cli' for benchmarking federation workloads, supporting signed inbox deliveries and WebFinger lookups. The implementation includes scenario suite validation against a published JSON Schema, a logic-less template engine, a synthetic actor/key server, an open/closed-loop load generator, background signing pipelines, and multiple report renderers. Feedback suggests implementing depth limits in recursive functions ('renderValue' and 'canonicalJson') to prevent stack overflows from deeply nested inputs, as well as using lazy cloning in 'renderValue' to minimize cloning costs.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/cli/src/bench/actor/fleet.ts`:
- Around line 33-46: The function httpStandardOf currently uses
standards.find(...) so it silently accepts the first matching HTTP signature
when multiple are present; update httpStandardOf to collect all matches (e.g.,
standards.filter(...)), verify there is exactly one match, throw a TypeError if
count !== 1, and return that single match cast to HttpSignatureStandard; ensure
this behavior aligns with the FleetMember.httpStandard expectation and the
invalid fixture for two schemes.
In `@packages/cli/src/bench/result/expect/evaluate.ts`:
- Around line 197-209: The function sumErrors currently accepts min?: number,
max?: number but uses max! which is unsafe; change the signature for sumErrors
to require the min/max pair atomically (e.g., replace the two optional numeric
params with a single optional range object or tuple like range?: { min: number;
max: number } or range?: [number, number]) and update the implementation to read
range.min/range.max (or range[0]/range[1]) instead of min/max and remove
non-null assertions; then update all call sites that use sumErrors (calls that
currently pass both min and max should pass the new range value, and callers
that passed only one should be adjusted or left without the range) so the API
guarantees both bounds are provided together.
In `@packages/cli/src/bench/scenarios/inbox.ts`:
- Line 66: Remove the redundant validateActivity(scenario) invocation: the
existing validate() implementation already calls validateActivity(scenario) and
the scenario runner guarantees validate() is called before run(), so delete the
extra validateActivity(scenario) call (the one immediately after validate()) to
avoid duplicate validation and improve clarity; keep validate() and the run()
flow unchanged.
In `@packages/cli/src/bench/template/generate.ts`:
- Around line 111-116: Add an explicit upper bound for payload sizes (e.g.,
const MAX_PAYLOAD_BYTES = 100 * 1024 * 1024) and enforce it before attempting to
allocate strings: update parseSize() to validate that the parsed numeric size
does not exceed MAX_PAYLOAD_BYTES and throw a clear error (or clamp/return a
controlled failure) if it does; alternatively (or additionally) add the same
check at the start of generateLorem(size) to guard against huge repeats and
throw a RangeError with a descriptive message referencing the limit. Ensure
references to generateLorem() and parseSize() are updated to rely on this limit
so oversized inputs are rejected rather than causing memory exhaustion or
String.repeat RangeErrors.
In `@schema/README.md`:
- Line 9: In the README.md line containing the plain path
schema/bench/scenario-v1.json, wrap that file path in asterisks so it becomes
*schema/bench/scenario-v1.json* to comply with the markdown coding guideline for
**/*.md files; update the single occurrence of the string in the file
accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f64291df-e7d7-4523-833d-dffc7534da37
⛔ Files ignored due to path filters (3)
deno.lockis excluded by!**/*.lockpnpm-lock.yamlis excluded by!**/pnpm-lock.yamlschema/logo.svgis excluded by!**/*.svg
📒 Files selected for processing (104)
CHANGES.mddocs/manual/benchmarking.mdpackages/cli/deno.jsonpackages/cli/package.jsonpackages/cli/scripts/generate-bench-schema.tspackages/cli/src/bench/__fixtures__/invalid/bad-expect-metric.yamlpackages/cli/src/bench/__fixtures__/invalid/failure-missing-fault.yamlpackages/cli/src/bench/__fixtures__/invalid/missing-version.yamlpackages/cli/src/bench/__fixtures__/invalid/mixed-bad-metric.yamlpackages/cli/src/bench/__fixtures__/invalid/rate-and-concurrency.yamlpackages/cli/src/bench/__fixtures__/invalid/two-http-schemes.yamlpackages/cli/src/bench/__fixtures__/invalid/unknown-field.yamlpackages/cli/src/bench/__fixtures__/reports/inbox-report.jsonpackages/cli/src/bench/__fixtures__/scenarios/all-types.yamlpackages/cli/src/bench/__fixtures__/scenarios/ci-gate.jsonpackages/cli/src/bench/__fixtures__/scenarios/getting-started.yamlpackages/cli/src/bench/action.test.tspackages/cli/src/bench/action.tspackages/cli/src/bench/actor/documents.tspackages/cli/src/bench/actor/fleet.tspackages/cli/src/bench/actor/keys.tspackages/cli/src/bench/command.test.tspackages/cli/src/bench/command.tspackages/cli/src/bench/discovery/discover.test.tspackages/cli/src/bench/discovery/discover.tspackages/cli/src/bench/discovery/probe.test.tspackages/cli/src/bench/discovery/probe.tspackages/cli/src/bench/load/arrival.test.tspackages/cli/src/bench/load/arrival.tspackages/cli/src/bench/load/clock.tspackages/cli/src/bench/load/generator.test.tspackages/cli/src/bench/load/generator.tspackages/cli/src/bench/metrics/aggregate.test.tspackages/cli/src/bench/metrics/aggregate.tspackages/cli/src/bench/metrics/histogram.test.tspackages/cli/src/bench/metrics/histogram.tspackages/cli/src/bench/metrics/stats-client.test.tspackages/cli/src/bench/metrics/stats-client.tspackages/cli/src/bench/mod.tspackages/cli/src/bench/render/format.tspackages/cli/src/bench/render/index.tspackages/cli/src/bench/render/json.tspackages/cli/src/bench/render/markdown.tspackages/cli/src/bench/render/render.test.tspackages/cli/src/bench/render/text.tspackages/cli/src/bench/result/build.test.tspackages/cli/src/bench/result/build.tspackages/cli/src/bench/result/expect/assert.test.tspackages/cli/src/bench/result/expect/assert.tspackages/cli/src/bench/result/expect/evaluate.test.tspackages/cli/src/bench/result/expect/evaluate.tspackages/cli/src/bench/result/expect/metrics.tspackages/cli/src/bench/result/model.tspackages/cli/src/bench/result/schema.tspackages/cli/src/bench/safety/gate.test.tspackages/cli/src/bench/safety/gate.tspackages/cli/src/bench/safety/tiers.test.tspackages/cli/src/bench/safety/tiers.tspackages/cli/src/bench/scenario/coerce.test.tspackages/cli/src/bench/scenario/coerce.tspackages/cli/src/bench/scenario/errors.tspackages/cli/src/bench/scenario/load.test.tspackages/cli/src/bench/scenario/load.tspackages/cli/src/bench/scenario/normalize.test.tspackages/cli/src/bench/scenario/normalize.tspackages/cli/src/bench/scenario/schema.tspackages/cli/src/bench/scenario/types.tspackages/cli/src/bench/scenario/units.test.tspackages/cli/src/bench/scenario/units.tspackages/cli/src/bench/scenario/validate.test.tspackages/cli/src/bench/scenario/validate.tspackages/cli/src/bench/scenarios/inbox.test.tspackages/cli/src/bench/scenarios/inbox.tspackages/cli/src/bench/scenarios/registry.test.tspackages/cli/src/bench/scenarios/registry.tspackages/cli/src/bench/scenarios/runner.test.tspackages/cli/src/bench/scenarios/runner.tspackages/cli/src/bench/scenarios/webfinger.test.tspackages/cli/src/bench/scenarios/webfinger.tspackages/cli/src/bench/schema-paths.tspackages/cli/src/bench/schema.test.tspackages/cli/src/bench/schemas.tspackages/cli/src/bench/server/synthetic.test.tspackages/cli/src/bench/server/synthetic.tspackages/cli/src/bench/signing/activity-id.test.tspackages/cli/src/bench/signing/activity-id.tspackages/cli/src/bench/signing/pipeline.test.tspackages/cli/src/bench/signing/pipeline.tspackages/cli/src/bench/signing/signer.test.tspackages/cli/src/bench/signing/signer.tspackages/cli/src/bench/template/generate.test.tspackages/cli/src/bench/template/generate.tspackages/cli/src/bench/template/helpers.tspackages/cli/src/bench/template/template.test.tspackages/cli/src/bench/template/template.tspackages/cli/src/config.tspackages/cli/src/mod.tspackages/cli/src/runner.tsschema/README.mdschema/_headersschema/bench/report-v1.jsonschema/bench/scenario-v1.jsonschema/index.htmlschema/netlify.toml
The benchmark end-to-end tests do real RSA key generation, signed inbox delivery, and server round-trips, which take a few seconds under CI CPU contention. Bun applies a default per-test timeout of 5000 ms (node:test and deno test have none), and the cli package's `test:bun` was the only one without a `--timeout` flag, so `runBench - passing gate exits 0…` and `runBench - failing gate exits 1` timed out on the Bun CI job while passing everywhere else. Run the cli Bun tests with `--timeout 60000`, matching the heaviest sibling packages (fedify, vocab, the database adapters). fedify-dev#783 Assisted-by: Claude Code:claude-opus-4-8 Assisted-by: Codex:gpt-5.5
The benchmarking manual's "Templating" section used inline code spans
containing `${{ … }}`. VitePress compiles each Markdown page as a Vue
component, and Vue interpreted the `{{ … }}` inside the inline `<code>`
as a mustache interpolation, producing invalid generated code
(`_ctx.…`) and failing the VitePress build with a Rollup parse error.
Fenced code blocks were unaffected because they render through the
syntax highlighter.
Rewrite the paragraph so it no longer puts double-brace delimiters in an
inline code span: it describes the templating in prose and points to the
`recipient` line in the example suite above, where the literal
`${{ target.host }}` already appears inside a fenced YAML block. A full
`vitepress build` now succeeds.
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
The previous fix avoided the VitePress build failure by rewording the
"Templating" section to drop the inline `${{ … }}` code, since Vue
compiled those braces inside inline code as a mustache interpolation.
Use VitePress's own escape instead: wrap the paragraph in a `::: v-pre`
container, where Vue leaves interpolation untouched, and restore the
explicit inline `${{ … }}` and `${{ target.host }}` so the templating
syntax is shown directly again. A full `vitepress build` succeeds and
the rendered page contains the literal braces; `hongdown --check` stays
happy with the container (unlike a raw inline `<code v-pre>`, which it
reflows and breaks).
fedify-dev#783
Assisted-by: Claude Code:claude-opus-4-8
Assisted-by: Codex:gpt-5.5
In `presign` mode the whole run is meant to be signed before the timed window opens. The buffered producer kept refilling as `next()` drained it, so background signers created replacement requests during the run, doing crypto on the client and skewing the very cost presign isolates. Cap background production at the pre-signed total so the signers stop once the run is signed; if an open-loop run overshoots its estimate (a few extra Poisson arrivals), those are signed on demand rather than triggering a continuous background refill. fedify-dev#791 (comment) Assisted-by: Claude Code:claude-opus-4-8
A scenario suite is parsed from user-supplied YAML or JSON, so its shape is untrusted enough that a pathologically deep tree or an enormous generated payload should fail with a clear error rather than overflow the stack or exhaust memory. Add a recursion-depth guard to the template renderer and to the config-hash walk, and (separately) cap a generated payload's size in `resolveGenerate`, where the string is actually allocated; `parseSize` stays a plain unit parser. The template renderer also keeps the original reference for subtrees it did not change, avoiding needless cloning. fedify-dev#791 (comment) fedify-dev#791 (comment) fedify-dev#791 (comment) Assisted-by: Claude Code:claude-opus-4-8
httpStandardOf claimed in its error message that a group must declare exactly one HTTP request signature standard, but it used find(), so a group listing both draft-cavage-http-signatures-12 and rfc9421 silently took the first instead of being rejected. The JSON Schema already forbids this, so this is defense in depth for the runtime path. Collect all HTTP standards and throw when there is not exactly one, so the function honors its own contract even on unvalidated input. fedify-dev#791 (comment) Assisted-by: Claude Code:claude-opus-4-8
sumErrors took optional min and max parameters but dereferenced max with
a non-null assertion, so a future caller passing only min would crash.
Replace the loose pair with one optional { min, max } range so the bounds
cannot be supplied half-way, and update the call sites.
fedify-dev#791 (comment)
Assisted-by: Claude Code:claude-opus-4-8
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces the fedify bench command—a comprehensive benchmarking tool for Fedify federation workloads. It acts as a synthetic remote actor to drive ActivityPub-specific load (such as signed inbox deliveries and WebFinger lookups) against a target running in benchmark mode. The implementation includes scenario suite loading and validation against a published JSON Schema, synthetic actor and key generation, open-loop and closed-loop load generation with coordinated-omission correction, a sparse log-linear histogram for latency metrics, safety gates to prevent targeting public production servers, and multiple report renderers (text, JSON, and Markdown). No review comments were provided, so there is no feedback to address.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 374b41ecc5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (mode != null && mode !== "shared" && mode !== "personal") { | ||
| return new URL(mode); |
There was a problem hiding this comment.
Validate explicit inbox URLs before running
When an inbox scenario uses a value other than shared or personal, the schema still accepts any string and preflight does not validate this field. A typo like inbox: shraed makes new URL(mode) throw after discovery and outside runBench's configuration-error handling, so the CLI crashes with an uncaught exception instead of exiting 2; non-HTTP absolute URLs can also get through to the send path as measured failures. Please reject malformed or non-http(s) explicit inbox URLs during validation/normalization before starting the run.
Useful? React with 👍 / 👎.
Resolves #783, the second of the five benchmarking steps tracked in #744. It adds the client half of
fedify benchto @fedify/cli: a load generator that exercises a Fedify server the way the fediverse does.Generic HTTP load tools (autocannon, wrk, k6) cannot sign an inbox delivery, build a realistic ActivityStreams payload, or read a target's queue depth, so against a federation server they measure the wrong thing. The server half landed earlier in #782, which added
benchmarkModetoFederationOptionstogether with the cooperative/.well-known/fedify/bench/statsand…/triggerendpoints. This PR is what drives that target.The command acts as a synthetic remote actor. It generates keys and serves its own actor and key documents over loopback, then discovers the recipient's inbox the way a real peer would. Every delivery is signed with the same
@fedify/fedifysigner a real sender uses, so the crypto cost lands in the measured latency. It drives the load, reads the target's server-side metrics from the stats endpoint, and renders one report model as text, JSON, or Markdown.What it includes:
target, theactorsto sign as, shareddefaults, and a list of scenarios, each with anexpectblock of pass/fail thresholds that doubles as a CI gate.inbox(the signed end-to-end delivery benchmark) andwebfinger. The format and the schema can express the other types from Performance benchmarking tools for Fedify federation workloads #744 (actor,object,fanout,collection,failure,mixed), but a scenario whose type has no runner yet is rejected with a clear message rather than silently skipped.rate) and closed-loop (concurrency) load, with coordinated-omission correction so a stalled target shows up as latency instead of disappearing, plus constant or Poisson arrivals and an optionalmaxInFlightcap.pipeline(background signers fill a bounded buffer),jit, andpresign.Scenario format and JSON schema
The schema is dual-maintained. A frozen TypeScript literal embedded in the CLI is what the runtime validates against, using
@cfworker/json-schema(pure JavaScript, so it survivesdeno compile); the committed schema/bench/scenario-v1.json and schema/bench/report-v1.json are the published copies. A test guard keeps the embedded and published forms byte-identical and refuses any edit to an already-published version, so a-v1URL never changes meaning. The# yaml-language-server:line in a suite gives editors autocomplete and validation against the published URL.Safety
A run proceeds without friction against a loopback or private target, or any target that advertises benchmark mode. A public target that does not advertise it is refused unless you pass
--allow-unsafe-target, which is mandatory and never prompted in CI. The gate classifies the actual load destination, not only the declaredtarget, so a loopback target paired with a public recipient (or an explicit publicinbox:) cannot route load to production behind the gate's back. For the same reason, benchmark traffic does not follow redirects. Signed scenarios additionally need the synthetic actor server to be reachable from the target: a loopback target reaches it automatically, and a non-loopback target requires--advertise-host.Schema hosting
The schemas live at
https://json-schema.fedify.dev/. This PR adds the static assets under schema/: the two JSON files, an index.html landing page, a contributor README.md, and _headers with netlify.toml that set CORS, long-lived immutable caching, and theapplication/schema+jsoncontent type. The hosting itself is configured on Netlify out of band; CI does not upload anything.Testing and documentation
The benchmark test suite runs under both Node and Deno (about 240 tests), including an end-to-end inbox benchmark against a real
benchmarkModeserver that verifies the signatures, so the signed delivery path is run rather than mocked. docs/manual/benchmarking.md gains a client section covering the suite format, the actor and signing model, the output formats, and safety; CHANGES.md has an entry under version 2.3.0.