fedify-dev · dahlia · Jun 4, 2026 · Jun 4, 2026 · Jun 4, 2026 · Jun 4, 2026
diff --git a/CHANGES.md b/CHANGES.md
@@ -277,6 +277,19 @@ To be released.
 [#782]: https://github.com/fedify-dev/fedify/issues/782
 [#787]: https://github.com/fedify-dev/fedify/pull/787
 
+### @fedify/cli
+
+ -  Added the `fedify bench` command for benchmarking Fedify federation
+    workloads.  It acts as a synthetic remote actor that drives
+    ActivityPub-specific load (signed inbox deliveries and WebFinger lookups)
+    against a cooperative `benchmarkMode` target and reports latency,
+    throughput, success rate, and errors, reading server-side metrics from the
+    target's stats endpoint.  Benchmarks are described by a YAML or JSON
+    scenario suite validated against a published JSON Schema, with an `expect`
+    block per scenario that gates a run for CI.  [[#744], [#783]]
+
+[#783]: https://github.com/fedify-dev/fedify/issues/783
+
 ### @fedify/fixture
 
  -  Added `createTestMeterProvider()` and `TestMetricRecorder` helpers for

diff --git a/deno.lock b/deno.lock
diff --git a/docs/manual/benchmarking.md b/docs/manual/benchmarking.md
@@ -1,7 +1,8 @@
 ---
 description: >-
-  Fedify can expose cooperative benchmark endpoints for measuring federation
-  workloads without requiring an external metrics backend.
+  Fedify can run as a cooperative benchmark target, and the fedify bench command
+  drives ActivityPub-specific load against it to measure federation workloads
+  without requiring an external metrics backend.
 ---
 
 Benchmarking
@@ -80,6 +81,190 @@ const federation = createFederation<void>({
 ~~~~
 
 
+The `fedify bench` command
+--------------------------
+
+*This command is available since Fedify 2.3.0.*
+
+Once a target runs in benchmark mode, the `fedify bench` command drives
+ActivityPub-specific load against it and reports latency, throughput, success
+rate, and errors.  It acts as a synthetic remote actor: it generates keys,
+serves its own actor and key documents over loopback, and signs every inbox
+delivery with the same `@fedify/fedify` signer a real peer uses, so the measured
+crypto cost is real.
+
+> [!NOTE]
+> This version runs the `inbox` and `webfinger` scenario types.  The scenario
+> format can express the others (`actor`, `object`, `fanout`, `collection`,
+> `failure`, and `mixed`), but they are not executed yet.  Within the runnable
+> types, a few options the format accepts are also not implemented yet and are
+> rejected up front with a clear message:
+>
+>  -  `runs` greater than `1` (repeated runs).
+>  -  An `inbox` `activity` that is not a `Create` carrying an embedded `Note`;
+>     that is, a non-`Create` `type`, a non-`Note` `object.type`, or
+>     `embedObject: false`.
+>  -  A `warmup` that is not shorter than the `duration` (which would leave no
+>     measured window).
+
+### A scenario suite
+
+A benchmark is described by a *suite* file in YAML (JSON works too, since YAML
+is a superset).  The suite declares the `target`, shared `defaults`, the
+`actors` to sign as, and a list of `scenarios`, each with an optional `expect`
+block of pass/fail thresholds:
+
+~~~~ yaml
+# yaml-language-server: $schema=https://json-schema.fedify.dev/bench/scenario-v1.json
+version: 1
+target: http://localhost:3000
+defaults:
+  duration: 30s
+  warmup: 5s            # excluded from results; also warms the key cache
+  load:
+    rate: 200/s         # open-loop; or closed-loop with `concurrency: 50`
+actors:
+- count: 3
+  signatureStandards: [draft-cavage-http-signatures-12, ld-signatures]
+scenarios:
+- name: inbox-shared
+  type: inbox
+  recipient: "http://${{ target.host }}/users/alice"
+  inbox: shared
+  activity:
+    type: Create
+    object:
+      type: Note
+      content: { generate: lorem, size: 2KB }
+  expect:
+    successRate: ">= 99%"
+    latency.p95: "< 100ms"
+~~~~
+
+Run it against the target and read the terminal report:
+
+~~~~ sh
+fedify bench scenario.yaml
+~~~~
+
+The `# yaml-language-server:` line gives editors autocomplete and validation
+against the [published schema].
+Override the file's target with `--target`, choose the output with
+`--format`/`--output`, and inspect a run without sending anything with
+`--dry-run`.
+
+An `inbox` scenario's `recipient` may be a single value or a list.  With a
+list, deliveries are rotated across the recipients (and across the synthetic
+`actors` signing them), modeling a server that receives from many peers into
+many local inboxes.
+
+[published schema]: https://json-schema.fedify.dev/bench/scenario-v1.json
+
+### Actors
+
+You pick signature *standards*, not key algorithms; the key set is derived,
+because a Fedify actor is inherently multi-key.  An actor uses exactly one HTTP
+request signature scheme, plus any document signature schemes:
+
+| Standard                          | Layer        | Algorithm                  |
+| --------------------------------- | ------------ | -------------------------- |
+| `draft-cavage-http-signatures-12` | HTTP request | RSA                        |
+| `rfc9421`                         | HTTP request | RSA                        |
+| `ld-signatures`                   | document     | RSA (`RsaSignature2017`)   |
+| `fep8b32`                         | document     | Ed25519 (`eddsa-jcs-2022`) |
+
+`draft-cavage-http-signatures-12` and `rfc9421` are mutually exclusive (one HTTP
+scheme per actor).  Several actor groups with different standard sets model a
+heterogeneous fleet, which is what a server actually receives.
+
+### Templating
+
+::: v-pre
+
+Values support GitHub-Actions-style `${{ … }}` templating, kept logic-less
+(references and whitelisted helper calls only).  For example
+`${{ target.host }}` expands to the target's host.  Generated payloads use typed
+directives such as `content: { generate: lorem, size: 2KB }` rather than string
+templates.  The tool owns actor URLs and activity ids, so each request gets a
+unique activity id automatically (which Fedify's always-on inbox idempotency
+requires).
+
+:::
+
+### Load generation and signing
+
+Open-loop (`rate`) is the default and the realistic model for incoming
+federation traffic: requests are launched on schedule regardless of when earlier
+responses return, and each request's latency is measured from its scheduled
+time (the coordinated-omission correction), so a stalled target shows up as
+latency instead of being hidden.  Closed-loop (`concurrency`) runs a fixed
+number of virtual users.  Arrival is `constant` (default) or `poisson`, and
+`maxInFlight` caps concurrent in-flight requests.
+
+Signing is kept off the send critical path, set per scenario with `signing`:
+
+ -  `pipeline` (default): background signers keep a bounded buffer filled, and
+    buffer starvation surfaces the client as the bottleneck.
+ -  `jit`: sign in the send path, for a strict signature-time-window target.
+ -  `presign`: pre-sign an estimated open-loop run before the timed window
+    (open-loop only; Poisson arrivals may still sign a few extra during the
+    run).
+
+### Output
+
+Choose the format with `--format text` (default), `json`, or `markdown`;
+`--output` only chooses the destination (a file instead of standard output) and
+does not infer the format, so pass both (for example
+`--format json --output report.json`).  JSON is the canonical machine form: it
+validates against the [report schema] and carries
+its own `$schema`; the text and Markdown renderers derive from the same model,
+keeping client-measured and server-reported numbers distinct.  Both sides are
+scoped to a measured window: client latency excludes warm-up samples, and the
+server-reported numbers are the difference between a `stats` snapshot taken when
+the measured window opens and one taken when it closes, so they exclude every
+earlier scenario in the suite and the scenario's own warm-up traffic (apart from
+warm-up requests still in flight at the boundary, a residue no larger than the
+number of requests in flight at that moment).  In GitHub Actions, append the
+Markdown report to the job summary:
+
+~~~~ sh
+fedify bench scenario.yaml --format markdown >> "$GITHUB_STEP_SUMMARY"
+~~~~
+
+An `expect` gate that fails exits the command non-zero, so a suite doubles as a
+CI check.  Keep CI gates on robust signals such as success rate, error counts,
+and gross throughput or latency floors; precise latency-percentile regression
+belongs in a controlled environment, not a shared CI runner.
+
+[report schema]: https://json-schema.fedify.dev/bench/report-v1.json
+
+### Safety
+
+`fedify bench` runs without friction against a loopback or private target, or
+any target that advertises benchmark mode.  A public target that does not
+advertise benchmark mode is refused unless you pass `--allow-unsafe-target`,
+which is mandatory (never prompted) in CI and any non-interactive context.  Use
+`--dry-run` to print the plan without sending anything.
+
+### Local targets over HTTP
+
+An `inbox` recipient given as an `acct:` handle is resolved through WebFinger,
+which goes over HTTPS, so against a plain-HTTP loopback target give the
+`recipient` as the actor's URI (for example
+`http://localhost:3000/users/alice`) instead.  The `webfinger` scenario is
+unaffected: it requests `/.well-known/webfinger` on the target directly, so it
+can benchmark `acct:` lookups over plain HTTP.
+
+Signed scenarios such as `inbox` make the target dereference the benchmark's
+synthetic actor server while verifying signatures, so that server must be
+reachable from the target.  A loopback target reaches it automatically (both
+run on the same machine).  For a non-loopback target, pass `--advertise-host`
+with an address the target can reach (for example the client's LAN IP); the
+synthetic server then binds every interface and advertises that host in the
+actor and key URLs.  Without it, a non-loopback signed scenario is refused
+(use a read scenario such as `webfinger`, which needs no synthetic server).
+
+
 Benchmark stats endpoint
 ------------------------
 

diff --git a/packages/cli/deno.json b/packages/cli/deno.json
@@ -4,6 +4,7 @@
   "license": "MIT",
   "exports": "./src/mod.ts",
   "imports": {
+    "@cfworker/json-schema": "npm:@cfworker/json-schema@^4.1.1",
     "@hongminhee/localtunnel": "jsr:@hongminhee/localtunnel@^0.3.0",
     "@inquirer/prompts": "npm:@inquirer/prompts@^7.8.4",
     "@jimp/core": "npm:@jimp/core@^1.6.1",
@@ -20,6 +21,7 @@
     "smol-toml": "npm:smol-toml@^1.6.1",
     "srvx": "npm:srvx@^0.8.7",
     "valibot": "jsr:@valibot/valibot@^1.4.0",
+    "yaml": "npm:yaml@^2.9.0",
     "#kv": "./src/kv.node.ts"
   },
   "exclude": [
@@ -56,6 +58,7 @@
         "codegen"
       ]
     },
+    "generate-bench-schema": "deno run -A scripts/generate-bench-schema.ts",
     "test": {
       "command": "deno test --allow-all",
       "dependencies": [

diff --git a/packages/cli/package.json b/packages/cli/package.json
@@ -18,7 +18,7 @@
     "test": "node --test --experimental-transform-types 'src/**/*.test.ts' '!src/init/test/**'",
     "test-init": "deno task test-init",
     "pretest:bun": "pnpm build",
-    "test:bun": "bun test",
+    "test:bun": "bun test --timeout 60000",
     "run": "pnpm build && node --disable-warning=ExperimentalWarning dist/mod.js",
     "runi": "tsdown && node --disable-warning=ExperimentalWarning dist/mod.js",
     "run:bun": "pnpm build && bun dist/mod.js",
@@ -72,6 +72,7 @@
     }
   },
   "dependencies": {
+    "@cfworker/json-schema": "^4.1.1",
     "@fedify/fedify": "workspace:*",
     "@fedify/init": "workspace:*",
     "@fedify/relay": "workspace:*",
@@ -109,7 +110,8 @@
     "shiki": "^1.6.4",
     "smol-toml": "^1.6.1",
     "srvx": "^0.8.7",
-    "valibot": "^1.4.0"
+    "valibot": "^1.4.0",
+    "yaml": "^2.9.0"
   },
   "devDependencies": {
     "@types/bun": "catalog:",

diff --git a/packages/cli/scripts/generate-bench-schema.ts b/packages/cli/scripts/generate-bench-schema.ts
@@ -0,0 +1,28 @@
+/**
+ * Regenerates the published benchmark JSON Schema files under the repository's
+ * *schema/bench/* directory from the embedded schema objects.
+ *
+ * The embedded objects (under *packages/cli/src/bench/.../schema.ts*) are the
+ * editing source; the published *.json* files are the hosted copies.  A drift
+ * guard keeps the two identical, so run this script after editing an embedded
+ * schema.
+ *
+ * Usage: `deno run -A scripts/generate-bench-schema.ts`
+ * @module
+ */
+
+import { mkdir, writeFile } from "node:fs/promises";
+import { join } from "node:path";
+import { PUBLISHED_SCHEMAS } from "../src/bench/schemas.ts";
+import { SCHEMA_DIR, serializeSchema } from "../src/bench/schema-paths.ts";
+
+async function main(): Promise<void> {
+  await mkdir(SCHEMA_DIR, { recursive: true });
+  for (const { fileName, schema } of PUBLISHED_SCHEMAS) {
+    const path = join(SCHEMA_DIR, fileName);
+    await writeFile(path, serializeSchema(schema), { encoding: "utf-8" });
+    console.error(`Wrote ${path}`);
+  }
+}
+
+await main();
diff --git a/packages/cli/src/bench/__fixtures__/invalid/bad-expect-metric.yaml b/packages/cli/src/bench/__fixtures__/invalid/bad-expect-metric.yaml
@@ -0,0 +1,9 @@
+# signatureVerification.* is not a valid expect metric for a webfinger scenario.
+version: 1
+target: http://localhost:3000
+scenarios:
+  - name: webfinger-lookup
+    type: webfinger
+    recipient: "acct:alice@example.com"
+    expect:
+      signatureVerification.p95: "< 10ms"
diff --git a/packages/cli/src/bench/__fixtures__/invalid/failure-missing-fault.yaml b/packages/cli/src/bench/__fixtures__/invalid/failure-missing-fault.yaml
@@ -0,0 +1,6 @@
+# A failure scenario must declare at least one fault.
+version: 1
+target: http://localhost:3000
+scenarios:
+  - name: broken
+    type: failure
diff --git a/packages/cli/src/bench/__fixtures__/invalid/missing-version.yaml b/packages/cli/src/bench/__fixtures__/invalid/missing-version.yaml
@@ -0,0 +1,6 @@
+# The top-level version field is required.
+target: http://localhost:3000
+scenarios:
+  - name: inbox-shared
+    type: inbox
+    recipient: "acct:alice@example.com"
diff --git a/packages/cli/src/bench/__fixtures__/invalid/mixed-bad-metric.yaml b/packages/cli/src/bench/__fixtures__/invalid/mixed-bad-metric.yaml
@@ -0,0 +1,11 @@
+# "bogus.metric" is not a recognized metric for a mixed scenario's expect block.
+version: 1
+target: http://localhost:3000
+scenarios:
+  - name: blend
+    type: mixed
+    mix:
+      - { scenario: inbox-shared, weight: 80 }
+      - { scenario: webfinger-lookup, weight: 20 }
+    expect:
+      bogus.metric: ">= 1"
diff --git a/packages/cli/src/bench/__fixtures__/invalid/rate-and-concurrency.yaml b/packages/cli/src/bench/__fixtures__/invalid/rate-and-concurrency.yaml
@@ -0,0 +1,11 @@
+# A load block must specify rate XOR concurrency, not both.
+version: 1
+target: http://localhost:3000
+defaults:
+  load:
+    rate: 100/s
+    concurrency: 50
+scenarios:
+  - name: inbox-shared
+    type: inbox
+    recipient: "acct:alice@example.com"
diff --git a/packages/cli/src/bench/__fixtures__/invalid/two-http-schemes.yaml b/packages/cli/src/bench/__fixtures__/invalid/two-http-schemes.yaml
@@ -0,0 +1,9 @@
+# An actor group must have exactly one HTTP request signature scheme.
+version: 1
+target: http://localhost:3000
+actors:
+  - signatureStandards: [draft-cavage-http-signatures-12, rfc9421]
+scenarios:
+  - name: inbox-shared
+    type: inbox
+    recipient: "acct:alice@example.com"