HelmsDeep

HelmsDeep — HTTP Endpoint Load Measurement System, Determining Each Endpoint's Performance — drives a stepped ramp of concurrent users against a single NCATS Translator component and reports the max sustainable concurrency (the "knee") — the highest load where the service still meets a latency/error SLO.

The Translator stack cascades ARS → ARAs → KPs, so a run targets exactly one layer at a time (testing a higher layer already loads everything beneath it). All three workflows are wired up: Retriever (KP), Shepherd (ARA), and the asynchronous ARS.

Install

pip install -e .          # Python >= 3.12; installs locust

Run a workflow

# Retriever (KP) — sync lookup queries, scalar parameters.tier per query
helmsdeep --targets kps \
    --host https://your-retriever-service.example.org \
    --csv-prefix run1

# Shepherd (ARA) — sync creative-mode (inferred) queries, cache bypassed
helmsdeep --targets aras \
    --host https://your-ara-service.example.org \
    --csv-prefix run1

# ARS — async submit/poll/merge of inferred queries (host = the ARS API base)
helmsdeep --targets ars \
    --host https://ars.ci.transltr.io/ars/api \
    --csv-prefix run1

# Pathfinder — its own heavier run type (ARA/ARS only); pins two endpoints and
# asks for connecting paths. Sync via the ARA, async via the ARS.
helmsdeep --targets aras_pathfinder \
    --host https://your-ara-service.example.org \
    --csv-prefix pf1
helmsdeep --targets ars_pathfinder \
    --host https://ars.ci.transltr.io/ars/api \
    --csv-prefix pf1

--targets selects the layer: kps (Retriever), aras (Shepherd), ars, or the Pathfinder run types aras_pathfinder / ars_pathfinder (ARA/ARS only).
--host is required — the base URL of the target service. For kps/aras the /query path is appended; for ars the host is the API base and the tool uses /submit then /messages/{pk}.
--csv-prefix is optional; it falls back to the LOCUST_CSV_PREFIX env var, then to trapi_run.

The load profile (users, spawn rate, duration) is driven by the StepLoad shape, not by CLI flags — so there is intentionally no -u/-r/-t. The ramp and knee threshold are per target in helmsdeep/config.py (stages and p99_slo_ms), since cost profiles differ wildly by layer; edit them there.

Outputs

Written to the working directory by the standalone/master node:

<prefix>_stages.csv — one row per stage (overall metrics)
<prefix>_by_qtype.csv — one row per (stage, query type)
<prefix>_summary.json — config (including which target was measured), all stages, and the chosen knee
<prefix>_ars_health.csv — ARS only: per-stage health signals (see below)
a printed summary table ending in the headline max sustainable concurrency

ARS async workflow and health signals

The ars target is asynchronous: each logical query is POST /submit → poll GET /messages/{pk}?trace=y (every poll_interval_s, capped at max_poll_s, default 15 min) until status is Done/Error → fetch GET /messages/{merged_pk} and count fields.data.message.results. Latency is the wall-clock submit→terminal time; one measurement is recorded per logical query (the submit/poll/merge calls appear separately in Locust's own table but aren't double-counted).

Because the ARS is what real users hit, the run also captures health signals to flag silent downstream breakage, written to <prefix>_ars_health.csv and summary.json (ars_health + a human-readable red_flags list):

result-count variation (min/mean/max + coefficient of variation) across identical queries;
zero-result Done count — a Done with 0 results is treated as a failure (counts against the error rate and the knee) and flagged;
response size (merged-message bytes, mean/max);
result drop under load — flagged when the mean result count falls sharply as concurrency rises across stages.

Tuning notes

Swap the CURIEs. The corpus in helmsdeep/trapi_corpus.py uses a few real MONDO/CHEBI entities; replace them with entities your target service actually knows about, or queries return empty and won't reflect real cost.
Tier is per query (Retriever only). Retriever exposes parameters.tier (0 or 1) to pick its backend graph. RETRIEVER_CORPUS pairs multi-hop shapes with tier 0 and single-hop shapes with tier 1. ARA queries carry no tier.
Shepherd/ARS send inferred + bypass_cache, mixing MVP1 and MVP2. SHEPHERD_CORPUS (also used for ars) holds creative-mode queries (knowledge_type: "inferred", bypass_cache: true) split evenly between two Translator templates, with entities varied per request to spread load and avoid cache-warming. by_qtype.csv breaks out latency per template/tier.
- MVP1 — "what treats disease X?" (chemical -[treats]-> disease): the pinned disease is sampled from size-tiered pools (heavy/medium/light), so cost tracks answer-set size. Heavy is a curated list of common disease hubs; the long-tail pool is ~1000 real MONDO CURIEs in curie_list.json.
- MVP2 — chemical⇄gene "affects" (biolink:affects, inferred, with object_aspect/object_direction qualifiers): both edge directions (chemical→gene and gene→chemical), with the gene (curated GENES pool) and qualifier combo varied per request.
These are far heavier than KP lookups, so the aras target ships a gentler ramp and a looser p99_slo_ms (see config.py). Tune the per-template weights and entity pools (HEAVY_DISEASES, GENES, the tiers) to your real traffic.
ARS reuses the Shepherd corpus (ARS_CORPUS = SHEPHERD_CORPUS) — the same inferred query the ARS fans out to its ARAs. Its poll cadence and per-query timeout (poll_interval_s, max_poll_s) are tunable in config.py.
Pathfinder is its own run type (aras_pathfinder / ars_pathfinder, ARA/ARS only). PATHFINDER_CORPUS sends a single drug↔disease shape that pins two endpoints and asks for connecting paths (a paths map in the query_graph, not edges); the (chemical, disease) pair varies per request from a curated CHEM_DISEASE_PAIRS list — swap these for pairs your service knows, and keep them plausibly connected so paths come back non-empty (on ARS a zero-result Done is a failure). It's the heaviest query class, so its targets ship the gentlest ramps and loosest SLOs (and, for ars_pathfinder, the longest max_poll_s); tune them in config.py.
Adjust corpus weights in RETRIEVER_CORPUS / SHEPHERD_CORPUS to match your traffic mix.

Running the engine directly

You can also invoke the locustfile without the CLI (defaults to the kps target):

LOCUST_CSV_PREFIX=run1 \
locust -f helmsdeep/trapi_loadtest.py --headless \
    --host https://your-retriever-service.example.org

The output-file prefix comes from the LOCUST_CSV_PREFIX env var (locust has no --csv-prefix flag). Set LOADTEST_TARGET to choose a different (implemented) layer.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
helmsdeep		helmsdeep
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HelmsDeep

Install

Run a workflow

Outputs

ARS async workflow and health signals

Tuning notes

Running the engine directly

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HelmsDeep

Install

Run a workflow

Outputs

ARS async workflow and health signals

Tuning notes

Running the engine directly

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages