Skip to content

FEAT add AgentThreatRulesScorer (ATR taxonomy scorer)#1893

Open
eeee2345 wants to merge 1 commit into
microsoft:mainfrom
eeee2345:feat/atr-taxonomy-scorer
Open

FEAT add AgentThreatRulesScorer (ATR taxonomy scorer)#1893
eeee2345 wants to merge 1 commit into
microsoft:mainfrom
eeee2345:feat/atr-taxonomy-scorer

Conversation

@eeee2345
Copy link
Copy Markdown
Contributor

@eeee2345 eeee2345 commented Jun 2, 2026

Adds AgentThreatRulesScorer, the scorer half of #1702 (the dataset loader landed in #1715).

What it does

  • A deterministic TrueFalseScorer that evaluates text against the open Agent Threat Rules (ATR) ruleset via the pyatr engine.
  • Returns True when at least one rule at or above a configurable min_severity matches; attaches matched rule ids, ATR category, and max severity as score metadata.
  • Mirrors SubStringScorer's shape (true_false base, _score_piece_async, _build_identifier, min_severity validation).

Dependency

  • pyatr (>=0.2.6, which bundles the ATR ruleset) is an optional dependency, imported lazily with a clear ImportError. The unit test uses pytest.importorskip("pyatr"); if you'd like CI to exercise it, pyatr can be added to the test extras — happy to wire that into whichever group you prefer.

Pairs with the _AgentThreatRulesDataset loader: the dataset supplies ATR-derived adversarial prompts, and this scorer detects whether a response trips an ATR rule.

Add a deterministic TrueFalseScorer that evaluates text against the open
Agent Threat Rules (ATR) ruleset via the pyatr engine and returns True when
a rule at or above a configurable min_severity matches, attaching matched
rule ids / ATR category / max severity as score metadata. Mirrors
SubStringScorer; pyatr (>=0.2.6) is an optional dependency. Scorer half of

Signed-off-by: Adam Lin <adam@agentthreatrule.org>
@eeee2345 eeee2345 force-pushed the feat/atr-taxonomy-scorer branch from e55c2a6 to 4d55a7d Compare June 4, 2026 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant