Skip to content

OpenAI: Topic relevance guardrail#126

Merged
AkhileshNegi merged 6 commits into
mainfrom
feat/open-ai-topic-relevance
Jun 3, 2026
Merged

OpenAI: Topic relevance guardrail#126
AkhileshNegi merged 6 commits into
mainfrom
feat/open-ai-topic-relevance

Conversation

@rkritika1508
Copy link
Copy Markdown
Collaborator

@rkritika1508 rkritika1508 commented Jun 1, 2026

Summary

Target issue is #127

  • Adds TopicRelevanceOpenAI, a new topic relevance validator that calls litellm.completion() directly instead of routing through the Guardrails Hub LLMCritic wrapper. This gives tighter control over the prompt format, JSON parsing, and pass/fail threshold.
  • Builds the system prompt from the configured topic scope text and appends a JSON response instruction so the model returns {"scope_violation": <1|2|3>}.
  • Exposes a configurable threshold field (default 2) — messages scoring ≥ threshold pass, score 1 fails.
  • Wires the new validator into the full stack: discriminated union in guardrail_config.py, ValidatorType enum, validators.json, and _resolve_validator_configs in the guardrails route (DB-backed topic_relevance_config_id lookup works the same as for topic_relevance).

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds a new OpenAI-powered topic relevance validator alongside the existing non-OpenAI variant. The feature includes validator implementation with LLM-based scoring, configuration classes with API-key validation, integration into the guardrails API and schema with stored configuration loading, comprehensive test coverage, and a refactored multi-backend evaluation framework supporting both validators.

Changes

TopicRelevanceOpenAI Validator Feature

Layer / File(s) Summary
Settings, foundation types, and shared constants
backend/app/core/config.py, backend/app/core/enum.py, backend/app/core/constants.py, backend/app/core/validators/validators.json
New Settings fields DEFAULT_LLM_CALLABLE and TOPIC_RELEVANCE_OPENAI_THRESHOLD, new ValidatorType.TopicRelevanceOpenAI enum member, shared error constants EMPTY_MESSAGE_ERROR and TOPIC_OUT_OF_SCOPE_ERROR, and validator manifest entry for topic_relevance_openai.
LLM utility helpers
backend/app/core/validators/llm_utils.py
New supports_response_format(model) function to conditionally request OpenAI-style JSON responses based on litellm capability detection.
Refactor existing topic relevance validators
backend/app/core/validators/topic_relevance.py, backend/app/core/validators/config/topic_relevance_safety_validator_config.py
Both TopicRelevance and TopicRelevanceSafetyValidatorConfig now use centralized settings.DEFAULT_LLM_CALLABLE default; TopicRelevance switches response-format detection to supports_response_format() helper; _validate metadata parameter typed as Optional[dict]; empty-message and out-of-scope errors use shared constants.
TopicRelevanceOpenAI validator implementation
backend/app/core/validators/topic_relevance_openai.py
New TopicRelevanceOpenAI validator with system prompt validation, conditional JSON response format, litellm.completion calls, JSON parsing with Markdown robustness, scope_violation validation in {1,2,3}, threshold-based pass/fail, and comprehensive error handling for LLM failures and malformed responses.
OpenAI validator configuration class
backend/app/core/validators/config/topic_relevance_openai_safety_validator_config.py
New TopicRelevanceOpenAISafetyValidatorConfig with validator type, optional system prompt, LLM callable, threshold (constrained to ge=1/le=3), optional topic relevance config ID, and build() method validating OpenAI API key and instantiating TopicRelevanceOpenAI.
Schema and API route integration
backend/app/schemas/guardrail_config.py, backend/app/api/routes/guardrails.py
Register TopicRelevanceOpenAISafetyValidatorConfig in the guardrail schema's validator union; extend _resolve_validator_configs to fetch and populate stored topic relevance configuration for both validator types when topic_relevance_config_id is set, gating prompt_schema_version to non-OpenAI variant.
Comprehensive test coverage
backend/app/tests/test_llm_validators.py, backend/app/tests/test_validate_with_guard.py, backend/app/tests/validators/test_topic_relevance_openai.py
Config build tests (API-key validation, threshold forwarding), integration tests (CRUD config lookup with/without ID, inline configuration), validator tests (threshold behavior, input validation, LLM error handling, JSON parsing robustness, type validation, response format negotiation).
Multi-backend evaluation framework
backend/app/evaluation/topic_relevance/run.py
Refactor from single-config to DATASETS and BACKENDS registries; run_evaluation(dataset, backend) now instantiates backend-specific validator; metrics extraction includes scope_score from metadata; report generation uses backend-specific directories and guardrail labels with per-backend metadata; main() iterates all backend/dataset pairs.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Guard/Route
  participant Validator as TopicRelevanceOpenAI
  participant LLM as litellm.completion
  Validator->>Validator: Validate system_prompt and user input
  alt Invalid config or empty input
    Validator-->>Client: FailResult
  else Valid input
    Validator->>LLM: POST system_prompt + user_value
    LLM-->>Validator: message.content (JSON)
    Validator->>Validator: Parse JSON, extract scope_violation
    alt Parse success and scope_violation in {1,2,3}
      Validator->>Validator: Compare scope_violation >= threshold
      alt Passes threshold
        Validator-->>Client: PassResult with scope_score metadata
      else Below threshold
        Validator-->>Client: FailResult with scope_score metadata
      end
    else Parse fails or invalid type/range
      Validator-->>Client: FailResult with error message
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • ProjectTech4DevAI/kaapi-guardrails#74: Both PRs modify backend/app/evaluation/topic_relevance/run.py, refactoring/expanding run_evaluation and the topic-relevance evaluation runner logic to support topic-relevance validation across backends.
  • ProjectTech4DevAI/kaapi-guardrails#71: The PR extends the existing run_guardrails/_resolve_topic_relevance_scope flow by updating backend/app/api/routes/guardrails.py to also load and attach configuration for the new TopicRelevanceOpenAISafetyValidatorConfig when topic_relevance_config_id is set.

Suggested reviewers

  • dennyabrain
  • nishika26

Poem

🐰 A rabbit's ode to the OpenAI validator

A validator so fine, with scores one through three,
That calls to the LLM for what topics should be,
With thresholds and JSON and errors contained,
The scope is now guarded, the guardrails maintained! 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.87% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding an OpenAI variant of the topic relevance guardrail validator.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/open-ai-topic-relevance

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread backend/app/core/validators/topic_relevance_openai.py Outdated
Comment thread backend/app/core/validators/topic_relevance_openai.py Outdated
Comment thread backend/app/evaluation/topic_relevance_openai/run.py Outdated
Comment thread backend/app/core/validators/topic_relevance_openai.py Outdated
@pritam-T4D
Copy link
Copy Markdown

pritam-T4D commented Jun 1, 2026

@dennyabrain @rkritika1508 I was thinking as per our discussion, we are building more as general purpose custom validator. That can be reusable across also (one of it topic relevance use case inclusion or exclusion). That way, we are letting user decide prompt (logic to score) and also score criteria everything at one place, with visibility also. Anyhow we have to also build custom validator. Such validator can be used even at output guardrail also and can provide ability with almost full customization (score, and also generate fix value by same llm). So will ask to rethink in that direction of more general purpose custom validator. I see for one of TAP use case of output guardrail, this will be needed.

@dennyabrain
Copy link
Copy Markdown
Collaborator

@pritam-T4D I see the point of allowing the user to specify the entire prompt. But not sure if we should allow the user the specify the scoring logic also. Because the response that comes from the LLM needs to be understood by the validator and a response that's consistent with how validators work in kaapi-guardrails needs to be returned.

For instance, we might have instructions like this in the prompt we send to the LLM :

Score using:
3 = clearly within scope (directly matches a topic description) 
2 = partially related (tangentially related or implicitly within scope) 
1 = clearly outside scope (no relation to any listed topic)

We are then able to compare the result that comes from the LLM and do actions like 'for response 1 and 2, respond a fail' and 'for response 3 we respond with pass'.

But if a user overrides this instruction with something like

Score using:
ok = clearly within scope  
not_ok = clearly outside scope (no relation to any listed topic)

then the response from LLM will be ok or not_ok and the validator code will act unpredictably.

If there is a need for such a versatile validator where user gets complete control over system prompt and also the scoring metric, I suggest we move this to a different validator. Something named more appropriately than TopicRelevance to avoid confusion.

@pritam-T4D
Copy link
Copy Markdown

@dennyabrain Agree with your thought process also. I was thinking we rather build custom validator only now to avoid engineering efforts, and would have used same for current gender detection use case.
And as per TAP email for their output validator requirement 4 (sensitive content) and 5 (toxicity bias), we will need customer validator.

How I am thinking of custom validator is,

  • User can define the prompt and also scoring logic (we can ask to put scoring in numeric and not categorical like yes, no, ok, not ok)
  • We can also allow user to define the threshold (separate input, not in llm prompt)
  • if user want to fix llm output (on fail action needs to be fix), it can define same in prompt on how to fix it, and for now same llm can fix the output using its reason for scoring + user prompt
  • then LLm should output score, reason, and fix value in strict json
  • we then use if score < threshold then fail else pass
  • and if fail, then fix value will be returned, else the original llm output.
  • if its input custom validator and if on fail = rephrase , then original content will only pass, as it happens now.

Now this can take care of any customization for both input and output guardrail.

So I am ok, if you want to do above changes in topic relevance validator, OR build custom validator as above. Great, if you do both. Please share suggestions if on custom validator on how it can be versatile and if I am missing anything.

Lmk.

@rkritika1508 rkritika1508 self-assigned this Jun 2, 2026
@rkritika1508 rkritika1508 added enhancement New feature or request ready-for-review labels Jun 2, 2026
@rkritika1508 rkritika1508 linked an issue Jun 2, 2026 that may be closed by this pull request
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
backend/app/core/validators/topic_relevance_openai.py (1)

78-85: 💤 Low value

Nit: build kwargs as a dict literal.

Ruff (C408) flags the dict(...) call; a literal avoids the function-call overhead and is more idiomatic.

♻️ Proposed change
-            kwargs = dict(
-                model=self.llm_callable,
-                messages=[
-                    {"role": "system", "content": self._system_prompt},
-                    {"role": "user", "content": value},
-                ],
-                max_tokens=50,
-            )
+            kwargs = {
+                "model": self.llm_callable,
+                "messages": [
+                    {"role": "system", "content": self._system_prompt},
+                    {"role": "user", "content": value},
+                ],
+                "max_tokens": 50,
+            }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/core/validators/topic_relevance_openai.py` around lines 78 - 85,
Replace the dict(...) call that builds kwargs with a dict literal to satisfy
Ruff C408 and avoid function-call overhead; locate the kwargs assignment in the
method where kwargs = dict(model=self.llm_callable, messages=[{"role": "system",
"content": self._system_prompt}, {"role": "user", "content": value}],
max_tokens=50) and rewrite it as a dictionary literal using the same keys
(model, messages, max_tokens) and values (self.llm_callable, the messages list
referencing self._system_prompt and value, and 50) so behavior remains
identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/api/routes/guardrails.py`:
- Around line 133-141: The TopicRelevanceOpenAISafetyValidatorConfig branch only
copies configuration and omits the stored prompt_schema_version, causing
DB-backed presets to lose their prompt template selection; update the code so
topic_relevance_crud.get(...) returns the prompt_schema_version (if present) and
assign it to validator.prompt_schema_version in the
TopicRelevanceOpenAISafetyValidatorConfig branch, and also ensure the OpenAI
config model/build path (where TopicRelevanceOpenAISafetyValidatorConfig is
constructed) accepts and propagates prompt_schema_version the same way as the
existing topic_relevance -> TopicRelevanceSafetyValidatorConfig flow does.

In `@backend/app/core/validators/topic_relevance_openai.py`:
- Around line 94-102: The JSON parsing must be hardened: before calling
json.loads on content (the variable used in the try block that populates data
and score and returns FailResult on error), extract/isolate the first JSON
object by stripping Markdown fences or surrounding prose (e.g., remove ```json
... ``` and/or use a regex to find the first {...} block), then call json.loads
on that extracted substring; also change the type check to require an exact int
(use type(score) is int) and then validate score in (1,2,3) to reject booleans.
Ensure any extraction/parsing errors continue to produce the FailResult with the
existing error_message pattern.

In `@backend/app/tests/validators/test_topic_relevance_openai.py`:
- Around line 86-98: The test and implementation allow threshold=1 which
effectively disables the guardrail; change TopicRelevanceOpenAI (its __init__)
to validate threshold and reject values >= 1 (raise a ValueError or similar) so
the passing threshold must be strictly less than 1, and update the test
test_custom_threshold_of_1_passes_on_score_1 to assert that constructing
TopicRelevanceOpenAI with threshold=1 fails (or that the constructor raises)
instead of expecting a PassResult; references: TopicRelevanceOpenAI, its
__init__, the _validate method, and the test function
test_custom_threshold_of_1_passes_on_score_1.

In `@backend/scripts/run_all_evaluations.sh`:
- Line 14: The runner list in run_all_evaluations.sh includes an entry for
"$EVAL_DIR/topic_relevance_openai/run.py" which doesn't exist and will abort the
script under set -euo pipefail; remove that runner line from
run_all_evaluations.sh (or add the missing runner file if you actually intend a
separate runner). Note that topic_relevance/run.py already implements BACKENDS
with name: "topic_relevance_openai" and its main() iterates over both backends
and writes outputs to outputs/topic_relevance_openai/*, so prefer removing the
bogus topic_relevance_openai runner entry to fix the orchestration.

---

Nitpick comments:
In `@backend/app/core/validators/topic_relevance_openai.py`:
- Around line 78-85: Replace the dict(...) call that builds kwargs with a dict
literal to satisfy Ruff C408 and avoid function-call overhead; locate the kwargs
assignment in the method where kwargs = dict(model=self.llm_callable,
messages=[{"role": "system", "content": self._system_prompt}, {"role": "user",
"content": value}], max_tokens=50) and rewrite it as a dictionary literal using
the same keys (model, messages, max_tokens) and values (self.llm_callable, the
messages list referencing self._system_prompt and value, and 50) so behavior
remains identical.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b0e4f79e-71f5-4146-8e85-6d66e5a84ba3

📥 Commits

Reviewing files that changed from the base of the PR and between b81dd64 and 19c1ca8.

📒 Files selected for processing (14)
  • backend/app/api/routes/guardrails.py
  • backend/app/core/config.py
  • backend/app/core/enum.py
  • backend/app/core/validators/config/topic_relevance_openai_safety_validator_config.py
  • backend/app/core/validators/config/topic_relevance_safety_validator_config.py
  • backend/app/core/validators/topic_relevance.py
  • backend/app/core/validators/topic_relevance_openai.py
  • backend/app/core/validators/validators.json
  • backend/app/evaluation/topic_relevance/run.py
  • backend/app/schemas/guardrail_config.py
  • backend/app/tests/test_llm_validators.py
  • backend/app/tests/test_validate_with_guard.py
  • backend/app/tests/validators/test_topic_relevance_openai.py
  • backend/scripts/run_all_evaluations.sh

Comment thread backend/app/api/routes/guardrails.py Outdated
Comment thread backend/app/core/validators/topic_relevance_openai.py
Comment thread backend/app/tests/validators/test_topic_relevance_openai.py
Comment thread backend/scripts/run_all_evaluations.sh Outdated
@AkhileshNegi AkhileshNegi changed the title added open ai topic relevance guardrail OpenAI: Topic relevance guardrail Jun 3, 2026
@AkhileshNegi AkhileshNegi self-requested a review June 3, 2026 03:40
@AkhileshNegi AkhileshNegi merged commit 0b87a30 into main Jun 3, 2026
1 of 2 checks passed
@AkhileshNegi AkhileshNegi deleted the feat/open-ai-topic-relevance branch June 3, 2026 03:41
@github-project-automation github-project-automation Bot moved this to Closed in Kaapi-dev Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request ready-for-review

Projects

Status: Closed

Development

Successfully merging this pull request may close these issues.

Validation: OpenAI query relevance checker

5 participants