OpenAI: Topic relevance guardrail by rkritika1508 · Pull Request #126 · ProjectTech4DevAI/kaapi-guardrails

rkritika1508 · 2026-06-01T07:23:59Z

Summary

Target issue is #127

Adds TopicRelevanceOpenAI, a new topic relevance validator that calls litellm.completion() directly instead of routing through the Guardrails Hub LLMCritic wrapper. This gives tighter control over the prompt format, JSON parsing, and pass/fail threshold.
Builds the system prompt from the configured topic scope text and appends a JSON response instruction so the model returns {"scope_violation": <1|2|3>}.
Exposes a configurable threshold field (default 2) — messages scoring ≥ threshold pass, score 1 fails.
Wires the new validator into the full stack: discriminated union in guardrail_config.py, ValidatorType enum, validators.json, and _resolve_validator_configs in the guardrails route (DB-backed topic_relevance_config_id lookup works the same as for topic_relevance).

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

coderabbitai · 2026-06-01T07:24:07Z

📝 Walkthrough

Walkthrough

This PR adds a new OpenAI-powered topic relevance validator alongside the existing non-OpenAI variant. The feature includes validator implementation with LLM-based scoring, configuration classes with API-key validation, integration into the guardrails API and schema with stored configuration loading, comprehensive test coverage, and a refactored multi-backend evaluation framework supporting both validators.

Changes

TopicRelevanceOpenAI Validator Feature

Layer / File(s)	Summary
Settings, foundation types, and shared constants `backend/app/core/config.py`, `backend/app/core/enum.py`, `backend/app/core/constants.py`, `backend/app/core/validators/validators.json`	New `Settings` fields `DEFAULT_LLM_CALLABLE` and `TOPIC_RELEVANCE_OPENAI_THRESHOLD`, new `ValidatorType.TopicRelevanceOpenAI` enum member, shared error constants `EMPTY_MESSAGE_ERROR` and `TOPIC_OUT_OF_SCOPE_ERROR`, and validator manifest entry for `topic_relevance_openai`.
LLM utility helpers `backend/app/core/validators/llm_utils.py`	New `supports_response_format(model)` function to conditionally request OpenAI-style JSON responses based on litellm capability detection.
Refactor existing topic relevance validators `backend/app/core/validators/topic_relevance.py`, `backend/app/core/validators/config/topic_relevance_safety_validator_config.py`	Both `TopicRelevance` and `TopicRelevanceSafetyValidatorConfig` now use centralized `settings.DEFAULT_LLM_CALLABLE` default; `TopicRelevance` switches response-format detection to `supports_response_format()` helper; `_validate` metadata parameter typed as `Optional[dict]`; empty-message and out-of-scope errors use shared constants.
TopicRelevanceOpenAI validator implementation `backend/app/core/validators/topic_relevance_openai.py`	New `TopicRelevanceOpenAI` validator with system prompt validation, conditional JSON response format, `litellm.completion` calls, JSON parsing with Markdown robustness, `scope_violation` validation in `{1,2,3}`, threshold-based pass/fail, and comprehensive error handling for LLM failures and malformed responses.
OpenAI validator configuration class `backend/app/core/validators/config/topic_relevance_openai_safety_validator_config.py`	New `TopicRelevanceOpenAISafetyValidatorConfig` with validator type, optional system prompt, LLM callable, threshold (constrained to `ge=1`/`le=3`), optional topic relevance config ID, and `build()` method validating OpenAI API key and instantiating `TopicRelevanceOpenAI`.
Schema and API route integration `backend/app/schemas/guardrail_config.py`, `backend/app/api/routes/guardrails.py`	Register `TopicRelevanceOpenAISafetyValidatorConfig` in the guardrail schema's validator union; extend `_resolve_validator_configs` to fetch and populate stored topic relevance configuration for both validator types when `topic_relevance_config_id` is set, gating `prompt_schema_version` to non-OpenAI variant.
Comprehensive test coverage `backend/app/tests/test_llm_validators.py`, `backend/app/tests/test_validate_with_guard.py`, `backend/app/tests/validators/test_topic_relevance_openai.py`	Config build tests (API-key validation, threshold forwarding), integration tests (CRUD config lookup with/without ID, inline configuration), validator tests (threshold behavior, input validation, LLM error handling, JSON parsing robustness, type validation, response format negotiation).
Multi-backend evaluation framework `backend/app/evaluation/topic_relevance/run.py`	Refactor from single-config to `DATASETS` and `BACKENDS` registries; `run_evaluation(dataset, backend)` now instantiates backend-specific validator; metrics extraction includes `scope_score` from metadata; report generation uses backend-specific directories and guardrail labels with per-backend metadata; `main()` iterates all backend/dataset pairs.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Guard/Route
  participant Validator as TopicRelevanceOpenAI
  participant LLM as litellm.completion
  Validator->>Validator: Validate system_prompt and user input
  alt Invalid config or empty input
    Validator-->>Client: FailResult
  else Valid input
    Validator->>LLM: POST system_prompt + user_value
    LLM-->>Validator: message.content (JSON)
    Validator->>Validator: Parse JSON, extract scope_violation
    alt Parse success and scope_violation in {1,2,3}
      Validator->>Validator: Compare scope_violation >= threshold
      alt Passes threshold
        Validator-->>Client: PassResult with scope_score metadata
      else Below threshold
        Validator-->>Client: FailResult with scope_score metadata
      end
    else Parse fails or invalid type/range
      Validator-->>Client: FailResult with error message
    end
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

ProjectTech4DevAI/kaapi-guardrails#74: Both PRs modify backend/app/evaluation/topic_relevance/run.py, refactoring/expanding run_evaluation and the topic-relevance evaluation runner logic to support topic-relevance validation across backends.
ProjectTech4DevAI/kaapi-guardrails#71: The PR extends the existing run_guardrails/_resolve_topic_relevance_scope flow by updating backend/app/api/routes/guardrails.py to also load and attach configuration for the new TopicRelevanceOpenAISafetyValidatorConfig when topic_relevance_config_id is set.

Suggested reviewers

dennyabrain
nishika26

Poem

🐰 A rabbit's ode to the OpenAI validator

A validator so fine, with scores one through three,
That calls to the LLM for what topics should be,
With thresholds and JSON and errors contained,
The scope is now guarded, the guardrails maintained! 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.87% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding an OpenAI variant of the topic relevance guardrail validator.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/open-ai-topic-relevance

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

pritam-T4D · 2026-06-01T15:24:43Z

@dennyabrain @rkritika1508 I was thinking as per our discussion, we are building more as general purpose custom validator. That can be reusable across also (one of it topic relevance use case inclusion or exclusion). That way, we are letting user decide prompt (logic to score) and also score criteria everything at one place, with visibility also. Anyhow we have to also build custom validator. Such validator can be used even at output guardrail also and can provide ability with almost full customization (score, and also generate fix value by same llm). So will ask to rethink in that direction of more general purpose custom validator. I see for one of TAP use case of output guardrail, this will be needed.

dennyabrain · 2026-06-02T04:04:20Z

@pritam-T4D I see the point of allowing the user to specify the entire prompt. But not sure if we should allow the user the specify the scoring logic also. Because the response that comes from the LLM needs to be understood by the validator and a response that's consistent with how validators work in kaapi-guardrails needs to be returned.

For instance, we might have instructions like this in the prompt we send to the LLM :

Score using:
3 = clearly within scope (directly matches a topic description) 
2 = partially related (tangentially related or implicitly within scope) 
1 = clearly outside scope (no relation to any listed topic)

We are then able to compare the result that comes from the LLM and do actions like 'for response 1 and 2, respond a fail' and 'for response 3 we respond with pass'.

But if a user overrides this instruction with something like

Score using:
ok = clearly within scope  
not_ok = clearly outside scope (no relation to any listed topic)

then the response from LLM will be ok or not_ok and the validator code will act unpredictably.

If there is a need for such a versatile validator where user gets complete control over system prompt and also the scoring metric, I suggest we move this to a different validator. Something named more appropriately than TopicRelevance to avoid confusion.

pritam-T4D · 2026-06-02T07:06:34Z

@dennyabrain Agree with your thought process also. I was thinking we rather build custom validator only now to avoid engineering efforts, and would have used same for current gender detection use case.
And as per TAP email for their output validator requirement 4 (sensitive content) and 5 (toxicity bias), we will need customer validator.

How I am thinking of custom validator is,

User can define the prompt and also scoring logic (we can ask to put scoring in numeric and not categorical like yes, no, ok, not ok)
We can also allow user to define the threshold (separate input, not in llm prompt)
if user want to fix llm output (on fail action needs to be fix), it can define same in prompt on how to fix it, and for now same llm can fix the output using its reason for scoring + user prompt
then LLm should output score, reason, and fix value in strict json
we then use if score < threshold then fail else pass
and if fail, then fix value will be returned, else the original llm output.
if its input custom validator and if on fail = rephrase , then original content will only pass, as it happens now.

Now this can take care of any customization for both input and output guardrail.

So I am ok, if you want to do above changes in topic relevance validator, OR build custom validator as above. Great, if you do both. Please share suggestions if on custom validator on how it can be versatile and if I am missing anything.

Lmk.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

backend/app/core/validators/topic_relevance_openai.py (1)

78-85: 💤 Low value

Nit: build kwargs as a dict literal.

Ruff (C408) flags the dict(...) call; a literal avoids the function-call overhead and is more idiomatic.

♻️ Proposed change

-            kwargs = dict(
-                model=self.llm_callable,
-                messages=[
-                    {"role": "system", "content": self._system_prompt},
-                    {"role": "user", "content": value},
-                ],
-                max_tokens=50,
-            )
+            kwargs = {
+                "model": self.llm_callable,
+                "messages": [
+                    {"role": "system", "content": self._system_prompt},
+                    {"role": "user", "content": value},
+                ],
+                "max_tokens": 50,
+            }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/core/validators/topic_relevance_openai.py` around lines 78 - 85,
Replace the dict(...) call that builds kwargs with a dict literal to satisfy
Ruff C408 and avoid function-call overhead; locate the kwargs assignment in the
method where kwargs = dict(model=self.llm_callable, messages=[{"role": "system",
"content": self._system_prompt}, {"role": "user", "content": value}],
max_tokens=50) and rewrite it as a dictionary literal using the same keys
(model, messages, max_tokens) and values (self.llm_callable, the messages list
referencing self._system_prompt and value, and 50) so behavior remains
identical.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/api/routes/guardrails.py`:
- Around line 133-141: The TopicRelevanceOpenAISafetyValidatorConfig branch only
copies configuration and omits the stored prompt_schema_version, causing
DB-backed presets to lose their prompt template selection; update the code so
topic_relevance_crud.get(...) returns the prompt_schema_version (if present) and
assign it to validator.prompt_schema_version in the
TopicRelevanceOpenAISafetyValidatorConfig branch, and also ensure the OpenAI
config model/build path (where TopicRelevanceOpenAISafetyValidatorConfig is
constructed) accepts and propagates prompt_schema_version the same way as the
existing topic_relevance -> TopicRelevanceSafetyValidatorConfig flow does.

In `@backend/app/core/validators/topic_relevance_openai.py`:
- Around line 94-102: The JSON parsing must be hardened: before calling
json.loads on content (the variable used in the try block that populates data
and score and returns FailResult on error), extract/isolate the first JSON
object by stripping Markdown fences or surrounding prose (e.g., remove ```json
... ``` and/or use a regex to find the first {...} block), then call json.loads
on that extracted substring; also change the type check to require an exact int
(use type(score) is int) and then validate score in (1,2,3) to reject booleans.
Ensure any extraction/parsing errors continue to produce the FailResult with the
existing error_message pattern.

In `@backend/app/tests/validators/test_topic_relevance_openai.py`:
- Around line 86-98: The test and implementation allow threshold=1 which
effectively disables the guardrail; change TopicRelevanceOpenAI (its __init__)
to validate threshold and reject values >= 1 (raise a ValueError or similar) so
the passing threshold must be strictly less than 1, and update the test
test_custom_threshold_of_1_passes_on_score_1 to assert that constructing
TopicRelevanceOpenAI with threshold=1 fails (or that the constructor raises)
instead of expecting a PassResult; references: TopicRelevanceOpenAI, its
__init__, the _validate method, and the test function
test_custom_threshold_of_1_passes_on_score_1.

In `@backend/scripts/run_all_evaluations.sh`:
- Line 14: The runner list in run_all_evaluations.sh includes an entry for
"$EVAL_DIR/topic_relevance_openai/run.py" which doesn't exist and will abort the
script under set -euo pipefail; remove that runner line from
run_all_evaluations.sh (or add the missing runner file if you actually intend a
separate runner). Note that topic_relevance/run.py already implements BACKENDS
with name: "topic_relevance_openai" and its main() iterates over both backends
and writes outputs to outputs/topic_relevance_openai/*, so prefer removing the
bogus topic_relevance_openai runner entry to fix the orchestration.

---

Nitpick comments:
In `@backend/app/core/validators/topic_relevance_openai.py`:
- Around line 78-85: Replace the dict(...) call that builds kwargs with a dict
literal to satisfy Ruff C408 and avoid function-call overhead; locate the kwargs
assignment in the method where kwargs = dict(model=self.llm_callable,
messages=[{"role": "system", "content": self._system_prompt}, {"role": "user",
"content": value}], max_tokens=50) and rewrite it as a dictionary literal using
the same keys (model, messages, max_tokens) and values (self.llm_callable, the
messages list referencing self._system_prompt and value, and 50) so behavior
remains identical.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b0e4f79e-71f5-4146-8e85-6d66e5a84ba3

📥 Commits

Reviewing files that changed from the base of the PR and between b81dd64 and 19c1ca8.

📒 Files selected for processing (14)

backend/app/api/routes/guardrails.py
backend/app/core/config.py
backend/app/core/enum.py
backend/app/core/validators/config/topic_relevance_openai_safety_validator_config.py
backend/app/core/validators/config/topic_relevance_safety_validator_config.py
backend/app/core/validators/topic_relevance.py
backend/app/core/validators/topic_relevance_openai.py
backend/app/core/validators/validators.json
backend/app/evaluation/topic_relevance/run.py
backend/app/schemas/guardrail_config.py
backend/app/tests/test_llm_validators.py
backend/app/tests/test_validate_with_guard.py
backend/app/tests/validators/test_topic_relevance_openai.py
backend/scripts/run_all_evaluations.sh

added open ai topic relevance guardrail

0167471

dennyabrain reviewed Jun 1, 2026

View reviewed changes

Comment thread backend/app/core/validators/topic_relevance_openai.py Outdated

AkhileshNegi requested changes Jun 1, 2026

View reviewed changes

Comment thread backend/app/core/validators/topic_relevance_openai.py Outdated

Comment thread backend/app/evaluation/topic_relevance_openai/run.py Outdated

Comment thread backend/app/core/validators/topic_relevance_openai.py Outdated

rkritika1508 added 3 commits June 2, 2026 16:39

updates

0394bfc

added threshold to settings

a3ca650

Merge branch 'main' into feat/open-ai-topic-relevance

19c1ca8

rkritika1508 self-assigned this Jun 2, 2026

rkritika1508 added enhancement New feature or request ready-for-review labels Jun 2, 2026

rkritika1508 added this to Kaapi-dev Jun 2, 2026

rkritika1508 linked an issue Jun 2, 2026 that may be closed by this pull request

Validation: OpenAI query relevance checker #127

Closed

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread backend/app/api/routes/guardrails.py Outdated

Comment thread backend/app/core/validators/topic_relevance_openai.py

Comment thread backend/app/tests/validators/test_topic_relevance_openai.py

Comment thread backend/scripts/run_all_evaluations.sh Outdated

rkritika1508 and others added 2 commits June 2, 2026 17:12

resolved comments

7532f41

cleanup PR

80727ff

AkhileshNegi changed the title ~~added open ai topic relevance guardrail~~ OpenAI: Topic relevance guardrail Jun 3, 2026

AkhileshNegi self-requested a review June 3, 2026 03:40

AkhileshNegi approved these changes Jun 3, 2026

View reviewed changes

AkhileshNegi merged commit 0b87a30 into main Jun 3, 2026
1 of 2 checks passed

AkhileshNegi deleted the feat/open-ai-topic-relevance branch June 3, 2026 03:41

github-project-automation Bot moved this to Closed in Kaapi-dev Jun 3, 2026

coderabbitai Bot mentioned this pull request Jun 5, 2026

Prompt-driver OpenAI topic relevance validator #128

Merged

2 tasks

Conversation

rkritika1508 commented Jun 1, 2026 • edited by AkhileshNegi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pritam-T4D commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dennyabrain commented Jun 2, 2026

Uh oh!

pritam-T4D commented Jun 2, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

rkritika1508 commented Jun 1, 2026 •

edited by AkhileshNegi

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

pritam-T4D commented Jun 1, 2026 •

edited

Loading