OpenAI: Topic relevance guardrail#126
Conversation
📝 WalkthroughWalkthroughThis PR adds a new OpenAI-powered topic relevance validator alongside the existing non-OpenAI variant. The feature includes validator implementation with LLM-based scoring, configuration classes with API-key validation, integration into the guardrails API and schema with stored configuration loading, comprehensive test coverage, and a refactored multi-backend evaluation framework supporting both validators. ChangesTopicRelevanceOpenAI Validator Feature
Sequence Diagram(s)sequenceDiagram
participant Client as Guard/Route
participant Validator as TopicRelevanceOpenAI
participant LLM as litellm.completion
Validator->>Validator: Validate system_prompt and user input
alt Invalid config or empty input
Validator-->>Client: FailResult
else Valid input
Validator->>LLM: POST system_prompt + user_value
LLM-->>Validator: message.content (JSON)
Validator->>Validator: Parse JSON, extract scope_violation
alt Parse success and scope_violation in {1,2,3}
Validator->>Validator: Compare scope_violation >= threshold
alt Passes threshold
Validator-->>Client: PassResult with scope_score metadata
else Below threshold
Validator-->>Client: FailResult with scope_score metadata
end
else Parse fails or invalid type/range
Validator-->>Client: FailResult with error message
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@dennyabrain @rkritika1508 I was thinking as per our discussion, we are building more as general purpose custom validator. That can be reusable across also (one of it topic relevance use case inclusion or exclusion). That way, we are letting user decide prompt (logic to score) and also score criteria everything at one place, with visibility also. Anyhow we have to also build custom validator. Such validator can be used even at output guardrail also and can provide ability with almost full customization (score, and also generate fix value by same llm). So will ask to rethink in that direction of more general purpose custom validator. I see for one of TAP use case of output guardrail, this will be needed. |
|
@pritam-T4D I see the point of allowing the user to specify the entire prompt. But not sure if we should allow the user the specify the scoring logic also. Because the response that comes from the LLM needs to be understood by the validator and a response that's consistent with how validators work in kaapi-guardrails needs to be returned. For instance, we might have instructions like this in the prompt we send to the LLM : We are then able to compare the result that comes from the LLM and do actions like 'for response 1 and 2, respond a fail' and 'for response 3 we respond with pass'. But if a user overrides this instruction with something like then the response from LLM will be If there is a need for such a versatile validator where user gets complete control over system prompt and also the scoring metric, I suggest we move this to a different validator. Something named more appropriately than TopicRelevance to avoid confusion. |
|
@dennyabrain Agree with your thought process also. I was thinking we rather build custom validator only now to avoid engineering efforts, and would have used same for current gender detection use case. How I am thinking of custom validator is,
Now this can take care of any customization for both input and output guardrail. So I am ok, if you want to do above changes in topic relevance validator, OR build custom validator as above. Great, if you do both. Please share suggestions if on custom validator on how it can be versatile and if I am missing anything. Lmk. |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
backend/app/core/validators/topic_relevance_openai.py (1)
78-85: 💤 Low valueNit: build
kwargsas a dict literal.Ruff (C408) flags the
dict(...)call; a literal avoids the function-call overhead and is more idiomatic.♻️ Proposed change
- kwargs = dict( - model=self.llm_callable, - messages=[ - {"role": "system", "content": self._system_prompt}, - {"role": "user", "content": value}, - ], - max_tokens=50, - ) + kwargs = { + "model": self.llm_callable, + "messages": [ + {"role": "system", "content": self._system_prompt}, + {"role": "user", "content": value}, + ], + "max_tokens": 50, + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/core/validators/topic_relevance_openai.py` around lines 78 - 85, Replace the dict(...) call that builds kwargs with a dict literal to satisfy Ruff C408 and avoid function-call overhead; locate the kwargs assignment in the method where kwargs = dict(model=self.llm_callable, messages=[{"role": "system", "content": self._system_prompt}, {"role": "user", "content": value}], max_tokens=50) and rewrite it as a dictionary literal using the same keys (model, messages, max_tokens) and values (self.llm_callable, the messages list referencing self._system_prompt and value, and 50) so behavior remains identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/app/api/routes/guardrails.py`:
- Around line 133-141: The TopicRelevanceOpenAISafetyValidatorConfig branch only
copies configuration and omits the stored prompt_schema_version, causing
DB-backed presets to lose their prompt template selection; update the code so
topic_relevance_crud.get(...) returns the prompt_schema_version (if present) and
assign it to validator.prompt_schema_version in the
TopicRelevanceOpenAISafetyValidatorConfig branch, and also ensure the OpenAI
config model/build path (where TopicRelevanceOpenAISafetyValidatorConfig is
constructed) accepts and propagates prompt_schema_version the same way as the
existing topic_relevance -> TopicRelevanceSafetyValidatorConfig flow does.
In `@backend/app/core/validators/topic_relevance_openai.py`:
- Around line 94-102: The JSON parsing must be hardened: before calling
json.loads on content (the variable used in the try block that populates data
and score and returns FailResult on error), extract/isolate the first JSON
object by stripping Markdown fences or surrounding prose (e.g., remove ```json
... ``` and/or use a regex to find the first {...} block), then call json.loads
on that extracted substring; also change the type check to require an exact int
(use type(score) is int) and then validate score in (1,2,3) to reject booleans.
Ensure any extraction/parsing errors continue to produce the FailResult with the
existing error_message pattern.
In `@backend/app/tests/validators/test_topic_relevance_openai.py`:
- Around line 86-98: The test and implementation allow threshold=1 which
effectively disables the guardrail; change TopicRelevanceOpenAI (its __init__)
to validate threshold and reject values >= 1 (raise a ValueError or similar) so
the passing threshold must be strictly less than 1, and update the test
test_custom_threshold_of_1_passes_on_score_1 to assert that constructing
TopicRelevanceOpenAI with threshold=1 fails (or that the constructor raises)
instead of expecting a PassResult; references: TopicRelevanceOpenAI, its
__init__, the _validate method, and the test function
test_custom_threshold_of_1_passes_on_score_1.
In `@backend/scripts/run_all_evaluations.sh`:
- Line 14: The runner list in run_all_evaluations.sh includes an entry for
"$EVAL_DIR/topic_relevance_openai/run.py" which doesn't exist and will abort the
script under set -euo pipefail; remove that runner line from
run_all_evaluations.sh (or add the missing runner file if you actually intend a
separate runner). Note that topic_relevance/run.py already implements BACKENDS
with name: "topic_relevance_openai" and its main() iterates over both backends
and writes outputs to outputs/topic_relevance_openai/*, so prefer removing the
bogus topic_relevance_openai runner entry to fix the orchestration.
---
Nitpick comments:
In `@backend/app/core/validators/topic_relevance_openai.py`:
- Around line 78-85: Replace the dict(...) call that builds kwargs with a dict
literal to satisfy Ruff C408 and avoid function-call overhead; locate the kwargs
assignment in the method where kwargs = dict(model=self.llm_callable,
messages=[{"role": "system", "content": self._system_prompt}, {"role": "user",
"content": value}], max_tokens=50) and rewrite it as a dictionary literal using
the same keys (model, messages, max_tokens) and values (self.llm_callable, the
messages list referencing self._system_prompt and value, and 50) so behavior
remains identical.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b0e4f79e-71f5-4146-8e85-6d66e5a84ba3
📒 Files selected for processing (14)
backend/app/api/routes/guardrails.pybackend/app/core/config.pybackend/app/core/enum.pybackend/app/core/validators/config/topic_relevance_openai_safety_validator_config.pybackend/app/core/validators/config/topic_relevance_safety_validator_config.pybackend/app/core/validators/topic_relevance.pybackend/app/core/validators/topic_relevance_openai.pybackend/app/core/validators/validators.jsonbackend/app/evaluation/topic_relevance/run.pybackend/app/schemas/guardrail_config.pybackend/app/tests/test_llm_validators.pybackend/app/tests/test_validate_with_guard.pybackend/app/tests/validators/test_topic_relevance_openai.pybackend/scripts/run_all_evaluations.sh
Summary
Target issue is #127
TopicRelevanceOpenAI, a new topic relevance validator that callslitellm.completion()directly instead of routing through the Guardrails HubLLMCriticwrapper. This gives tighter control over the prompt format, JSON parsing, and pass/fail threshold.{"scope_violation": <1|2|3>}.thresholdfield (default 2) — messages scoring ≥ threshold pass, score 1 fails.guardrail_config.py,ValidatorTypeenum,validators.json, and_resolve_validator_configsin the guardrails route (DB-backedtopic_relevance_config_idlookup works the same as fortopic_relevance).Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.