module cleanlab_codex.validator
Detect and remediate bad responses in RAG applications, by integrating Codex as-a-Backup.
class Validator
method __init__
__init__(
codex_access_key: 'str',
tlm_api_key: 'Optional[str]' = None,
trustworthy_rag_config: 'Optional[dict[str, Any]]' = None,
bad_response_thresholds: 'Optional[dict[str, float]]' = None
)
Real-time detection and remediation of bad responses in RAG applications, powered by Cleanlab’s TrustworthyRAG and Codex.
This object combines Cleanlab’s TrustworthyRAG evaluation scores with configurable thresholds to detect potentially bad responses in your RAG application. When a bad response is detected, this Validator automatically attempts to remediate by retrieving an expert-provided answer from the Codex Project you’ve integrated with your RAG app. If no expert answer is available, the corresponding query is logged in the Codex Project for SMEs to answer.
For production, use the validate()
method which provides a complete validation workflow including both detection and remediation. A detect()
method is separately available for you to test/tune detection configurations like score thresholds and TrustworthyRAG settings without triggering any Codex lookups that otherwise could affect the state of the corresponding Codex Project.
Args:
-
codex_access_key
(str): The access key for a Codex project. Used to retrieve expert-provided answers when bad responses are detected, or otherwise log the corresponding queries for SMEs to answer. -
tlm_api_key
(str, optional): API key for accessing TrustworthyRAG. If not provided, this must be specified intrustworthy_rag_config
. -
trustworthy_rag_config
(dict[str, Any], optional): Optional initialization arguments for TrustworthyRAG, which is used to detect response issues. If not provided, a default configuration will be used. By default, this Validator uses the same default configurations as TrustworthyRAG, except:- Explanations are returned in logs for better debugging
- Only the
response_helpfulness
eval is run
-
bad_response_thresholds
(dict[str, float], optional): Detection score thresholds used to flag whether a response is bad or not. Each key corresponds to an Eval from TrustworthyRAG, and the value indicates a threshold (between 0 and 1) below which Eval scores are treated as detected issues. A response is flagged as bad if any issues are detected. If not provided or only partially provided, default thresholds will be used for any missing metrics. Note that if a threshold is provided for a metric, that metric must correspond to an eval that is configured to run (with the exception of ‘trustworthiness’ which is always implicitly configured). You can configure arbitrary evals to run, and their thresholds will use default values unless explicitly set. SeeBadResponseThresholds
for more details on the default values.
Raises:
ValueError
: If both tlm_api_key and api_key in trustworthy_rag_config are provided.ValueError
: If any user-provided thresholds in bad_response_thresholds are for evaluation metrics that are not available in the current evaluation setup (except ‘trustworthiness’ which is always available).TypeError
: If any threshold value is not a number.ValueError
: If any threshold value is not between 0 and 1.
method detect
detect(
query: 'str',
context: 'str',
response: 'str',
prompt: 'Optional[str]' = None,
form_prompt: 'Optional[Callable[[str, str], str]]' = None
) → tuple[ThresholdedTrustworthyRAGScore, bool]
Score response quality using TrustworthyRAG and flag bad responses based on configured thresholds.
Note:
Use this method instead of
validate()
to test/tune detection configurations like score thresholds and TrustworthyRAG settings. Thisdetect()
method will not affect your Codex Project, whereasvalidate()
will log queries whose response was detected as bad into the Codex Project and is thus only suitable for production, not testing. Both this method andvalidate()
rely on this same detection logic, so you can use this method to first optimize detections and then switch to usingvalidate()
.
Args:
query
(str): The user query that was used to generate the response.context
(str): The context that was retrieved from the RAG Knowledge Base and used to generate the response.response
(str): A reponse from your LLM/RAG system.prompt
(str, optional): Optional prompt representing the actual inputs (combining query, context, and system instructions into one string) to the LLM that generated the response.form_prompt
(Callable[[str, str], str], optional): Optional function to format the prompt based on query and context. Cannot be provided together with prompt, provide one or the other. This function should take query and context as parameters and return a formatted prompt string. If not provided, a default prompt formatter will be used. To include a system prompt or any other special instructions for your LLM, incorporate them directly in your custom form_prompt() function definition.
Returns:
tuple[ThresholdedTrustworthyRAGScore, bool]
: A tuple containing:- ThresholdedTrustworthyRAGScore: Quality scores for different evaluation metrics like trustworthiness and response helpfulness. Each metric has a score between 0-1. It also has a boolean flag,
is_bad
indicating whether the score is below a given threshold. - bool: True if the response is determined to be bad based on the evaluation scores and configured thresholds, False otherwise.
- ThresholdedTrustworthyRAGScore: Quality scores for different evaluation metrics like trustworthiness and response helpfulness. Each metric has a score between 0-1. It also has a boolean flag,
method detect_async
detect_async(
query: 'str',
context: 'str',
response: 'str',
prompt: 'Optional[str]' = None,
form_prompt: 'Optional[Callable[[str, str], str]]' = None
) → tuple[ThresholdedTrustworthyRAGScore, bool]
Score response quality using TrustworthyRAG and flag bad responses based on configured thresholds.
Note:
Use this method instead of
validate()
to test/tune detection configurations like score thresholds and TrustworthyRAG settings. Thisdetect()
method will not affect your Codex Project, whereasvalidate()
will log queries whose response was detected as bad into the Codex Project and is thus only suitable for production, not testing. Both this method andvalidate()
rely on this same detection logic, so you can use this method to first optimize detections and then switch to usingvalidate()
.
Args:
query
(str): The user query that was used to generate the response.context
(str): The context that was retrieved from the RAG Knowledge Base and used to generate the response.response
(str): A reponse from your LLM/RAG system.prompt
(str, optional): Optional prompt representing the actual inputs (combining query, context, and system instructions into one string) to the LLM that generated the response.form_prompt
(Callable[[str, str], str], optional): Optional function to format the prompt based on query and context. Cannot be provided together with prompt, provide one or the other. This function should take query and context as parameters and return a formatted prompt string. If not provided, a default prompt formatter will be used. To include a system prompt or any other special instructions for your LLM, incorporate them directly in your custom form_prompt() function definition.
Returns:
tuple[ThresholdedTrustworthyRAGScore, bool]
: A tuple containing:- ThresholdedTrustworthyRAGScore: Quality scores for different evaluation metrics like trustworthiness and response helpfulness. Each metric has a score between 0-1. It also has a boolean flag,
is_bad
indicating whether the score is below a given threshold. - bool: True if the response is determined to be bad based on the evaluation scores and configured thresholds, False otherwise.
- ThresholdedTrustworthyRAGScore: Quality scores for different evaluation metrics like trustworthiness and response helpfulness. Each metric has a score between 0-1. It also has a boolean flag,
method validate
validate(
query: 'str',
context: 'str',
response: 'str',
prompt: 'Optional[str]' = None,
form_prompt: 'Optional[Callable[[str, str], str]]' = None,
metadata: 'Optional[dict[str, Any]]' = None,
log_results: 'bool' = True
) → dict[str, Any]
Evaluate whether the AI-generated response is bad, and if so, request an alternate expert answer. If no expert answer is available, this query is still logged for SMEs to answer.
Args:
query
(str): The user query that was used to generate the response.context
(str): The context that was retrieved from the RAG Knowledge Base and used to generate the response.response
(str): A reponse from your LLM/RAG system.prompt
(str, optional): Optional prompt representing the actual inputs (combining query, context, and system instructions into one string) to the LLM that generated the response.form_prompt
(Callable[[str, str], str], optional): Optional function to format the prompt based on query and context. Cannot be provided together with prompt, provide one or the other. This function should take query and context as parameters and return a formatted prompt string. If not provided, a default prompt formatter will be used. To include a system prompt or any other special instructions for your LLM, incorporate them directly in your custom form_prompt() function definition.
Returns:
dict[str, Any]
: A dictionary containing:- ‘expert_answer’: Alternate SME-provided answer from Codex if the response was flagged as bad and an answer was found in the Codex Project, or None otherwise.
- ‘is_bad_response’: True if the response is flagged as potentially bad, False otherwise. When True, a Codex lookup is performed, which logs this query into the Codex Project for SMEs to answer.
- Additional keys from a
ThresholdedTrustworthyRAGScore
dictionary: each corresponds to a TrustworthyRAG evaluation metric, and points to the score for this evaluation as well as a booleanis_bad
flagging whether the score falls below the corresponding threshold.
method validate_async
validate_async(
query: 'str',
context: 'str',
response: 'str',
prompt: 'Optional[str]' = None,
form_prompt: 'Optional[Callable[[str, str], str]]' = None,
metadata: 'Optional[dict[str, Any]]' = None,
log_results: 'bool' = True
) → dict[str, Any]
Evaluate whether the AI-generated response is bad, and if so, request an alternate expert answer. If no expert answer is available, this query is still logged for SMEs to answer.
Args:
query
(str): The user query that was used to generate the response.context
(str): The context that was retrieved from the RAG Knowledge Base and used to generate the response.response
(str): A reponse from your LLM/RAG system.prompt
(str, optional): Optional prompt representing the actual inputs (combining query, context, and system instructions into one string) to the LLM that generated the response.form_prompt
(Callable[[str, str], str], optional): Optional function to format the prompt based on query and context. Cannot be provided together with prompt, provide one or the other. This function should take query and context as parameters and return a formatted prompt string. If not provided, a default prompt formatter will be used. To include a system prompt or any other special instructions for your LLM, incorporate them directly in your custom form_prompt() function definition.
Returns:
dict[str, Any]
: A dictionary containing:- ‘expert_answer’: Alternate SME-provided answer from Codex if the response was flagged as bad and an answer was found in the Codex Project, or None otherwise.
- ‘is_bad_response’: True if the response is flagged as potentially bad, False otherwise. When True, a Codex lookup is performed, which logs this query into the Codex Project for SMEs to answer.
- Additional keys from a
ThresholdedTrustworthyRAGScore
dictionary: each corresponds to a TrustworthyRAG evaluation metric, and points to the score for this evaluation as well as a booleanis_bad
flagging whether the score falls below the corresponding threshold.
class BadResponseThresholds
Config for determining if a response is bad. Each key is an evaluation metric and the value is a threshold such that a response is considered bad whenever the corresponding evaluation score falls below the threshold.
Default Thresholds:
- trustworthiness: 0.7
- response_helpfulness: 0.7
- Any custom eval: 0.0 (if not explicitly specified in bad_response_thresholds). A threshold of 0.0 means that the associated eval is not used to determine if a response is bad, unless explicitly specified in bad_response_thresholds, but still allow for reporting of those scores.
property default_threshold
The default threshold to use when an evaluation metric’s threshold is not specified. This threshold is set to 0.0.
method get_threshold
get_threshold(eval_name: 'str') → float
Get threshold for an eval, if it exists.
For fields defined in the model, returns their value (which may be the field’s default). For custom evals not defined in the model, returns the default threshold value (see default_threshold
).
classmethod validate_threshold
validate_threshold(v: 'Any') → float
Validate that all fields (including dynamic ones) are floats between 0 and 1.