module cleanlab_codex.response_validation
Validation functions for evaluating LLM responses and determining if they should be replaced with Codex-generated alternatives.
function is_bad_response
is_bad_response(
response: 'str',
context: 'Optional[str]' = None,
query: 'Optional[str]' = None,
config: 'Union[BadResponseDetectionConfig, Dict[str, Any]]' = BadResponseDetectionConfig(fallback_answer='Based on the available information, I cannot provide a complete answer to this question.', fallback_similarity_threshold=70, trustworthiness_threshold=0.5, format_prompt=<function default_format_prompt at 0x7f80fac36a70>, unhelpfulness_confidence_threshold=None, tlm=None)
) → bool
Run a series of checks to determine if a response is bad.
If any check detects an issue (i.e. fails), the function returns True
, indicating the response is bad.
This function runs three possible validation checks: 1. Fallback check: Detects if response is too similar to a known fallback answer. 2. Untrustworthy check: Assesses response trustworthiness based on the given context and query. 3. Unhelpful check: Predicts if the response adequately answers the query or not, in a useful way.
Note: Each validation check runs conditionally based on whether the required arguments are provided. As soon as any validation check fails, the function returns True
.
Args:
response
(str): The response to check.context
(str, optional): Optional context/documents used for answering. Required for untrustworthy check.query
(str, optional): Optional user question. Required for untrustworthy and unhelpful checks.config
(BadResponseDetectionConfig, optional): Optional, configuration parameters for validation checks. See BadResponseDetectionConfig for details. If not provided, default values will be used.
Returns:
bool
:True
if any validation check fails,False
if all pass.
function is_fallback_response
is_fallback_response(
response: 'str',
fallback_answer: 'str' = 'Based on the available information, I cannot provide a complete answer to this question.',
threshold: 'int' = 70
) → bool
Check if a response is too similar to a known fallback answer.
Uses fuzzy string matching to compare the response against a known fallback answer. Returns True
if the response is similar enough to the fallback answer to be considered unhelpful.
Args:
response
(str): The response to check.fallback_answer
(str): A known unhelpful/fallback response to compare against.threshold
(int): Similarity threshold (0-100) above which a response is considered to match the fallback answer. Higher values require more similarity. Default 70 means responses that are 70% or more similar are considered bad.
Returns:
bool
:True
if the response is too similar to the fallback answer,False
otherwise.
function is_untrustworthy_response
is_untrustworthy_response(
response: 'str',
context: 'str',
query: 'str',
tlm: 'TLM',
trustworthiness_threshold: 'float' = 0.5,
format_prompt: 'Callable[[str, str], str]' = <function default_format_prompt at 0x7f80fac36a70>
) → bool
Check if a response is untrustworthy.
Uses TLM to evaluate whether a response is trustworthy given the context and query. Returns True
if TLM’s trustworthiness score falls below the threshold, indicating the response may be incorrect or unreliable.
Args:
response
(str): The response to check from the assistant.context
(str): The context information available for answering the query.query
(str): The user’s question or request.tlm
(TLM): The TLM model to use for evaluation.trustworthiness_threshold
(float): Score threshold (0.0-1.0) under which a response is considered untrustworthy. Lower values allow less trustworthy responses. Default 0.5 means responses with scores less than 0.5 are considered untrustworthy.format_prompt
(Callable[[str, str], str]): Function that takes (query, context) and returns a formatted prompt string. Users should provide the prompt formatting function for their RAG application here so that the response can be evaluated using the same prompt that was used to generate the response.
Returns:
bool
:True
if the response is deemed untrustworthy by TLM,False
otherwise.
function is_unhelpful_response
is_unhelpful_response(
response: 'str',
query: 'str',
tlm: 'TLM',
trustworthiness_score_threshold: 'Optional[float]' = None
) → bool
Check if a response is unhelpful by asking TLM to evaluate it.
Uses TLM to evaluate whether a response is helpful by asking it to make a Yes/No judgment. The evaluation considers both the TLM’s binary classification of helpfulness and its confidence score. Returns True
only if TLM classifies the response as unhelpful AND is sufficiently confident in that assessment (if a threshold is provided).
Args:
response
(str): The response to check.query
(str): User query that will be used to evaluate if the response is helpful.tlm
(TLM): The TLM model to use for evaluation.trustworthiness_score_threshold
(float, optional): Optional confidence threshold (0.0-1.0). If provided and TLM determines the response is unhelpful, the TLM confidence score must also exceed this threshold for the response to be considered truly unhelpful.
Returns:
bool
:True
if TLM determines the response is unhelpful with sufficient confidence,False
otherwise.
class BadResponseDetectionConfig
Configuration for bad response detection functions.
Used by is_bad_response
function to which passes values to corresponding downstream validation checks.
class BadResponseDetectionConfig(BaseModel):
fallback_answer: 'str' = Based on the available information, I cannot provide a complete answer to this question.
fallback_similarity_threshold: 'int' = 70
trustworthiness_threshold: 'float' = 0.5
format_prompt: 'typing.Callable[[str, str], str]' = <function default_format_prompt at 0x7f80fac36a70>
unhelpfulness_confidence_threshold: 'typing.Optional[float]'
tlm: 'typing.Optional[cleanlab_codex.types.tlm.TLM]'
Args:
fallback_answer
(str): Known unhelpful response to compare againstfallback_similarity_threshold
(int): Fuzzy matching similarity threshold (0-100). Higher values mean responses must be more similar to fallback_answer to be considered bad.trustworthiness_threshold
(float): Score threshold (0.0-1.0). Lower values allow less trustworthy responses.format_prompt
(typing.Callable[[str, str], str]): Function to format (query, context) into a prompt string.unhelpfulness_confidence_threshold
(typing.Optional[float]): Optional confidence threshold (0.0-1.0) for unhelpful classification.tlm
(typing.Optional[cleanlab_codex.types.tlm.TLM]): TLM model to use for evaluation (required for untrustworthiness and unhelpfulness checks).
property model_extra
Get extra fields set during validation.
Returns:
A dictionary of extra fields, or None
if config.extra
is not set to "allow"
.
property model_fields_set
Returns the set of fields that have been explicitly set on this model instance.
Returns: A set of strings representing the fields that have been set, i.e. that were not filled from defaults.