module cleanlab_tlm.utils.chat_completions
Real-time evaluation of responses from OpenAI Chat Completions API.
If you are using OpenAI’s Chat Completions API, this module allows you to incorporate TLM trust scoring without any change to your existing code. It works for any OpenAI LLM model, as well as the many other non-OpenAI LLMs that are also usable via Chat Completions API (Gemini, DeepSeek, Llama, etc).
class TLMChatCompletion
Represents a Trustworthy Language Model (TLM) instance specifically designed for evaluating OpenAI Chat Completions responses.
This class provides a TLM wrapper that can be used to evaluate the quality and trustworthiness of responses from any OpenAI model by passing in the inputs to OpenAI’s Chat Completions API and the ChatCompletion response object.
Args:
-
quality_preset
({“base”, “low”, “medium”, “high”, “best”}, default = “medium”): an optional preset configuration to control the quality of TLM trustworthiness scores vs. latency/costs. -
api_key
(str, optional): Cleanlab TLM API key. If not provided, will attempt to read from CLEANLAB_API_KEY environment variable. -
options
(TLMOptions, optional): a typed dict of configurations you can optionally specify. See detailed documentation under TLMOptions. -
timeout
(float, optional): timeout (in seconds) to apply to each TLM evaluation.
method get_explanation
get_explanation(
response: Optional[ForwardRef('ChatCompletion')] = None,
tlm_result: Union[TLMScore, ForwardRef('ChatCompletion')],
**openai_kwargs: Any
) → str
Gets explanations for a given prompt-response pair with a given score.
This method provides detailed explanations from TLM about why a particular response received its trustworthiness score.
The tlm_result
object will be mutated to include the explanation in its log.
Args:
response
(ChatCompletion, optional): The OpenAI ChatCompletion response object to evaluatetlm_result
(TLMScore | ChatCompletion): The result object from a previous TLM call**openai_kwargs (Any)
: The original kwargs passed to OpenAI’s create() method, must include ‘messages’
Returns:
str
: Explanation for why TLM assigned the given trustworthiness score to the response.
method get_untrustworthy_fields
get_untrustworthy_fields(
response: Optional[ForwardRef('ChatCompletion')] = None,
tlm_result: Union[TLMScore, ForwardRef('ChatCompletion')],
threshold: float = 0.8,
display_details: bool = True
) → list[str]
Gets the fields of a structured output response that are considered untrustworthy by TLM. Only works for responses that are valid JSON objects (uses response_format
to specify the output format). Prints detailed information about the untrustworthy fields if display_details
is True.
Args:
response
(ChatCompletion): The OpenAI ChatCompletion response object to evaluatetlm_result
(TLMScore | ChatCompletion): The result object from a previous TLM callthreshold
(float): The threshold for considering a field untrustworthydisplay_details
(bool): Whether to display detailed information about the untrustworthy fields
Returns:
list[str]
: The fields of the response that are considered untrustworthy by TLM
method score
score(response: 'ChatCompletion', **openai_kwargs: Any) → TLMScore
Score the trustworthiness of an OpenAI ChatCompletion response.
Args:
response
(ChatCompletion): The OpenAI ChatCompletion response object to evaluate**openai_kwargs (Any)
: The original kwargs passed to OpenAI’s create() method, must include ‘messages’
Returns:
TLMScore
: A dict containing the trustworthiness score and optional logs
method score_async
score_async(response: 'ChatCompletion', **openai_kwargs: Any) → TLMScore
Asynchronously score the trustworthiness of an OpenAI ChatCompletion response. This method is similar to the score()
method but operates asynchronously, allowing for non-blocking concurrent operations.
Use this method if you want to score multiple ChatCompletion responses concurrently without blocking the execution of other operations.
Args:
response
(ChatCompletion): The OpenAI ChatCompletion response object to evaluate**openai_kwargs (Any)
: The original kwargs passed to OpenAI’s create() method, must include ‘messages’
Returns:
TLMScore
: A dict containing the trustworthiness score and optional logs