Skip to main content

module cleanlab_tlm.utils.chat_completions

Real-time evaluation of responses from OpenAI Chat Completions API.

If you are using OpenAI’s Chat Completions API, this module allows you to incorporate TLM trust scoring without any change to your existing code. It works for any OpenAI LLM model, as well as the many other non-OpenAI LLMs that are also usable via Chat Completions API (Gemini, DeepSeek, Llama, etc).


class TLMChatCompletion

Represents a Trustworthy Language Model (TLM) instance specifically designed for evaluating OpenAI Chat Completions responses.

This class provides a TLM wrapper that can be used to evaluate the quality and trustworthiness of responses from any OpenAI model by passing in the inputs to OpenAI’s Chat Completions API and the ChatCompletion response object.

Args:

  • quality_preset ({“base”, “low”, “medium”, “high”, “best”}, default = “medium”): an optional preset configuration to control the quality of TLM trustworthiness scores vs. latency/costs.

  • api_key (str, optional): Cleanlab TLM API key. If not provided, will attempt to read from CLEANLAB_API_KEY environment variable.

  • options (TLMOptions, optional): a typed dict of configurations you can optionally specify. See detailed documentation under TLMOptions.

  • timeout (float, optional): timeout (in seconds) to apply to each TLM evaluation.


method get_explanation

get_explanation(
response: Optional[ForwardRef('ChatCompletion')] = None,
tlm_result: Union[TLMScore, ForwardRef('ChatCompletion')],
**openai_kwargs: Any
)str

Gets explanations for a given prompt-response pair with a given score.

This method provides detailed explanations from TLM about why a particular response received its trustworthiness score.

The tlm_result object will be mutated to include the explanation in its log.

Args:

  • response (ChatCompletion, optional): The OpenAI ChatCompletion response object to evaluate
  • tlm_result (TLMScore | ChatCompletion): The result object from a previous TLM call
  • **openai_kwargs (Any): The original kwargs passed to OpenAI’s create() method, must include ‘messages’

Returns:

  • str: Explanation for why TLM assigned the given trustworthiness score to the response.

method get_untrustworthy_fields

get_untrustworthy_fields(
response: Optional[ForwardRef('ChatCompletion')] = None,
tlm_result: Union[TLMScore, ForwardRef('ChatCompletion')],
threshold: float = 0.8,
display_details: bool = True
)list[str]

Gets the fields of a structured output response that are considered untrustworthy by TLM. Only works for responses that are valid JSON objects (uses response_format to specify the output format). Prints detailed information about the untrustworthy fields if display_details is True.

Args:

  • response (ChatCompletion): The OpenAI ChatCompletion response object to evaluate
  • tlm_result (TLMScore | ChatCompletion): The result object from a previous TLM call
  • threshold (float): The threshold for considering a field untrustworthy
  • display_details (bool): Whether to display detailed information about the untrustworthy fields

Returns:

  • list[str]: The fields of the response that are considered untrustworthy by TLM

method score

score(response: 'ChatCompletion', **openai_kwargs: Any) → TLMScore

Score the trustworthiness of an OpenAI ChatCompletion response.

Args:

  • response (ChatCompletion): The OpenAI ChatCompletion response object to evaluate
  • **openai_kwargs (Any): The original kwargs passed to OpenAI’s create() method, must include ‘messages’

Returns:

  • TLMScore: A dict containing the trustworthiness score and optional logs

method score_async

score_async(response: 'ChatCompletion', **openai_kwargs: Any) → TLMScore

Asynchronously score the trustworthiness of an OpenAI ChatCompletion response. This method is similar to the score() method but operates asynchronously, allowing for non-blocking concurrent operations.

Use this method if you want to score multiple ChatCompletion responses concurrently without blocking the execution of other operations.

Args:

  • response (ChatCompletion): The OpenAI ChatCompletion response object to evaluate
  • **openai_kwargs (Any): The original kwargs passed to OpenAI’s create() method, must include ‘messages’

Returns:

  • TLMScore: A dict containing the trustworthiness score and optional logs