Using TLM in your VPC via OpenAI's Chat Completions API

Run in Google Colab

This tutorial demonstrates how to integrate your VPC installation of Cleanlab's Trustworthy Language Model (TLM) into existing GenAI apps. You will learn how to assess the trustworthiness of LLM model responses, directly through the OpenAI client library, Azure's AI inference client, or Cleanlab's cleanlab-tlm client library.

API access to the TLM backend service

This demo assumes that you have access to the deployed TLM backend service at the URL http://example.customer.com:8080/api. You are welcome to expose the TLM service however you prefer, depending on the unique needs of your networking environment. Simply replace the base URL in the corresponding cell blocks below.

Please note that Google Colab does not have built-in support to access services on your local machine. This is because Colab runs in a virtual machine, so localhost refers to that VM, rather than your computer. If you would like to access TLM by port-forwarding to your local machine, you may do so by downloading the .ipynb file and running Jupyter locally, or by using a tunneling service like ngrok.

import os

os.environ["BASE_URL"] = "http://example.customer.com:8080/api"

Setup

The Python packages required for this tutorial can be installed using pip:

%pip install --upgrade openai azure-ai-inference cleanlab-tlm

from openai import OpenAI, AzureOpenAI
from cleanlab_tlm.utils.vpc.chat_completions import TLMChatCompletion

Overview of this tutorial

The workflows showcased below demonstrates how to incorporate trust scoring into your existing LLM code with minimal code changes. We'll explore three workflows:

Workflow 1 & 2: Use your own existing LLM infrastructure to generate responses, then use Cleanlab to score them
Workflow 3: Use Cleanlab for both generating and scoring responses (response-generation can be from any LLM model supported in your VPC deployment)

Workflow 1: Score Responses from Existing LLM Calls

One way to use TLM if you're already using OpenAI's ChatCompletions API is to score any existing LLM call you've made. This works for LLMs beyond OpenAI models (many LLM providers like Gemini or DeepSeek also support OpenAI's Chat Completions API).

You can first obtain generate LLM responses as usual using the OpenAI API (or any of your existing infrastructure):

openai_kwargs = {
    "model": "gpt-4.1-mini",
    "messages":[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    "logprobs": True,
    "top_logprobs": 3,
}

client = AzureOpenAI(
    api_version="<your-api-version>",
    azure_endpoint="<your-azure-endpoint>",
    api_key="<your-azure-api-key>",
)
response = client.chat.completions.create(**openai_kwargs)
response

ChatCompletion(id='chatcmpl-CC9mnAX00eJVEpdEHcpq9N38JmY2W', choices=[Choice(finish_reason='stop', index=0, logprobs=ChoiceLogprobs(content=[ChatCompletionTokenLogprob(token='The', bytes=[84, 104, 101], logprob=0.0, top_logprobs=[TopLogprob(token='The', bytes=[84, 104, 101], logprob=0.0), TopLogprob(token='Paris', bytes=[80, 97, 114, 105, 115], logprob=-20.625), TopLogprob(token=' The', bytes=[32, 84, 104, 101], logprob=-22.375)]), ChatCompletionTokenLogprob(token=' capital', bytes=[32, 99, 97, 112, 105, 116, 97, 108], logprob=0.0, top_logprobs=[TopLogprob(token=' capital', bytes=[32, 99, 97, 112, 105, 116, 97, 108], logprob=0.0), TopLogprob(token=' capitale', bytes=[32, 99, 97, 112, 105, 116, 97, 108, 101], logprob=-20.75), TopLogprob(token='capital', bytes=[99, 97, 112, 105, 116, 97, 108], logprob=-21.25)]), ChatCompletionTokenLogprob(token=' of', bytes=[32, 111, 102], logprob=0.0, top_logprobs=[TopLogprob(token=' of', bytes=[32, 111, 102], logprob=0.0), TopLogprob(token=' של', bytes=[32, 215, 169, 215, 156], logprob=-27.375), TopLogprob(token=' của', bytes=[32, 99, 225, 187, 167, 97], logprob=-28.25)]), ChatCompletionTokenLogprob(token=' France', bytes=[32, 70, 114, 97, 110, 99, 101], logprob=0.0, top_logprobs=[TopLogprob(token=' France', bytes=[32, 70, 114, 97, 110, 99, 101], logprob=0.0), TopLogprob(token='France', bytes=[70, 114, 97, 110, 99, 101], logprob=-20.0), TopLogprob(token=' Paris', bytes=[32, 80, 97, 114, 105, 115], logprob=-20.125)]), ChatCompletionTokenLogprob(token=' is', bytes=[32, 105, 115], logprob=0.0, top_logprobs=[TopLogprob(token=' is', bytes=[32, 105, 115], logprob=0.0), TopLogprob(token='\xa0', bytes=[194, 160], logprob=-28.5), TopLogprob(token='是', bytes=[230, 152, 175], logprob=-28.875)]), ChatCompletionTokenLogprob(token=' Paris', bytes=[32, 80, 97, 114, 105, 115], logprob=0.0, top_logprobs=[TopLogprob(token=' Paris', bytes=[32, 80, 97, 114, 105, 115], logprob=0.0), TopLogprob(token=' **', bytes=[32, 42, 42], logprob=-17.875), TopLogprob(token='Paris', bytes=[80, 97, 114, 105, 115], logprob=-19.25)]), ChatCompletionTokenLogprob(token='.', bytes=[46], logprob=-1.9361264946837764e-07, top_logprobs=[TopLogprob(token='.', bytes=[46], logprob=-1.9361264946837764e-07), TopLogprob(token='._', bytes=[46, 95], logprob=-16.625), TopLogprob(token='.\\', bytes=[46, 92], logprob=-16.875)])], refusal=None), message=ChatCompletionMessage(content='The capital of France is Paris.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None), content_filter_results={'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}})], created=1757013589, model='gpt-4.1-mini-2025-04-14', object='chat.completion', service_tier=None, system_fingerprint='fp_4f3d32ad4e', usage=CompletionUsage(completion_tokens=8, prompt_tokens=24, total_tokens=32, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)), prompt_filter_results=[{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}])

We can then use TLM to score the generated response. For models supporting log probabilities, including these will allow TLM to return higher quality scores.

Here, we first instantiate a TLMChatCompletion object. For more configurations, view the valid arguments below.

tlm = TLMChatCompletion(quality_preset="medium", options={"model": "azure/gpt-4.1-mini", "log": ["explanation"]})

score_result = tlm.score(
    response=response,
    **openai_kwargs
)

print(f"Response: {response.choices[0].message.content}")
print(f"TLM Score: {score_result['trustworthiness_score']:.4f}")
print(f"TLM Explanation: {score_result['log']['explanation']}")

Response: The capital of France is Paris.
TLM Score: 1.0000
TLM Explanation: Did not find a reason to doubt trustworthiness.

Using OpenAI client instead of AzureOpenAI (click to expand)

If you're using the OpenAI client instead of AzureOpenAI client, the only difference in your code from above would be that the client is instantiated differently:

client = OpenAI()
response = client.chat.completions.create(**openai_kwargs)

instead of:

client = AzureOpenAI(
    api_version="<your-api-version>",
    azure_endpoint="<your-azure-endpoint>",
    api_key="<your-azure-api-key>",
)
response = client.chat.completions.create(**openai_kwargs)

The code to score this response using TLM remains identical. Full code sample below:

openai_kwargs = {
    "model": "gpt-4.1-mini",
    "messages":[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
}

client = OpenAI()
tlm = TLMChatCompletion(quality_preset="medium", options={"model": "azure/gpt-4.1-mini", "log": ["explanation"]})

response = client.chat.completions.create(**openai_kwargs)
score_result = tlm.score(
    response=response,
    **openai_kwargs
)

Workflow 2: Adding a Decorator to your LLM Call

For greater convenience, you decorate your call to openai.chat.completions.create() with a decorator that then appends the trust score as a key in the returned response. This workflow only requires an initial setup which then requires zero changes to the rest of your existing code:

import functools

def add_trust_scoring(tlm_instance):
    """Decorator factory that creates a trust scoring decorator."""
    def trust_score_decorator(fn):
        @functools.wraps(fn)
        def wrapper(**kwargs):
            response = fn(**kwargs)
            score_result = tlm_instance.score(response=response, **kwargs)
            response.tlm_metadata = score_result
            return response
        return wrapper
    return trust_score_decorator

Then, we can decorate the OpenAI client, and then your existing code automatically gets trust scores:

tlm = TLMChatCompletion(quality_preset="medium", options={"model": "azure/gpt-4.1-mini", "log": ["explanation"]})

client = AzureOpenAI(
    api_version="<your-api-version>",
    azure_endpoint="<your-azure-endpoint>",
    api_key="<your-azure-api-key>",
)
client.chat.completions.create = add_trust_scoring(tlm)(client.chat.completions.create)

response = client.chat.completions.create(**openai_kwargs)
response

ChatCompletion(id='chatcmpl-Bjuu831HubejCal4AsdB2UWeLEJW2', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The capital of France is Paris.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1750283200, model='gpt-4.1-mini-2025-04-14', object='chat.completion', service_tier='default', system_fingerprint='fp_6f2eabb9a5', usage=CompletionUsage(completion_tokens=7, prompt_tokens=24, total_tokens=31, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)), tlm_metadata={'trustworthiness_score': 0.9999999233293982, 'log': {'explanation': 'Did not find a reason to doubt trustworthiness.'}})

print(f"Response: {response.choices[0].message.content}")
print(f"TLM Score: {response.tlm_metadata['trustworthiness_score']:.4f}")
print(f"TLM Explanation: {response.tlm_metadata['log']['explanation']}")

Response: The capital of France is Paris.
TLM Score: 1.0000
TLM Explanation: Did not find a reason to doubt trustworthiness.

Using OpenAI client instead of AzureOpenAI (click to expand)

The only difference would again be that your client is instantiated differently:

client = OpenAI()
client.chat.completions.create = add_trust_scoring(tlm)(client.chat.completions.create)

instead of:

client = AzureOpenAI(
    api_version="<your-api-version>",
    azure_endpoint="<your-azure-endpoint>",
    api_key="<your-azure-api-key>",
)
client.chat.completions.create = add_trust_scoring(tlm)(client.chat.completions.create)

The code to score this response using TLM remains identical. Full code sample below:

import functools

def add_trust_scoring(tlm_instance):
    """Decorator factory that creates a trust scoring decorator."""
    def trust_score_decorator(fn):
        @functools.wraps(fn)
        def wrapper(**kwargs):
            response = fn(**kwargs)
            score_result = tlm_instance.score(response=response, **kwargs)
            response.tlm_metadata = score_result
            return response
        return wrapper
    return trust_score_decorator

tlm = TLMChatCompletion(quality_preset="medium", options={"model": "azure/gpt-4.1-mini", "log": ["explanation"]})

client = OpenAI()
client.chat.completions.create = add_trust_scoring(tlm)(client.chat.completions.create)

response = client.chat.completions.create(**openai_kwargs)

The above workflows allow you to continue using your own LLM infrastructure to generate responses, and you simply add Cleanlab as an extra step to score their trustworthiness. Your core AI system remains the same as before, without changes to your existing code. Alternatively, you can avoid managing any LLM infrastructure via the workflow below, where Cleanlab manages the LLM calls to produce responses as well.

Workflow 3: Use Cleanlab to Generate and Score Responses

You can point your LLM client directly to Cleanlab's infrastructure. This approach generates responses using Cleanlab's backend while simultaneously providing trustworthiness scores.

OpenAI Client

First we demonstrate how to use to OpenAI client with TLM. Here, you can replace the base URL with your actual TLM service endpoint, and then use the chat.completions.create() method as you normally would.

If your existing code uses AzureOpenAI client instead of OpenAI client, simply make the following replacements in your code:

from openai import AzureOpenAI -> from openai import OpenAI
client = AzureOpenAI() -> client = OpenAI(...) using the arguments specified below

The rest of this section should work with your existing code, as the API interface and input/output types are the same between OpenAI and AzureOpenAI.

client = OpenAI(
    api_key="<your-api-key>",  # replace with your Azure OpenAI key
    base_url="http://example.customer.com:8080/api"  # replace with your TLM service URL
)

response = client.chat.completions.create(
    model="azure/gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    extra_body={
        "quality_preset": "low",
        "options": {"log": ["explanation"]}
    }
)
response

ChatCompletion(id='chatcmpl-BjuuC9gCQQk7yfonPQ3upUQsBef3v', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The capital of France is Paris.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1750283204, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fingerprint='fp_34a54ae93c', usage=CompletionUsage(completion_tokens=7, prompt_tokens=24, total_tokens=31, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)), tlm_metadata={'trustworthiness_score': 0.9999997198696753, 'log': {'explanation': 'Did not find a reason to doubt trustworthiness.'}})

The extra_body argument contains additional TLM configurations. For all supported inputs, view the valid arguments below.

print(f"Response: {response.choices[0].message.content}")
print(f"TLM Score: {response.tlm_metadata['trustworthiness_score']:.4f}")
print(f"TLM Explanation: {response.tlm_metadata['log']['explanation']}")

Response: The capital of France is Paris.
TLM Score: 1.0000
TLM Explanation: Did not find a reason to doubt trustworthiness.

Adding a decorator to pass in TLM configurations via `extra_body`

Here, we demonstrate how to decorate your call to openai.chat.completions.create() which will automatically add the extra_body argument to all your subsequent calls to the create() method, which after the initial setup will require zero changes to your existing code.

import functools

def add_extra_body(tlm_kwargs):
    def decorator(fn):
        @functools.wraps(fn)
        def wrapper(*args, **kwargs):
            kwargs["extra_body"] = tlm_kwargs
            return fn(*args, **kwargs)
        return wrapper
    return decorator

Similar to above, we can decorate the OpenAI client. After this monkey-patch, the code below is functionally equivalent to the one above where we specified extra_body in each create() call -- this make it such that you can use your existing code with minimal changes.

tlm_kwargs = {"quality_preset": "low", "options": {"log": ["explanation"]}}
client.chat.completions.create = add_extra_body(tlm_kwargs)(client.chat.completions.create)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
)
response

ChatCompletion(id='chatcmpl-BjuuGxSKNvPWoQqYL8iSNC6e8UYjt', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The capital of France is Paris.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1750283208, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fingerprint='fp_34a54ae93c', usage=CompletionUsage(completion_tokens=7, prompt_tokens=24, total_tokens=31, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)), tlm_metadata={'trustworthiness_score': 0.9999997198696664, 'log': {'explanation': 'Did not find a reason to doubt trustworthiness.'}})

Azure AI Inference Client

You can also use the azure-ai-inference client by pointing it to the TLM service endpoint. It can be called in a similar way:

from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

azure_client = ChatCompletionsClient(
    endpoint="http://example.customer.com:8080/api",  # replace with your TLM service URL
    credential=AzureKeyCredential("<your-api-key>"),  # replace with your Azure OpenAI key
)

response = azure_client.complete(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    model_extras={
        "quality_preset": "low",
        "options": {"log": ["explanation"]}
    }
)

response

{'id': 'chatcmpl-BrSfSZVUcEMJk9Baw0kgg13vGGEpS', 'created': 1752081282, 'model': 'gpt-4.1-mini-2025-04-14', 'object': 'chat.completion', 'system_fingerprint': 'fp_658b958c37', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'content': 'The capital of France is Paris.', 'role': 'assistant', 'tool_calls': None, 'function_call': None, 'annotations': []}, 'logprobs': None}], 'usage': {'completion_tokens': 7, 'prompt_tokens': 14, 'total_tokens': 21, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'service_tier': 'default', 'tlm_metadata': {'trustworthiness_score': 0.9999999161026348, 'log': {'explanation': 'Did not find a reason to doubt trustworthiness.'}}}

Note that the extra TLM options are now passed in using the model_extras argument (instead of extra_body argument used for invoking TLM through the OpenAI client).

Input Arguments to TLM

These are optional TLM configurations you can specify either when initializing TLMChatCompletion object, or in the extra_body argument to the OpenAI API client.

quality_preset (medium, default = "medium"): a preset configuration to control the quality of TLM responses and trustworthiness scores vs. latency/costs. The "medium" preset produces more reliable trustworthiness scores than "low". The "base" preset provides the lowest possible latency/cost. Higher presets have increased runtime and cost. Reduce your preset if you see token-limit errors.
options is a dictionary of configuration options for TLM. Inputs include:
- model (default = “gpt-4.1-mini”): Underlying base LLM to use (better models yield better results, faster models yield faster results).
  
  Note that if you are using the OpenAI openai.chat.completions.create() API, you should provide the model name there instead of in the options dictionary here.
- log (default = []): Specify additional logs or metadata that TLM should return. Valid options inlude:
  - explanation: get explanations of why a response is scored with low trustworthiness
- model_provider (default = ): A dictionary specifying the specific endpoint that LLM requests would be sent to, valid keys include:
  - api_base: the base URL endpoint for the LLM service
  - api_key: the corresponding API key to authenticate with the endpoint specified in api_base
  - api-version: the API version to use
  - provider: the provider name, should be one of the providers supported by litellm (e.g., "azure", "openai", "anthropic", "cohere", etc.)

Using Custom Endpoints

The model_provider parameter allows you to specify custom API endpoints for your LLM services. This is particularly useful when you want to route requests through specific endpoints for each request. Below are examples showing how to configure TLM to work with different endpoints.

When using the TLMChatCompletion object (workflows 1 / 2):

tlm = TLMChatCompletion(
    quality_preset="medium",
    options={
        "model": "azure/gpt-4.1-mini",
        "log": ["explanation"],
        "model_provider": {
            "api_base": "<your-api-base>",
            "api_key": "<your-api-key>"
        }
    }
)

tlm.score(...)

When using the OpenAI / Azure AI Inference client (workflow 3):

response = client.chat.completions.create(
    model="azure/gpt-4.1-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    extra_body={
        "quality_preset": "low",
        "options": {
            "log": ["explanation"],
            "model_provider": {
                "api_base": "<your-api-base>",
                "api_key": "<your-api-key>"
            }
        }
    }
)

Getting Cheaper / Faster Results

The default TLM settings are not latency-optimized because they have to remain effective across all possible LLM use-cases. For your specific use-case, you can greatly improve latency without compromising results. Strategy: first run TLM with default settings to see what results look like over a dataset from your use-case; once results look promising, adjust the TLM preset/options/model to reduce latency for your application.

You can stream in a response from any (fast) LLM you are using, and then use TLMChatCompletion.score to subsequently stream in the trustworthiness score for the response. If you run TLM with a lower quality_preset and cheaper model, then the additional cost/runtime of trustworthiness scoring can be only a fraction of your cost/runtime of producing the response with your own LLM.
Reduce the quality_preset setting (e.g. to "low" or "base:).
Specify options to further reduce TLM runtimes by: changing model to a faster base LLM (e.g. gpt-4.1-nano)

Running on Batches and Managing Rate Limits

When processing large datasets, here are some tips to handle rate limits and implement proper batching strategies:

Prevent hitting rate limits

Process data in small batches (e.g. 10-50 requests at a time)
Add sleep intervals between batches (e.g. time.sleep(1)) to stay under rate limits

Handling errors

Save partial results frequently to avoid losing progress
Consider using a try/except block to catch errors, and implement retry logic when rate limits are hit

Here are some sample helper functions that could help with batching:

import time
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor, as_completed

client = OpenAI(
    api_key="<your-api-key>",  # replace with your Azure OpenAI key
    base_url="http://example.customer.com:8080/api"  # replace with your TLM service URL
)

def invoke_llm_with_retries(openai_kwargs, retries=3, backoff=2):
    attempt = 0
    while attempt <= retries:
        try:
            # the code to invoke the LLM goes here, feel free to modify
            response = client.chat.completions.create(**openai_kwargs)
            return {
                "response": response.choices[0].message.content,
                "trustworthiness_score": response.tlm_metadata["trustworthiness_score"],
                "raw_completion": response
            }
        except Exception as e:
            if attempt == retries:
                return {"error": str(e), "input": openai_kwargs}
            sleep_time = backoff ** attempt
            time.sleep(sleep_time)
            attempt += 1

def run_batch(batch_data, batch_size=20, max_threads=8, sleep_time=5):
    results = []

    for i in tqdm(range(0, len(batch_data), batch_size)):
        data = batch_data[i:i + batch_size]
        batch_results = [None] * len(data)

        with ThreadPoolExecutor(max_workers=max_threads) as executor:
            future_to_idx = {executor.submit(invoke_llm_with_retries, d): idx for idx, d in enumerate(data)}
            for future in as_completed(future_to_idx):
                idx = future_to_idx[future]
                batch_results[idx] = future.result()

        results.extend(batch_results)

        # sleep to prevent hitting rate limits
        if i + batch_size < len(batch_data):
            time.sleep(sleep_time)

    return results

sample_input = {
    "model": "gpt-4.1-mini",
    "messages":[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
}
sample_batch = [sample_input] * 10
run_batch(sample_batch)

More information about handling rate limits can be found in this OpenAI cookbook.

API access to the TLM backend service​

Setup​

Overview of this tutorial​

Workflow 1: Score Responses from Existing LLM Calls​

Workflow 2: Adding a Decorator to your LLM Call​

Workflow 3: Use Cleanlab to Generate and Score Responses​

OpenAI Client​

Adding a decorator to pass in TLM configurations via extra_body​

Azure AI Inference Client​

Input Arguments to TLM​

Using Custom Endpoints​

Getting Cheaper / Faster Results​

Running on Batches and Managing Rate Limits​

Prevent hitting rate limits​

Handling errors​