Skip to main content

Integrate Cleanlab with OpenAI Agents

Run in Google ColabRun in Google Colab

This tutorial demonstrates how to add real-time validation and trustworthiness scoring to AI Agents built with the OpenAI Agents SDK. With minimal changes to your existing OpenAI Agent code, you can detect problematic responses and automatically remediate them in real-time.

Setup

The Python packages required for this tutorial can be installed using pip:

%pip install --upgrade cleanlab-codex openai-agents tavily-python

This tutorial requires a Cleanlab API key. Get one here.

import os
os.environ["CODEX_API_KEY"] = "<Cleanlab Codex API key>" # Get your API key from: https://codex.cleanlab.ai/
os.environ["OPENAI_API_KEY"] = "<OpenAI API key>" # Get API key from: https://platform.openai.com/signup
os.environ["TAVILY_API_KEY"] = "<TAVILY API KEY>" # for using a web search tool (get your free API key from Tavily)
from cleanlab_codex.client import Client
from tavily import TavilyClient

Overview of this tutorial

This tutorial showcases using Cleanlab’s validation hooks to add real-time validation to OpenAI Agents.

We’ll demonstrate five key scenarios:

  1. Conversational Chat Response - Basic agent interaction with validation
  2. Tool Call Response - Agent response using tools with validation
  3. Bad AI Response - How Cleanlab detects and scores problematic responses
  4. Expert Answer Response - A deterministic remediation to problematic responses
  5. Information Retrieval Tool Call Response - Context-aware validation with web search

Create Cleanlab Project

To use the Cleanlab AI Platform for validation, we must first create a Project. Here we assume no (question, answer) pairs have already been added to the Project yet.

User queries where Cleanlab detected a bad response from your AI app will be logged in this Project for SMEs to later answer.

# Create a Cleanlab project
client = Client()

project = client.create_project(
name="OpenAI Agent with Cleanlab Validation Tutorial",
description="Tutorial demonstrating validation of an OpenAI Agent with Cleanlab hooks"
)

Example Use Case: Bank Loan Customer Support

We’ll build a customer support agent for bank loans to demonstrate validation scenarios.

Let’s define tools representing different response quality levels:

  • a good tool that returns reasonable information
  • a bad tool that returns problematic information
  • a web search tool that provides additional context to the Agent

Note: The web search tool follows the example information retrieval function defined in the Strands Web Search tutorial

Optional: Tool definitions for demonstration scenarios
from agents.tool import function_tool

# ============ Good Tool: Returns reasonable information ============
@function_tool
def get_payment_schedule(account_id: str) -> str:
"""Get payment schedule for an account."""
payment_schedule = f"""Account {account_id} has:
Bi-weekly payment plan
Upcoming payment scheduled for next Friday
"""
return payment_schedule

# ============ Bad Tool: Returns problematic information ============
@function_tool
def get_total_amount_owed(account_id: str) -> dict:
"""A tool that simulates fetching the total amount owed for a loan.
**Note:** This tool returns a hardcoded *unrealistic* total amount for demonstration purposes."""
return {
"account_id": account_id,
"currency": "USD",
"total": 7000000000000000000000000000000000000.00,
}

# ============ Web Search Tool: Provides context for the Agent ============
@function_tool
def web_search(
query: str, time_range: str | None = None, include_domains: str | None = None
) -> str:
"""Perform a web search. Returns the search results as a string, with the title, url, and content of each result ranked by relevance.
Args:
query (str): The search query to be sent for the web search.
time_range (str | None, optional): Limits results to content published within a specific timeframe.
Valid values: 'd' (day - 24h), 'w' (week - 7d), 'm' (month - 30d), 'y' (year - 365d).
Defaults to None.
include_domains (list[str] | None, optional): A list of domains to restrict search results to.
Only results from these domains will be returned. Defaults to None.
Returns:
formatted_results (str): The web search results
"""

def format_search_results_for_agent(search_results: list[dict]) -> str:
"""Format search results into a numbered context string for the agent."""
results = search_results["results"]
parts = []
for i, r in enumerate(results, start=1):
title = r.get("title", "").strip()
content = r.get("content", "").strip()
if title or content:
block = (
f"Context {i}:\n"
f"title: {title}\n"
f"content: {content}"
)
parts.append(block)
return "\n\n".join(parts)

client = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))
formatted_results = format_search_results_for_agent(
client.search(
query=query,
max_results=2,
time_range=time_range,
include_domains=include_domains
)
)
return formatted_results

Session Setup and Context

OpenAI Agents support persistent conversation history through sessions. We’ll create a session and context to track our conversation and validation results.

Note: For Cleanlab integration, you can use any pre-defined Context class, just make sure to pass in session_id if you want conversations to be tracked in the UI.

import uuid
from dataclasses import dataclass
from agents.memory.sqlite_session import SQLiteSession

@dataclass
class ExampleContext:
session_id: str | None # Add this for session tracking in Cleanlab UI
environment: str = "development"

# Create session for conversation history
session = SQLiteSession(session_id="cleanlab_loan_support_demo")

# Create your context with session info
example_context = ExampleContext(
session_id=session.session_id, # Get from your session
environment="production"
)

OpenAI Agents Integration

To add response validation to your OpenAI agent, we add a Cleanlab hook that intercepts LLM responses, validates them in real-time, and can replace bad responses with fallbacks or expert answers where appropriate.

Integration steps:

  1. Create your Agent (optionally Session, Context, …)
  2. Create the validation hook with CleanlabHook
  3. Use the hook when running the agent with Runner.run() like so:
    result = await Runner.run(
starting_agent=agent,
input=query,
hooks=cleanlab_hook,
session=session,
context=example_context
)

Context-Aware Validation for Information Retrieval

For agents with tools that retrieve information (e.g., RAG, web search, database queries), Cleanlab can use this retrieved content as context during validation. This enables more accurate evaluation by:

  • Checking if the AI response is grounded in the retrieved information
  • Measuring context sufficiency (whether enough information was retrieved)
  • Detecting hallucinations by comparing the response against actual context

To enable this, specify the names of your context-providing tools in the context_retrieval_tools parameter during hook initialization.

from agents import Agent, Runner

SYSTEM_PROMPT = "You are a customer service agent for bank loans. Be polite and concise in your responses. Always rely on the tool answers."

agent = Agent(
name="BankLoanSupport",
instructions=SYSTEM_PROMPT,
tools=[get_payment_schedule, get_total_amount_owed, web_search],
model="gpt-4o-mini",
)
### New code to add for Cleanlab API ###
from cleanlab_codex.experimental.openai_agents.cleanlab_hook import CleanlabHook

FALLBACK_RESPONSE = "Sorry I am unsure. You can try rephrasing your request."

cleanlab_hook = CleanlabHook(
cleanlab_project=project,
fallback_response=FALLBACK_RESPONSE,
skip_validating_tool_calls=True,
context_retrieval_tools=["get_payment_schedule", "get_total_amount_owed", "web_search"], # Specify tool(s) that provide context
validate_every_response=True
)

Scenario 1: Conversational Chat Response

Let’s start with a basic agent interaction without tools.

The Cleanlab hook validates the response in real-time.

Optional: Helper method to run the agent and print Cleanlab validation results

def display_openai_validation_results(final_output, initial_llm_response, validation_result, query):
"""Helper function to display OpenAI Agent validation results with consistent formatting"""
print("-" * 30)
print("Response Delivered to User:")
print("-" * 30)
print()
print(final_output)
print()
print()
print("=== Internal Trace (not shown to user) ===")
print()

if validation_result:
# Group core detection metrics
should_guardrail = validation_result.should_guardrail
escalated_to_sme = validation_result.escalated_to_sme
is_bad_response = validation_result.is_bad_response
expert_answer_available = bool(validation_result.expert_answer)

print("-" * 30)
if should_guardrail or expert_answer_available:
print(f"Original AI Response (not delivered to user):")
print("-" * 30)
print()
print(initial_llm_response)
print()
else:
print(f"Original AI Response:")
print("-" * 30)
print()
print("[Same as \"Response Delivered to User\"]")
print()

print("-" * 30)
print("Cleanlab Analysis:")
print("-" * 30)
print()

print(f"Should Guardrail: {should_guardrail}")
print(f"Escalated to SME: {escalated_to_sme}")
print(f"Is Bad Response: {is_bad_response}")
print(f"Expert Answer Available: {expert_answer_available}")

# Show evaluation scores if available
if getattr(validation_result, 'eval_scores', None) is not None:
eval_scores = validation_result.eval_scores
print()

# Access trustworthiness score
if 'trustworthiness' in eval_scores:
trust_score = eval_scores['trustworthiness'].score
print(f"Trustworthiness: {trust_score:.3f} (triggered_guardrail = {eval_scores['trustworthiness'].triggered_guardrail})")

# Access response helpfulness score
if 'response_helpfulness' in eval_scores:
help_score = eval_scores['response_helpfulness'].score
print(f"Response Helpfulness: {help_score:.3f} (triggered_guardrail = {eval_scores['response_helpfulness'].triggered_guardrail})")

# Access context sufficiency score (for retrieval scenarios)
if 'context_sufficiency' in eval_scores:
context_score = eval_scores['context_sufficiency'].score
print(f"Context Sufficiency: {context_score:.3f} (triggered_guardrail = {eval_scores['context_sufficiency'].triggered_guardrail})")

# Show expert answer if available
if expert_answer_available:
print()
print("-" * 30)
print("Expert Answer Available:")
print("-" * 30)
print()
print(validation_result.expert_answer)
print()

# Show validation status summary
if should_guardrail or is_bad_response or expert_answer_available:
print()
if expert_answer_available:
print("💡 EXPERT ANSWER AVAILABLE: Expert answer was available and delivered to user instead of Original AI Response")
elif should_guardrail:
print("⚠️ GUARDRAIL TRIGGERED: Original AI Response was blocked and a fallback response was delivered to user")
if escalated_to_sme and (not expert_answer_available) and (not should_guardrail):
print("🔄 ESCALATED: This case was flagged as problematic for subject matter expert review in the Cleanlab Project Interface")
else:
print()
print("✅ VALIDATION PASSED: Original AI Response delivered to user")

else:
print("No validation results available")

def print_last_n_messages(message_history, n=3):
"""Pretty print the last n messages from the conversation history"""
for msg in message_history[-n:]:
type = msg.get('type', 'unknown')
role = msg.get('role', type)
arguments = msg.get('arguments', '')
output = msg.get('output', arguments)
content = str(msg.get('content', '')[0].get('text', ''))[:100] + "..." if len(str(msg.get('content', ''))) > 100 else str(msg.get('content', output))
print(f"- {role}: {content}")

async def run_with_validation(query: str):
"""Run agent and display formatted validation results"""
result = await Runner.run(
starting_agent=agent,
input=query,
hooks=cleanlab_hook,
session=session,
context=example_context
)

validation_result = getattr(example_context, 'latest_cleanlab_validation_result', None)
initial_llm_response = getattr(example_context, 'latest_initial_response_text', None)
display_openai_validation_results(result.final_output, initial_llm_response, validation_result, query)

return result

run_result = await run_with_validation("What is a credit score?")
------------------------------
Response Delivered to User:
------------------------------

A credit score is a numerical representation of your creditworthiness, typically ranging from 300 to 850. It is based on your credit history and helps lenders determine the likelihood that you will repay borrowed money. Factors that influence your credit score include payment history, amounts owed, length of credit history, new credit, and types of credit used. A higher score generally indicates better creditworthiness, which can lead to better loan terms and interest rates.


=== Internal Trace (not shown to user) ===

------------------------------
Original AI Response:
------------------------------

[Same as "Response Delivered to User"]

------------------------------
Cleanlab Analysis:
------------------------------

Should Guardrail: False
Escalated to SME: False
Is Bad Response: False
Expert Answer Available: False

Trustworthiness: 0.937 (triggered_guardrail = False)
Response Helpfulness: 0.998 (triggered_guardrail = False)

✅ VALIDATION PASSED: Original AI Response delivered to user

Without Cleanlab: The agent would deliver its response directly to the user without any validation or safety checks.

With Cleanlab: The above response is automatically validated for trustworthiness and helpfulness before reaching the user. In this case, Cleanlab found the response trustworthy, so it allowed the original response to be delivered to the user.

Scenario 2: Tool Call Response

Now let’s test an agent interaction that uses tools. Cleanlab validation checks both tool usage and the final response.

run_result = await run_with_validation("What is the payment schedule for account ID 12345?")
------------------------------
Response Delivered to User:
------------------------------

The payment schedule for account ID 12345 is as follows:

- **Payment Plan:** Bi-weekly
- **Next Payment:** Scheduled for next Friday.

If you have any other questions, feel free to ask!


=== Internal Trace (not shown to user) ===

------------------------------
Original AI Response:
------------------------------

[Same as "Response Delivered to User"]

------------------------------
Cleanlab Analysis:
------------------------------

Should Guardrail: False
Escalated to SME: False
Is Bad Response: False
Expert Answer Available: False

Trustworthiness: 1.000 (triggered_guardrail = False)
Response Helpfulness: 0.997 (triggered_guardrail = False)

✅ VALIDATION PASSED: Original AI Response delivered to user

Without Cleanlab: The agent would deliver its tool-based response directly to the user without validation.

With Cleanlab: The response is validated even when tools are used. Cleanlab evaluated both the which tools are called and the final response, found them highly trustworthy, and delivered the original response to the user.

After this interaction, we can see the tool calls and response show up in the message history.

# Show conversation history
message_history = await session.get_items()
print(f"\nConversation now has {len(message_history)} messages")
print("\nLast 4 messages:")
print_last_n_messages(message_history, n=4)

Conversation now has 6 messages

Last 4 messages:
- user: What is the payment schedule for account ID 12345?
- function_call: {"account_id":"12345"}
- function_call_output: Account 12345 has:
Bi-weekly payment plan
Upcoming payment scheduled for next Friday

- assistant: The payment schedule for account ID 12345 is as follows:

- **Payment Plan:** Bi-weekly
- **Next Pay...

Scenario 3: Bad AI Response

When an Agent calls an incorrect tool or summarizes problematic information returned from the tool call, Cleanlab automatically:

  1. Detects the problematic response with a low trustworthiness score
  2. Blocks it from reaching the user
  3. Substitutes a safe fallback response
  4. Logs the interaction for expert review

Let’s see this in action:

run_result = await run_with_validation("How much do I owe on my loan for account ID 12345?")
------------------------------
Response Delivered to User:
------------------------------

Sorry I am unsure. You can try rephrasing your request.


=== Internal Trace (not shown to user) ===

------------------------------
Original AI Response (not delivered to user):
------------------------------

You currently owe **$7,000,000,000,000,000,000,000,000,000,000,000,000** on your loan for account ID 12345. If you have any further questions or need assistance, please let me know!

------------------------------
Cleanlab Analysis:
------------------------------

Should Guardrail: True
Escalated to SME: True
Is Bad Response: True
Expert Answer Available: False

Trustworthiness: 0.076 (triggered_guardrail = True)
Response Helpfulness: 0.918 (triggered_guardrail = False)

⚠️ GUARDRAIL TRIGGERED: Original AI Response was blocked and a fallback response was delivered to user

Without Cleanlab: The user would receive the problematic response: “The total amount owed on your loan for account ID 12345 is $7,000,000,000,000,000,000,000,000,000,000,000,000,000…” - clearly an unrealistic and harmful amount that could confuse or alarm the user.

With Cleanlab: Cleanlab’s validation detects the unrealistic amount, assigns a very low trustworthiness score, blocks the problematic response, and instead delivers a configurable fallback response to the user.

After this interaction, we can see the conversation history is updated with the safe fallback response.

# Show updated conversation history
message_history = await session.get_items()
print(f"\nConversation now has {len(message_history)} messages")
print("\nLastest message:")
print_last_n_messages(message_history, n=4)

Conversation now has 10 messages

Lastest message:
- user: How much do I owe on my loan for account ID 12345?
- function_call: {"account_id":"12345"}
- function_call_output: {'account_id': '12345', 'currency': 'USD', 'total': 7e+36}
- assistant: Sorry I am unsure. You can try rephrasing your request....

Scenario 4: Expert Answer Response

After setting up the project in Cleanlab UI, you can add expert answer to common queries that could be deterministically returned to the user instead of the Agent response.

Consider the following user query:

run_result = await run_with_validation("How do I add a payee to my Mortgagelender loan? Give me specific steps.")
------------------------------
Response Delivered to User:
------------------------------

Sorry I am unsure. You can try rephrasing your request.


=== Internal Trace (not shown to user) ===

------------------------------
Original AI Response (not delivered to user):
------------------------------

To add a payee to your Mortgage lender loan, follow these general steps:

1. **Log Into Your Account:**
- Visit your Mortgage lender's website and log into your account using your credentials.

2. **Navigate to Payment Options:**
- Look for a section labeled "Payments" or "Manage Payees."

3. **Select "Add Payee":**
- Click on the option to add a new payee or manage payees.

4. **Enter Payee Information:**
- Input the required details for the new payee, such as name, address, and payment details.

5. **Verify Information:**
- Review the information you've entered to ensure it's accurate.

6. **Save Changes:**
- Click the "Save" or "Add Payee" button to finalize the addition.

7. **Confirmation:**
- You may receive a confirmation message or email indicating that the payee has been successfully added.

If you encounter any issues or need specific guidance for your lender, it's best to contact their customer service directly.

------------------------------
Cleanlab Analysis:
------------------------------

Should Guardrail: True
Escalated to SME: True
Is Bad Response: True
Expert Answer Available: False

Trustworthiness: 0.213 (triggered_guardrail = True)
Response Helpfulness: 0.997 (triggered_guardrail = False)

⚠️ GUARDRAIL TRIGGERED: Original AI Response was blocked and a fallback response was delivered to user

Since we did not give our Agent specific context on how to do this action, the steps outlined in the Original LLM Response are hallucinated.

As expected, the Trustworthiness score is low and the query, answer pair is marked as an Issue in the Web UI.

Consider adding an expert answer for the Question above on the proper steps like:

1. Open the Mortgagelender site or app and sign into your profile.
2. Go to the section where you handle billing or transfer details.
3. Look for an option to set up a new recipient for payments.
4. Fill in the recipient’s required details (name, account info, etc.).
5. Confirm the details and complete the setup.
6. Wait for a notice or email confirming the payee has been linked.

Now, when we re-run the same exact query the expert answer will be used, immediately improving the accuracy of the Agent responses.

run_result = await run_with_validation("How do I add a payee to my Mortgagelender loan? Give me specific steps.")
------------------------------
Response Delivered to User:
------------------------------

1\. Open the Mortgagelender site or app and sign into your profile.

2\. Go to the section where you handle billing or transfer details.

3\. Look for an option to set up a new recipient for payments.

4\. Fill in the recipient’s required details (name, account info, etc.).

5\. Confirm the details and complete the setup.

6\. Wait for a notice or email confirming the payee has been linked.


=== Internal Trace (not shown to user) ===

------------------------------
Original AI Response (not delivered to user):
------------------------------

While I don't have specific instructions for adding a payee to a Mortgagelender loan, typically the process involves the following steps:

1. **Log in to Your Account:** Visit the Mortgagelender website and log into your account.

2. **Navigate to Payments:** Look for a section related to payments or payees.

3. **Add Payee:** There should be an option to add a new payee or manage payees. Click on it.

4. **Input Payee Information:** Enter the required details for the new payee, such as name, address, and account number.

5. **Save Changes:** Review the information and save the changes.

6. **Confirm Addition:** You may receive a confirmation that the payee has been successfully added.

For specific instructions, please refer to Mortgagelender’s customer support or help section on their website. If you need any more assistance, feel free to ask!

------------------------------
Cleanlab Analysis:
------------------------------

Should Guardrail: True
Escalated to SME: True
Is Bad Response: True
Expert Answer Available: True

Trustworthiness: 0.366 (triggered_guardrail = True)
Response Helpfulness: 0.995 (triggered_guardrail = False)

------------------------------
Expert Answer Available:
------------------------------

1\. Open the Mortgagelender site or app and sign into your profile.

2\. Go to the section where you handle billing or transfer details.

3\. Look for an option to set up a new recipient for payments.

4\. Fill in the recipient’s required details (name, account info, etc.).

5\. Confirm the details and complete the setup.

6\. Wait for a notice or email confirming the payee has been linked.


💡 EXPERT ANSWER AVAILABLE: Expert answer was available and delivered to user instead of Original AI Response

Without Cleanlab: The agent would deliver potentially inaccurate step-by-step instructions for adding a payee, which could mislead users or waste their time with incorrect procedures.

With Cleanlab: When no expert answer exists, problematic responses are blocked with fallback messages. When expert answers are available they deterministically replace the agent’s potentially inaccurate response. This incorrect case is tracked and escalated to SMEs.

Scenario 5: Information Retrieval Tool Call Response

Now let’s ask a question that requires our Agent to use web search, which we specified in our context_retrieval_tools list.

To flag answers based on their provided context, go into your Project UI “Evaluations” section, click on the search_failure Eval edit button and toggle the “should escalate” or “should guardrail” sections on the bottom.

What happens with context-aware validation:

  • Tool results are automatically passed to Cleanlab as context
  • Cleanlab can evaluate whether the AI response is grounded in the retrieved information
  • You’ll see a “Retrieved Context” section in the Cleanlab Project UI showing what information was available for validation
run_result = await run_with_validation("What are current mortgage interest rates?")
------------------------------
Response Delivered to User:
------------------------------

As of today, the current average mortgage interest rates are:

- **30-Year Fixed Mortgage Rate:** 6.43%

Rates can fluctuate, so it's always a good idea to check regularly or consult with your lender for the most accurate information. If you have any more questions, feel free to ask!


=== Internal Trace (not shown to user) ===

------------------------------
Original AI Response:
------------------------------

[Same as "Response Delivered to User"]

------------------------------
Cleanlab Analysis:
------------------------------

Should Guardrail: False
Escalated to SME: False
Is Bad Response: False
Expert Answer Available: False

Trustworthiness: 0.953 (triggered_guardrail = False)
Response Helpfulness: 0.997 (triggered_guardrail = False)

✅ VALIDATION PASSED: Original AI Response delivered to user

Context is now automatically extracted from web search tool result and passed to Cleanlab validation, improving evaluation accuracy for information retrieval scenarios.

# Show the last few messages to see web search tool call and context
message_history = await session.get_items()
print(f"\nFinal conversation has {len(message_history)} messages")
print("\nLast 3 messages (showing web search interaction):")
print_last_n_messages(message_history, n=3)

Final conversation has 18 messages

Last 3 messages (showing web search interaction):
- function_call: {"query":"current mortgage interest rates","time_range":"w","include_domains":null}
- function_call_output: Context 1:
title: Mortgage Rates
content: Following several weeks of decline,mortgage rates inched up this week. Housing market activity continues to hold up with purchase and refinance applications

Context 2:
title: Compare current mortgage rates for today
content: For today, Tuesday, September 30, 2025, thecurrent average 30-year fixed mortgage interest rate is 6.43%. If you're looking to refinance your current mortgage
- assistant: As of today, the current average mortgage interest rates are:

- **30-Year Fixed Mortgage Rate:** 6....

How Cleanlab Validation Works

Cleanlab evaluates AI responses across multiple dimensions (trustworthiness, helpfulness, reasoning quality, etc.) and provides scores, guardrail decisions, and expert remediation.

For detailed information on Cleanlab’s validation methodology, see:

Summary

This tutorial demonstrated integrating Cleanlab validation with OpenAI Agents using validation hooks.

Key benefits:

  • Real-time validation during response generation
  • Automatic remediation with expert answers and fallbacks
  • Context-aware validation for retrieval-based agents
  • Session management for persistent conversation history
  • Minimal code changes to existing OpenAI Agent applications

Integration is simple:

  1. Create a CleanlabHook with your project
  2. Specify context_retrieval_tools for better validation
  3. Pass the hook to Runner.run() along with session and context
  4. Access validation results from the context object

The hook-based approach provides enterprise-grade safety with minimal code changes to your existing OpenAI Agent workflows.