Integrate Cleanlab with OpenAI Agents
This tutorial demonstrates how to add real-time validation and trustworthiness scoring to AI Agents built with the OpenAI Agents SDK. With minimal changes to your existing OpenAI Agent code, you can detect problematic responses and automatically remediate them in real-time.
Setup
The Python packages required for this tutorial can be installed using pip:
%pip install --upgrade cleanlab-codex openai-agents tavily-python
This tutorial requires a Cleanlab API key. Get one here.
import os
os.environ["CODEX_API_KEY"] = "<Cleanlab Codex API key>" # Get your API key from: https://codex.cleanlab.ai/
os.environ["OPENAI_API_KEY"] = "<OpenAI API key>" # Get API key from: https://platform.openai.com/signup
os.environ["TAVILY_API_KEY"] = "<TAVILY API KEY>" # for using a web search tool (get your free API key from Tavily)
from cleanlab_codex.client import Client
from tavily import TavilyClient
Overview of this tutorial
This tutorial showcases using Cleanlab’s validation hooks to add real-time validation to OpenAI Agents.
We’ll demonstrate five key scenarios:
- Conversational Chat Response - Basic agent interaction with validation
- Tool Call Response - Agent response using tools with validation
- Bad AI Response - How Cleanlab detects and scores problematic responses
- Expert Answer Response - A deterministic remediation to problematic responses
- Information Retrieval Tool Call Response - Context-aware validation with web search
Create Cleanlab Project
To use the Cleanlab AI Platform for validation, we must first create a Project. Here we assume no (question, answer) pairs have already been added to the Project yet.
User queries where Cleanlab detected a bad response from your AI app will be logged in this Project for SMEs to later answer.
# Create a Cleanlab project
client = Client()
project = client.create_project(
name="OpenAI Agent with Cleanlab Validation Tutorial",
description="Tutorial demonstrating validation of an OpenAI Agent with Cleanlab hooks"
)
Example Use Case: Bank Loan Customer Support
We’ll build a customer support agent for bank loans to demonstrate validation scenarios.
Let’s define tools representing different response quality levels:
- a good tool that returns reasonable information
- a bad tool that returns problematic information
- a web search tool that provides additional context to the Agent
Note: The web search tool follows the example information retrieval function defined in the Strands Web Search tutorial
Optional: Tool definitions for demonstration scenarios
from agents.tool import function_tool
# ============ Good Tool: Returns reasonable information ============
@function_tool
def get_payment_schedule(account_id: str) -> str:
"""Get payment schedule for an account."""
payment_schedule = f"""Account {account_id} has:
Bi-weekly payment plan
Upcoming payment scheduled for next Friday
"""
return payment_schedule
# ============ Bad Tool: Returns problematic information ============
@function_tool
def get_total_amount_owed(account_id: str) -> dict:
"""A tool that simulates fetching the total amount owed for a loan.
**Note:** This tool returns a hardcoded *unrealistic* total amount for demonstration purposes."""
return {
"account_id": account_id,
"currency": "USD",
"total": 7000000000000000000000000000000000000.00,
}
# ============ Web Search Tool: Provides context for the Agent ============
@function_tool
def web_search(
query: str, time_range: str | None = None, include_domains: str | None = None
) -> str:
"""Perform a web search. Returns the search results as a string, with the title, url, and content of each result ranked by relevance.
Args:
query (str): The search query to be sent for the web search.
time_range (str | None, optional): Limits results to content published within a specific timeframe.
Valid values: 'd' (day - 24h), 'w' (week - 7d), 'm' (month - 30d), 'y' (year - 365d).
Defaults to None.
include_domains (list[str] | None, optional): A list of domains to restrict search results to.
Only results from these domains will be returned. Defaults to None.
Returns:
formatted_results (str): The web search results
"""
def format_search_results_for_agent(search_results: list[dict]) -> str:
"""Format search results into a numbered context string for the agent."""
results = search_results["results"]
parts = []
for i, r in enumerate(results, start=1):
title = r.get("title", "").strip()
content = r.get("content", "").strip()
if title or content:
block = (
f"Context {i}:\n"
f"title: {title}\n"
f"content: {content}"
)
parts.append(block)
return "\n\n".join(parts)
client = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))
formatted_results = format_search_results_for_agent(
client.search(
query=query,
max_results=2,
time_range=time_range,
include_domains=include_domains
)
)
return formatted_results
Session Setup and Context
OpenAI Agents support persistent conversation history through sessions. We’ll create a session and context to track our conversation and validation results.
Note: For Cleanlab integration, you can use any pre-defined Context class, just make sure to pass in session_id
if you want conversations to be tracked in the UI.
import uuid
from dataclasses import dataclass
from agents.memory.sqlite_session import SQLiteSession
@dataclass
class ExampleContext:
session_id: str | None # Add this for session tracking in Cleanlab UI
environment: str = "development"
# Create session for conversation history
session = SQLiteSession(session_id="cleanlab_loan_support_demo")
# Create your context with session info
example_context = ExampleContext(
session_id=session.session_id, # Get from your session
environment="production"
)
OpenAI Agents Integration
To add response validation to your OpenAI agent, we add a Cleanlab hook that intercepts LLM responses, validates them in real-time, and can replace bad responses with fallbacks or expert answers where appropriate.
Integration steps:
- Create your Agent (optionally Session, Context, …)
- Create the validation hook with CleanlabHook
- Use the hook when running the agent with
Runner.run()
like so:
result = await Runner.run(
starting_agent=agent,
input=query,
hooks=cleanlab_hook,
session=session,
context=example_context
)
Context-Aware Validation for Information Retrieval
For agents with tools that retrieve information (e.g., RAG, web search, database queries), Cleanlab can use this retrieved content as context during validation. This enables more accurate evaluation by:
- Checking if the AI response is grounded in the retrieved information
- Measuring context sufficiency (whether enough information was retrieved)
- Detecting hallucinations by comparing the response against actual context
To enable this, specify the names of your context-providing tools in the context_retrieval_tools
parameter during hook initialization.
from agents import Agent, Runner
SYSTEM_PROMPT = "You are a customer service agent for bank loans. Be polite and concise in your responses. Always rely on the tool answers."
agent = Agent(
name="BankLoanSupport",
instructions=SYSTEM_PROMPT,
tools=[get_payment_schedule, get_total_amount_owed, web_search],
model="gpt-4o-mini",
)
### New code to add for Cleanlab API ###
from cleanlab_codex.experimental.openai_agents.cleanlab_hook import CleanlabHook
FALLBACK_RESPONSE = "Sorry I am unsure. You can try rephrasing your request."
cleanlab_hook = CleanlabHook(
cleanlab_project=project,
fallback_response=FALLBACK_RESPONSE,
skip_validating_tool_calls=True,
context_retrieval_tools=["get_payment_schedule", "get_total_amount_owed", "web_search"], # Specify tool(s) that provide context
validate_every_response=True
)
Scenario 1: Conversational Chat Response
Let’s start with a basic agent interaction without tools.
The Cleanlab hook validates the response in real-time.
Optional: Helper method to run the agent and print Cleanlab validation results
def display_openai_validation_results(final_output, initial_llm_response, validation_result, query):
"""Helper function to display OpenAI Agent validation results with consistent formatting"""
print("-" * 30)
print("Response Delivered to User:")
print("-" * 30)
print()
print(final_output)
print()
print()
print("=== Internal Trace (not shown to user) ===")
print()
if validation_result:
# Group core detection metrics
should_guardrail = validation_result.should_guardrail
escalated_to_sme = validation_result.escalated_to_sme
is_bad_response = validation_result.is_bad_response
expert_answer_available = bool(validation_result.expert_answer)
print("-" * 30)
if should_guardrail or expert_answer_available:
print(f"Original AI Response (not delivered to user):")
print("-" * 30)
print()
print(initial_llm_response)
print()
else:
print(f"Original AI Response:")
print("-" * 30)
print()
print("[Same as \"Response Delivered to User\"]")
print()
print("-" * 30)
print("Cleanlab Analysis:")
print("-" * 30)
print()
print(f"Should Guardrail: {should_guardrail}")
print(f"Escalated to SME: {escalated_to_sme}")
print(f"Is Bad Response: {is_bad_response}")
print(f"Expert Answer Available: {expert_answer_available}")
# Show evaluation scores if available
if getattr(validation_result, 'eval_scores', None) is not None:
eval_scores = validation_result.eval_scores
print()
# Access trustworthiness score
if 'trustworthiness' in eval_scores:
trust_score = eval_scores['trustworthiness'].score
print(f"Trustworthiness: {trust_score:.3f} (triggered_guardrail = {eval_scores['trustworthiness'].triggered_guardrail})")
# Access response helpfulness score
if 'response_helpfulness' in eval_scores:
help_score = eval_scores['response_helpfulness'].score
print(f"Response Helpfulness: {help_score:.3f} (triggered_guardrail = {eval_scores['response_helpfulness'].triggered_guardrail})")
# Access context sufficiency score (for retrieval scenarios)
if 'context_sufficiency' in eval_scores:
context_score = eval_scores['context_sufficiency'].score
print(f"Context Sufficiency: {context_score:.3f} (triggered_guardrail = {eval_scores['context_sufficiency'].triggered_guardrail})")
# Show expert answer if available
if expert_answer_available:
print()
print("-" * 30)
print("Expert Answer Available:")
print("-" * 30)
print()
print(validation_result.expert_answer)
print()
# Show validation status summary
if should_guardrail or is_bad_response or expert_answer_available:
print()
if expert_answer_available:
print("💡 EXPERT ANSWER AVAILABLE: Expert answer was available and delivered to user instead of Original AI Response")
elif should_guardrail:
print("⚠️ GUARDRAIL TRIGGERED: Original AI Response was blocked and a fallback response was delivered to user")
if escalated_to_sme and (not expert_answer_available) and (not should_guardrail):
print("🔄 ESCALATED: This case was flagged as problematic for subject matter expert review in the Cleanlab Project Interface")
else:
print()
print("✅ VALIDATION PASSED: Original AI Response delivered to user")
else:
print("No validation results available")
def print_last_n_messages(message_history, n=3):
"""Pretty print the last n messages from the conversation history"""
for msg in message_history[-n:]:
type = msg.get('type', 'unknown')
role = msg.get('role', type)
arguments = msg.get('arguments', '')
output = msg.get('output', arguments)
content = str(msg.get('content', '')[0].get('text', ''))[:100] + "..." if len(str(msg.get('content', ''))) > 100 else str(msg.get('content', output))
print(f"- {role}: {content}")
async def run_with_validation(query: str):
"""Run agent and display formatted validation results"""
result = await Runner.run(
starting_agent=agent,
input=query,
hooks=cleanlab_hook,
session=session,
context=example_context
)
validation_result = getattr(example_context, 'latest_cleanlab_validation_result', None)
initial_llm_response = getattr(example_context, 'latest_initial_response_text', None)
display_openai_validation_results(result.final_output, initial_llm_response, validation_result, query)
return result
run_result = await run_with_validation("What is a credit score?")
Without Cleanlab: The agent would deliver its response directly to the user without any validation or safety checks.
With Cleanlab: The above response is automatically validated for trustworthiness and helpfulness before reaching the user. In this case, Cleanlab found the response trustworthy, so it allowed the original response to be delivered to the user.
Scenario 2: Tool Call Response
Now let’s test an agent interaction that uses tools. Cleanlab validation checks both tool usage and the final response.
run_result = await run_with_validation("What is the payment schedule for account ID 12345?")
Without Cleanlab: The agent would deliver its tool-based response directly to the user without validation.
With Cleanlab: The response is validated even when tools are used. Cleanlab evaluated both the which tools are called and the final response, found them highly trustworthy, and delivered the original response to the user.
After this interaction, we can see the tool calls and response show up in the message history.
# Show conversation history
message_history = await session.get_items()
print(f"\nConversation now has {len(message_history)} messages")
print("\nLast 4 messages:")
print_last_n_messages(message_history, n=4)
Scenario 3: Bad AI Response
When an Agent calls an incorrect tool or summarizes problematic information returned from the tool call, Cleanlab automatically:
- Detects the problematic response with a low trustworthiness score
- Blocks it from reaching the user
- Substitutes a safe fallback response
- Logs the interaction for expert review
Let’s see this in action:
run_result = await run_with_validation("How much do I owe on my loan for account ID 12345?")
Without Cleanlab: The user would receive the problematic response: “The total amount owed on your loan for account ID 12345 is $7,000,000,000,000,000,000,000,000,000,000,000,000,000…” - clearly an unrealistic and harmful amount that could confuse or alarm the user.
With Cleanlab: Cleanlab’s validation detects the unrealistic amount, assigns a very low trustworthiness score, blocks the problematic response, and instead delivers a configurable fallback response to the user.
After this interaction, we can see the conversation history is updated with the safe fallback response.
# Show updated conversation history
message_history = await session.get_items()
print(f"\nConversation now has {len(message_history)} messages")
print("\nLastest message:")
print_last_n_messages(message_history, n=4)
Scenario 4: Expert Answer Response
After setting up the project in Cleanlab UI, you can add expert answer to common queries that could be deterministically returned to the user instead of the Agent response.
Consider the following user query:
run_result = await run_with_validation("How do I add a payee to my Mortgagelender loan? Give me specific steps.")
Since we did not give our Agent specific context on how to do this action, the steps outlined in the Original LLM Response are hallucinated.
As expected, the Trustworthiness score is low and the query, answer pair is marked as an Issue in the Web UI.
Consider adding an expert answer for the Question above on the proper steps like:
1. Open the Mortgagelender site or app and sign into your profile.
2. Go to the section where you handle billing or transfer details.
3. Look for an option to set up a new recipient for payments.
4. Fill in the recipient’s required details (name, account info, etc.).
5. Confirm the details and complete the setup.
6. Wait for a notice or email confirming the payee has been linked.
Now, when we re-run the same exact query the expert answer will be used, immediately improving the accuracy of the Agent responses.
run_result = await run_with_validation("How do I add a payee to my Mortgagelender loan? Give me specific steps.")
Without Cleanlab: The agent would deliver potentially inaccurate step-by-step instructions for adding a payee, which could mislead users or waste their time with incorrect procedures.
With Cleanlab: When no expert answer exists, problematic responses are blocked with fallback messages. When expert answers are available they deterministically replace the agent’s potentially inaccurate response. This incorrect case is tracked and escalated to SMEs.
Scenario 5: Information Retrieval Tool Call Response
Now let’s ask a question that requires our Agent to use web search, which we specified in our context_retrieval_tools
list.
To flag answers based on their provided context, go into your Project UI “Evaluations” section, click on the search_failure
Eval edit button and toggle the “should escalate” or “should guardrail” sections on the bottom.
What happens with context-aware validation:
- Tool results are automatically passed to Cleanlab as context
- Cleanlab can evaluate whether the AI response is grounded in the retrieved information
- You’ll see a “Retrieved Context” section in the Cleanlab Project UI showing what information was available for validation
run_result = await run_with_validation("What are current mortgage interest rates?")
Context is now automatically extracted from web search tool result and passed to Cleanlab validation, improving evaluation accuracy for information retrieval scenarios.
# Show the last few messages to see web search tool call and context
message_history = await session.get_items()
print(f"\nFinal conversation has {len(message_history)} messages")
print("\nLast 3 messages (showing web search interaction):")
print_last_n_messages(message_history, n=3)
How Cleanlab Validation Works
Cleanlab evaluates AI responses across multiple dimensions (trustworthiness, helpfulness, reasoning quality, etc.) and provides scores, guardrail decisions, and expert remediation.
For detailed information on Cleanlab’s validation methodology, see:
Summary
This tutorial demonstrated integrating Cleanlab validation with OpenAI Agents using validation hooks.
Key benefits:
- Real-time validation during response generation
- Automatic remediation with expert answers and fallbacks
- Context-aware validation for retrieval-based agents
- Session management for persistent conversation history
- Minimal code changes to existing OpenAI Agent applications
Integration is simple:
- Create a
CleanlabHook
with your project - Specify
context_retrieval_tools
for better validation - Pass the hook to
Runner.run()
along with session and context - Access validation results from the context object
The hook-based approach provides enterprise-grade safety with minimal code changes to your existing OpenAI Agent workflows.