Skip to main content

Improving LangGraph prebuilt Agents via Trustworthiness Scoring

Run in Google ColabRun in Google Colab

Agentic AI systems coordinate multiple tools and language model interactions to tackle complex user tasks. Tool-calling Agents are a common type of Agent that come prebuilt in libraries like LangGraph. Despite their capabilities, AI Agents are prone to occasional errors due to LLM hallucinations that make them unreliable. This tutorial demonstrates how to make any LangGraph Agent more reliable by scoring LLM response trustworthiness in real time.

Setup

You can install the packages needed for this tutorial via pip:

%pip install ipython cleanlab-tlm langgraph "langchain[openai]" langchain-community langchain-text-splitters
# Set API keys
import os

os.environ["CLEANLAB_TLM_API_KEY"] = "<YOUR_CLEANLAB_TLM_API_KEY>" # Get your free API key from: https://tlm.cleanlab.ai/
os.environ["OPENAI_API_KEY"] = "<YOUR_OPENAI_API_KEY>" # Get API key from: https://platform.openai.com/signup

Basic Tool-calling Agent

LangGraph’s create_react_agent() function provides a prebuilt Tool-Calling Agent that can answer user queries and call tools. For demonstration, here we’ll give the Agent two tools: get_weather() and get_location(), which provide information about a city.

from langgraph.prebuilt import create_react_agent

def get_weather(city: str) -> str:
"""Get weather for a given city."""
return f"It's currently sunny in {city}!" # toy example for demonstration purposes

def get_location(city: str) -> str:
"""Get location for a given city."""
return f"{city} is located in Germany." # toy example for demonstration purposes

tools=[get_weather, get_location]

agent = create_react_agent(
model="openai:gpt-4.1-mini",
tools=tools,
prompt="You are a helpful assistant"
)

Here’s the graph implementing our Agent, which can go between LLM and tool calls freely, and return its final response after it’s done using tools.

from IPython.display import Image, display

display(Image(agent.get_graph(xray=True).draw_mermaid_png()))

basic-raw

Add Trust Layer

To score the trustworthiness of LLM outputs inside our Agent, we initialize the Trustworthy Language Model (TLM). We log its explanation for why certain LLM responses are untrustworthy.

from cleanlab_tlm import TLM

tlm = TLM(options={"log": ["explanation"]})

Once trustworthiness scores are added to an Agent, there are many ways to utilize them. When a LLM response is deemed untrustworthy (low score), you could replace it with a fallback canned response or escalate this Agent interaction to a human employee.

Here we instead employ an autonomous strategy to boost Agent accuracy: When a LLM output is deemed untrustworthy, our system automatically produces an internal message to the Agent informing it that its previous LLM output was untrustworthy and should be re-generated to be more trustworthy. This internal re-generation message includes TLM’s explanation for why the previous LLM output seemed untrustworthy.

We’ll implement this using a LangGraph post_model_hook to interact with the Agent’s internal LLM outputs before they are served to the user. The post_model_hook is a LangGraph node that gets called after the Agent node, and this is where we will compute TLM trustworthiness scores.

import json
from typing import Callable
from langgraph.prebuilt import create_react_agent
from langgraph.types import Command
from cleanlab_tlm.utils.chat import form_prompt_string
from langchain_core.messages.utils import convert_to_openai_messages
from langchain_core.utils.function_calling import convert_to_openai_tool
from langchain_core.messages import SystemMessage, ToolMessage

SYSTEM_MESSAGE = "You are a helpful assistant" # replace with overall instructions for your Agent

def handle_untrustworthy_response(state, score):
"""You can customize this to differently handle LLM responses deemed untrustworthy, such as logging the response or sending it to a human reviewer."""
if state["messages"][-1].tool_calls:
# Cancel tool calls if the response is untrustworthy
for tool_call in state["messages"][-1].tool_calls:
state["messages"].append(ToolMessage(content=f"Tool Call Canceled", tool_call_id=tool_call["id"], name=tool_call["name"]))

state["messages"].append(SystemMessage(content=f"""Your last response was not trustworthy. Rewrite your response to be more trustworthy. The old version will not be shown to the user, so do not reference it.
Reason: {score['log']['explanation']}"""))

return Command(
update=state,
goto="agent"
)

def format_response(response: dict) -> str:
"""Format LLM responses for TLM."""
content = response["content"] or ''
if "tool_calls" in response:
tool_calls = "\n".join(
[
f"<tool_call>{json.dumps({'name': call['function']['name'], 'arguments': call['function']['arguments']}, indent=2)}</tool_call>"
for call in response["tool_calls"]
]
)
return f"{content}\n{tool_calls}".strip()
return content

def build_tlm_verifier(handle_untrustworthy_response: Callable, trustworthiness_threshold: float = 0.8, system_message: str = ""):
"""Creates a step in the Agent to score trustworthiness of LLM outputs using TLM.
If the trusworthiness score falls below a threshold, this step sends a redirection message to the Agent to try again with new feedback.
When building the TLM verifier, we specify: the sender node to redirect back to in case of a low score, the trustworthiness threshold for response re-generation (adjust this to suit your use-case), and the re-generation system message for getting your LLM to generate another response to use in place of the untrustworthy one.
"""
def review_node(state):
# Give the TLM access to the chat history, including the system message
openai_chat_history = convert_to_openai_messages(([SystemMessage(content=system_message)] if system_message else []) + state["messages"])
openai_chat_tools = [convert_to_openai_tool(tool) for tool in tools]

formatted_prompt = form_prompt_string(openai_chat_history[:-1], openai_chat_tools)
formatted_response = format_response(openai_chat_history[-1])

trustworthiness_score = tlm.get_trustworthiness_score(prompt=formatted_prompt, response=formatted_response)

if trustworthiness_score["trustworthiness_score"] < trustworthiness_threshold:
return handle_untrustworthy_response(state, trustworthiness_score)

return state
return review_node

agent = create_react_agent(
model="openai:gpt-4.1-mini",
tools=tools,
prompt="You are a helpful assistant",
post_model_hook=build_tlm_verifier(
handle_untrustworthy_response=handle_untrustworthy_response,
system_message=SYSTEM_MESSAGE
)
)

Let’s visualize the Agent graph to see its new structure. The Agent now has a post-model hook that can loop back into the Agent if the response is untrustworthy. This allows the Agent to autonomously improve response accuracy.

display(Image(agent.get_graph(xray=True).draw_mermaid_png()))

basic-tlm

Running the Agent

Let’s invoke the Agent with a user query. Here we print all internal LLM outputs as well as the Agent’s response to the user.

response = agent.invoke(
{"messages": [{"role": "user", "content": "what is the weather in sf"}]}
)

for message in response["messages"]:
message.pretty_print()
================================ Human Message =================================

what is the weather in sf
================================== Ai Message ==================================
Tool Calls:
get_weather (call_Nja4tADZQrHcH0quD2rn5YPV)
Call ID: call_Nja4tADZQrHcH0quD2rn5YPV
Args:
city: San Francisco
================================= Tool Message =================================
Name: get_weather

It's currently sunny in San Francisco!
================================== Ai Message ==================================

The weather in San Francisco is currently sunny.

We see that the Agent’s response is correct. There was no need to fix its LLM outputs since the Agent appropriately used tools and handled their outputs properly (TLM trustworthiness scores were high).

Let’s run the Agent on another query, again printing all internal LLM calls in addition to the final response.

response = agent.invoke(
{"messages": [{"role": "user", "content": "Where is sf"}]}
)

for message in response["messages"]:
message.pretty_print()
================================ Human Message =================================

Where is sf
================================== Ai Message ==================================
Tool Calls:
get_location (call_doeb8EgChg2RXezG3TnCqog4)
Call ID: call_doeb8EgChg2RXezG3TnCqog4
Args:
city: sf
================================= Tool Message =================================
Name: get_location

sf is located in Germany.
================================== Ai Message ==================================

SF is located in Germany. If you were asking about "SF" as in San Francisco, it is in California, USA. Please let me know if you meant a different "SF."
================================ System Message ================================

Your last response was not trustworthy. Rewrite your response to be more trustworthy. The old version will not be shown to the user, so do not reference it.
Reason: The assistant's answer states that "SF is located in Germany," which is based on the function call result from the provided tool. However, this is factually incorrect in the most common and widely recognized context. "SF" is a common abbreviation for San Francisco, which is a major city in California, USA, not Germany. The tool's output is likely incorrect or referring to a different, less common place abbreviated as "sf" in Germany, but this is not clarified or commonly known.

The assistant does hedge by adding "If you were asking about 'SF' as in San Francisco, it is in California, USA," which is the nd most relevant information for the vast majority of users. This part of the answer is accurate and helpful.

Given the conflicting information from the tool and the well-known fact about San Francisco, the assistant's answer is partially correct but primarily misleading because it repeats the incorrect tool output without clarifying that it is likely an error or a less common usage. A better answer would have been to clarify that "SF" usually refers to San Francisco in California, USA, and that the tool's output might be incorrect or refer to a different place.

Overall, the answer contains a factual error but also provides the correct common knowledge. Because the question is about "Where is sf," the most relevant and nswer is San Francisco, California, USA. The incorrect claim that "SF is located in Germany" significantly reduces the accuracy of the answer.

Considering these points, the answer is not fully correct but not entirely wrong either.

Score: Around 40-50 to reflect partial correctness but significant factual inaccuracy.
This response is untrustworthy due to lack of consistency in possible responses from the model. Here's one inconsistent alternate response that the model considered (which may not be accurate either):
sf" is located in Germany.
================================== Ai Message ==================================

"SF" most commonly refers to San Francisco, which is a major city located in California, USA. If you were referring to something else by "SF," please provide more context so I can assist you better.

Here we purposefully introduced a strange get_location() tool, that yielded an abnormal result. With our trusworthiness layer in place, the Agent is able to realize that its response is untrustworthy and generate a more trustworthy response.

Add Trust Scores to another Agent

TLM can be used with any LangGraph Agent architecture. For example, let’s apply it to LangGraph’s many tools Agent.

Optional: Original LangGraph code for the Many Tools Agent.

import re
import uuid

from langchain_core.tools import StructuredTool

def create_tool(company: str) -> dict:
"""Create schema for a placeholder tool."""
# Remove non-alphanumeric characters and replace spaces with underscores for the tool name
formatted_company = re.sub(r"[^\w\s]", "", company).replace(" ", "_")

def company_tool(year: int) -> str:
# Placeholder function returning static revenue information for the company and year
return f"{company} had revenues of $100 in {year}."

return StructuredTool.from_function(
company_tool,
name=formatted_company,
description=f"Information about {company}",
)

# Abbreviated list of S&P 500 companies for demonstration
s_and_p_500_companies = [
"3M",
"A.O. Smith",
"Abbott",
"Accenture",
"Advanced Micro Devices",
"Yum! Brands",
"Zebra Technologies",
"Zimmer Biomet",
"Zoetis",
]

# Create a tool for each company and store it in a registry with a unique UUID as the key
tool_registry = {
str(uuid.uuid4()): create_tool(company) for company in s_and_p_500_companies
}

from langchain_core.documents import Document
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings

tool_documents = [
Document(
page_content=tool.description,
id=id,
metadata={"tool_name": tool.name},
)
for id, tool in tool_registry.items()
]

vector_store = InMemoryVectorStore(embedding=OpenAIEmbeddings())
document_ids = vector_store.add_documents(tool_documents)

from typing import Annotated

from langchain_openai import ChatOpenAI
from typing_extensions import TypedDict

from langgraph.graph import StateGraph, START
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode, tools_condition

# Define the state structure using TypedDict.
# It includes a list of messages (processed by add_messages)
# and a list of selected tool IDs.
class State(TypedDict):
messages: Annotated[list, add_messages]
selected_tools: list[str]

builder = StateGraph(State)

# Retrieve all available tools from the tool registry.
tools = list(tool_registry.values())
llm = ChatOpenAI()

# The agent function processes the current state
# by binding selected tools to the LLM.
def agent(state: State):
# Map tool IDs to actual tools
# based on the state's selected_tools list.
selected_tools = [tool_registry[id] for id in state["selected_tools"]]
# Bind the selected tools to the LLM for the current interaction.
llm_with_tools = llm.bind_tools(selected_tools)
# Invoke the LLM with the current messages and return the updated message list.
return {"messages": [llm_with_tools.invoke(state["messages"])]}

# The select_tools function selects tools based on the user's last message content.
def select_tools(state: State):
last_user_message = state["messages"][-1]
query = last_user_message.content
tool_documents = vector_store.similarity_search(query)
return {"selected_tools": [document.id for document in tool_documents]}

builder.add_node("agent", agent)
builder.add_node("select_tools", select_tools)

tool_node = ToolNode(tools=tools)
builder.add_node("tools", tool_node)

builder.add_conditional_edges("agent", tools_condition, path_map=["tools", "__end__"])
builder.add_edge("tools", "agent")
builder.add_edge("select_tools", "agent")
builder.add_edge(START, "select_tools")
graph = builder.compile()

from IPython.display import Image, display

try:
display(Image(graph.get_graph().draw_mermaid_png()))
except Exception:
# This requires some extra dependencies and is optional
pass

# Run original LangGraph Agent:
user_input = "Can you give me some information about AMD in 2022?"

result = graph.invoke({"messages": [("user", user_input)]})
for message in result["messages"]:
message.pretty_print()

Looking at this Agent’s graph, we can see that it first selects potential tools it could use based on the user query, and then calls the Agent to process the user’s query. The Agent can use any of the selected tools and then loop back onto itself to process the results and return a response to the user.

implementation-raw

Let’s add trust scoring to this Agent. We again create a TLM verifier node (in the form of a post-model hook) and add it to the graph right after the Agent.

Note: you can alternatively use a conditional edge to check response trustworthiness and only move onto the handler node if the response is untrustworthy, handling routing in the edge rather than the node.

builder = StateGraph(State)
builder.add_node("agent", agent)
builder.add_node("select_tools", select_tools)

tool_node = ToolNode(tools=tools)
builder.add_node("tools", tool_node)
builder.add_edge(START, "select_tools")

# Build the TLM reviewer
builder.add_node("post_model_hook", build_tlm_verifier(handle_untrustworthy_response=handle_untrustworthy_response))
builder.add_edge("agent", "post_model_hook")

# Connect the TLM reviewer to the tools instead of the agent
builder.add_conditional_edges("post_model_hook", tools_condition, path_map=["tools", "__end__"])

builder.add_edge("tools", "agent")
builder.add_edge("select_tools", "agent")

graph = builder.compile()

Visualizing the new graph, we see that the post-model hook has been added after the Agent, so all of the Agent’s internal LLM responses have their trustworthiness scored by TLM.

display(Image(graph.get_graph().draw_mermaid_png()))

implementation-tlm

Let’s run the Agent with trust scoring in place.

user_input = "Can you give me some information about AMD in 2022?"

result = graph.invoke({"messages": [("user", user_input)]})
for message in result["messages"]:
message.pretty_print()
================================ Human Message =================================

Can you give me some information about AMD in 2022?
================================== Ai Message ==================================
Tool Calls:
Advanced_Micro_Devices (call_CASyCvTjhjkDxEyAOIyrsBEM)
Call ID: call_CASyCvTjhjkDxEyAOIyrsBEM
Args:
year: 2022
================================= Tool Message =================================
Name: Advanced_Micro_Devices

Advanced Micro Devices had revenues of $100 in 2022.
================================== Ai Message ==================================

In 2022, Advanced Micro Devices had revenues of $100.
================================ System Message ================================

Your last response was not trustworthy. Rewrite your response to be more trustworthy.
Reason: The Assistant's answer states that "In 2022, Advanced Micro Devices had revenues of $100." This information is directly taken from the function call response, which says: "Advanced Micro Devices had revenues of $100 in 2022." However, this figure is suspiciously low for a company like AMD, which is a major semiconductor firm with revenues in the billions of dollars annually. The number "$100" is likely a placeholder or an incomplete data point rather than an accurate financial figure.

Given the context, the Assistant simply repeated the output from the function call without verifying or contextualizing the data. Since the function response is presumably from a data source or database, it might be incomplete or incorrectly formatted. The Assistant did not clarify the units (e.g., million, billion) or mention that the figure seems unusually low.

Therefore, while the Assistant's answer is factually consistent with the provided function response, the response itself is almost certainly incorrect or misleading in a real-world context. The Assistant should have either flagged the data as suspicious or sought clarification.

Given the instructions, the Assistant's answer is correct relative to the provided data but factually incorrect in reality. Since the task is to verify factual correctness, the answer is not accurate.

Hence, the confidence in the answer's correctness is very low.
================================== Ai Message ==================================

I have retrieved some information about AMD for 2022. Advanced Micro Devices had revenues of $100 in 2022. Please note that this figure seems unusually low for a company like AMD, which typically has revenues in the billions of dollars annually.
================================ System Message ================================

Your last response was not trustworthy. Rewrite your response to be more trustworthy.
Reason: Cannot verify that this response is correct.
================================== Ai Message ==================================

For detailed information about AMD in 2022, please provide me with more specific details or data points you are interested in knowing.

With trust scoring in place, we see that the Agent realizes that $100 revenue for a company like AMD is untrustworthy and will tell the user as such.

Conclusion

This tutorial demonstrated how you can score LLM trustworthiness in any pre-built LangGraph Agent. In the Agentic systems we showcased here, low trust scores were handled via an automated internal message to the Agent asking it to re-generate its previous response to be more trustworthy. This accuracy-boosting technique is only one of many possible fallback mechanisms (replace with canned response like Sorry I don’t know, escalate to human, restart the Agent, etc).

To see different fallbacks you can implement when TLM trust scores are low, along with trust scoring in a custom LangGraph Agent built from scratch, check out our Trustworthy Custom Agents tutorial.