Using TLM with OpenAI’s Responses API

Run in Google Colab

This tutorial demonstrates how to score the trustworthiness of responses from the OpenAI Responses API. With minimal changes to your existing Responses API code, you can score the trustworthiness of every LLM response in real-time, even when relying on OpenAI tools like function calling, web search, and file search.

Setup

The Python packages required for this tutorial can be installed using pip:

%pip install --upgrade --quiet cleanlab-tlm openai trafilatura

This tutorial requires a TLM API key. Get one here.

import os
os.environ["CLEANLAB_TLM_API_KEY"] = "<Cleanlab TLM API key>"  # Get your free API key from: https://tlm.cleanlab.ai/
os.environ["OPENAI_API_KEY"] = "<OpenAI API key>"  # for using OpenAI client library

Let’s first initialize clients.

from openai import OpenAI
from cleanlab_tlm.utils.responses import TLMResponses

client = OpenAI()
tlm = TLMResponses(options={"log": ["explanation"]})

Usage

We’ll showcase different OpenAI Responses API workflows, and how you can score the trustworthiness of results in each workflow.

Workflow 1: Single Turn Q&A

Here is the standard OpenAI Responses code you’d write to call the LLM with a prompt and get a response.

openai_kwargs = dict(
  model = "gpt-4.1-mini",
  input = "What is the capital of France?",
)

response = client.responses.create(**openai_kwargs)

print("Response:", next(message for message in response.output if message.type == "message").content[0].text)

Response: The capital of France is Paris.

Score the trustworthiness of this response using the TLMResponse score() method, passing in the OpenAI keyword arguments that you had passed in the OpenAI Responses call that generated this response.

tlm_result = tlm.score(response=response, **openai_kwargs)

print(f"TLM Score: {tlm_result['trustworthiness_score']:.4f}")
print(f"TLM Explanation: {tlm_result['log']['explanation']}")

TLM Score: 0.9990
TLM Explanation: Did not find a reason to doubt trustworthiness.

Workflow 2: Multi-Turn Chat

In the following example, the user first asks the LLM about FIFA finalists and follows up with a question about the Golden Ball winner in the same chat. Here we manage the messages variables to track the conversation history. Again, we can get trust scores for every LLM response, simply by passing in the same arguments to TLMResponses that were passed to OpenAI to generate that response.

print("Turn one:")

messages = [
  {
    "role": "user",
    "content": "Who were the finalists in 2022 FIFA World Cup?"
  }
]

openai_kwargs = dict(
    model = "gpt-4.1-mini",
    input = messages,
)

response = client.responses.create(**openai_kwargs)

text_response = next(message for message in response.output if message.type == "message").content[0].text
print("Response:", text_response)

## Extra Cleanlab code ##
tlm_result = tlm.score(response=response, **openai_kwargs)
print(f"TLM Score: {tlm_result['trustworthiness_score']:.4f}")
print(f"TLM Explanation: {tlm_result['log']['explanation']}")
## End of extra Cleanlab code ##

print("\n\nTurn two:")

messages.append({
  "role": "assistant",
  "content": text_response
})
messages.append({
  "role": "user",
  "content": "Who won Golden Ball?"
})

openai_kwargs = dict(
    model = "gpt-4.1-mini",
    input = messages,
)

response = client.responses.create(**openai_kwargs)

print("Response:", next(message for message in response.output if message.type == "message").content[0].text)

## Extra Cleanlab code ##
tlm_result = tlm.score(response=response, **openai_kwargs)
print(f"TLM Score: {tlm_result['trustworthiness_score']:.4f}")
print(f"TLM Explanation: {tlm_result['log']['explanation']}")
## End of extra Cleanlab code ##

Turn one:
Response: The finalists in the 2022 FIFA World Cup were Argentina and France.
TLM Score: 0.9907
TLM Explanation: Did not find a reason to doubt trustworthiness.


Turn two:
Response: The Golden Ball award at the 2022 FIFA World Cup was won by Lionel Messi of Argentina.
TLM Score: 0.9904
TLM Explanation: Did not find a reason to doubt trustworthiness.

Workflow 3: Including Web Search Tool

The OpenAI Responses API provides LLMs access to a native web_search tool. Here, we will force the usage of the web search tool, demonstrating how scoring the trustworthiness of web-search powered LLM responses can be achieved with the same TLM code as before. Note that when trust-scoring a response that uses web search, you will need to install the trafilatura package to analyze the content of web pages.

openai_kwargs = dict(
    model = "gpt-4.1-mini",
    input = "Who wrote pride and prejudice?",
    tools = [{"type": "web_search"}],
    tool_choice = {"type": "web_search"},
)

response = client.responses.create(**openai_kwargs)

print("Response Text:", next(message for message in response.output if message.type == "message").content[0].text)
print("\nResponse Object:", response)

Response Text: "Pride and Prejudice" is a novel written by Jane Austen, first published in 1813. Austen, an English author, is renowned for her keen observations of social manners and relationships in the early 19th century. "Pride and Prejudice" is considered one of her most significant works, exploring themes of love, class, and societal expectations. ([britannica.com](https://www.britannica.com/topic/Pride-and-Prejudice?utm_source=openai)) 

Response Object: Response(id='resp_096511bf77ab1d520068ddae4afad4819faa3789b70c62c481', created_at=1759358539.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-4.1-mini-2025-04-14', object='response', output=[ResponseFunctionWebSearch(id='ws_096511bf77ab1d520068ddae4b080c819fbfa935065492e28a', action=ActionSearch(query='Who wrote pride and prejudice?', type='search', sources=None), status='completed', type='web_search_call'), ResponseOutputMessage(id='msg_096511bf77ab1d520068ddae4c4818819fb0be010b8fcf2983', content=[ResponseOutputText(annotations=[AnnotationURLCitation(end_index=431, start_index=341, title='Pride and Prejudice | Summary, Characters, Author, Book, Movie, Quotes, & Facts | Britannica', type='url_citation', url='https://www.britannica.com/topic/Pride-and-Prejudice?utm_source=openai')], text='"Pride and Prejudice" is a novel written by Jane Austen, first published in 1813. Austen, an English author, is renowned for her keen observations of social manners and relationships in the early 19th century. "Pride and Prejudice" is considered one of her most significant works, exploring themes of love, class, and societal expectations. ([britannica.com](https://www.britannica.com/topic/Pride-and-Prejudice?utm_source=openai)) ', type='output_text', logprobs=[])], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice=ToolChoiceTypes(type='web_search_preview'), tools=[WebSearchTool(type='web_search', filters=None, search_context_size='medium', user_location=UserLocation(city=None, country='US', region=None, timezone=None, type='approximate'))], top_p=1.0, background=False, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=Reasoning(effort=None, generate_summary=None, summary=None), safety_identifier=None, service_tier='auto', status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text'), verbosity='medium'), top_logprobs=0, truncation='disabled', usage=ResponseUsage(input_tokens=8498, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=100, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=8598), user=None, billing={'payer': 'developer'}, store=True, tlm_metadata={'trustworthiness_score': 0.999, 'log': {'explanation': 'Did not find a reason to doubt trustworthiness.'}})

tlm_response = tlm.score(response=response, **openai_kwargs)

print(f"TLM Score: {tlm_result['trustworthiness_score']:.4f}")
print(f"TLM Explanation: {tlm_result['log']['explanation']}")

TLM Score: 0.9904
TLM Explanation: Did not find a reason to doubt trustworthiness.

Workflow 4: Including File Search Tool (RAG)

OpenAI Responses can also run RAG, retrieving relevant data from a vector-store that is considered by the LLM when generating its response. This involves using the native file_search tool to find relevant documents and passages that can inform the response.

Let’s download a sample PDF file containing OpenAI prices and upload it to an OpenAI vector store. This way, we can ask questions about OpenAI pricing using the file_search tool.

import requests

url = "https://storage.googleapis.com/files-hosting/openai-pricing.pdf"
pdf_path = "openai-pricing.pdf"

response = requests.get(url)
with open(pdf_path, "wb") as f:
  f.write(response.content)

print(f"Downloaded PDF to {pdf_path}")

file = client.files.create(file=open("openai-pricing.pdf", "rb"), purpose="user_data")
vector_store = client.vector_stores.create(name="knowledge_base")
client.vector_stores.files.create_and_poll(vector_store_id=vector_store.id, file_id=file.id)

print("Created vector store with ID:", vector_store.id)

Downloaded PDF to openai-pricing.pdf
Created vector store with ID: vs_68ddad3a883c8191b5586b074b349ac5

Now, we’re ready to send a test message to OpenAI Responses. When you do this, you must have "include": ["file_search_call.results"] in your request payload to properly score the file search.

openai_kwargs = {
  "model": "gpt-4.1-mini",
  "input": "How much does GPT-5 cost?",
  "tools": [{"type": "file_search", "vector_store_ids": [vector_store.id]}],
  "include": ["file_search_call.results"],
  "tool_choice": {"type": "file_search"},
}

response = client.responses.create(**openai_kwargs)

print("Response:", next(message for message in response.output if message.type == "message").content[0].text)
print("\nResponse Object:", response)

Response: The cost of using GPT-5 via API pricing is as follows:

- For the main GPT-5 model:
  - Input tokens: $1.250 per 1 million tokens
  - Cached input tokens: $0.125 per 1 million tokens
  - Output tokens: $10.000 per 1 million tokens

- GPT-5 mini (a faster, cheaper version):
  - Input tokens: $0.250 per 1 million tokens
  - Cached input tokens: $0.025 per 1 million tokens
  - Output tokens: $2.000 per 1 million tokens

- GPT-5 nano (the fastest and cheapest version):
  - Input tokens: $0.050 per 1 million tokens
  - Cached input tokens: $0.005 per 1 million tokens
  - Output tokens: $0.400 per 1 million tokens

This pricing structure reflects usage costs based on tokens processed (both input and output) by the API. For more details, you can refer to the OpenAI pricing document provided.

Response Object: Response(id='resp_0eb45fdecd2d60c30068ddad3d6edc81a2bed8caf423a3bf28', created_at=1759358269.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-4.1-mini-2025-04-14', object='response', output=[ResponseFileSearchToolCall(id='fs_0eb45fdecd2d60c30068ddad3de62c81a2b0fa1375e1d3af76', queries=['GPT-5 cost', 'How much does GPT-5 cost?'], status='completed', type='file_search_call', results=[Result(attributes={}, file_id='file-LAa4zxoHGrZNwJNhsWyBsY', filename='openai-pricing.pdf', score=0.9285, text='Pricing | OpenAI\n\n\nPricing below reflects standard processing rates. To optimize cost and performance for\n\ndifferent use cases, we also offer:\n\nBatch API  : Save 50% on inputs and outputs with the Batch API and run tasks\n\nasynchronously over 24 hours.\n\nPriority processing  : offers reliable, high-speed performance with the flexibility to\n\npay-as-you-go.\n\nGPT-5\n\nThe best model for coding and agentic tasks across industries\n\nPrice\n\nAPI Pricing\n\nContact sales\n\nFlagship models\n\nOur frontier models designed to spend more time thinking before\n\nproducing a response, making them ideal for complex, multi-step problems.\n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 1/11\n\nhttps://platform.openai.com/docs/guides/batch\nhttps://openai.com/api-priority-processing/\nhttps://openai.com/contact-sales/\nhttps://openai.com/\n\n\nInput:\n\n$1.250 / 1M tokens\n\nCached input:\n\n$0.125 / 1M tokens\n\nOutput:\n\n$10.000 / 1M tokens\n\nGPT-5 mini\nA faster, cheaper version of GPT-5 for well-defined tasks\n\nPrice\n\nInput:\n\n$0.250 / 1M tokens\n\nCached input:\n\n$0.025 / 1M tokens\n\nOutput:\n\n$2.000 / 1M tokens\n\nGPT-5 nano\nThe fastest, cheapest version of GPT-5—great for summarization and\n\nclassification tasks\n\nPrice\n\nInput:\n\n$0.050 / 1M tokens\n\nCached input:\n\n$0.005 / 1M tokens\n\nOutput:\n\n$0.400 / 1M tokens\n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 2/11\n\nhttps://openai.com/\n\n\nGPT-4.1\n\nFine-tuning price\n\nInput:\n\n$3.00 / 1M tokens\n\nCached input:\n\n$0.75 / 1M tokens\n\nOutput:\n\n$12.00 / 1M tokens\n\nTraining:\n\n$25.00 / 1M tokens\n\nGPT-4.1 mini\n\nFine-tuning price\n\nInput:\n\n$0.80 / 1M tokens\n\nCached input:\n\n$0.20 / 1M tokens\n\nOutput:\n\n$3.20 / 1M tokens\n\nTraining:\n\n$5.00 / 1M tokens\n\nFine-tuning our models\n\nCustomize our models to get even higher performance for your specific use cases.\n\nAsk ChatGPT\n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 3/11\n\nhttps://openai.com/\n\n\nGPT-4.1 nano\n\nFine-tuning price\n\nInput:\n\n$0.20 / 1M tokens\n\nCached input:\n\n$0.05 / 1M tokens\n\nOutput:\n\n$0.80 / 1M tokens\n\nTraining:\n\n$1.50 / 1M tokens\n\no4-mini\n\nReinforcement fine-tuning price\n\nInput:\n\n$4.00 / 1M tokens\n\nCached input:\n\n$1.00 / 1M tokens\n\nOutput:\n\n$16.00 / 1M tokens\n\nTraining:\n\n$100.00 / training hour\n\nExplore detailed pricing\n\nOur APIs\n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 4/11', vector_store_id='vs_68ddad3a883c8191b5586b074b349ac5'), Result(attributes={}, file_id='file-LAa4zxoHGrZNwJNhsWyBsY', filename='openai-pricing.pdf', score=0.8829, text="https://openai.com/api/pricing/ 6/11\n\nhttps://platform.openai.com/docs/pricing\nhttps://openai.com/\n\n\nChat Completions API is not priced separately. Tokens are billed at the chosen language model's \ninput and output rates.\n\nAssistants API\n\nBuild assistant-like experiences with our tools.\n\nPrice\n\nAssistants API is not priced separately. Tokens are billed at the chosen language model's input and \noutput rates.\n\nBuilt-in tools\n\nExtend model capabilities with built-in tools in the API Platform.\n\nCode Interpreter\n$0.03\n\nFile Search Storage\n$0.10 / GB of vector storage per day (first GB free)\n\nFile Search Tool Call (Responses API only)\n$2.50 / 1k tool calls\n\nWeb Search Tool Call\nThe tokens used for built-in tools are billed at the chosen model's per-token rates.\nWeb search content tokens: Search content tokens are tokens retrieved from the search index and\nfed to the model alongside your prompt to generate an answer. For gpt-4o and gpt-4.1 models,\nthese tokens are included in the $25/1K calls cost. For o3 and o4-mini models, search content\ntokens are charged at model rate.\n\nModels Search Content Cost\n\ngpt-4o, gpt-4o-mini, gpt-4.1, and\ngpt-4.1-mini-models*\n\nSearch content tokens free $25.00 / 1K calls\n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 7/11\n\nhttps://openai.com/\n\n\nModels Search Content Cost\n\nGPT-5, GPT-5-mini, GPT-5-nano,\no3, o4-mini, o3-pro, and deep\nresearch models\n\nSearch content tokens billed\nat model rate\n\n$10.00 / 1K calls\nThe billing dashboard will report gpt-4.1 and gpt-4.1-mini search line items as ‘web search tool\n\ncalls | gpt-4o’ and ‘web search tool calls | gpt-4o-mini’\n\nGB refers to binary gigabytes of storage (also known as gibibyte), where 1GB is 2^30 bytes.\n\nExplore detailed pricing\n\nExplore our offerings for Enterprise customers: Priority processing , Scale Tier   and\n\nReserved Capacity  .\n\nWhich model should I use?\n\nFAQ\n\nWe recommend that developers use our large and mini GPT models for everyday tasks. Our large GPT\n\nmodels generally perform better on a wide range of tasks, while our mini GPT models are fast and\n\ninexpensive for simpler tasks.\n\nOur large and mini reasoning models are ideal for complex, multi-step tasks and STEM use cases that\n\nrequire deep thinking about tough problems. You can choose the mini reasoning model if you're looking for a\n\nfaster, more inexpensive option.\n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 8/11\n\nhttps://platform.openai.com/docs/pricing\nhttps://openai.com/api-priority-processing/\nhttps://openai.com/api-scale-tier/\nhttps://openai.com/reserved-capacity/\nhttps://openai.com/\n\n\nDo you offer an enterprise package or SLAs?\n\nWill I be charged for API usage in the Playground?\n\nHow will I know how many tokens I’ve used each month?\n\nHow can I manage my spending on the API platform?\n\nIs access to the API included in ChatGPT Plus, Team, Enterprise or Edu?\n\nHow is pricing calculated for images?\n\nWe recommend experimenting with all of these models in the  to explore which models provide", vector_store_id='vs_68ddad3a883c8191b5586b074b349ac5'), Result(attributes={}, file_id='file-LAa4zxoHGrZNwJNhsWyBsY', filename='openai-pricing.pdf', score=0.4536, text="https://platform.openai.com/docs/pricing\nhttps://openai.com/\n\n\nRealtime API\n\nBuild low-latency, multimodal experiences including speech-to-speech.\n\nText\n\nGPT-4o\n\n$5.00 / 1M input tokens $2.50 / 1M cached\n\ninput tokens\n\n$20.00 / 1M output tokens\n\nGPT-4o mini\n\n$0.60 / 1M input tokens $0.30 / 1M cached\n\ninput tokens\n\n$2.40 / 1M output tokens\n\nAudio\n\nImage Generation API\n\nPrecise, high-fidelity image generation and editing with our latest multimodal model.\n\nText\n\nGPT-image-1\n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 5/11\n\nhttps://openai.com/\n\n\n$5.00 / 1M input tokens $1.25 / 1M cached\n\ninput tokens*\n\n-\n\nImage\n\nGPT-image-1\n\n$10.00 / 1M input tokens $2.50 / 1M cached\n\ninput tokens*\n\n$40.00 / 1M output tokens\n\nPrompts are billed similarly to other GPT models. Image outputs cost approximately $0.01 (low),\n\n$0.04 (medium), and $0.17 (high) for square images.\n\n*available via the Responses API\n\nFor detailed token usage by image quality and size, see the docs.\n\nResponses API\n\nOur newest API combining the simplicity of Chat Completions with the built-in tool use\n\nof Assistants.\n\nPrice\n\nResponses API is not priced separately. Tokens are billed at the chosen language model’s input \nand output rates.\n\nChat Completions API\n\nBuild text-based conversational experiences.\n\nPrice\n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 6/11\n\nhttps://platform.openai.com/docs/pricing\nhttps://openai.com/\n\n\nChat Completions API is not priced separately. Tokens are billed at the chosen language model's \ninput and output rates.\n\nAssistants API\n\nBuild assistant-like experiences with our tools.\n\nPrice\n\nAssistants API is not priced separately. Tokens are billed at the chosen language model's input and \noutput rates.\n\nBuilt-in tools\n\nExtend model capabilities with built-in tools in the API Platform.\n\nCode Interpreter\n$0.03\n\nFile Search Storage\n$0.10 / GB of vector storage per day (first GB free)\n\nFile Search Tool Call (Responses API only)\n$2.50 / 1k tool calls\n\nWeb Search Tool Call\nThe tokens used for built-in tools are billed at the chosen model's per-token rates.\nWeb search content tokens: Search content tokens are tokens retrieved from the search index and\nfed to the model alongside your prompt to generate an answer. For gpt-4o and gpt-4.1 models,\nthese tokens are included in the $25/1K calls cost. For o3 and o4-mini models, search content\ntokens are charged at model rate.\n\nModels Search Content Cost\n\ngpt-4o, gpt-4o-mini, gpt-4.1, and\ngpt-4.1-mini-models*\n\nSearch content tokens free $25.00 / 1K calls\n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 7/11\n\nhttps://openai.com/\n\n\nModels Search Content Cost\n\nGPT-5, GPT-5-mini, GPT-5-nano,\no3, o4-mini, o3-pro, and deep\nresearch models", vector_store_id='vs_68ddad3a883c8191b5586b074b349ac5'), Result(attributes={}, file_id='file-LAa4zxoHGrZNwJNhsWyBsY', filename='openai-pricing.pdf', score=0.3007, text="Search content tokens billed\nat model rate\n\n$10.00 / 1K calls\nThe billing dashboard will report gpt-4.1 and gpt-4.1-mini search line items as ‘web search tool\n\ncalls | gpt-4o’ and ‘web search tool calls | gpt-4o-mini’\n\nGB refers to binary gigabytes of storage (also known as gibibyte), where 1GB is 2^30 bytes.\n\nExplore detailed pricing\n\nExplore our offerings for Enterprise customers: Priority processing , Scale Tier   and\n\nReserved Capacity  .\n\nWhich model should I use?\n\nFAQ\n\nWe recommend that developers use our large and mini GPT models for everyday tasks. Our large GPT\n\nmodels generally perform better on a wide range of tasks, while our mini GPT models are fast and\n\ninexpensive for simpler tasks.\n\nOur large and mini reasoning models are ideal for complex, multi-step tasks and STEM use cases that\n\nrequire deep thinking about tough problems. You can choose the mini reasoning model if you're looking for a\n\nfaster, more inexpensive option.\n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 8/11\n\nhttps://platform.openai.com/docs/pricing\nhttps://openai.com/api-priority-processing/\nhttps://openai.com/api-scale-tier/\nhttps://openai.com/reserved-capacity/\nhttps://openai.com/\n\n\nDo you offer an enterprise package or SLAs?\n\nWill I be charged for API usage in the Playground?\n\nHow will I know how many tokens I’ve used each month?\n\nHow can I manage my spending on the API platform?\n\nIs access to the API included in ChatGPT Plus, Team, Enterprise or Edu?\n\nHow is pricing calculated for images?\n\nWe recommend experimenting with all of these models in the  to explore which models provide\n\nthe best price performance trade-off for your usage.\n\nPlayground  \n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 9/11\n\nhttps://platform.openai.com/playground\nhttps://openai.com/\n\n\nOur Research\n\nResearch Index\n\nResearch Overview\n\nResearch Residency\n\nLatest\nAdvancements\n\nOpenAI o3\n\nChatGPT\n\nExplore ChatGPT\n\nTeam\n\nEnterprise\n\nEducation\n\nPricing\n\nFor Business\n\nBusiness Overview\n\nSolutions\n\nContact Sales\n\nCompany\n\nAbout Us\n\nTerms & Policies\n\nTerms of Use\n\nPrivacy Policy\n\nOther Policies\n\nStart creating with\nOpenAI’s powerful models.\n\nGet started Contact sales\n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 10/11\n\nhttps://openai.com/research/index/\nhttps://openai.com/research/\nhttps://openai.com/residency/\nhttps://openai.com/index/introducing-o3-and-o4-mini/\nhttps://chatgpt.com/overview?openaicom-did=16555033-6011-4e78-8929-6d7ac2440212&openaicom_referred=true\nhttps://openai.com/chatgpt/team/\nhttps://openai.com/chatgpt/enterprise/\nhttps://openai.com/chatgpt/education/\nhttps://openai.com/chatgpt/pricing/\nhttps://openai.com/business/\nhttps://openai.com/solutions/\nhttps://openai.com/contact-sales/\nhttps://openai.com/about/\nhttps://openai.com/policies/terms-of-use/\nhttps://openai.com/policies/privacy-policy/\nhttps://openai.com/policies/\nhttps://chatgpt.", vector_store_id='vs_68ddad3a883c8191b5586b074b349ac5'), Result(attributes={}, file_id='file-LAa4zxoHGrZNwJNhsWyBsY', filename='openai-pricing.pdf', score=0.2591, text='the best price performance trade-off for your usage.\n\nPlayground  \n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 9/11\n\nhttps://platform.openai.com/playground\nhttps://openai.com/\n\n\nOur Research\n\nResearch Index\n\nResearch Overview\n\nResearch Residency\n\nLatest\nAdvancements\n\nOpenAI o3\n\nChatGPT\n\nExplore ChatGPT\n\nTeam\n\nEnterprise\n\nEducation\n\nPricing\n\nFor Business\n\nBusiness Overview\n\nSolutions\n\nContact Sales\n\nCompany\n\nAbout Us\n\nTerms & Policies\n\nTerms of Use\n\nPrivacy Policy\n\nOther Policies\n\nStart creating with\nOpenAI’s powerful models.\n\nGet started Contact sales\n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 10/11\n\nhttps://openai.com/research/index/\nhttps://openai.com/research/\nhttps://openai.com/residency/\nhttps://openai.com/index/introducing-o3-and-o4-mini/\nhttps://chatgpt.com/overview?openaicom-did=16555033-6011-4e78-8929-6d7ac2440212&openaicom_referred=true\nhttps://openai.com/chatgpt/team/\nhttps://openai.com/chatgpt/enterprise/\nhttps://openai.com/chatgpt/education/\nhttps://openai.com/chatgpt/pricing/\nhttps://openai.com/business/\nhttps://openai.com/solutions/\nhttps://openai.com/contact-sales/\nhttps://openai.com/about/\nhttps://openai.com/policies/terms-of-use/\nhttps://openai.com/policies/privacy-policy/\nhttps://openai.com/policies/\nhttps://chatgpt.com/download?openaicom-did=16555033-6011-4e78-8929-6d7ac2440212&openaicom_referred=true\nhttps://auth0.openai.com/u/signup/identifier?state=hKFo2SBBUFVDN0w1ZjZEVmJEZWFpVmE1VmRwc21tZlZ4aVFrS6Fur3VuaXZlcnNhbC1sb2dpbqN0aWTZIEt6RzlGWUhZOEhhWllfd0dXMVlScUlramh0YkI5dkozo2NpZNkgRFJpdnNubTJNdTQyVDNLT3BxZHR3QjNOWXZpSFl6d0Q\nhttps://openai.com/contact-sales/\nhttps://openai.com/\n\n\nOpenAI o4-mini\n\nGPT-4o\n\nGPT-4o mini\n\nSora\n\nSafety\n\nSafety Approach\n\nSecurity & Privacy\n\nTrust & Transparency\n\nDownload\n\nSora\n\nSora Overview\n\nFeatures\n\nPricing\n\nSora log in\n\nAPI Platform\n\nPlatform Overview\n\nPricing\n\nAPI log in\n\nDocumentation\n\nDeveloper Forum\n\nOur Charter\n\nCareers\n\nBrand\n\nSupport\n\nHelp Center\n\nMore\n\nNews\n\nStories\n\nLivestreams\n\nPodcast\n\nOpenAI © 2015–2025\n\nManage Cookies\nEnglish United States\n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 11/11\n\nhttps://openai.com/index/introducing-o3-and-o4-mini/\nhttps://openai.com/index/gpt-4o-system-card/\nhttps://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/\nhttps://openai.', vector_store_id='vs_68ddad3a883c8191b5586b074b349ac5'), Result(attributes={}, file_id='file-LAa4zxoHGrZNwJNhsWyBsY', filename='openai-pricing.pdf', score=0.2348, text='8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 2/11\n\nhttps://openai.com/\n\n\nGPT-4.1\n\nFine-tuning price\n\nInput:\n\n$3.00 / 1M tokens\n\nCached input:\n\n$0.75 / 1M tokens\n\nOutput:\n\n$12.00 / 1M tokens\n\nTraining:\n\n$25.00 / 1M tokens\n\nGPT-4.1 mini\n\nFine-tuning price\n\nInput:\n\n$0.80 / 1M tokens\n\nCached input:\n\n$0.20 / 1M tokens\n\nOutput:\n\n$3.20 / 1M tokens\n\nTraining:\n\n$5.00 / 1M tokens\n\nFine-tuning our models\n\nCustomize our models to get even higher performance for your specific use cases.\n\nAsk ChatGPT\n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 3/11\n\nhttps://openai.com/\n\n\nGPT-4.1 nano\n\nFine-tuning price\n\nInput:\n\n$0.20 / 1M tokens\n\nCached input:\n\n$0.05 / 1M tokens\n\nOutput:\n\n$0.80 / 1M tokens\n\nTraining:\n\n$1.50 / 1M tokens\n\no4-mini\n\nReinforcement fine-tuning price\n\nInput:\n\n$4.00 / 1M tokens\n\nCached input:\n\n$1.00 / 1M tokens\n\nOutput:\n\n$16.00 / 1M tokens\n\nTraining:\n\n$100.00 / training hour\n\nExplore detailed pricing\n\nOur APIs\n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 4/11\n\nhttps://platform.openai.com/docs/pricing\nhttps://openai.com/\n\n\nRealtime API\n\nBuild low-latency, multimodal experiences including speech-to-speech.\n\nText\n\nGPT-4o\n\n$5.00 / 1M input tokens $2.50 / 1M cached\n\ninput tokens\n\n$20.00 / 1M output tokens\n\nGPT-4o mini\n\n$0.60 / 1M input tokens $0.30 / 1M cached\n\ninput tokens\n\n$2.40 / 1M output tokens\n\nAudio\n\nImage Generation API\n\nPrecise, high-fidelity image generation and editing with our latest multimodal model.\n\nText\n\nGPT-image-1\n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 5/11\n\nhttps://openai.com/\n\n\n$5.00 / 1M input tokens $1.25 / 1M cached\n\ninput tokens*\n\n-\n\nImage\n\nGPT-image-1\n\n$10.00 / 1M input tokens $2.50 / 1M cached\n\ninput tokens*\n\n$40.00 / 1M output tokens\n\nPrompts are billed similarly to other GPT models. Image outputs cost approximately $0.01 (low),\n\n$0.04 (medium), and $0.17 (high) for square images.\n\n*available via the Responses API\n\nFor detailed token usage by image quality and size, see the docs.\n\nResponses API\n\nOur newest API combining the simplicity of Chat Completions with the built-in tool use\n\nof Assistants.\n\nPrice\n\nResponses API is not priced separately. Tokens are billed at the chosen language model’s input \nand output rates.\n\nChat Completions API\n\nBuild text-based conversational experiences.\n\nPrice', vector_store_id='vs_68ddad3a883c8191b5586b074b349ac5'), Result(attributes={}, file_id='file-LAa4zxoHGrZNwJNhsWyBsY', filename='openai-pricing.pdf', score=0.0701, text='openai.com/u/signup/identifier?state=hKFo2SBBUFVDN0w1ZjZEVmJEZWFpVmE1VmRwc21tZlZ4aVFrS6Fur3VuaXZlcnNhbC1sb2dpbqN0aWTZIEt6RzlGWUhZOEhhWllfd0dXMVlScUlramh0YkI5dkozo2NpZNkgRFJpdnNubTJNdTQyVDNLT3BxZHR3QjNOWXZpSFl6d0Q\nhttps://openai.com/contact-sales/\nhttps://openai.com/\n\n\nOpenAI o4-mini\n\nGPT-4o\n\nGPT-4o mini\n\nSora\n\nSafety\n\nSafety Approach\n\nSecurity & Privacy\n\nTrust & Transparency\n\nDownload\n\nSora\n\nSora Overview\n\nFeatures\n\nPricing\n\nSora log in\n\nAPI Platform\n\nPlatform Overview\n\nPricing\n\nAPI log in\n\nDocumentation\n\nDeveloper Forum\n\nOur Charter\n\nCareers\n\nBrand\n\nSupport\n\nHelp Center\n\nMore\n\nNews\n\nStories\n\nLivestreams\n\nPodcast\n\nOpenAI © 2015–2025\n\nManage Cookies\nEnglish United States\n\n8/21/25, 8:18 PM Pricing | OpenAI\n\nhttps://openai.com/api/pricing/ 11/11\n\nhttps://openai.com/index/introducing-o3-and-o4-mini/\nhttps://openai.com/index/gpt-4o-system-card/\nhttps://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/\nhttps://openai.com/index/sora-system-card/\nhttps://openai.com/safety/\nhttps://openai.com/security-and-privacy/\nhttps://openai.com/trust-and-transparency/\nhttps://chatgpt.com/download?openaicom-did=16555033-6011-4e78-8929-6d7ac2440212&openaicom_referred=true\nhttps://openai.com/sora/\nhttps://openai.com/sora/#features\nhttps://openai.com/sora/#pricing\nhttps://sora.com/\nhttps://openai.com/api/\nhttps://openai.com/api/pricing/\nhttps://platform.openai.com/login\nhttps://platform.openai.com/docs/overview\nhttps://community.openai.com/\nhttps://openai.com/charter/\nhttps://openai.com/careers/\nhttps://openai.com/brand/\nhttps://help.openai.com/\nhttps://openai.com/news/\nhttps://openai.com/stories/\nhttps://openai.com/live/\nhttps://openai.com/podcast/\nhttps://x.com/OpenAI\nhttps://www.youtube.com/OpenAI\nhttps://www.linkedin.com/company/openai\nhttps://github.com/openai\nhttps://www.instagram.com/openai/\nhttps://www.tiktok.com/@openai\nhttps://discord.gg/openai\nhttps://openai.com/', vector_store_id='vs_68ddad3a883c8191b5586b074b349ac5')]), ResponseOutputMessage(id='msg_0eb45fdecd2d60c30068ddad3f855081a295c2c317cec7389d', content=[ResponseOutputText(annotations=[AnnotationFileCitation(file_id='file-LAa4zxoHGrZNwJNhsWyBsY', filename='openai-pricing.pdf', index=794, type='file_citation')], text='The cost of using GPT-5 via API pricing is as follows:\n\n- For the main GPT-5 model:\n  - Input tokens: $1.250 per 1 million tokens\n  - Cached input tokens: $0.125 per 1 million tokens\n  - Output tokens: $10.000 per 1 million tokens\n\n- GPT-5 mini (a faster, cheaper version):\n  - Input tokens: $0.250 per 1 million tokens\n  - Cached input tokens: $0.025 per 1 million tokens\n  - Output tokens: $2.000 per 1 million tokens\n\n- GPT-5 nano (the fastest and cheapest version):\n  - Input tokens: $0.050 per 1 million tokens\n  - Cached input tokens: $0.005 per 1 million tokens\n  - Output tokens: $0.400 per 1 million tokens\n\nThis pricing structure reflects usage costs based on tokens processed (both input and output) by the API. For more details, you can refer to the OpenAI pricing document provided.', type='output_text', logprobs=[])], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice=ToolChoiceTypes(type='file_search'), tools=[FileSearchTool(type='file_search', vector_store_ids=['vs_68ddad3a883c8191b5586b074b349ac5'], filters=None, max_num_results=20, ranking_options=RankingOptions(ranker='auto', score_threshold=0.0))], top_p=1.0, background=False, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=Reasoning(effort=None, generate_summary=None, summary=None), safety_identifier=None, service_tier='default', status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text'), verbosity='medium'), top_logprobs=0, truncation='disabled', usage=ResponseUsage(input_tokens=6524, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=259, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=6783), user=None, billing={'payer': 'developer'}, store=True)

As long as you have the file_search_call.results included in your OpenAI request, then you can score the trustworthiness of the file search powered response using the same TLM code.

tlm_response = tlm.score(response=response, **openai_kwargs)

print(f"TLM Score: {tlm_result['trustworthiness_score']:.4f}")
print(f"TLM Explanation: {tlm_result['log']['explanation']}")

TLM Score: 0.9904
TLM Explanation: Did not find a reason to doubt trustworthiness.

Make your existing code also produce trust scores (via decorator)

You decorate your call to openai.responses.create() with a decorator that then appends the trust score as a key in the returned response. This workflow only requires minimal initial setup; after that zero changes are needed in the rest of your existing code!

import functools

def add_trust_scoring(tlm_instance):
    """Decorator factory that creates a trust scoring decorator."""
    def trust_score_decorator(fn):
        @functools.wraps(fn)
        def wrapper(**kwargs):
            response = fn(**kwargs)
            score_result = tlm_instance.score(response=response, **kwargs)
            response.tlm_metadata = score_result
            return response
        return wrapper
    return trust_score_decorator

Then decorate your OpenAI Responses function like this:

client.responses.create = add_trust_scoring(tlm)(client.responses.create)

After you decorate OpenAI’s Responses function like this, all of your existing Responses API code will automatically compute trust scores as well (zero change needed in other code):

response = client.responses.create(input="What is the capital of France?", model="gpt-4.1-mini")

print(f"Response: {response.output[0].content[0].text}")
print(f"TLM Score: {response.tlm_metadata['trustworthiness_score']:.4f}")
print(f"TLM Explanation: {response.tlm_metadata['log']['explanation']}")

Response: The capital of France is Paris.
TLM Score: 0.9990
TLM Explanation: Did not find a reason to doubt trustworthiness.

Setup​

Usage​

Workflow 1: Single Turn Q&A​

Workflow 2: Multi-Turn Chat​

Workflow 3: Including Web Search Tool​

Workflow 4: Including File Search Tool (RAG)​

Make your existing code also produce trust scores (via decorator)​