Integrate Codex as-a-Tool into any RAG framework

Run in Google Colab

To demonstrate how to integrate Codex with any RAG framework, we’ll consider a toy example RAG app built from scratch using OpenAI LLMs. You can translate the same ideas to any RAG framework, assuming basic familiarity with RAG and LLMs.

This tutorial presumes your RAG app already can perform tool calls. If unsure how to do RAG with tool calls, follow our tutorial: Adding Tool Calls to RAG.

Once you have a RAG app that supports tool calling, adding Codex as an additional Tool takes minimal effort but guarantees better responses from your AI application.

RAG Workflow

If you prefer to integrate Codex without adding tool calls to your application, check out our other integrations.

Let’s first install packages required for this tutorial.

%pip install openai  # we used version 1.63.2

%pip install --upgrade cleanlab_codex

Optional: Helper methods for basic RAG from prior tutorial (Adding Tool Calls to RAG)

import os
import json
from datetime import datetime
from openai import OpenAI

fallback_answer = "Based on the available information, I cannot provide a complete answer to this question."  # desired RAG response when query cannot be answered

system_prompt_without_codex = f"""You are a helpful assistant designed to help users navigate a complex set of documents for question-answering tasks. Answer the user's Question based on the following possibly relevant Context and previous chat history using the tools provided if necessary. Follow these rules in order:
    1. NEVER use phrases like "according to the context", "as the context states", etc. Treat the Context as your own knowledge, not something you are referencing.
    2. Use only information from the provided Context.
    3. Give a clear, short, and accurate Answer. Explain complex terms if needed.
    4. If the answer to the question requires today's date, use the following tool: get_todays_date. Return the date in the exact format the tool provides it.
    5. If the Context doesn't adequately address the Question or you are unsure how to answer the Question, say: "{fallback_answer}" only, nothing else.

    Remember, your purpose is to provide information based on the Context, not to offer original advice.
"""

def get_todays_date(date_format: str) -> str:
  """A tool that returns today's date in the date format requested."""
  datetime_str = datetime.now().strftime(date_format)
  return datetime_str

todays_date_tool_json = {
  "type": "function",
  "function": {
    "name": "get_todays_date",
    "description": "A tool that returns today's date in the date format requested. Options for date_format parameter are: '%Y-%m-%d', '%d', '%m', '%Y'.",
    "parameters": {
      "type": "object",
      "properties": {
        "date_format": {
          "type": "string",
          "enum": ["%Y-%m-%d", "%d", "%m", "%Y"],
          "default": "%Y-%m-%d",
          "description": "The date format to return today's date in."
        }
      },
      "required": ["date_format"],
    }
  }
}

tools_without_codex = [todays_date_tool_json]

def retrieve_context(user_question: str) -> str:
  """Toy retrieval that returns same context for any user question. Replace this with actual retrieval in your RAG system."""
  contexts = """Simple Water Bottle - Amber (limited edition launched Jan 1st 2025)
A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish.
Price: $24.99 \nDimensions: 10 inches height x 4 inches width"""
  return contexts

def form_prompt(user_question: str, retrieved_context: str) -> str:
  question_with_context = f"Context:\n{retrieved_context}\n\nUser Question:\n{user_question}"
  indented_question_with_context = "\n".join(f"  {line}" for line in question_with_context.splitlines())    # line is just formatting the final prompt for readability in the tutorial
  return indented_question_with_context

def simulate_response_as_message(response: str) -> list[dict]:
  """Commits the response to a conversation history to return back to the model."""
  return {"role": "assistant", "content": response}

def simulate_tool_call_as_message(tool_call_id: str, function_name: str, function_arguments: str) -> dict:
  """Commits the tool call to a conversation history to return back to the model."""
  tool_call_message = {
    "role": "assistant",
    "tool_calls": [{
            "id": tool_call_id,
            "type": "function",
            "function": {
                "arguments": function_arguments,
                "name": function_name
            }
  }]}
  return tool_call_message

def simulate_tool_call_response_as_message(tool_call_id: str, function_response: str) -> dict:
  """Commits the result of the function call to a conversation history to return back to the model."""
  function_call_result_message = {
    "role": "tool",
    "content": function_response,
    "tool_call_id": tool_call_id,
  }
  return function_call_result_message

def stream_response(client, messages: list[dict], model: str, tools: list[dict]) -> str:
  """Processes a streaming model response dynamically, handling any tool calls that were made.
  Params:
      messages: message history list in openai format
      model: model name
      tools: list of tools model can call
  Returns:
      response: final response in openai format
  """

  response_stream = client.chat.completions.create(
      model=model,
      messages=messages,
      stream=True,
      tools=tools,
      parallel_tool_calls=False,  # prevents OpenAI from making multiple tool calls in a single response
  )

  collected_messages = []
  final_tool_calls = {}

  for chunk in response_stream:
      if chunk.choices[0].delta.content:
          collected_messages.append(chunk.choices[0].delta.content)
      for tool_call in chunk.choices[0].delta.tool_calls or []:
          index = tool_call.index

          if index not in final_tool_calls:
              final_tool_calls[index] = tool_call

          final_tool_calls[index].function.arguments += tool_call.function.arguments
      
      if chunk.choices[0].finish_reason == "tool_calls":
          for tool_call in final_tool_calls.values():
              function_response = _handle_any_tool_call_for_stream_response(tool_call.function.name, json.loads(tool_call.function.arguments))
              print(f'[internal log] Called {tool_call.function.name} tool, with arguments: {tool_call.function.arguments}')
              print(f'[internal log] Tool response: {str(function_response)}')
              tool_call_response_message = simulate_tool_call_response_as_message(tool_call.id, function_response)

              # If the tool call resulted in an error, return the message instead of continuing the conversation
              if "error" in tool_call_response_message["content"]:
                  return tool_call_response_message

              response = [
                  simulate_tool_call_as_message(tool_call.id, tool_call.function.name, tool_call.function.arguments),
                  tool_call_response_message,
              ]
              
              # If needed, extend messages and re-call the stream response
              messages.extend(response)
              response = stream_response(client=client, messages=messages, model=model, tools=tools)  # This recursive call handles the case when a tool calls another tool until all tools are resolved and a final response is returned
      else:
          collected_messages = [m for m in collected_messages if m is not None]
          full_str_response = "".join(collected_messages)
          response = simulate_response_as_message(full_str_response)
  return response

def _handle_any_tool_call_for_stream_response(function_name: str, arguments: dict) -> str:
    """Handles any tool dynamically by calling the function by name and passing in collected arguments.
       Returns a dictionary of the tool output.
       Returns error message if the tool is not found, not callable or called incorrectly.
    """

    try:
        tool_function = globals().get(function_name) or locals().get(function_name)
        if callable(tool_function):
            # Dynamically call the tool function with arguments
            tool_output = tool_function(**arguments)
            return json.dumps(tool_output)
        else:
            return json.dumps({
                "error": f"Tool '{function_name}' not found or not callable.",
                "arguments": arguments,
            })
    except Exception as e:
        return json.dumps({
            "error": f"Exception in handling tool '{function_name}': {str(e)}",
            "arguments": arguments,
        })

Example RAG App: Product Customer Support

Let’s revisit our RAG app built in the RAG With Tool Calls tutorial, which has the option to call a get_todays_date() tool. This example represents a customer support / e-commerce use-case where the Knowledge Base contains product listings like the following:

Simple water bottle product listing

The details of this toy RAG app are unimportant if you are already familiar with RAG and Tool Calling, otherwise refer to the RAG With Tool Calls tutorial. That tutorial walks through the RAG method defined below, which uses the OpenAI LLM API for single-turn Q&A with token-streaming. To run this method, we instantiate our OpenAI client. Subsequently, we integrate Codex-as-a-Tool and demonstrate its benefits.

Optional: Helper RAG method from prior tutorial (Adding Tool Calls to RAG)

def rag(client, model: str, user_question: str, system_prompt: str, tools: list[dict]) -> str:
  retrieved_context = retrieve_context(user_question)
  question_with_context = form_prompt(user_question, retrieved_context)
  print(f"[internal log] Invoking LLM text\n{question_with_context}\n\n")

  messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": question_with_context},
  ]
  
  response_messages = stream_response(client=client, messages=messages, model=model, tools=tools)
  return f"\n[RAG response] {response_messages.get('content')}"

os.environ["OPENAI_API_KEY"] = "<YOUR-KEY-HERE>"  # Replace with your OpenAI API key
model = "gpt-4o"  # which LLM to use
client = OpenAI()

Create Codex Project

To use Codex, first create a Project.

Here we assume some common (question, answer) pairs about the Simple Water Bottle have already been added to a Codex Project. Learn how that was done via our tutorial: Populating Codex.

Our existing Codex Project contains the following entries:

Codex Knowledge Base Example

access_key = "<YOUR-PROJECT-ACCESS-KEY>"  # Obtain from your Project's settings page: https://codex.cleanlab.ai/

Integrate Codex as an additional tool

Integrating Codex into a RAG app that supports tool calling requires minimal code changes:

Import Codex and add it into your list of tools.
Update your system prompt to include instructions for calling Codex, as demonstrated below in: system_prompt_with_codex.

After that, call your original RAG pipeline with these updated variables to start experiencing the benefits of Codex!

Note: This tutorial uses a Codex tool description in OpenAI format, provided via the to_openai_tool() function. For certain non-OpenAI LLMs, you can import the Codex tool description in other provided formats as well, or manually write it yourself if no provided format is available. Check the Codex API Docs for other formats.

from cleanlab_codex import CodexTool

codex_tool = CodexTool.from_access_key(access_key=access_key, fallback_answer=fallback_answer)
codex_tool_openai = codex_tool.to_openai_tool()

globals()[codex_tool.tool_name] = codex_tool.query  # Optional step for convenience: make function to call the tool globally accessible

tools_with_codex = tools_without_codex + [codex_tool_openai]  # Add Codex to the list of tools

# Update the RAG system prompt with instructions for handling Codex (adjust based on your needs)
system_prompt_with_codex = f"""You are a helpful assistant designed to help users navigate a complex set of documents for question-answering tasks. Answer the user's Question based on the following possibly relevant Context and previous chat history using the tools provided if necessary. Follow these rules in order:
    1. NEVER use phrases like "according to the context", "as the context states", etc. Treat the Context as your own knowledge, not something you are referencing.
    2. Use only information from the provided Context.
    3. Give a clear, short, and accurate Answer. Explain complex terms if needed.
    4. If the answer to the question requires today's date, use the following tool: get_todays_date. Return the date in the exact format the tool provides it.
    5. When the Context does not answer the user's Question, call the `{codex_tool.tool_name}` tool.
        - Always use `{codex_tool.tool_name}` if the provided Context lacks the necessary information.
        - Your query to `{codex_tool.tool_name}` should closely match the user’s original Question, with only minor clarifications if needed.
        - Evaluate the response from `{codex_tool.tool_name}`. If the response is helpful, use it to answer the user’s Question. If the response is not helpful, ignore it.
    6. If you still cannot confidently answer the user's Question (even after using `{codex_tool.tool_name}` and other tools), say: "{fallback_answer}".
    
    Remember, your purpose is to provide information based on the Context and make effective use of `{codex_tool.tool_name}` when necessary, not to offer original advice.
"""

RAG with Codex in action

Integrating Codex as-a-Tool allows your RAG app to answer more questions than it was originally capable of.

Example 1

Let’s ask a question to our original RAG app (before Codex was integrated).

user_question = "Can I return my simple water bottle?"

response = rag(client, model=model, user_question=user_question,
               system_prompt=system_prompt_without_codex, tools=tools_without_codex
              )

print(response)

[internal log] Invoking LLM text
Context:
Simple Water Bottle - Amber (limited edition launched Jan 1st 2025)
A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish.
Price: $24.99 
Dimensions: 10 inches height x 4 inches width

User Question:
Can I return my simple water bottle?



[RAG response] Based on the available information, I cannot provide a complete answer to this question.

The original RAG app is unable to answer, in this case because the required information is not in its Knowledge Base.

Let’s ask the same question to our RAG app with Codex added as an additional tool. Note that we use the updated system prompt and tool list when Codex is integrated in the RAG app.

response = rag(client, model=model, user_question=user_question,
               system_prompt=system_prompt_with_codex, tools=tools_with_codex
              )

print(response)

[internal log] Invoking LLM text
Context:
Simple Water Bottle - Amber (limited edition launched Jan 1st 2025)
A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish.
Price: $24.99 
Dimensions: 10 inches height x 4 inches width

User Question:
Can I return my simple water bottle?


[internal log] Called consult_codex tool, with arguments: {"question":"Can I return my Simple Water Bottle?"}
[internal log] Tool response: "Return it within 30 days for a full refund-- no questions asked. Contact our support team to initiate your return!"

[RAG response] You can return your Simple Water Bottle within 30 days for a full refund. Contact the support team to initiate the return process.

As you see, integrating Codex enables your RAG app to answer questions it originally strugged with, as long as a similar question was already answered in the corresponding Codex Project.

Example 2

Let’s ask another question to our RAG app with Codex integrated.

user_question = "How can I order the Simple Water Bottle in bulk?"

response = rag(client, model=model, user_question=user_question,
               system_prompt=system_prompt_with_codex, tools=tools_with_codex
              )

print(response)

[internal log] Invoking LLM text
Context:
Simple Water Bottle - Amber (limited edition launched Jan 1st 2025)
A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish.
Price: $24.99 
Dimensions: 10 inches height x 4 inches width

User Question:
How can I order the Simple Water Bottle in bulk?


[internal log] Called consult_codex tool, with arguments: {"question":"How can I order the Simple Water Bottle in bulk?"}
[internal log] Tool response: "Based on the available information, I cannot provide a complete answer to this question."

[RAG response] Based on the available information, I cannot provide a complete answer to this question.

Our RAG app is unable to answer this question because there is no relevant information in its Knowledge Base, nor has a similar question been answered in the Codex Project (see the contents of the Codex Project above).

Codex automatically recognizes this question could not be answered and logs it into the Project where it awaits an answer from a SME. Navigate to your Codex Project in the Web App where you (or a SME at your company) can enter the desired answer for this query.

Codex Project with asked question that has not been answered yet

As soon as an answer is provided in Codex, our RAG app will be able to answer all similar questions going forward (as seen for the previous query).

Example 3

Let’s ask another query to our RAG app with Codex integrated. This is a query the original RAG app was able to correctly answer without Codex (since the relevant information exists in the Knowledge Base).

user_question = "How big is the water bottle?"

response = rag(client, model=model, user_question=user_question,
               system_prompt=system_prompt_with_codex, tools=tools_with_codex
              )

print(response)

[internal log] Invoking LLM text
Context:
Simple Water Bottle - Amber (limited edition launched Jan 1st 2025)
A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish.
Price: $24.99 
Dimensions: 10 inches height x 4 inches width

User Question:
How big is the water bottle?



[RAG response] The water bottle has dimensions of 10 inches in height and 4 inches in width.

We see that the RAG app with Codex integrated is still able to correctly answer this query. Integrating Codex has no negative effect on questions your original RAG app could answer.

Next Steps

Now that Codex is integrated with your RAG app, you and SMEs can open the Codex Project and answer questions logged there to continuously improve your AI.

Adding Codex only improves your RAG app. As seen here, integrating Codex into your RAG app requires minimal extra code. Once integrated, the Codex Project automatically logs all user queries that your original RAG app handles poorly. Using a simple web interface, SMEs at your company can answer the highest priority questions in the Codex Project. As soon as an answer is entered in Codex, your RAG app will be able to properly handle all similar questions encountered in the future

Codex is the fastest way for nontechnical SMEs to directly improve your RAG app. As the Developer, you simply integrate Codex once, and from then on, SMEs can continuously improve how your AI handles common user queries without needing your help. Codex works with any RAG architecture, so Developers can independently improve the RAG system in other ways with their new free time.

This tutorial demonstrated a single-turn Q&A app, but you can easily extend this code into a conversational app (multi-turn chat).

Need help, more capabilities, or other deployment options? Check the FAQ or email us at: support@cleanlab.ai

Example RAG App: Product Customer Support​

Create Codex Project​

Integrate Codex as an additional tool​

RAG with Codex in action​

Example 1​

Example 2​

Example 3​

Next Steps​