Integrate Codex as-a-Tool with LlamaIndex

Run in Google Colab

This tutorial assumes you have a LlamaIndex RAG app that supports tool calls. Learn how to add tool calls to any LlamaIndex application via our tutorial: RAG With Tool Calls in LlamaIndex.

Once you have a RAG app that supports tool calling, adding Codex as an additional Tool takes minimal effort but guarantees better responses from your AI application.

RAG Workflow

If you prefer to integrate Codex without adding tool calls to your application, check out our other integrations.

Let’s first install packages required for this tutorial.

%pip install --upgrade llama-index llama-index-llms-openai  # we used package-versions 0.12.10 and 0.4.1

%pip install cleanlab_codex  # we used package-version 1.0.0

import os
from llama_index.llms.openai import OpenAI

os.environ["OPENAI_API_KEY"] = "<YOU-KEY-HERE>"  # Replace with your OpenAI API key
model =  "gpt-4o"  # model used by RAG system (has to support tool calling)

llm = OpenAI(model=model)  # API key can be set via OPENAI_API_KEY environment variable or .env file

Optional: Helper methods for basic RAG from prior tutorial (Adding Tool Calls to RAG)

from datetime import datetime

def get_todays_date(date_format: str) -> str:
  "A tool that returns today's date in the date format requested. Options for date_format parameter are: '%Y-%m-%d', '%d', '%m', '%Y'."
  datetime_str = datetime.now().strftime(date_format)
  return datetime_str

fallback_answer = "Based on the available information, I cannot provide a complete answer to this question."

system_message_without_codex = f"""You are a helpful assistant designed to help users navigate a complex set of documents for question-answering tasks. Answer the user's Question based on the following possibly relevant Context and previous chat history using the tools provided if necessary. Follow these rules in order:
    1. NEVER use phrases like "according to the context", "as the context states", etc. Treat the Context as your own knowledge, not something you are referencing.
    2. Use only information from the provided Context.
    3. Give a clear, short, and accurate Answer. Explain complex terms if needed.
    4. If the answer to the question requires today's date, use the following tool: get_todays_date. Return the date in the exact format the tool provides it.
    5. If the Context doesn't adequately address the Question or you are unsure how to answer the Question, say: "{fallback_answer}" only, nothing else.

    Remember, your purpose is to provide information based on the Context, not to offer original advice.
"""

from llama_index.core import VectorStoreIndex, Document
from llama_index.core.llms import ChatMessage, ChatResponse
from llama_index.core.llms.function_calling import FunctionCallingLLM
from llama_index.core.retrievers import BaseRetriever
from llama_index.core.tools import FunctionTool

# Ingest documents into a vector database, and set up a retriever
documents = [
    Document(text="Simple Water Bottle - Amber (limited edition launched Jan 1st 2025) \n\nA water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish."),
    Document(text="Price: $24.99 \nDimensions: 10 inches height x 4 inches width"),
]
index = VectorStoreIndex.from_documents(documents) # Set up your own doc-store and vector database here
retriever = index.as_retriever(similarity_top_k=5)

class RAGApp:
    def __init__(self,
        llm: FunctionCallingLLM,
        tools: list[FunctionTool],
        retriever: BaseRetriever,
        messages: list[ChatMessage] | None = None,
    ):
        self.llm = llm
        self.tools = tools
        self._tools_map = {tool.metadata.name: tool for tool in tools}
        self.retriever = retriever
        self.chat_history = messages or []

    def __call__(self, user_query: str) -> ChatResponse:
        """Process user input: retrieve context to enrich query, get response (possibly using tools), update conversation."""
        self.chat_history.append(ChatMessage(role="user", content=user_query))
        context = self._retrieve_context(user_query)
        query_with_context = self._form_prompt(user_question=user_query, retrieved_context=context)
        response = self.handle_response_and_tools(query_with_context)
        self.chat_history.append(response.message)
        return response
    
    def _form_prompt(self, user_question: str, retrieved_context: str) -> str:
        question_with_context = f"Context:\n{retrieved_context}\n\nUser Question:\n{user_question}"
        # Below step is just formatting the final prompt for readability in the tutorial
        indented_question_with_context = "\n".join(f"  {line}" for line in question_with_context.splitlines())
        return indented_question_with_context

    def _retrieve_context(self, user_query: str) -> str:
        """Retrieves and formats context from documents matching the user query."""
        context_strings = [node.text for node in self.retriever.retrieve(user_query)]
        return "\n".join(context_strings)  # Basic context formatting for demo-purposes

    def handle_response_and_tools(self, query: str) -> ChatResponse:
        """Manages tool-calling conversation loop using transient message history.
        
        Creates temporary chat history to track tool interactions without affecting main conversation.
        Loops through tool calls and responses until completion, then returns final response to user.
        """
        # Create a temporary chat history for tool interactions
        temp_chat_history = self.chat_history.copy()
        print(f"[internal log] Invoking LLM text\n{query}\n\n")

        response = self.llm.chat_with_tools(
            tools=self.tools, 
            user_msg=query, 
            chat_history=temp_chat_history[:-1],
        )
        tool_calls = self.llm.get_tool_calls_from_response(
            response, error_on_no_tool_call=False
        )

        while tool_calls:
            temp_chat_history.append(response.message)
            # If any tools are called, run with the tools until we hit the Alpha tool and return
            for tool_call in tool_calls:
                print(f'[internal log] Called {tool_call.tool_name} tool, with arguments: {tool_call.tool_kwargs}')
                tool = self._tools_map[tool_call.tool_name]
                tool_kwargs = tool_call.tool_kwargs
                tool_output = tool(**tool_kwargs)
                temp_chat_history.append(ChatMessage(role="tool", content=str(tool_output), additional_kwargs={"tool_call_id": tool_call.tool_id}))
                
                response = self.llm.chat_with_tools([tool], chat_history=temp_chat_history)
                print(f'[internal log] Tool response: {response.message.content}')
                tool_calls = self.llm.get_tool_calls_from_response(
                    response, error_on_no_tool_call=False
                )
        return response

Example: Customer Service for a New Product

Let’s revisit our RAG app built in the RAG with Tool Calls in LlamaIndex tutorial, which has the option to call a get_todays_date() tool. This example represents a customer support / e-commerce use-case where the Knowledge Base contains product listings like the following:

Simple water bottle product listing

The details of this example RAG app are unimportant if you are already familiar with RAG and Tool Calling, otherwise refer to the RAG with Tool Calls in LlamaIndex tutorial. That tutorial walks through the RAG app defined above. Subsequently, we integrate Codex-as-a-Tool and demonstrate its benefits.

Create Codex Project

To use Codex, first create a Project.

Here we assume some common (question, answer) pairs about the Simple Water Bottle have already been added to a Codex Project. To learn how that was done, see our tutorial: Populating Codex.

Our existing Codex Project contains the following entries:

Codex Knowledge Base Example

access_key = "<YOUR-PROJECT-ACCESS-KEY>"  # Obtain from your Project's settings page: https://codex.cleanlab.ai/

Add Codex as an additional tool

Integrating Codex into a RAG app that supports tool calling requires minimal code changes:

Import Codex and add it into your list of tools.
Update your system prompt to include instructions for calling Codex, as demonstrated below in: system_prompt_with_codex.

After that, call your original RAG pipeline with these updated variables to start experiencing the benefits of Codex!

# 1: Import CodexTool
from cleanlab_codex import CodexTool

codex_tool = CodexTool.from_access_key(access_key=access_key, fallback_answer=fallback_answer)
codex_tool_llama = codex_tool.to_llamaindex_tool()

globals()[codex_tool.tool_name] = codex_tool.query # Optional step for convenience: make function to call the tool globally accessible

# 2: Update the RAG system prompt with instructions for handling Codex (adjust based on your needs)
system_message_with_codex = f"""You are a helpful assistant designed to help users navigate a complex set of documents for question-answering tasks. Answer the user's Question based on the following possibly relevant Context and previous chat history using the tools provided if necessary. Follow these rules in order:
    1. NEVER use phrases like "according to the context", "as the context states", etc. Treat the Context as your own knowledge, not something you are referencing.
    2. Use only information from the provided Context.
    3. Give a clear, short, and accurate Answer. Explain complex terms if needed.
    4. If the answer to the question requires today's date, use the following tool: get_todays_date. Return the date in the exact format the tool provides it.
    5. When the Context does not answer the user's Question, call the `{codex_tool.tool_name}` tool.
        - Always use `{codex_tool.tool_name}` if the provided Context lacks the necessary information.
        - Your query to `{codex_tool.tool_name}` should closely match the user’s original Question, with only minor clarifications if needed.
        - Evaluate the response from `{codex_tool.tool_name}`. If the response is helpful, use it to answer the user’s Question. If the response is not helpful, ignore it.
    6. If you still cannot confidently answer the user's Question (even after using `{codex_tool.tool_name}` and other tools), say: "{fallback_answer}".
    
    Remember, your purpose is to provide information based on the Context and make effective use of `{codex_tool.tool_name}` when necessary, not to offer original advice.
"""

# 3: Initialize RAGApp with the CodexTool
llm = OpenAI(model=model)
chat_history = [
    ChatMessage(role="system", content=system_message_with_codex),  # Add Codex instructions here
]
tools = [
    FunctionTool.from_defaults(fn=get_todays_date),
    codex_tool_llama  # Add Codex to list of tools
]
rag_with_codex = RAGApp(llm=llm, tools=tools, retriever=retriever, messages=chat_history)

Optional: Initialize RAG App without CodexTool

chat_history = [
    ChatMessage(role="system", content=system_message_without_codex),  # Add Codex instructions here
]
tools = [
    FunctionTool.from_defaults(fn=get_todays_date),
    codex_tool_llama # Add Codex to list of tools
]
rag_without_codex = RAGApp(llm=llm, tools=tools, retriever=retriever, messages=chat_history)

RAG with Codex in action

Integrating Codex as-a-Tool allows your RAG app to answer more questions than it was originally capable of.

Example 1

Let’s ask a question to our original RAG app (before Codex was integrated).

response = rag_without_codex("Can I return my Simple Water Bottle?")
print(f'\n[RAG response] {response.message.content}')

[internal log] Invoking LLM text
Context:
Simple Water Bottle - Amber (limited edition launched Jan 1st 2025) 

A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish.
Price: $24.99 
Dimensions: 10 inches height x 4 inches width

User Question:
Can I return my Simple Water Bottle?



[RAG response] Based on the available information, I cannot provide a complete answer to this question.

The original RAG app is unable to answer, in this case because the required information is not in its Knowledge Base.

Let’s ask the same question to our RAG app with Codex added as an additional tool. Note that we use the updated system prompt and tool list when Codex is integrated in the RAG app.

response = rag_with_codex("Can I return my Simple Water Bottle?")
print(f'\n[RAG response] {response.message.content}')

[internal log] Invoking LLM text
Context:
Simple Water Bottle - Amber (limited edition launched Jan 1st 2025) 

A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish.
Price: $24.99 
Dimensions: 10 inches height x 4 inches width

User Question:
Can I return my Simple Water Bottle?


[internal log] Called consult_codex tool, with arguments: {'question': 'Can I return my Simple Water Bottle?'}
[internal log] Tool response: You can return your Simple Water Bottle within 30 days for a full refund. To start the return process, contact the support team.

[RAG response] You can return your Simple Water Bottle within 30 days for a full refund. To start the return process, contact the support team.

As you see, integrating Codex enables your RAG app to answer questions it originally strugged with, as long as a similar question was already answered in the corresponding Codex Project.

Example 2

Let’s ask another question to our RAG app with Codex integrated.

response = rag_with_codex("How exactly can I order the Simple Water Bottle in bulk?")
print(f'\n[RAG response] {response.message.content}')

[internal log] Invoking LLM text
Context:
Simple Water Bottle - Amber (limited edition launched Jan 1st 2025) 

A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish.
Price: $24.99 
Dimensions: 10 inches height x 4 inches width

User Question:
How exactly can I order the Simple Water Bottle in bulk?


[internal log] Called consult_codex tool, with arguments: {'question': 'How can I order the Simple Water Bottle in bulk?'}
[internal log] Tool response: Based on the available information, I cannot provide a complete answer to this question.

[RAG response] Based on the available information, I cannot provide a complete answer to this question.

Our RAG app is unable to answer this question because there is no relevant information in its Knowledge Base, nor has a similar question been answered in the Codex Project (see the contents of the Codex Project above).

Codex automatically recognizes this question could not be answered and logs it into the Project where it awaits an answer from a SME.

Codex Project with asked question that has not been answered yet

As soon as an answer is provided in Codex, our RAG app will be able to answer all similar questions going forward (as seen for the previous query).

Example 3

Let’s ask another query to our RAG app with Codex integrated. This is a query the original RAG app was able to correctly answer without Codex (since the relevant information exists in the Knowledge Base).

response = rag_with_codex("How big is the water bottle?")
print(f'\n[RAG response] {response.message.content}')

[internal log] Invoking LLM text
Context:
Simple Water Bottle - Amber (limited edition launched Jan 1st 2025) 

A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish.
Price: $24.99 
Dimensions: 10 inches height x 4 inches width

User Question:
How big is the water bottle?



[RAG response] The Simple Water Bottle is 10 inches in height and 4 inches in width.

We see that the RAG app with Codex integrated is still able to correctly answer this query. Integrating Codex has no negative effect on questions your original RAG app could answer.

Next Steps

Now that Codex is integrated with your RAG app, you and SMEs can open the Codex Project and answer questions logged there to continuously improve your AI.

Adding Codex only improves your RAG app. Once integrated, Codex automatically logs all user queries that your original RAG app handles poorly. Using a simple web interface, SMEs at your company can answer the highest priority questions in the Codex Project. As soon as an answer is entered in Codex, your RAG app will be able to properly handle all similar questions encountered in the future.

Codex is the fastest way for nontechnical SMEs to directly improve your AI application. As the Developer, you simply integrate Codex once, and from then on, SMEs can continuously improve how your AI handles common user queries without needing your help.

Need help, more capabilities, or other deployment options? Check the FAQ or email us at: support@cleanlab.ai

Example: Customer Service for a New Product​

Create Codex Project​

Add Codex as an additional tool​

RAG with Codex in action​

Example 1​

Example 2​

Example 3​

Next Steps​