Types of Codex Integrations
Options for how to connect Codex with your existing RAG application. Our Integration Tutorials show how to actually implement each of these options with different RAG frameworks (minimal code required).
Overview
Here is high-level pseudocode for our recommended Codex integrations (assumes you are familiar with RAG):
Standard RAG system:
context = KnowledgeBase.retrieve(query)
response = LLM.generate(prompt=query+context)
RAG system integrating Codex as-a-Tool:
context = KnowledgeBase.retrieve(query)
response_or_toolcall = LLM.generate(prompt=query+context, tools=[...,codex,...])
RAG sytem integrating Codex as-a-Backup:
context = KnowledgeBase.retrieve(query)
response = LLM.generate(prompt=query+context)
if Codex.is_bad_response(response, ...):
desired_response = Codex.get_answer(query)
if desired_response is not None:
response = desired_response
Choosing an Integration
Consider integrating Codex as-a-Tool if:
- Using a black-box RAG framework whose internals are hard to modify, and it supports tool calling.
- IDK responses / knowledge gaps are a major issue of your RAG app.
Consider integrating Codex as-a-Backup if:
- Your RAG app is not set up to support tool calling.
- You want to use Codex to not only fix IDK responses, but also incorrect responses from your current RAG app.
Other Considerations:
- Codex as-a-Backup may be easier to implement for single-turn Q&A vs. conversational applications.
- To control when / how often your RAG app relies on Codex: you will need to rely on system prompting for Codex as-a-Tool integrations, whereas Codex as-a-Backup integrations enable you to exert programmatic control.
Other integration options are possible (e.g. integrating Codex as-a-Cache that is checked first before queries are passed to the RAG system). Contact us to learn more: support@cleanlab.ai
Codex as a Tool
Codex is integrated as one the possible tools your LLM can utilize. Many LLMs and RAG applications support Tool Calling. The basic idea of the Codex as-a-Tool integration option is to allow your RAG app to ‘ask for help’ when it is unsure how to answer a user query. This integration option is thus best for fixing IDK responses and knowledge gaps in your RAG app.
The best way to integrate Codex as-a-Tool can depend on your RAG architecture. Consider which RAG Architecture below is most relevant for you.
What are Tool Calls?
Tool Calling (a.k.a. Function Calling) makes LLMs more powerful by allowing the model access external APIs like databases, calculators, or the internet. Adding tool calls enables broader workflows. Instead of training the model to handle everything, let it delegate specific tasks to tools. For example, call a weather API instead of asking the model to guess.
Advantages of Tool Calling include:
- Accuracy: Let tools handle tasks they’re specialized for, while the LLM focuses on reasoning and language tasks.
- Composability: Its easier to add or update tools than retraining the entire model.
RAG Architectures - Retrieval first (classic RAG)
In classical RAG (such as LlamaIndex), the retrieval step always happens before tool calling. When a user query is asked: the RAG system retrieves relevant context from its Knowledge Base, and then provides the query + context to an LLM. When generating a response for the user, this LLM can choose to make Tool Calls.
For a Codex as-a-Tool integration with such a RAG Architecture: we can guarantee the system has considered the available knowledge to make an informed response, before it decides to consult Codex for answers. This ensures that all Questions appearing in Codex are those for which the LLM did not think it could produce a good answer. The overall architecture with Codex integrated looks like this:
RAG Architectures - Retrieval as a tool (agentic RAG)
In agentic RAG (such as OpenAI Assistants), retrieval is considered as a tool itself. When a user query is asked: the RAG system decides whether to respond directly, run retrieval to search its Knowledge Base, or call other tools.
For a Codex as-a-Tool integration with such a RAG Architecture: the LLM must decide between performing retrieval or consulting Codex when it does not know the answer. If you allow the LLM to consult Codex without first having performed retrieval, then Codex will be full of Questions that could’ve been answered by your Knowledge Base. To avoid this, modify your system prompt to instruct the Agent to always call the retrieval tool first and then Codex afterward if the answer is still unclear.
The overall agentic RAG architecture with Codex integrated looks like this:
Codex as a Backup
When integrating Codex as-a-Tool, it can be nontrivial to control when Codex is called. Using a Codex as-a-Backup integration, you can control this programmatically.
In this integration, you first produce the response your RAG app would return to the user (without Codex). Then we assess whether this response is problematic via real-time Eval methods provided in Codex. If the response is detected to be problematic, then we consult Codex to see if it has an answer to the user’s query. If Codex is able to answer, we give the user this answer instead of the original RAG response.
The overall architecture of a Codex as-a-Backup integration looks like this:
The Codex API provides methods to automatically detect RAG responses which are: untrustworthy (potentially incorrect), or unhelpful (such as saying ‘I don’t know’).