RAG with Tool Calls in LangChain

Run in Google Colab

This notebook covers the basics of building an agentic RAG application with LangChain (the specific RAG app used in our Integrate Codex as-a-Tool with LangChain tutorial)

The LangChain framework implements agentic RAG, which treats Retrieval as a tool that can be called (rather than as a step hardcoded into every user interaction, as is done in standard RAG). For standard RAG (where retrieval is a hardcoded step), refer to our Adding Tool Calls to RAG tutorial.

Here’s a typical architecture for agentic RAG apps with tool calling:

RAG Workflow

Let’s first install required packages for this tutorial.

%pip install langchain-text-splitters langchain-community langgraph langchain-openai  # we used package-versions 0.3.5, 0.3.16, 1.59.7, 0.3.2

import os

os.environ["OPENAI_API_KEY"] = "<YOUR-KEY-HERE>"  # Replace with your OpenAI API key
generation_model = "gpt-4o"  # model used by RAG system (has to support tool calling)
embedding_model = "text-embedding-3-small"   # any LangChain embeddings model

Example: Customer Service for a New Product

Consider a customer support / e-commerce RAG use-case where the Knowledge Base contains product listings like the following:

Image of a beautiful simple water bottle that is definitely worth more than the asking price

To keep this example minimal, we’ll use a simple in-memory vector store with a single document. The document will contain the context (product information) on the product above. The current setup can be updated with any LangChain embeddings model and vector store.

Optional: Initialize vector store + add document

from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Initialize vector store
embeddings = OpenAIEmbeddings(model=embedding_model)
vector_store = InMemoryVectorStore(embeddings)

# Sample document to demonstrate Codex integration
product_page_content = """Simple Water Bottle - Amber (limited edition launched Jan 1st 2025)
A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish.
Price: $24.99 \nDimensions: 10 inches height x 4 inches width"""
documents =[
    Document(
        id="simple_water_bottle.txt",
        page_content=product_page_content,
    ),
]

# Standard LangChain text splitting - use any splitter that fits your docs
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(documents)

# Add documents to your chosen vector store
_ = vector_store.add_documents(documents=all_splits)

Create Chat App with Tool Calls

We now define a tool-calling RAG app.

from langchain_core.tools import BaseTool
from langchain_core.messages import (
    HumanMessage,
    SystemMessage,
    BaseMessage,
)
from typing import List, Optional

class RAGApp:
    def __init__(
        self,
        llm: ChatOpenAI,
        tools: List[BaseTool],
        retriever: BaseTool,
        messages: Optional[List[BaseMessage]] = None
    ):
        """Initialize RAG application with provided components."""
        _tools = [retriever] + tools
        self.tools = {tool.name: tool for tool in _tools}
        self.llm = llm.bind_tools(_tools)
        self.messages: List[BaseMessage] = messages or []

    def chat(self, user_query: str) -> str:
        """Process user input and handle any necessary tool calls."""
        # Add user query to messages
        self.messages.append(HumanMessage(content=user_query))
        
        # Get initial response (may include tool calls)
        print(f"[internal log] Invoking LLM text\n{user_query}\n\n")
        response = self.llm.invoke(self.messages)
        self.messages.append(response)
        
        # Handle any tool calls
        while response.tool_calls:
            # Process each tool call
            for tool_call in response.tool_calls:
                # Get the appropriate tool
                tool = self.tools[tool_call["name"].lower()]
                
                # Call the tool and get result
                tool_name = tool_call["name"]
                tool_args = tool_call["args"]
                print(f"[internal log] Calling tool: {tool_name} with args: {tool_args}")
                tool_result = tool.invoke(tool_call)
                print(f'[internal log] Tool response: {str(tool_result)}')
                self.messages.append(tool_result)
            
            # Get next response after tool calls
            response = self.llm.invoke(self.messages)
            self.messages.append(response)
        
        return response.content

Example tool: get_todays_date

Let’s define an example tool get_todays_date() that our RAG app can rely on. LangChain does not need function schemas meaning adding tools much is easier - write normal Python functions and it automatically: reads your function name and docstring, understands your parameters and type hints, and creates the LLM-friendly format for you

from langchain_core.tools import tool
from datetime import datetime

@tool
def get_todays_date(date_format: str) -> str:
  "A tool that returns today's date in the date format requested. Options for date_format parameter are: '%Y-%m-%d', '%d', '%m', '%Y'."
  datetime_str = datetime.now().strftime(date_format)
  return datetime_str

Define Retriever Tool

In addition to our example tool, we need to explicitly provide the system with a tool that works as a retriever. It searches the vector store for relevant Context as is needed if we want our system to do any context retrieval. Let’s define it here.

@tool
def retrieve(query: str) -> str:
    """Search through available documents to find relevant information."""
    docs = vector_store.similarity_search(query, k=2)
    return "\n\n".join(doc.page_content for doc in docs)

Update our LLM system prompt with tool call instructions

For the best performance, add instructions on when to use the tool into the system prompt that governs your LLM. Below we simply added Step 3. and 4. in our list of instructions, which otherwise represent a typical RAG system prompt. In most RAG apps, one instructs the LLM on what fallback_answer to respond with when it does not know how to answer a user’s query. Such fallback instructions help you reduce hallucinations and more precisely control the AI.

fallback_answer = "Based on the available information, I cannot provide a complete answer to this question."

system_message = f"""You are a helpful assistant designed to help users navigate a complex set of documents for question-answering tasks. Answer the user's Question based on the following possibly relevant Context and previous chat history using the tools provided if necessary. Follow these rules in order:
    1. NEVER use phrases like "according to the context", "as the context states", etc. Treat the Context as your own knowledge, not something you are referencing.
    2. Use only information from the provided Context.
    3. Give a clear, short, and accurate Answer. Explain complex terms if needed.
    4. You have access to the retrieve tool, to retrieve relevant information to the query as Context.
    5. If the answer to the question requires today's date, use the following tool: get_todays_date. Return the date in the exact format the tool provides it.
    6. If the Context doesn't adequately address the Question or you are unsure how to answer the Question, say: "{fallback_answer}" only, nothing else.

    Remember, your purpose is to provide information based on the Context, not to offer original advice.
"""

Initialize our RAG App

Finally, let’s set up our LLM that supports tool calling and initialize our RAG App. Any LangChain-compatible LLM can be used here, as long as it supports tool calling

llm = ChatOpenAI(model=generation_model)

rag = RAGApp(
    llm=llm,
    tools=[get_todays_date],  # Add your tools here
    retriever=retrieve,
    messages=[SystemMessage(content=system_message)]
)

RAG in action

Let’s run our RAG application over different questions commonly asked by users about the Simple Water Bottle in our example.

Scenario 1: RAG can answer the question without tools

response = rag.chat("How big is the water bottle?")
print(f"\n[RAG response] {response}")

[internal log] Invoking LLM text
How big is the water bottle?


[internal log] Calling tool: retrieve with args: {'query': 'water bottle size'}
[internal log] Tool response: content='Simple Water Bottle - Amber (limited edition launched Jan 1st 2025)\nA water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish.\nPrice: $24.99 \nDimensions: 10 inches height x 4 inches width' name='retrieve' tool_call_id='call_7oOCRF3Mu6FHhcSUgWRJ7ZLF'

[RAG response] The water bottle is 10 inches in height and 4 inches in width.

Scenario 2: RAG can answer the question (using other tools)

response = rag.chat("Check today's date. Has the limited edition Amber water bottle already launched?")
print(f"\n[RAG response] {response}")

[internal log] Invoking LLM text
Check today's date. Has the limited edition Amber water bottle already launched?


[internal log] Calling tool: get_todays_date with args: {'date_format': '%Y-%m-%d'}
[internal log] Tool response: content='2025-02-25' name='get_todays_date' tool_call_id='call_B5WNeIMURcLAL04atuSmZfr7'

[RAG response] Yes, the limited edition Amber water bottle has already launched, as it was released on January 1st, 2025.

Scenario 3: RAG cannot answer the question

response = rag.chat("Can I return my simple water bottle?")
print(f"\n[RAG response] {response}")

[internal log] Invoking LLM text
Can I return my simple water bottle?


[internal log] Calling tool: retrieve with args: {'query': 'water bottle return policy'}
[internal log] Tool response: content='Simple Water Bottle - Amber (limited edition launched Jan 1st 2025)\nA water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish.\nPrice: $24.99 \nDimensions: 10 inches height x 4 inches width' name='retrieve' tool_call_id='call_rWrqgYfJtkcfggqsuWf159Up'

[RAG response] Based on the available information, I cannot provide a complete answer to this question.

Note that the Context does not contain information about the return policy, and the get_todays_date tool would not help either. In this case, we want to return our fallback response to the user.

Next Steps

Adding tool calls to your RAG system expands the capabilities of what your AI can do and the types of questions it can answer.

Once you have a RAG app with tools set up, adding Codex as-a-Tool takes only a few lines of code. Codex enables your RAG app to answer questions it previously could not (like Scenario 3 above). Learn how via our tutorial: Integrate Codex as-a-Tool with LangChain.

Need help? Check the FAQ or email us at: support@cleanlab.ai

Example: Customer Service for a New Product​

Create Chat App with Tool Calls​

Example tool: get_todays_date​

Define Retriever Tool​

Update our LLM system prompt with tool call instructions​

Initialize our RAG App​

RAG in action​

Scenario 1: RAG can answer the question without tools​

Scenario 2: RAG can answer the question (using other tools)​

Scenario 3: RAG cannot answer the question​

Next Steps​