RAG with Tool Calls in AWS Bedrock Knowledge Bases

Run in Google Colab

This tutorial covers the basics of building a conversational RAG application that supports tool calls, via the AWS Bedrock Knowledge Bases and Converse APIs. Here we demonstrate how to build the specific RAG app used in our Integrate Codex as-a-Tool with AWS Bedrock Knowledge Bases tutorial. Remember that Codex works with any RAG app, you can easily translate these ideas to more complex RAG pipelines.

Here’s a typical architecture for RAG apps with tool calling:

RAG Workflow

Let’s first install packages required for this tutorial and set up required AWS configurations.

%pip install -U boto3  # we used package-version 1.36.0

Optional: Set up AWS configurations

import os
import boto3
from botocore.client import Config

os.environ["AWS_ACCESS_KEY_ID"] = (
    "<YOUR_AWS_ACCESS_KEY_ID>"  # Your permament access key (not session access key)
)
os.environ["AWS_SECRET_ACCESS_KEY"] = (
    "<YOUR_AWS_SECRET_ACCESS_KEY>"  # Your permament secret access key (not session secret access key)
)
os.environ["MFA_DEVICE_ARN"] = (
    "<YOUR_MFA_DEVICE_ARN>"  # If your organization requires MFA, find this in AWS Console under: settings -> security credentials -> your mfa device
)
os.environ["AWS_REGION"] = "us-east-1"  # Specify your AWS region

# Load environment variables
aws_access_key_id = os.getenv("AWS_ACCESS_KEY_ID")
aws_secret_access_key = os.getenv("AWS_SECRET_ACCESS_KEY")
region_name = os.getenv("AWS_REGION", "us-east-1")  # Default to 'us-east-1' if not set
mfa_serial_number = os.getenv("MFA_DEVICE_ARN")

# Ensure required environment variables are set
if not all([aws_access_key_id, aws_secret_access_key, mfa_serial_number]):
    raise EnvironmentError(
        "Missing required environment variables. Ensure AWS_ACCESS_KEY_ID, "
        "AWS_SECRET_ACCESS_KEY, and MFA_DEVICE_ARN are set."
    )

# Enter MFA code in case your AWS organization requires it
mfa_token_code = input("Enter your MFA code: ")
print("MFA code entered: ", mfa_token_code)

sts_client = boto3.client(
    "sts",
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key,
    region_name=region_name,
)

try:
    # Request temporary credentials
    response = sts_client.get_session_token(
        DurationSeconds=3600 * 24,  # Valid for 24 hours
        SerialNumber=mfa_serial_number,
        TokenCode=mfa_token_code,
    )

    temp_credentials = response["Credentials"]
    temp_access_key = temp_credentials["AccessKeyId"]
    temp_secret_key = temp_credentials["SecretAccessKey"]
    temp_session_token = temp_credentials["SessionToken"]

    # Create a Bedrock Agent Runtime client
    client = boto3.client(
        "bedrock-agent-runtime",
        aws_access_key_id=temp_access_key,
        aws_secret_access_key=temp_secret_key,
        aws_session_token=temp_session_token,
        region_name=region_name,
    )
    print("Bedrock client successfully created.")
except Exception as e:
    print(f"Error creating Bedrock client: {e}")

Initialize Bedrock retrieval and generation clients.

bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0})

BEDROCK_RETRIEVE_CLIENT = boto3.client(
    "bedrock-agent-runtime",
    config=bedrock_config,
    aws_access_key_id=temp_access_key,
    aws_secret_access_key=temp_secret_key,
    aws_session_token=temp_session_token,
    region_name=region_name
)

BEDROCK_GENERATION_CLIENT = boto3.client(
    service_name='bedrock-runtime',
    aws_access_key_id=temp_access_key,
    aws_secret_access_key=temp_secret_key,
    aws_session_token=temp_session_token,
    region_name=region_name
)

Example RAG App: Product Customer Support

Consider a customer support / e-commerce RAG use-case where the Knowledge Base contains product listings like the following:

Simple water bottle product listing

Creating a Knowledge Base

To keep our example simple, we upload the product description to AWS S3 as a single file: simple_water_bottle.txt. This is the sole file our Knowledge Base will contain, but you can populate your actual Knowledge Base with many heterogeneous documents.

To create a Knowledge Base using Amazon Bedrock, refer to the official documentation.

After you’ve created it, add your KNOWLEDGE_BASE_ID below.

KNOWLEDGE_BASE_ID = 'DASYAHIOKX'  # replace with your own Knowledge Base

Implement a standard RAG pipeline

A RAG pipeline has two key steps – retrieval and generation, which implement using AWS Bedrock APIs. We’ll add tool calling support to the generation step.

Retrieval in AWS Knowledge Bases

We define helper methods for retrieving context from our Knowledge Base.

Optional: Helper methods for Retrieval in AWS Knowledge Bases

def retrieve(query, knowledgebase_id, numberOfResults=3):
    """Fetches relevant document chunks to query from Knowledge Base using AWS Bedrock Agent Runtime"""
    return BEDROCK_RETRIEVE_CLIENT.retrieve(
        retrievalQuery= {
            'text': query
        },
        knowledgeBaseId=knowledgebase_id,
        retrievalConfiguration= {
            'vectorSearchConfiguration': {
                'numberOfResults': numberOfResults,
                'overrideSearchType': "HYBRID"
            }
        }
    )

def retrieve_and_get_contexts(query, kbId, numberOfResults=3, threshold=0.0):
    """Fetches relevant contexts and properly formats them for the subsequent LLM response generation step."""
    retrieval_results = retrieve(query, kbId, numberOfResults)
    contexts = []
    
    for retrievedResult in retrieval_results['retrievalResults']:
        if retrievedResult['score'] >= threshold:
            text = retrievedResult['content']['text']
            if text.startswith("Document 1: "):
                text = text[len("Document 1: "):]  # Remove prefix if present
            contexts.append(text)
    
    return contexts

SCORE_THRESHOLD = 0.3  #  Similarity score threshold for retrieving context to use in our RAG app

Let’s run our retrieval with a query.

query = "What is the Simple Water Bottle?"

print(retrieve_and_get_contexts(query, KNOWLEDGE_BASE_ID)[0])

Simple Water Bottle - Amber (limited edition launched Jan 1st 2025) A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish. Price: $24.99 \nDimensions: 10 inches height x 4 inches width

Response generation with tool calling

To generate responses with an LLM that can also call tools, we pass the user query and retrieved context from our Knowledge Base into the AWS Converse API.

This API can either return a string response from the LLM or a tool call. If the output is a tool call, our method will keep prompting the Converse API until the LLM returns a string response after processing the result of tool call(s).

Optional: Helper methods for response generation with tool calling via AWS Converse API

import json

def form_prompt(user_question: str, contexts: list) -> str:
    """Forms the prompt to be used for querying the model."""
    context_strings = "\n\n".join([f"Context {i + 1}: {context}" for i, context in enumerate(contexts)])
    query_with_context = f"{context_strings}\n\nQUESTION:\n{user_question}"

    indented_question_with_context = "\n".join(f"  {line}" for line in query_with_context.splitlines())
    return indented_question_with_context

def generate_text(user_question: str, model: str, tools: list[dict], system_prompts: list, messages: list[dict], bedrock_client) -> list[dict]:
    """Generates text dynamically handling tool use within Amazon Bedrock.
    Params:
        messages: List of message history in the desired format.
        model: Identifier for the Amazon Bedrock model.
        tools: List of tools the model can call.
        bedrock_client: Client to interact with Bedrock API.
    Returns:
        messages: Final updated list of messages including tool interactions and responses.
    """

    # Initial call to the model
    response = bedrock_client.converse(
        modelId=model,
        messages=messages,
        toolConfig=tools,
        system=system_prompts,
    )

    output_message = response["output"]["message"]
    stop_reason = response["stopReason"]
    messages.append(output_message)

    while stop_reason == "tool_use":
        # Extract tool requests from the model response
        tool_requests = output_message.get("content", [])

        for tool_request in tool_requests:
            if "toolUse" in tool_request:
                tool = tool_request["toolUse"]
                tool_name = tool["name"]
                tool_input = tool["input"]
                tool_use_id = tool["toolUseId"]
                
                try:
                    # If you don't want the original question to be modified, use this instead
                    if 'question' in tool['input'].keys():
                        tool['input']['question'] = user_question
                    print(f"[internal log] Requesting tool {tool['name']}. with arguments: {tool_input}.")
                    tool_output_json = _handle_any_tool_call_for_stream_response(tool_name, tool_input)
                    tool_result = json.loads(tool_output_json)
                    print(f"[internal log] Tool response: {tool_result}")

                    # If tool call resulted in an error
                    if "error" in tool_result:
                        tool_result_message = {
                            "role": "user",
                            "content": [{"toolResult": {
                                "toolUseId": tool_use_id,
                                "content": [{"text": tool_result["error"]}],
                                "status": "error"
                            }}]
                        }
                    else:
                        # Format successful tool response
                        tool_result_message = {
                            "role": "user",
                            "content": [{"toolResult": {
                                "toolUseId": tool_use_id,
                                "content": [{"json": {"response": tool_result}}]
                            }}]
                        }

                except Exception as e:
                    # Handle unexpected exceptions during tool handling
                    tool_result_message = {
                        "role": "user",
                        "content": [{"toolResult": {
                            "toolUseId": tool_use_id,
                            "content": [{"text": f"Error processing tool: {str(e)}"}],
                            "status": "error"
                        }}]
                    }

                # Append the tool result to messages
                messages.append(tool_result_message)

        # Send the updated messages back to the model
        response = bedrock_client.converse(
            modelId=model,
            messages=messages,
            toolConfig=tools,
            system=system_prompts,
        )

        output_message = response["output"]["message"]
        stop_reason = response["stopReason"]
        messages.append(output_message)

    return messages

def _handle_any_tool_call_for_stream_response(function_name: str, arguments: dict) -> str:
    """Handles any tool dynamically by calling the function by name and passing in collected arguments.
       Returns a dictionary of the tool output.
       Returns error message if the tool is not found, not callable, or called incorrectly.
    """
    tool_function = globals().get(function_name) or locals().get(function_name)

    if callable(tool_function):
        try:
            # Dynamically call the tool function with arguments
            tool_output = tool_function(**arguments)
            return json.dumps(tool_output)
        except Exception as e:
            return json.dumps({
                "error": f"Exception while calling tool '{function_name}': {str(e)}",
                "arguments": arguments,
            })
    else:
        return json.dumps({
            "error": f"Tool '{function_name}' not found or not callable.",
            "arguments": arguments,
        })

Define single-turn RAG app

We integrate the above helper methods into a standard RAG app that can respond to any user query, calling tools as the LLM deems necessary. Our rag() method can be called multiple times in a conversation, as long as a messages variable is provided each time to track conversation history.

def rag(model: str, user_question: str, system_prompt: str, tools: list[dict], messages: list, knowledgebase_id: str) -> str:
    """Performs Retrieval-Augmented Generation using the provided model and tools.
    Params:
        model: Model name or ID.
        user_question: The user's question or query.
        system_prompt: System message to set context or behavior.
        tools: List of tools the model can call.
        knowledgebase_id: Knowledge base ID for retrieving contexts.
        messages: Optional list of prior conversation history.
    Returns:
        Final response text generated by the model.
    """

    # Retrieve contexts based on the user query and knowledge base ID
    contexts = retrieve_and_get_contexts(user_question, knowledgebase_id,threshold= SCORE_THRESHOLD)
    query_with_context = form_prompt(user_question, contexts)
    print(f"[internal log] Invoking LLM with prompt + context\n{query_with_context}\n\n")

    # Construct the user message with the retrieved contexts
    user_message = {
        "role": "user",
        "content": [{"text": query_with_context}]
    }
    messages.append(user_message)
    system_prompts = [{'text': system_prompt}]
    # Call generate_text with the updated messages
    final_messages = generate_text(
        user_question=user_question,
        model=model,
        tools=tools,
        system_prompts=system_prompts,
        messages=messages,
        bedrock_client=BEDROCK_GENERATION_CLIENT, 
    )

    # Extract and return the final response text
    return final_messages[-1]["content"][-1]["text"]

Example tool: get_todays_date

Let’s define an example tool, get_todays_date(), to use in our RAG system. We provide the corresponding function and instructions on how to use it in a JSON format required by the AWS Converse API.

from datetime import datetime

def get_todays_date(date_format: str) -> str:
  """A tool that returns today's date in the date format requested."""
  datetime_str = datetime.now().strftime(date_format)
  return datetime_str

todays_date_tool_json = {
  "toolSpec": {
    "name": "get_todays_date",
    "description": "A tool that returns today's date in the date format requested. Options are: '%Y-%m-%d', '%d', '%m', '%Y'.",
    "inputSchema": {
      "json": {
        "type": "object",
        "properties": {
          "date_format": {
            "type": "string",
            "description": "The format that the tool requests the date in."
          }
        },
        "required": [
          "date_format"
        ]
      }
    }
  }
}

System prompt with tool use instructions

For the best performance, add clear instructions on when to use the tool into the system prompt that governs your LLM. Below we simply add Step 3. in our list of instructions, which otherwise represent a typical RAG system prompt. In most RAG apps, one instructs the LLM on what fallback answer to respond with when it does not know how to answer a user’s query. Such fallback instructions help you reduce hallucinations and more precisely control the AI.

fallback_answer = "Based on the available information, I cannot provide a complete answer to this question."

system_prompt = f"""You are a helpful assistant designed to help users navigate a complex set of documents for question-answering tasks. Answer the user's Question based on the following possibly relevant Context and previous chat history using the tools provided if necessary. Follow these rules in order:
    1. NEVER use phrases like "according to the context," "as the context states," etc. Treat the Context as your own knowledge, not something you are referencing.
    2. Use only information from the provided Context. Your purpose is to provide information based on the Context, not to offer original advice.
    3. Give a clear, short, and accurate answer. Explain complex terms if needed.
    4. If the answer to the question requires today's date, use the following tool: get_todays_date. Return the date in the exact format the tool provides it.
    5. If you remain unsure how to answer the Question  then only respond with: "{fallback_answer}".

    Remember, your purpose is to provide information based on the Context, not to offer original advice.
""".format(
        fallback_answer=fallback_answer
    )

Conversational RAG with tool calling

We track conversation history in a messages variable that is updated each time we call the rag() method to respond to a user query. Let’s also select an LLM for our RAG pipeline and which tools are available.

After that, we can chat with our RAG app! Here we try a few user queries to evaluate different scenarios.

messages = []
model = 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20240620-v1:0'

tool_config = {
    "tools": [todays_date_tool_json]
}

Scenario 1: RAG can answer the question without tools

user_question = "How big is the water bottle?"

rag_response = rag(model=model, user_question=user_question, system_prompt=system_prompt, tools=tool_config, messages=messages, knowledgebase_id=KNOWLEDGE_BASE_ID)
print(f'[RAG response] {rag_response}')

[internal log] Invoking LLM with prompt + context
Context 1: Simple Water Bottle - Amber (limited edition launched Jan 1st 2025) A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish. Price: $24.99 \nDimensions: 10 inches height x 4 inches width

QUESTION:
How big is the water bottle?


[RAG response] The Simple Water Bottle - Amber has the following dimensions:

10 inches in height
4 inches in width

These dimensions indicate that it's a fairly standard-sized water bottle, tall enough to hold a good amount of liquid while still being easy to carry and fit into most cup holders or bag pockets.

For this user query, the necessary information is available in the Knowledge Base (as part of the product description).

Scenario 2: RAG can answer the question using tools

user_question = "Has the limited edition Amber water bottle already launched?"

rag_response = rag(model=model, user_question=user_question, system_prompt=system_prompt, tools=tool_config, messages=messages, knowledgebase_id=KNOWLEDGE_BASE_ID)
print(f'[RAG response] {rag_response}')

[internal log] Invoking LLM with prompt + context
Context 1: Simple Water Bottle - Amber (limited edition launched Jan 1st 2025) A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish. Price: $24.99 \nDimensions: 10 inches height x 4 inches width

QUESTION:
Has the limited edition Amber water bottle already launched?


[internal log] Requesting tool get_todays_date. with arguments: {'date_format': '%Y-%m-%d'}.
[internal log] Tool response: 2025-02-13
[RAG response] Based on the information provided and today's date, I can answer your question:

The limited edition Amber water bottle has already launched. The context states that it was launched on January 1st, 2025, and today's date is February 13, 2025. This means the water bottle has been available for about a month and a half.

For this user query, the LLM chose to call our get_todays_date tool to obtain necessary information. Note that a proper answer to this question also requires considering information from the Knowledge Base as well.

Scenario 3: RAG can answer the question considering conversation history

user_question = "What is the full name of it?"

rag_response = rag(model=model, user_question=user_question, system_prompt=system_prompt, tools=tool_config, messages=messages, knowledgebase_id=KNOWLEDGE_BASE_ID)
print(f'[RAG response] {rag_response}')

[internal log] Invoking LLM with prompt + context
Context 1: Simple Water Bottle - Amber (limited edition launched Jan 1st 2025) A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish. Price: $24.99 \nDimensions: 10 inches height x 4 inches width

QUESTION:
What is the full name of it?

[RAG response] The full name of the product is:

Simple Water Bottle - Amber

This name encompasses both the product type and its specific color variant, which is described as a limited edition.

This user query only makes sense taking the conversation history into account.

Scenario 4: RAG cannot answer the question

user_question = "Can I return my simple water bottle?"

rag_response = rag(model=model, user_question=user_question, system_prompt=system_prompt, tools=tool_config, messages=messages, knowledgebase_id=KNOWLEDGE_BASE_ID)
print(f'[RAG response] {rag_response}')

[internal log] Invoking LLM with prompt + context
Context 1: Simple Water Bottle - Amber (limited edition launched Jan 1st 2025) A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish. Price: $24.99 \nDimensions: 10 inches height x 4 inches width

QUESTION:
Can I return my simple water bottle?


[RAG response] Based on the available information, I cannot provide a complete answer to this question. The given context does not include any details about return policies or procedures for the Simple Water Bottle - Amber. To answer this question accurately, we would need additional information about the company's return policy or specific terms and conditions for this product.

Note that the Knowledge Base does not contain information about the return policy, and the get_todays_date tool would not help either. In this case, the best our RAG app can do is to return our fallback response to the user.

Optional: Review full message history (includes tool calls)

# For educational purposes, we passed `messages` into every RAG call and logged every step in this variable.

for message in messages:
    print(message)

{'role': 'user', 'content': [{'text': '  Context 1: Simple Water Bottle - Amber (limited edition launched Jan 1st 2025) A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish. Price: $24.99 \\nDimensions: 10 inches height x 4 inches width\n  \n  QUESTION:\n  How big is the water bottle?'}]}
{'role': 'assistant', 'content': [{'text': "The Simple Water Bottle - Amber has the following dimensions:\n\n10 inches in height\n4 inches in width\n\nThese dimensions indicate that it's a fairly standard-sized water bottle, tall enough to hold a good amount of liquid while still being easy to carry and fit into most cup holders or bag pockets."}]}
{'role': 'user', 'content': [{'text': '  Context 1: Simple Water Bottle - Amber (limited edition launched Jan 1st 2025) A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish. Price: $24.99 \\nDimensions: 10 inches height x 4 inches width\n  \n  QUESTION:\n  Has the limited edition Amber water bottle already launched?'}]}
{'role': 'assistant', 'content': [{'text': "To answer this question accurately, I need to know today's date and compare it with the launch date of the Simple Water Bottle - Amber limited edition. Let me use the available tool to get today's date."}, {'toolUse': {'toolUseId': 'tooluse_yjXK7j33T7yCaR-cXHVsvQ', 'name': 'get_todays_date', 'input': {'date_format': '%Y-%m-%d'}}}]}
{'role': 'user', 'content': [{'toolResult': {'toolUseId': 'tooluse_yjXK7j33T7yCaR-cXHVsvQ', 'content': [{'json': {'response': '2025-02-13'}}]}}]}
{'role': 'assistant', 'content': [{'text': "Based on the information provided and today's date, I can answer your question:\n\nThe limited edition Amber water bottle has already launched. The context states that it was launched on January 1st, 2025, and today's date is February 13, 2025. This means the water bottle has been available for about a month and a half."}]}
{'role': 'user', 'content': [{'text': '  Context 1: Simple Water Bottle - Amber (limited edition launched Jan 1st 2025) A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish. Price: $24.99 \\nDimensions: 10 inches height x 4 inches width\n  \n  QUESTION:\n  What is the full name of it?'}]}
{'role': 'assistant', 'content': [{'text': 'The full name of the product is:\n\nSimple Water Bottle - Amber\n\nThis name encompasses both the product type and its specific color variant, which is described as a limited edition.'}]}
{'role': 'user', 'content': [{'text': '  Context 1: Simple Water Bottle - Amber (limited edition launched Jan 1st 2025) A water bottle designed with a perfect blend of functionality and aesthetics in mind. Crafted from high-quality, durable plastic with a sleek honey-colored finish. Price: $24.99 \\nDimensions: 10 inches height x 4 inches width\n  \n  QUESTION:\n  Can I return my simple water bottle?'}]}
{'role': 'assistant', 'content': [{'text': "Based on the available information, I cannot provide a complete answer to this question. The given context does not include any details about return policies or procedures for the Simple Water Bottle - Amber. To answer this question accurately, we would need additional information about the company's return policy or specific terms and conditions for this product."}]}

Next Steps

Adding tool calls to your RAG system expands the capabilities of what your AI can do and the types of questions it can answer.

Once you have a RAG app with tools set up, adding Codex as-a-Tool takes only a few lines of code. Codex enables your RAG app to answer questions it previously could not (like Scenario 4 above). Learn how via our tutorial: Integrate Codex as-a-Tool with AWS Bedrock Knowledge Bases.

Need help? Check the FAQ or email us at: support@cleanlab.ai

Example RAG App: Product Customer Support​

Creating a Knowledge Base​

Implement a standard RAG pipeline​

Retrieval in AWS Knowledge Bases​

Response generation with tool calling​

Define single-turn RAG app​

Example tool: get_todays_date​

System prompt with tool use instructions​

Conversational RAG with tool calling​

Scenario 1: RAG can answer the question without tools​

Scenario 2: RAG can answer the question using tools​

Scenario 3: RAG can answer the question considering conversation history​

Scenario 4: RAG cannot answer the question​

Next Steps​