Code Summary: Create Tools for your Agent

This code creates two tools for an agent. One tool performs a vector search and the other tool retrieves documents by their titles.

Link to code on GitHub

Import Packages

Update the imported packages in the main.py file with the following:

import key_param
from pymongo import MongoClient
from langchain.agents import tool
from typing import List
from langchain_openai import ChatOpenAI
import voyageai

Create a Helper Function

Create a helper function that generates embeddings for user’s queries with the following. This is used to perform a vector search.

def generate_embedding(text: str) -> List[float]:
    """
    Generate embedding for a piece of text.

    Args:
        text (str): The text to embed.
        embedding_model (voyage-3-lite): The embedding model.

    Returns:
        List[float]: The embedding of the text.
    """

    embedding_model = voyageai.Client(api_key=key_param.voyage_api_key)

    embedding = embedding_model.embed(text, model="voyage-3-lite", input_type="query").embeddings[0]
    
    return embedding

Vector Search Tool

The following creates a tool function that finds relevant documents in a MongoDB database using vector search. It converts a user query into an embedding, performs semantic similarity search against stored document vectors, and returns the top 5 matching document bodies as context the LLM will use to answer questions.

@tool 
def get_information_for_question_answering(user_query: str) -> str:
    """
    Retrieve relevant documents for a user query using vector search.

    Args:
        user_query (str): The user's query.

    Returns:
        str: The retrieved documents as a string.
    """

    query_embedding = generate_embedding(user_query)

    vs_collection = init_mongodb()[1]
    
    pipeline = [
        {
            # Use vector search to find similar documents
            "$vectorSearch": {
                "index": "vector_index",  # Name of the vector index
                "path": "embedding",       # Field containing the embeddings
                "queryVector": query_embedding,  # The query embedding to compare against
                "numCandidates": 150,      # Consider 150 candidates (wider search)
                "limit": 5,                # Return only top 5 matches
            }
        },
        {
            # Project only the fields we need
            "$project": {
                "_id": 0,                  # Exclude document ID
                "body": 1,                 # Include the document body
                "score": {"$meta": "vectorSearchScore"},  # Include the similarity score
            }
        },
    ]
    
    results = vs_collection.aggregate(pipeline)
    
    context = "\n\n".join([doc.get("body") for doc in results])
    
    return context

Database Query Tool

The following creates a tool function that fetches a complete documentation page by title from MongoDB. It queries the database for an exact title match and returns the full document body, which the LLM uses to create a summary of the page. If no matching document is found, it returns a "Document not found" message.

@tool 
def get_page_content_for_summarization(user_query: str) -> str:
    """
    Retrieve the content of a documentation page for summarization.

    Args:
        user_query (str): The user's query (title of the documentation page).

    Returns:
        str: The content of the documentation page.
    """
    full_collection = init_mongodb()[2]

    query = {"title": user_query}
    
    projection = {"_id": 0, "body": 1}
    
    document = full_collection.find_one(query, projection)
    
    if document:
        return document["body"]
    else:
        return "Document not found"

Update the Main Function

Update the main function to include a list of tools the agent will use and invoke each tool individually:

def main():
    """
    Main function to initialize and execute the graph.
    """
    # Initialize MongoDB connections
    mongodb_client, vs_collection, full_collection = init_mongodb()
    
    # Initialize the ChatOpenAI model with API key
    llm = ChatOpenAI(openai_api_key=key_param.openai_api_key, temperature=0, model="gpt-4o")
    
    tools = [
        get_information_for_question_answering,
        get_page_content_for_summarization
    ]

    answer = get_information_for_question_answering.invoke(
    "What are some best practices for data backups in MongoDB?"
    )
    print("answer:" + answer)

    summary = get_page_content_for_summarization.invoke("Create a MongoDB Deployment")
    print("Summary:" + summary)
    

# Execute main function when script is run directly
main()