AI Agents with MongoDB / Introduction to AI Agents

8:33
Just like any other software, AI agents need the correct data, environment, and dependencies to function properly. Even though they may seem intelligent, AI agents don't work through magic. In this lesson, we'll examine the data our AI agents need, install the packages needed for the environment, and create the basic scaffolding for the AI agent. To recap, our goal is to build an AI agent that can answer questions about MongoDB and summarize documentation pages. Before writing a single line of code for any AI agent, it's essential to plan what your agent will be doing. There are a few things you need to consider. First, you need to plan the tasks that you want the agent to perform. Next, define the data your agent will use, its types, sources, and how it will be processed. Lastly, establish the decision making logic outlining the agent's process and the data it uses for decisions. Keep in mind that careful planning results in efficient and effective AI agents that meet their objectives. Okay. Let's start by identifying the kinds of tasks we want our agent to perform. For our agent, we want it to be a MongoDB expert. To achieve this, it needs to be able to answer questions about MongoDB correctly. The agent will use vector search to semantically search the MongoDB documentation and use the returned information to answer questions. Next, we want the agent to summarize particular MongoDB documentation pages. For this, we'll extract the page title from the user's query and perform a find query to return the entire docs page. Then we'll use an LLM to summarize the returned page. Now that we know which tasks our agent will perform, let's move on to identifying and collecting the right data. This will determine how effectively our agent can answer user questions. In general, this part of the process most likely requires collaborations across many teams. Since this is a demonstration, I have gone ahead and prepared the data for us. Let's take a moment to review it. We'll be using two collections with our database, chunked_docs and full_docs. Let's break down the difference between these collections and how they will help us with tasks we outlined. The chunked_docs collection contains MongoDB documentation that has been split into smaller pieces or chunks. We've chunked the data to ensure the generative model receives the targeted relevant information it needs to answer a question. The body field contains the content of the documentation chunk that'll be used by the model to answer questions. Also, notice the embedding field at the end. That's a vector representation of the body field's content, which allows us to perform a vector search. To leverage vector search, we'll need to create a vector search index on the embedding field. We're using the voyage-3-lite model, which outputs embeddings with five hundred and twelve dimensions, but you can use any embedding model you like. Finally, we're using the cosine similarity function to measure similarity. With the vector search index in place, we can perform a vector search on chunk docs. Now what about the full_docs collection? The full_docs collection contains complete, un chunked documentation pages. While it's structured like chunked_docs, it doesn't split the content into smaller chunks, and it doesn't contain embeds. This is important because the full_docs collection will be used when users want a summary of specific documentation pages. For example, if someone asks, summarize the view database access history page, our agent will query for that page by title in the full_docs collection. The full page will be shared with the LLM for summarization. Chunking these documents would hinder effective summarization since we need all the content on the page to summarize it. With these two collections, our agent should have the necessary data to perform its tasks correctly. Now that we have the data, let's shift our attention to the packages we're going to use. We will describe our project configuration using a dot TOML file, which is a standard format to describe projects. Under the dependencies array, we can see the list of packages. First up is langchain, the framework that gives the application its structure. Think of it as a backbone that supports all the language model capabilities. Then to tap into the power of reasoning, we connect langchain with OpenAI's advanced large language models using the langchain-openai integration. This is where the thinking happens. To guide our agents' decision making and actions, we use langgraph to create a dynamic workflow. This allows the agent to respond intelligently to different user inputs. And to give the agent a memory, we use langgraph-checkpoint-mongodb. This handy package saves the agent's state in a MongoDB database so it can remember past interactions and maintain context across sessions. For seamless interaction with our database, we rely on PyMongo, the official MongoDB driver for Python. Finally, to enable the agent to perform a vector search and find relevant information to answer a query, we use Voyage AI and its embedding models (voyageai). These models convert text into numerical representations that help us find the semantic meaning of the query and relevant content. Together, these dependencies will allow our agent to take action. Keep in mind, you're not limited to these packages. You can use MongoDB in most popular frameworks and libraries for creating agents, or you can always build your own solution and use it with MongoDB. Once we have all packages installed, we can begin writing the code that will ultimately become our agent. To start, create a file named main.py and import the key_param file with your connection string and various API keys for the models. After that, import PyMongo (MongoClient) and OpenAI (ChatOpenAI). Now that we've imported a couple packages, let's set them up. Create a function below the imports named init_mongdb. In this function, we're configuring a connection to MongoDB using the mongo_client and the connection string stored in key_param.mongodb_uri. We're defining a database name, ai_agents, and getting references to two collections, chunked_docs and full_docs. The function returns three things, the mongodb_client itself, the chunked_docs collection where we'll perform vector search, and the full_docs collection where we'll execute a find query. These will be used by our agent to retrieve the information it needs to answer questions. Finally, let's create the entry point for our application. For this, we'll create a new function named main at the bottom of our file. Inside this function, we'll call the init_mongodb function to establish connections to our database and collections. This gives our agent access to the information it needs. Next, we initialize the language model that will power our agents' reasoning capabilities. We're using OpenAI's GPT-4o model, but you can use whichever model you prefer. Keep in mind that the main function is still incomplete. We'll be adding more functionality to it in the future lessons as we build out the agent's capabilities. This is just the starting point. Awesome job. Let's take a moment to recap what we've covered. First, we planned the agent's data needs, focusing on the chunked_docs for vector search and full_docs for summarization. After that, we reviewed the vector search index that will enable our agent to perform a vector search on the documents in the chunked_docs collection. Next, we reviewed the packages we'll be using to build our AI agent. And finally, we set up the basic structure of our application that we'll be using throughout this skill.