AI Agents with MongoDB / Build the agent and add memory
Welcome back. In our previous lesson, we connected our custom MongoDB tools to an LLM, enabling it to answer questions and retrieve information. That was a great start, but let's make our agent more powerful. In this lesson, we'll explore LandGraph, a framework that helps us build sophisticated agents with decision making abilities within the Langchain ecosystem.
We will make a MongoDB agent that can process queries through multiple steps, choosing the right tools at the right time to fill user requests.
So far, our implementation has been fairly straightforward, where each query is processed independently. While this works for simple tasks, real world scenarios with several steps require AI agents to remember past actions and context to be truly effective.
One way we can achieve this is using a well known data structure called a graph. Graphs offer a streamlined way to represent these multistep processes.
They are particularly good at representing complex relationships, such as multiple nodes interconnected through multiple edges, incorporating conditional logic, which enables different paths based on conditions, modeling cyclical processes with the added benefits of supporting loops and feedback mechanisms, flexibility to quickly adapt to change.
Finally, they provide visual clarity, making complex workflows easier to understand.
By using graphs, AI agents can maintain context across interactions and navigate multi step workflows intelligently, mirroring real world problem solving much more effectively.
LangGraph is a framework designed for constructing AI agents. It allows us to create a cyclical graph, a structure where information can flow in loops rather than in a single direction.
This is perfect for building agents that need to make complex decisions or engage in multi turn reasoning.
Within an application using LangGraph, state is implemented through a shared graph that represents the current snapshot of your application as it progresses through different steps of execution.
In a LangGraph application, state tracks the current information flowing through your graph workflow.
While memory in the context of state refers to the storage that can either persist state for your current session or extend across multiple conversations, giving your AI the ability to work call information over time, As we build our MongoDB agent, we'll see how these concepts work together in practice. Our agent will use state to track the current conversation flow and decision making process.
State allows it to handle more complex queries by properly sequencing tool usage and reasoning steps. In an upcoming video, we'll also learn about how we can persist our agent state using MongoDB as a memory store. Let's begin building our agent with langgraph by defining our state graph.
The state will contain the schema of the graph and reducer functions. The reducer functions will inform how the state will be updated. First, let's import all the modules we'll need for this in the main.py file. We won't go into each one of them now, but we'll get to see them in action throughout the lesson.
Now we'll define a GraphState class, which will serve as the state for our agent. The GraphState class inherits from type dict, which enforces a strict structure for the state by defining specific keys and their expected value types.
Next, we'll define a single field named messages. We're using messages since most new LLMs have an interface that accepts a list of messages which represent inputs of chat models. You can add additional fields if you need to track more in the state. This field will include user inputs, AI agent output, as well as internal communications between the Graph components.
The annotated type with add_messages tells LangGraph to append new messages to the existing list rather than replacing it, preserving conversation history. Without a reducer annotation, keys would simply overwrite previous values.
Our graph performs two key functions.
Nodes process the current state and generate updated states.
Message updates are added to the existing history maintaining context by appending instead of replacing prior messages.
This allows our agent to maintain context throughout a conversation.
Now that we have the state defined, let's shift our attention to our graph. Graphs consist of nodes and edges. As a refresher, nodes are processing units that perform specific functions, and edges connect the nodes and define how information flows between them. The first node we'll implement is the agent node, the brain of our system.
Back in the main.py file, we'll create an agent node that takes the current state and the language model with tool bindings that we set up in the previous lesson. It then extracts the messages from the state, invokes the language model with these messages, and returns the result as a new message in the state.
This agent node serves as the decision maker, evaluating the conversation stored in the messages field to either provide a direct answer or determine which tool to use for further information.
Once done, the agent node will update the state, appending either a tool call or a direct response.
With the agent node set up, we can create the tool node. While the agent node decides on the necessary action, the tool node is responsible for executing the selected tools. Here, the tool node receives the current state and a dictionary that maps tool names to their functions.
The tool node function first extracts any pending tool calls issued by the agent node from the messages in the state. Once we have the tool calls, we loop through them. For each tool call, it looks up the appropriate tool function, invokes that tool with the provided arguments, and creates a tool message with the result.
Finally, it returns the tool messages as part of the updated state.
Okay. We have the agent node and tool node in place. Next, we need a way to decide whether to route to the tool node or simply end the process.
To do this, we'll create a router function, which represents an edge in the graph. This function receives the state and examines the latest message to check if it contains any tool_calls. If it does, we route it to the tools node to execute those tools. If not, we've reached the end of the processing cycle and can return the final answer to the user.
This routing mechanism is what creates the cyclical nature of the graph. The agent can decide to use tools, the tools can be executed, and then the agent can evaluate the results and potentially use more tools, all within a single conversation turn. With the nodes and router function finished, we can now assemble the complete graph.
To do this, we'll define an init_graph function that creates and connects all the elements of the graph. We begin by creating a new StateGraph using our GraphState class as the template. This establishes the foundation for our agents workflow.
Next, we add the two main nodes. The first is the agent node that invokes the LLM to make decisions. The second is the tools node that execute tools when needed. Each node is defined as a Lambda function that calls previously defined node functions with the appropriate parameters.
Now we need to connect these nodes with edges that define how information flows through our graph. We add an edge from the start point to the agent node, which means our workflow always begins with the agent analyzing the user's input. We also add an edge from the tools node back to the agent node, creating a cycle that allows the agent to process tool results and potentially use additional tools. Here's where things get interesting.
The conditional edge. This line creates a dynamic decision point in the graph. Conditional edges allow the graph to take different paths based on the current state. In this case, after the agent processes the input, we call our route_tools function to walk through our logic for decision making.
If the reasoning engine decides to use a tool, route_tools returns tools, and the flow continues to the tools node. If the reasoning engine has reached a final answer, route_tools returns end, and the graph execution stops. This dynamic routing is what enables our agent to handle complex queries that might require multiple tools or reasoning steps. Finally, we compile the graph, which becomes the complete agent delivering answer through a series of interconnected steps.
Now that we've built the graph, we need a way to run it and see the results.
For this demo, we'll create a simple execute_graph function that receives the l m with our graph and the user's input, and prints the state changes to the shell. We start by formatting the user's input so it can be passed to our agent as part of the messages field. Next, we start the agent streaming the results. Each time a node in our graph executes, it produces an output containing the node name and the updated state after that node runs.
This output is a dictionary, where the keys are node names and the values are the state updates from each node. By looping through this streaming output, we can observe each step of the agent's reasoning process, from the initial analysis to the tool usage to the final step of answer generation.
At the end, we extract the content of the final message to present the answer to the user. This approach gives us visibility into how our agent is working at each step, which is invaluable for understanding its decision making processes and for debugging any issues that might arise. Now that we have everything in place for our MongoDB agent, let's update our main function to run our agent. At the bottom of the main function, we create a dictionary of tools. After that, we initialize our graph, passing in the correct arguments. And finally, we execute our graph with two test queries. The first query about backup best practices tests our agent's ability to use vector search to find relevant information across documentation.
The second query requests a specific page summary, testing our agent's ability to recognize this pattern and use the Direct Page Lookup Tool.
When we run this code, we see the agent working through each step. I'm going to truncate the results since they have a lot of information in them. Here, the agent node is deciding to use the information retrieval tool to answer the backup practices question. It's not generating content directly. It's requesting information first. Now the tool node is executing the vector search tool and returning relevant documentation about MongoDB backup practices that is found in our database.
Finally, the agent nodes process the information retrieved by the tool and generate a comprehensive formatted answer about MongoDB backup best practices.
For the second query about MongoDB deployment, the agent would follow a similar process but use the page content retrieval tool instead, demonstrating its ability to select the appropriate tool based on the query type. Awesome job. That probably felt like a lot, but we now have a working agent that can make decisions on its own. Before we go, let's quickly recap what we learned.
In this lesson, we learned how to use LangGraph to build a stateful agent with decision making capabilities By creating a cyclical graph structure with conditional routing, we've enabled the agent to perform multistep reasoning. It can use tools as needed to gather information before providing thorough, accurate responses about MongoDB.
