Memory for AI Applications / Implementing Long-Term Memory in AI Applications

10:31
By now, you should be familiar with how to implement a short term memory solution with check pointers to keep our contact state within a conversation intact. But that only helps us in a single conversation with our agent. Wouldn't it be nice to be able to start a new conversation about a different topic but to have the agent be able to recognize your preferences from past interactions? That's what we're going to do today. In this video, we'll implement long term memory using MongoDB. First, we'll set up a MongoDB vector store for long term memory storage and retrieval. Next, we'll create memory tools that save and retrieve information across threads, giving our agents the ability to remember users across conversations. For long term memory, we need two things, a place to store memories and a way to find them by meaning. That's where MongoDB and Voyage AI come in. Our setup has three components, MongoDB for storage, Voyage AI for embeddings and LangGraph's MongoDBStore to tie them together. Let's look at each one. MongoDB vector search gives us native vector similarity search without needing a separate vector database. We can store both the structured memory metadata and the vector embeddings in one unified data platform. Voyage AI provides state of the art embedding models optimized for semantic understanding. When we save a memory like I love Italian food, Voyage AI converts it to a vector. Later, when the user asks what kind of cuisine do I like, we convert that query to a vector and find memories with similar meaning even though the words are completely different. The MongoDB store from LangGraph ties it all together. It handles embedding generation and storage automatically so we can focus on building the agent logic. Together, these three components give us long term memory with semantic retrieval. We're going to configure all three components in order. The Voyage AI embedding model, the vector search index, and the store connection. So by the end of this section, we'll have working memory tools. First, we need a few new packages. We import MongoDBStore and create_vector_index_config from LangGraph for storage and configuring the vector index. Voyager AI embeddings handles our embedding model. The @tool decorator turns functions docstring into the tools description, helping the model decide when to call it and enabling us to define memory tools while get_config gives us access to the user ID at runtime. Next, we configure the embedding model and vector search. We're using Voyage AI's voyage-4 model, which produces 1024 dimensional vectors by default. The create _vector_index_config function configures our 1024 dimensional vector search index to use dotProduct similarity on the content field. Next, we create the store itself. We connect to MongoDB, specify a database and collection for our memories, and create the MongoDB store with our index configuration. MongoDB Atlas will use this configuration to enable semantic retrieval through vector search. With that, we have a connection to our MongoDB database and a place to keep our memories. But we need a way to tell our agent to store the memories and how to retrieve them. For that, we'll first need to create a tool that saves memories to the MongoDB store. Inside our save memory function, we use get_config to access the current user_id from the runtime configuration. The key line is "store.put". We pass a namespace, key, and a value as the three arguments. The namespace, user, user_id, memories organizes memories by users. This tuple structure ensures each user's memories are isolated, meaning Sarah's memories won't mix with Mike's. The key is a unique identifier for each memory and the value contains the actual content. When we call "store.put", MongoDBStore automatically generates an embedding for the content and stores everything in MongoDB. Saving memories is only half the story. We also need a way to retrieve them. This tool takes a query string and searches for relevant memories. Again, we get the user ID from the configuration and construct the same namespace tuple. The magic happens in "store.search". This method converts the query to a vector using our Voyage AI embedding model and then performs a vector search in MongoDB. It returns the top five most semantically similar memories. Next, if we find results, we format them as a bulleted list. If not, we return a message saying no relevant memories were found. The agent uses this information to personalize its responses. Now let's create an agent that uses both short term and long term memory. In our example, the system prompts instruct the agent how to use its memory tools. When a message arrives, it should first check for existing memories then use them to personalize responses and finally save any new information worth remembering. In the create_agent call, we pass our two memory tools in the tools parameter. We also include the checkpointer from our short term memory setup. If you need a refresher on the check pointer, feel free to revisit our video on implementing short term memory. This gives the complete picture. The check pointer handles conversation history within threads while the memory tools handle persistent facts across threads. The agent decides autonomously when to call these tools. If a user says, my name is Sarah, the agent recognizes this as important information and calls save_memory. If a user asks, do you know my name? The agent calls retrieve_memories to check. This is the power of combining tools with a well crafted system prompt. The agent learns when and how to use its memory capabilities. Let's see long term memory in action. We'll start with Sarah in thread-1. Sarah introduces herself and mentions she's learning MongoDB. The agent recognizes this as important information and automatically calls save_memory to store it in their namespace. Now here's where long term memory really shines. Let's start a completely new thread. In thread-2, Sarah asks if the agent remembers her name. The agent calls retrieve_memories, searches her namespace, and finds the memory we saved earlier. It responds correctly even though this is a brand new conversation thread. This is the key difference from short term memory. With only the checkpointer, thread-2 would have no knowledge of thread one. But with long term memory, facts persist across all threads. The agent can remember Sarah no matter which thread she's in. Long term memory works across threads, but what about across users? Let's introduce Mike. Mike introduces himself and his interests in thread-3. The agent saves his information to Mike's namespace completely separate from Sarah's. Now let's test cross thread retrieval for Mike. In thread-4, Mike asks what the agent knows about him. The agent searches his namespace and retrieves his memories correctly. Cross thread memory works for Mike too. But the critical test is whether we have achieved user isolation. Mike asks about Sarah. The agent searches, but it only searches Mike's namespace. Sarah's memories exist in the database, but Mike can't access them. The namespace structure ensures complete user isolation. This is important for production systems. Users trust that their personal information will remain private. The namespace tuple, user, user_id, memories, helps maintain this isolation. Let's step back and visualize the complete memory architecture we've built. Short term memory uses the checkpointer. It stores conversation history within a single thread. When Sarah talks in thread-1, the checkpointer saves each message exchange. If she leaves and comes back to thread-1, the conversation continues seamlessly. But thread two is a fresh start. The checkpointer keeps the different threads isolated. Long term memory uses the store and memory tools. It stores facts across all threads. When Sarah shares her name in thread-1, the save_memory tool stores it in her namespace. When she asks about it in thread-2, retrieve_memories finds it through vector search. Both systems use namespaces to ensure user isolation. Sarah's data stays in Sarah's namespace. Mike's data stays in Mike's namespace. Neither can access the other's memories. Together, we have a working memory system. The checkpointer handles the conversation flow and the MongoDB store handles the persistent facts. One agent, two memory systems working in harmony. This is a very simple example but it highlights the kinds of capabilities we can give our agents. Whether we're building a single virtual agent or a multi agent workflow, memory enables them to complete complex multi step processes and continuously learn and adapt to their environment. Fantastic work. In this lesson, we implemented long term memory using MongoDB's vector search capabilities. We set up MongoDB Store with Voyage AI embeddings enabling semantic retrieval. We created the save_memory and retrieve_memories tools that the agent uses autonomously. And we demonstrated cross thread persistence. Sarah's name saved in thread-1 was retrieved in thread-2. We also confirmed that our users memories were kept in isolation. We now have both short term and long term memory working together. The checkpointer maintains conversation flow within threads. The MongoDB store maintains persistent facts across threads. Together, they create agents that truly remember. You can now build AI systems that store user specific facts in MongoDB and retrieve them across any conversation no matter how many threads.