Memory for AI Applications / What is Memory in AI Applications?

8:19
Think about the last time you talked with a friend. You didn't need to reintroduce yourself every time you spoke or have to summarize what you discussed last time. Memory makes conversations feel natural and productive. Now imagine an AI assistant that forgets everything the moment the conversation ends. Every interaction is like starting back at square one. Frustrating, right? In this video, we'll explore how memory works in AI applications, what a memory unit is, and why these concepts are so important for building intelligent agents. We'll look at the difference between what an LLM remembers during a single API call versus what an agent can persist across sessions. By the end, you'll understand the difference between short term and long term memory and see how they transform stateless applications into agents that learn and adapt. Before we get started, let's define what we mean by a memory unit. You can think of a memory unit as a little container that doesn't just store information, but also keeps track of when it was learned, how important or reliable it is, and how it connects to other memories. It's like a notebook with page markers and highlights full of context, meaning and reminders for later. Sometimes you'll hear these referred to as memory blocks. They help agents not only remember facts, but also understand how those facts fit together and when to use them. Now, you might be asking yourself, don't LLMs already have memory? After all, when you chat with one, it seems to remember what you said earlier in the conversation. But that's not true memory. It's what we call a context window. A context window is the text the model can see during a single API call. It includes your current message, prior outputs by the LLM, and any previous messages you've decided to include. The model then processes all this together to generate a response. However, the moment that API call ends, the context window is gone. The model only appears to remember what you said previously because your application resends the entire conversation history with each new request. If your application crashes, restarts, or you start a new session, the conversation begins from zero. As you can imagine, this can limit how useful or natural an LLM can be for users. Imagine a customer support chatbot that loses track of a user's issue every time the page refreshes. Not a great experience. Agent memory systems solve this problem by persisting data beyond the context window. Instead of relying on the model to remember, we store the conversation history externally. This way, the agent can pick up right where it left off, even after a restart. There are two common approaches to implementing this type of persistence, file systems and databases. Let's take a quick look at each. The file system approach involves storing conversation transcripts and session logs as files on disk. While this approach is simple and easy to debug, file systems have limitations. They're vulnerable to corruption when multiple processes write at the same time and they lack transaction guarantees. Efficiently querying conversation history also becomes more difficult as data grows. The database approach solves these problems with reliable storage that handles multiple users simultaneously and scales as your data grows. But that comes with a trade off as databases require additional infrastructure and hosting. The main thing to remember is the content of a context window is scoped to a single API call and discarded after, whereas agent memory is persistent. Now that we understand the difference between context windows and persistent agent memory, let's look at the two main types of agent memory, short term and long term. Short term memory helps keep track of what's happening right now. In an AI system, that usually means storing recent information in a database, similar to how our own short term memory helps us remember what just happened. If we leave the room and come back in, we still know who we're talking to and what we're talking about. When it comes to our agent, we can pause our application, restart it, and resume our conversation right where we left off. But just like our own brains, some information is important enough to keep around longer. Long term memory persists across all sessions, enabling the agent to recall and use information over time. Information stored in long term memory is available days, weeks, or even months later and across completely different conversations. This long term memory includes dynamic information gathered from interactions between a user and agent like conversation history, user preferences, and patterns observed and stored from past interactions. These are the details that make an agent feel intelligent and personalized. For example, if a user tells an agent that they have a peanut allergy, that's something worth remembering for future conversations. If they consistently ask questions about a particular topic, that pattern could inform future responses. Memory is what allows agents to build relationships with users over time. Understanding these two memory types is great in theory, but how do you decide what information belongs where? Well, here's a practical framework you can use. First, consider time. Ask yourself, does this information need to persist beyond the current session? And how long does it remain relevant? If the information is only useful during this conversation, that's short term. If the information is something that will still matter next week or next month, like a user's preferred language or writing style, that's long term. Second, consider scope. Is this information specific to one conversation, or does it apply across all of a user's interactions? Does it even extend to multiple users or agents? Conversation specific details go in short term memory. Patterns that apply broadly belong in long term memory. Finally, consider influence. How does this information shape the agent's behavior? Contextual details that help the agent understand the immediate conversation flow, like what you're currently discussing, fit into short term memory. Foundational information that fundamentally changes how the agent should respond to you across future interactions, like your communication style preferences or domain expertise, belongs in long term memory. One more thing to keep in mind, memory management is crucial. Like most data, memories aren't static. They update and change over time. Agents need to be able to revise, strengthen, or even forget certain memories as new information comes in. Good memory management helps keep the agent's responses accurate, relevant, and up to date. While memory management is important, it's a topic that goes beyond the scope of this skill. There are many strategies and techniques for handling updates, retention, and forgetting. But here, we're focused on understanding the basics of memory types and their importance to agents. Now that we've covered the basics, let's see how this works in practice. Think of a customer support chatbot. During a conversation, a user could ask, can you repeat that? Or what did you just say about refunds? These references to the immediate conversation flow belong in short term memory. They're only meaningful right now in this session. But if the system notices that this user always prefers detailed explanations over brief answers, that preference belongs in long term memory. The agent would then provide comprehensive responses in all future conversations, not just this one. This example shows how the same agent can use both types of memory in conjunction. Short term memory keeps the current conversation coherent while long term memory makes the agent smarter over time, allowing it to anticipate needs and personalize responses. This selective use of memory is what transforms a stateless AI application into an intelligent agent. Great job. In this video, we defined what memory means in the context of AI applications and how it's different from LLM context windows. We classified memory into short term and long term and introduced a framework based on time, scope, and influence to help you decide where information belongs. Finally, we saw real world examples of how agents use both memory types together.