RAG with MongoDB / Create a RAG Application
Welcome back. In the previous video, we built a retriever which took the user's query, vectorized it, and then used Atlas vector search to return relevant chunks of data.
Now, let's use those chunks to generate a response to the query and check if our rack system works.
We're going to build on the code from the last lesson. If you recall, we set mongodb atlas as our vector store in langchain.
Then we define a function called query data, which holds the retriever. In this video we'll build the answer generation component inside the query data function. Remember all the code will be provided at the end so you don't have to worry about coding along with me. Okay let's do it.
Looking at the diagram of our RAG system, we've completed the first half. We've vectorized a query and used it to retrieve relevant data, both of which will be included in the prompt. Now we shift our focus to engineering the prompt and generating an answer based on that prompt. Sometimes this part of the RAG system is called the generator.
The main purpose of the generator is to come up with a response to the query based on the context provided by the retriever.
Two main components of the generator are the prompt and the generative model which generates the answer. As a reminder, generative models are different from the embedding model used to create vectors. Generative models are better suited for creating human like responses and embedding models are best suited for creating vector embeddings.
Now that we have some background knowledge, let's write the code. First, let's import the necessary packages starting with the generative model. We'll use GPT, but you should use the model that works best for you.
Next, we'll import the prompt template, runnable pass through, and the output parser from langchain.
Don't worry if you're unfamiliar with these packages. I'll explain each of them as we use them.
Now let's put the prompt together. Below the retriever we'll define a template variable which we'll use to engineer our prompt. This template will be sent to the generative model. It contains instructions on how the model should respond. For example, we state that the model should use the provided context to answer the question, and we don't want the model to generate a response if there isn't any context provided. If we wanted to, we could also dictate the tone of voice for the model to use.
Additionally, you can describe aspects such as specificity, limitations, format, and much more. Feel free to be creative.
After that we include parameters for the context in question. Finally, let's instantiate the template by using the from template method on the prompt template class.
Then we'll pass in the template.
Now that we have our prompt let's ensure that it receives the context in question. To do this create a dictionary named retrieve. The first field in this dictionary will be context which holds a reference to the retriever.
Since the retriever returns a list of chunks and the template expects a string, we'll pipe a Lambda function that joins each list element and separates them by two new lines.
This will result in our chunks looking like multiple paragraphs.
The last field in the dictionary will contain the user's question or query.
Here we'll use runnable pass through which allows us to pass the user's question to the next stage. This will make more sense in a moment when we put everything together using the langchain expression language. We're getting close to the end. So far we have our retriever and prompt in place. Now we need to send the prompt to the model.
For this we'll define a variable named LLM which holds a reference to the generative model we're using. We'll use the string output parser to parse the output from the LLM. This converts the raw output of the model into a structured format that is easy to use in our code. With all the necessary components in place, let's chain them together to get our RAG system running.
To do this, we'll use the langchain expression languages pipe operator.
This works like the pipe operator in Linux, and it builds a pipeline similar to MongoDB's aggregation framework.
We chain all our components together to be run as a single unit. As data moves through the chain, it gets processed and pushed to the next stage.
Let's see what this looks like in our code. We'll define a variable called rag chain which will hold the chain we assemble. The first part of our chain is our retrieve variable which includes the retriever along with the user's query.
Next, we pipe in the prompt, which will receive the context and query from the retriever.
Once the prompt is built, we pipe in the generative model that it will be sent to. The generative model will output a response, which needs to be parsed, So we pipe in the parser.
Lastly we wrap our chain in parentheses which signifies that this chain is a single unit. Now we use the invoke method on our chain and pass in the query. We'll store the response in a variable called answer. We'll finish the query data function by returning the answer.
We've made it to the end. Let's test it now. Let's ask our RAG system what is the difference between a collection and database in MongoDB. After we run it, we see that in MongoDB, the main difference is that a database contains collections, which in turn contain documents.
This is cool. But let's test our prop by asking it something unrelated to MongoDB to see what it does.
Remember, we directed the model to only answer questions if context was provided. To test this, let's ask, why is the sky blue? Well, apparently, our RAG system doesn't know the answer, which is exactly the behavior we wanted. Great work.
We've successfully put together a simple RAG system. Keep in mind that you're not required to use a certain framework to achieve this result. You can apply this logic any way you want as long as you have the data, a retriever, and a generator. Let's recap what we learned.
First, we learned that the main purpose of the generator is to respond to a query based on the context provided by the retriever.
Two main components of the generator are the prompt and a generative model which creates the answer. We also learned that the generator uses a generative model which is different from an embedding model.
Finally, we built a generator and assembled our RAG system.
Great job creating your own RAG system. As a next step, I'd encourage you to experiment with your own data or with a prompt to see how the results might change.
