RAG with MongoDB / Identify RAG Architecture

4:05
Building a RAG system involves combining various components such as search solutions, data storage systems, and different types of models. We now have powerful tools that streamline the integration of these elements, making the development of AI powered applications more straightforward. In this video, we'll check out some integrations and frameworks that can make it easier to put all these components together as you build a RAG system with Atlas Vector Search. Keep in mind that this is not going to be an exhaustive list. Atlas Vector Search is highly extensible and can meet you anywhere your needs are. If fully managed services or frameworks aren't your thing, you can always build your own homegrown solution. With that said, let's get started. Building any kind of application requires a substantial amount of infrastructure, and AI apps are no different. If you wanna skip all the setup, you can use a fully managed service like Amazon Bedrock. Amazon Bedrock makes it easy to use popular models while also providing the necessary security and privacy for your systems. To use Amazon Bedrock with MongoDB Atlas, we can set up Atlas as a knowledge base. A knowledge base is a repository that stores and organizes data so it's easy to retrieve information that models use to generate responses. We just need to direct Amazon Bedrock to our data sources such as an S3 bucket filled with PDFs. It consumes this data, creates chunks and embeddings, and securely stores everything in Atlas. Once this is set up, we create a Bedrock agent that uses the knowledge base to perform specific tasks. To learn more about how to use Amazon Bedrock with Atlas, check out the MongoDB Docs. Now, let's talk about some of the frameworks we can use. As a developer, chances are you've used frameworks before. Maybe you've harnessed a front end framework like React, or if you've worked in back end development you've used something like Express. AI Frameworks are similar. They come bundled with a toolkit and functionalities designed to simplify the development process. Several of these frameworks have support for Atlas, including Microsoft Semantic Kernel, OAMA Index, and Lang Che. If you're interested in learning more about Semantic Kernel or OAMA Index, check out the MongoDB docs or DevCenter tutorials. In this unit we're gonna use the Langchain framework to build our own RAG application. Langchain lets us easily connect the components of a RAG system such as the retriever and answer generator using chains. Like aggregation pipelines in MongoDB, these chains consist of a series of interconnected tasks that process and transform data step by step. We can connect multiple chains in a declarative manner using the langchain expression language or LCEL for short. Now that we know a little about langchain, let's set up our Python project by installing langchain and some other packages we'll use. Keep in mind, langchain has support for multiple languages. To set up our environment, let's use pip3 to install the following libraries. First up, we'll install langchain, langchain community, and langchain core. These packages give us access to langchain's features. Next, we install langchain OpenAI so we can use models from OpenAI. After that, we'll add a langchain mongrdb and pymongr to our toolkit so we can use Atlas as our data store and vector search solution. Finally we'll install pypdf to make it easy to work with PDFs. Don't worry if you're feeling a little lost with all the packages we just installed. It'll all start to make sense once we begin putting the RAG system together. Before we move on, let's take a moment to recap what we've learned. First, Atlas Vector Search can be used with many integrations and frameworks. Then, we learned that Amazon Bedrock is a fully managed service you can use to create a RAG system with Atlas Vector Search as a knowledge base. On the other hand, you can use a framework such as Microsoft Semantic Kernel, WAMMA index, or langchain. If those don't work for you, you can always build your own solution. In this unit, we'll use langchain Python libraries. So we installed a few dependencies for this project. Now that we have everything installed, let's prepare our data in the next video.