Data Ingestion for RAG Applications

Learn about the Data Ingestion Pipeline for retrieval-augmented generation (RAG) applications.

Overview

In this Learning Byte, a MongoDB expert will introduce you to the data ingestion pipeline for retrieval-augmented generation (RAG) applications. 

You’ll learn the importance of quality data for RAG applications and some common sources of data. Next, you'll be introduced to the different stages of the pipeline with a focus on the first stage: collection and formatting.

We'll end with a hands-on demo of collecting and formatting tablular data from a PDF using PyMuPDF.

Chapters

  • Intro
  • RAG Data Ingestion Pipeline
  • Collection & Formatting
  • Demo
  • Summary