Fundamentals of Data Transformation / Common Aggregation Pipelines: $match, $group, $unwind, $project

Common Aggregation Pipelines: $match, $group, $unwind, $project

Below is a sample document from a sales collection and an aggregation pipeline. The pipeline uses the $match, $group, $unwind and $project stages on the sales collection to generate a report showing the total revenue generated from book sales for each genre of book during 2024.

Code

Sample document from the sales collection

{
    _id: ObjectId('650e1f291b8e4bcd7c9a9d54'),
    date: ISODate('2024-04-19T12:30:00.000Z'),
    customer: ObjectId('650e1f291b8e4bcd7c9a9d52'),
    books: [
      {
        _id: ObjectId('650e1f291b8e4bcd7c9a9c02'),
        title: 'The Little Prince',
        author: { name: 'Antoine de Saint-Exupéry', nationality: 'French' },
        original_language: 'French',
        date_of_original_publication: '1943',
        genre: "Fantasy, children's fiction",
        price: 18
      },
      {
        _id: ObjectId('650e1f291b8e4bcd7c9a9c03'),
        title: 'The Alchemist',
        author: { name: 'Paulo Coelho', nationality: 'Brazilian' },
        original_language: 'Portuguese',
        date_of_original_publication: '1988',
        genre: 'Fantasy',
        price: 21
      }
   ],
    promo_code: null,
    sales_tax: 0.07,
    total: 41.55
  }

Aggregation Pipeline

This aggregation pipeline generates a report showing the total revenue generated from book sales for each genre of book during 2024.

The $match stage returns all documents that have a date field with a value in the year 2024.

The $unwind stage deconstructs the books array, so that each book sold in every sales document will be represented by its own document in the output of this stage.

The $group stage groups documents by the genre and calculates how much revenue they generated with the $sum operator.

The $project stage changes the group _id field name to genre and includes totalRevenue in the output.

db.sales.aggregate([
    {
        $match: {
            date: {
                $gte: ISODate('2024-01-01T00:00:00.000Z'),
                $lt: ISODate('2025-01-01T00:00:00.000Z')
            }
        }
    },
    { $unwind: { path: '$books' } },
    {
        $group: {
            _id: '$books.genre',
            totalRevenue: { $sum: '$books.price' }
        }
    },
    {
        $project: {
            _id: 0,
            genre: '$_id',
            totalRevenue: 1
        }
    }
]);

Adding $out to the Aggregation Pipeline

To write the results of the aggregation pipeline to a collection, we can add the $out stage to the end of the pipeline. $out must be the final stage. In this example, the $out stage will write the results of our pipeline to a collection in the same database called analysis_report. If this collection doesn’t exist, then the $out stage will create a new collection. If the specified collection does exist, then all data in the collection will be dropped and replaced with the output of the aggregation pipeline.

db.sales.aggregate([
    {
        $match: {
            date: {
                $gte: ISODate('2024-01-01T00:00:00.000Z'),
                $lt: ISODate('2025-01-01T00:00:00.000Z')
            }
        }
    },
    { $unwind: { path: '$books' } },
    {
        $group: {
            _id: '$books.genre',
            totalRevenue: { $sum: '$books.price' }
        }
    },
    {
        $project: {
            _id: 0,
            genre: '$_id',
            totalRevenue: 1
        }
    },
    {
        $out: 'analysis_report'
    }
]);