Fundamentals of Data Transformation / Common Aggregation Pipelines: $match, $group, $unwind, $project
Common Aggregation Pipelines: $match
, $group
, $unwind
, $project
Below is a sample document from a sales
collection and an aggregation pipeline. The pipeline uses the $match
, $group
, $unwind
and $project
stages on the sales
collection to generate a report showing the total revenue generated from book sales for each genre of book during 2024.
Code
Sample document from the sales
collection
{
_id: ObjectId('650e1f291b8e4bcd7c9a9d54'),
date: ISODate('2024-04-19T12:30:00.000Z'),
customer: ObjectId('650e1f291b8e4bcd7c9a9d52'),
books: [
{
_id: ObjectId('650e1f291b8e4bcd7c9a9c02'),
title: 'The Little Prince',
author: { name: 'Antoine de Saint-Exupéry', nationality: 'French' },
original_language: 'French',
date_of_original_publication: '1943',
genre: "Fantasy, children's fiction",
price: 18
},
{
_id: ObjectId('650e1f291b8e4bcd7c9a9c03'),
title: 'The Alchemist',
author: { name: 'Paulo Coelho', nationality: 'Brazilian' },
original_language: 'Portuguese',
date_of_original_publication: '1988',
genre: 'Fantasy',
price: 21
}
],
promo_code: null,
sales_tax: 0.07,
total: 41.55
}
Aggregation Pipeline
This aggregation pipeline generates a report showing the total revenue generated from book sales for each genre of book during 2024.
The $match
stage returns all documents that have a date field with a value in the year 2024.
The $unwind
stage deconstructs the books
array, so that each book sold in every sales document will be represented by its own document in the output of this stage.
The $group
stage groups documents by the genre and calculates how much revenue they generated with the $sum
operator.
The $project
stage changes the group _id
field name to genre
and includes totalRevenue
in the output.
db.sales.aggregate([
{
$match: {
date: {
$gte: ISODate('2024-01-01T00:00:00.000Z'),
$lt: ISODate('2025-01-01T00:00:00.000Z')
}
}
},
{ $unwind: { path: '$books' } },
{
$group: {
_id: '$books.genre',
totalRevenue: { $sum: '$books.price' }
}
},
{
$project: {
_id: 0,
genre: '$_id',
totalRevenue: 1
}
}
]);
Adding $out
to the Aggregation Pipeline
To write the results of the aggregation pipeline to a collection, we can add the $out
stage to the end of the pipeline. $out
must be the final stage.
In this example, the $out
stage will write the results of our pipeline to a collection in the same database called analysis_report
. If this collection doesn’t exist, then the $out
stage will create a new collection. If the specified collection does exist, then all data in the collection will be dropped and replaced with the output of the aggregation pipeline.
db.sales.aggregate([
{
$match: {
date: {
$gte: ISODate('2024-01-01T00:00:00.000Z'),
$lt: ISODate('2025-01-01T00:00:00.000Z')
}
}
},
{ $unwind: { path: '$books' } },
{
$group: {
_id: '$books.genre',
totalRevenue: { $sum: '$books.price' }
}
},
{
$project: {
_id: 0,
genre: '$_id',
totalRevenue: 1
}
},
{
$out: 'analysis_report'
}
]);