Fundamentals of Data Transformation / Working with multiple collections using $lookup
Using $lookup and $set Stages
Below are sample documents from a sales collection and a customers collection, and an aggregation pipeline. The pipeline uses the $lookup, $unwind, $set, $match, and $project stages on the sales collection to join information from the customers collection to generate recommendations for customers based on past purchases.
Code
Sample document from the sales collection
{
_id: ObjectId('650e1f291b8e4bcd7c9a9d54'),
date: ISODate('2024-04-19T12:30:00.000Z'),
customer: ObjectId('650e1f291b8e4bcd7c9a9d52'),
books: [
{
_id: ObjectId('650e1f291b8e4bcd7c9a9c02'),
title: 'The Little Prince',
author: { name: 'Antoine de Saint-Exupéry', nationality: 'French' },
original_language: 'French',
date_of_original_publication: '1943',
genre: "Fantasy, children's fiction",
price: 18
},
{
_id: ObjectId('650e1f291b8e4bcd7c9a9c03'),
title: 'The Alchemist',
author: { name: 'Paulo Coelho', nationality: 'Brazilian' },
original_language: 'Portuguese',
date_of_original_publication: '1988',
genre: 'Fantasy',
price: 21
}
],
promo_code: null,
sales_tax: 0.07,
total: 41.55
}
Sample document from the customers collection
{
_id: ObjectId('650e1f291b8e4bcd7c9a9d45'),
name: { first: 'Liam', last: 'Miller' },
email: 'liam.miller@example.com',
address: { city: 'Denver', state: 'CO' },
sales: [
ObjectId('650e1f291b8e4bcd7c9a9d46'),
ObjectId('650e1f291b8e4bcd7c9a9d47'),
ObjectId('650e1f291b8e4bcd7c9a9d48'),
ObjectId('650e1f291b8e4bcd7c9a9d49')
]
}
Aggregation Pipeline
The output of this aggregation pipeline joins customer names and email addresses from a second collection, called customers, with our sales collection data to generate recommendations for customers.
The $lookup stage combines data from the customers collection with data from the sales collection.
The $unwind stage deconstructs the customer_details array, so that each customer in the array will be represented by its own document in the output of this stage.
The $set stage is an alias for the $addFields stage. It creates a new field named childrensBooks using $filter to extract books with the genre "Children's literature" from the books array.
The $match stage filters for any documents that have a null value for the childrensBooks field so that all documents in the output of this stage contain at least one book in the childrensBooks array.
The $project stage removes the _id field from the output documents, combines the customer’s first and last name into a single string, and includes the customer’s email and the childrensBooks field we created with our $set stage.
db.sales.aggregate([
{
$lookup: {
from: "customers",
localField: "customer",
foreignField: "_id",
as: "customer_details"
}
},
{
$unwind: {
path: "$customer_details"
}
},
{
$set: {
childrensBooks: {
$filter: {
input: "$books",
as: "book",
cond: {
$eq: ["$$book.genre", "Children's literature"]
}
}
}
}
},
{
$match: {
childrensBooks: {
$ne: []
}
}
},
{
$project: {
_id: 0,
customerName: {
$concat: [
"$customer_details.name.first",
" ",
"$customer_details.name.last"
]
},
email: "$customer_details.email",
childrensBooks: "$childrensBooks.title"
}
}
]);
Adding $out to the Aggregation Pipeline
To write the results of the aggregation pipeline to a collection, we can add the $out stage to the end of the pipeline. $out must be the final stage.
In this example, the $out stage will write the results of our pipeline to a collection in the same database called book_recommendations. If this collection doesn’t exist, then the $out stage will create a new collection. If the specified collection does exist, then all data in the collection will be dropped and replaced with the output of the aggregation pipeline.
db.sales.aggregate([
{
$lookup: {
from: "customers",
localField: "customer",
foreignField: "_id",
as: "customer_details"
}
},
{
$unwind: {
path: "$customer_details"
}
},
{
$set: {
childrensBooks: {
$filter: {
input: "$books",
as: "book",
cond: {
$eq: ["$$book.genre", "Children's literature"]
}
}
}
}
},
{
$match: {
childrensBooks: {
$ne: []
}
}
},
{
$project: {
_id: 0,
customerName: {
$concat: [
"$customer_details.name.first",
" ",
"$customer_details.name.last"
]
},
email: "$customer_details.email",
childrensBooks: "$childrensBooks.title"
}
},
{ $out: "book_recommendations" }
]);