Fundamentals of Data Transformation / Working with multiple collections using $lookup
Using $lookup
and $set
Stages
Below are sample documents from a sales
collection and a customers
collection, and an aggregation pipeline. The pipeline uses the $lookup
, $unwind
, $set
, $match
, and $project
stages on the sales
collection to join information from the customers
collection to generate recommendations for customers based on past purchases.
Code
Sample document from the sales
collection
{
_id: ObjectId('650e1f291b8e4bcd7c9a9d54'),
date: ISODate('2024-04-19T12:30:00.000Z'),
customer: ObjectId('650e1f291b8e4bcd7c9a9d52'),
books: [
{
_id: ObjectId('650e1f291b8e4bcd7c9a9c02'),
title: 'The Little Prince',
author: { name: 'Antoine de Saint-Exupéry', nationality: 'French' },
original_language: 'French',
date_of_original_publication: '1943',
genre: "Fantasy, children's fiction",
price: 18
},
{
_id: ObjectId('650e1f291b8e4bcd7c9a9c03'),
title: 'The Alchemist',
author: { name: 'Paulo Coelho', nationality: 'Brazilian' },
original_language: 'Portuguese',
date_of_original_publication: '1988',
genre: 'Fantasy',
price: 21
}
],
promo_code: null,
sales_tax: 0.07,
total: 41.55
}
Sample document from the customers
collection
{
_id: ObjectId('650e1f291b8e4bcd7c9a9d45'),
name: { first: 'Liam', last: 'Miller' },
email: 'liam.miller@example.com',
address: { city: 'Denver', state: 'CO' },
sales: [
ObjectId('650e1f291b8e4bcd7c9a9d46'),
ObjectId('650e1f291b8e4bcd7c9a9d47'),
ObjectId('650e1f291b8e4bcd7c9a9d48'),
ObjectId('650e1f291b8e4bcd7c9a9d49')
]
}
Aggregation Pipeline
The output of this aggregation pipeline joins customer names and email addresses from a second collection, called customers
, with our sales
collection data to generate recommendations for customers.
The $lookup
stage combines data from the customers
collection with data from the sales
collection.
The $unwind
stage deconstructs the customer_details
array, so that each customer in the array will be represented by its own document in the output of this stage.
The $set
stage is an alias for the $addFields
stage. It creates a new field named childrensBooks
using $filter
to extract books with the genre "Children's literature" from the books
array.
The $match
stage filters for any documents that have a null value for the childrensBooks
field so that all documents in the output of this stage contain at least one book in the childrensBooks
array.
The $project
stage removes the _id
field from the output documents, combines the customer’s first and last name into a single string, and includes the customer’s email and the childrensBooks
field we created with our $set
stage.
db.sales.aggregate([
{
$lookup: {
from: "customers",
localField: "customer",
foreignField: "_id",
as: "customer_details"
}
},
{
$unwind: {
path: "$customer_details"
}
},
{
$set: {
childrensBooks: {
$filter: {
input: "$books",
as: "book",
cond: {
$eq: ["$$book.genre", "Children's literature"]
}
}
}
}
},
{
$match: {
childrensBooks: {
$ne: []
}
}
},
{
$project: {
_id: 0,
customerName: {
$concat: [
"$customer_details.name.first",
" ",
"$customer_details.name.last"
]
},
email: "$customer_details.email",
childrensBooks: "$childrensBooks.title"
}
}
]);
Adding $out
to the Aggregation Pipeline
To write the results of the aggregation pipeline to a collection, we can add the $out
stage to the end of the pipeline. $out
must be the final stage.
In this example, the $out
stage will write the results of our pipeline to a collection in the same database called book_recommendations
. If this collection doesn’t exist, then the $out
stage will create a new collection. If the specified collection does exist, then all data in the collection will be dropped and replaced with the output of the aggregation pipeline.
db.sales.aggregate([
{
$lookup: {
from: "customers",
localField: "customer",
foreignField: "_id",
as: "customer_details"
}
},
{
$unwind: {
path: "$customer_details"
}
},
{
$set: {
childrensBooks: {
$filter: {
input: "$books",
as: "book",
cond: {
$eq: ["$$book.genre", "Children's literature"]
}
}
}
}
},
{
$match: {
childrensBooks: {
$ne: []
}
}
},
{
$project: {
_id: 0,
customerName: {
$concat: [
"$customer_details.name.first",
" ",
"$customer_details.name.last"
]
},
email: "$customer_details.email",
childrensBooks: "$childrensBooks.title"
}
},
{ $out: "book_recommendations" }
]);