Fundamentals of Data Transformation / Working with multiple collections using $lookup

Using $lookup and $set Stages

Below are sample documents from a sales collection and a customers collection, and an aggregation pipeline. The pipeline uses the $lookup, $unwind, $set, $match, and $project stages on the sales collection to join information from the customers collection to generate recommendations for customers based on past purchases.

Code

Sample document from the sales collection

{
    _id: ObjectId('650e1f291b8e4bcd7c9a9d54'),
    date: ISODate('2024-04-19T12:30:00.000Z'),
    customer: ObjectId('650e1f291b8e4bcd7c9a9d52'),
    books: [
      {
        _id: ObjectId('650e1f291b8e4bcd7c9a9c02'),
        title: 'The Little Prince',
        author: { name: 'Antoine de Saint-Exupéry', nationality: 'French' },
        original_language: 'French',
        date_of_original_publication: '1943',
        genre: "Fantasy, children's fiction",
        price: 18
      },
      {
        _id: ObjectId('650e1f291b8e4bcd7c9a9c03'),
        title: 'The Alchemist',
        author: { name: 'Paulo Coelho', nationality: 'Brazilian' },
        original_language: 'Portuguese',
        date_of_original_publication: '1988',
        genre: 'Fantasy',
        price: 21
      }
   ],
    promo_code: null,
    sales_tax: 0.07,
    total: 41.55
  }

Sample document from the customers collection

{
    _id: ObjectId('650e1f291b8e4bcd7c9a9d45'),
    name: { first: 'Liam', last: 'Miller' },
    email: 'liam.miller@example.com',
    address: { city: 'Denver', state: 'CO' },
    sales: [
      ObjectId('650e1f291b8e4bcd7c9a9d46'),
      ObjectId('650e1f291b8e4bcd7c9a9d47'),
      ObjectId('650e1f291b8e4bcd7c9a9d48'),
      ObjectId('650e1f291b8e4bcd7c9a9d49')
    ]
  }

Aggregation Pipeline

The output of this aggregation pipeline joins customer names and email addresses from a second collection, called customers, with our sales collection data to generate recommendations for customers.

The $lookup stage combines data from the customers collection with data from the sales collection.

The $unwind stage deconstructs the customer_details array, so that each customer in the array will be represented by its own document in the output of this stage.

The $set stage is an alias for the $addFields stage. It creates a new field named childrensBooks using $filter to extract books with the genre "Children's literature" from the books array.

The $match stage filters for any documents that have a null value for the childrensBooks field so that all documents in the output of this stage contain at least one book in the childrensBooks array.

The $project stage removes the _id field from the output documents, combines the customer’s first and last name into a single string, and includes the customer’s email and the childrensBooks field we created with our $set stage.

db.sales.aggregate([
    {
        $lookup: {
            from: "customers",
            localField: "customer",
            foreignField: "_id",
            as: "customer_details"
        }
    },
    {
        $unwind: {
            path: "$customer_details"
        }
    },
    {
        $set: {
            childrensBooks: {
                $filter: {
                    input: "$books",
                    as: "book",
                    cond: {
                        $eq: ["$$book.genre", "Children's literature"]
                    }
                }
            }
        }
    },
    {
        $match: {
            childrensBooks: {
                $ne: []
            }
        }
    },
    {
        $project: {
            _id: 0,
            customerName: {
                $concat: [
                    "$customer_details.name.first",
                    " ",
                    "$customer_details.name.last"
                ]
            },
            email: "$customer_details.email",
            childrensBooks: "$childrensBooks.title"
        }
    }
]);

Adding $out to the Aggregation Pipeline

To write the results of the aggregation pipeline to a collection, we can add the $out stage to the end of the pipeline. $out must be the final stage.

In this example, the $out stage will write the results of our pipeline to a collection in the same database called book_recommendations. If this collection doesn’t exist, then the $out stage will create a new collection. If the specified collection does exist, then all data in the collection will be dropped and replaced with the output of the aggregation pipeline.

db.sales.aggregate([
    {
        $lookup: {
            from: "customers",
            localField: "customer",
            foreignField: "_id",
            as: "customer_details"
        }
    },
    {
        $unwind: {
            path: "$customer_details"
        }
    },
    {
        $set: {
            childrensBooks: {
                $filter: {
                    input: "$books",
                    as: "book",
                    cond: {
                        $eq: ["$$book.genre", "Children's literature"]
                    }
                }
            }
        }
    },
    {
        $match: {
            childrensBooks: {
                $ne: []
            }
        }
    },
    {
        $project: {
            _id: 0,
            customerName: {
                $concat: [
                    "$customer_details.name.first",
                    " ",
                    "$customer_details.name.last"
                ]
            },
            email: "$customer_details.email",
            childrensBooks: "$childrensBooks.title"
        }
    },
    { $out: "book_recommendations" }
]);