Fundamentals of Data Transformation / Using $sort and $limit Stages

4:26
Picture this. You work for a company with an online store and need to quickly and easily see the ten most recent reviews for the products you sell. How can you use MongoDB's aggregation framework to do this? In this video, we'll use the sort and limit stages to help you focus on just the data you need. Before we begin, let's briefly review the sort and limit stages. The sort stage orders the documents in the pipeline based on a specified field or fields. These fields can be in ascending or descending order. The limit stage restricts the number of documents passed to the next stage or final output to whatever number we specify. This stage is especially helpful when it appears after a sort stage. Whenever possible, MongoDB's query optimizer will combine the two stages so fewer documents are sorted. This keeps operations out of memory and improves performance. We'll demonstrate how these stages work with a simple example. We'll be working with data from an online bookstore app and focusing on the reviews collection. This collection contains fields for customer reviews and ratings of books. We want to figure out which books have the highest average ratings based on reviews from twenty seventeen to today. Let's use MongoDB's aggregation framework to do this. First, let's look at a document from the reviews collection. A review document contains references to a book ID from a book's collection and a customer ID in a customer's collection. It also contains a title, rating, the review text, and a time stamp. We'll be working with the book ID, rating, and time stamp fields in this example. We'll use a match stage to filter for reviews created after twenty seventeen. Then we'll use the group stage to group documents by book ID because a book will likely have more than one review document associated with it. When we group all reviews from one book together, we can calculate the average rating for each book using the average operator. Finally, we'll use the sort stage to output these documents in order from highest to lowest value of the average review field. We'll use the limit stage to limit the number of output documents to ten so we only see the top ten books by customer rating. Remember, when we use sort with limit, MongoDB's query optimizer will combine the two stages when possible to improve performance. In our example, MongoDB will only keep track of ten documents while sorting instead of dealing with potentially thousands or millions of documents in memory. Since we're looking for reviews that were left after the year twenty seventeen, we'll start with a match stage on the time stamp field that uses the greater than operator to find all reviews that were left in the year twenty eighteen or later. Next, we need to group reviews together by book ID. We add a new field called average rating. The value of this new field will be the average customer rating, which we calculate using the average operator on the rating field. The result is one document for each book ID that contains the average rating for that book. So we have the average rating for each book. Now we need to find the top ten books by customer rating. First, we need to sort the input documents by the average rating field in descending order. In the output, the documents with the highest average score will appear first. Then we'll use the limit stage so that the output only contains the ten documents with the highest average rating. Excellent. When we run this pipeline, we'll have the book IDs for the ten books with the highest average rating. If we wish, next steps could be to use the book ID with the lookup stage to retrieve the related book details from the book's collection for further analysis. Check out the lesson on lookup to learn more about how to accomplish this. Nice work. Let's recap what we covered in this lesson. The sort stage orders the documents in the pipeline based on a specified field or fields in either ascending or descending order. The limit stage restricts the number of documents passed to the next stage or final output to whatever number we specify. When we use sort with limit, MongoDB's query optimizer will coalesce or combine the two stages whenever possible to improve performance. Finally, we walked through an example where we used sort and limit to find the ten books with the highest average user rating. Great work. See you in the next video.