Fundamentals of Data Transformation / The Explain Plan

5:06
After you build your aggregation pipeline, you'll want to assess its performance. For example, how do you know if it uses indexes as expected? That's where an explain plan comes in. In this lesson, we'll talk about explain plans. We'll discuss what an explain plan is and how it can teach you more about your pipeline's execution and identify opportunities for optimization. We'll do this by taking a look at an example in the MongoDB shell and Compass. When building aggregation pipelines, we want to identify inefficiencies, optimize queries by adjusting indexes, and restructure or refine pipelines for better performance. An explain plan is one of the primary tools we can use to accomplish this. An explain plan in MongoDB is a detailed report that gives us information about possible plans for executing a query, including the winning plan that MongoDB selects. By default, it includes details about each stage of the query execution process, like index usage, the path data takes through collections, and the resources required for execution. Explain plans provide this information not only for the winning plan, but also for the rejected plans. MongoDB gives us a couple of options for viewing an explain plan. First, we can use the MongoSH method for explain chained with the aggregation method, like in this example, to return the explain plan. When we run it, it returns detailed information about how our pipeline is executed as JSON. The output contains a lot of information. To learn more about all of the information included in an explained plan and how to adjust the level of detail provided by the explained plan, visit MongoDB's documentation. For this video, the query planner and execution stats fields are a helpful place to start. The query planner field provides information about the winning plan selected by the query optimizer. In other words, how MongoDB will execute a given query. The execution stats field provides details about the winning plan execution, including execution times, number of documents examined, etcetera. By analyzing these fields, we can better understand the efficiency of queries, identify potential performance bottlenecks, and make informed decisions about query optimization and indexing strategies. We can also use MongoDB Compass to view and interact with the explained plan. Compass provides UI features to help us gain quick insights from an explained plan. Let's take a look at an example to learn more. For this example, we'll use an aggregation pipeline on the books collection to analyze trends and determine which book genre received the highest user ratings over the last five years. Let's look at this aggregation pipeline in Compass. First, we'll use the match stage to find all book documents that were published within the last five years. Then we'll use group to group documents by genre and calculate the average rating for the genre. And finally, we'll use the sort stage to sort in descending order to show the top genres by average rating. In the sample output, we can see that documents include the genre and average rating with the highest rated genre, dystopian, appearing first. To view the explain plan for this pipeline, we click on the explain button. On the explain plan page, we see the visual tree, which is a simplified way to view information from the query planner and execution stats fields. If we want to read the full output, we can select raw output. Now we can view info about the plan for executing our pipeline under query planner and execution stats for information about how the pipeline was actually executed. If we want a high level overview of how our query was executed, we can look at the query performance summary section. Let's review this to see if we can improve our query. Here, it's indicating that we have no index available for our query and is performing a collection scan. That tells me that we should add an index to improve our query performance. Group stages cannot use an index, and in this case, our sort stage can't either since it's sorting data that has already been processed by the pipeline. However, our match stage can use an index. Since we're filtering by publication date, we can add an index to the date of original publication field to support our query and avoid a costly collection scan. After adding an index, the explain plan visual tree shows an index scan, which means we're effectively using the index and examining fewer documents. Nice work. Let's recap what we covered in this lesson. An explain plan in MongoDB is a detailed report that gives us information about the execution of a query. We can use an explain plan to learn more about index usage, the path data takes through collections, the resources required for execution, and more. You can view and explain plan by using the explain MongoSH method when you run your aggregation pipeline or by viewing the output in MongoDB Compass. Finally, we looked at an example in Compass and learned that the statistics and insight provided by an explained plan make it an important tool for testing and optimizing your aggregation pipeline.