Fundamentals of Data Transformation / The Explain Plan

Using an Explain Plan

Below is the code used to run the explain() method with mongosh to return an explain plan. An explain plan is a report that gives us information about possible plans for executing a pipeline, including the winning plan that MongoDB selects.

Code

explain() Method

Use the explain() method, chained with the aggregate() method to return the explain plan in mongosh.

db.collection.explain().aggregate()

Aggregation Pipeline

The output of this aggregation pipeline tells us which book genre received the highest user ratings over the last five years. The explain() method will generate an explain plan for this aggregation pipeline.

The $match stage filters for books published after 2020.

The $group stage groups documents by genre and calculates the average rating for the genre using the $avg operator.

The $sort stage sorts the genre_average_rating field in descending order to show the top genres by average rating.

db.books.explain().aggregate(
    [
      {
        $match: {
          date_of_original_publication: {
            $gte: '2020'
          }
        }
      },
      {
        $group: {
          _id: '$genre',
          genre_average_rating: {
            $avg: '$average_rating'
          }
        }
      },
      { $sort: { genre_average_rating: -1 } }
    ]
  );

Here is some example output from such a query:

winningPlan: {
  isCached: false,
  queryPlan: {
    stage: 'GROUP',
    inputStage: {
      stage: 'COLLSCAN',
      filter: { date_of_original_publication: { '$gte': '2020' } },
      direction: 'forward'
    }
  }
}
…
rejectedPlans: []
…
{ '$sort': { sortKey: { genre_average_rating: -1 } } }
…

We can see the winningPlan performs a collection scan, and then filters all documents in the collection to pass along only those with a value of 2020 or greater in the “date_of_original_publication” field. In this case, there are no rejected plans because there are no indexes supporting this query (which is also why we see a collection scan). The sort is performed only after finding and executing the best plan to find and group the data.

If we add an index on date_of_original_publication, we can see the winningPlan changes:

…
winningPlan: {
  queryPlan: {
    stage: "GROUP",
    inputStage: {
      stage: "FETCH",
      inputStage: {
        stage: "IXSCAN",
        keyPattern: { date_of_original_publication: 1 },
        indexName: "date_of_original_publication_1",
        indexBounds: { date_of_original_publication: [ '["2020", {})' ] }
      }
    }
  }
}
…
{ '$sort': { sortKey: { genre_average_rating: -1 } } }
…

And we’re now leveraging an index for our group stage.