Indexing Design Fundamentals / Index Key Order

7:45
So we've identified queries that support key workloads and examined those queries for common patterns and similarities. The next step in designing a good indexing strategy is determining how to structure compound indexes so that they optimally support those queries. In this video, we'll discuss how to order the index fields in a compound index. Then we'll apply these concepts to our example bank messaging app. Let's get started. Before we start to design our first compound index, it's important to understand that the order that we arrange the fields in the index impacts how well it will support the query. First, the order of index fields determines if a compound index can support a query at all. MongoDB can only use a compound index to support a query if the query includes the index prefix. In MongoDB, an index prefix refers to the leading subset of fields in a compound index. Let's look at an example. So in this compound index, the index prefixes are a and a b. MongoDB can use this compound index to support queries with these field combinations, a, a b, and a b c. It cannot support queries that don't include the prefix, such as queries on fields b, c, or b c. The order of index fields also determines how efficient an index is for a particular query. So what are the design considerations we should make? Well, if we want to ensure that all operations are supported by an index, we should place equality fields first, followed by sort fields, and finally range filters. This field order is referred to as ESR, and it's a good place to start when thinking about how to order fields in an index. The most important thing to remember is that equality matches come first, like the equality match on the a field in this example. To understand why this is so important, let's look at what happens when this condition isn't met. We'll follow this index and see how it would support different queries. The field names in this index are a, b, c, and this list contains sample index values. Let's say we have a query with a range on a, equality on b, and sort on c. With this query, range comes first in our index order. If we use the index to perform a range on a, where a is less than three, we can see that the index values of b and c are not in sorted order. So now, in order to do an equality match on b, we have to go through each of these entries to see which values match. This means that we can no longer use this index to efficiently do an equality match on b, nor can this be used to keep a sort on c from happening in memory. Let's look at a different query with the same index. Here, we're doing an equality match on b, a range on c, and sorting on a. With this query, sort now comes before an equality match in the index field order. Now when we sort first, we run into the same problem with an equality match on b and the range on c. The values for b and c in the matching index are not sorted, and we have to go through each value to find a match and the correct range. This means that all index keys and all documents will be examined, usually resulting in very poor docs and keys examined versus return documents ratios. Now let's look at what happens when we follow the ESR order. Here, we're doing an equality match on a, a sort on b, and a range on c. So the index is following the ESR order. Now after we find equality matches on a, b is still in order. So our index can still keep the sort on b from happening in memory. We still have to check every entry within this group of index values to do a range on c. However, this will still be much more efficient than a collection scan. Okay. So equality must come first in order for an index to efficiently support a query. Does it matter if sort or range come next? Yes. Because when a range comes before a sort in an index, the index can no longer support the sort operation. Take this example using the same index. With this query, we do an equality match on a, then do a range on b values that are less than three. But now c is no longer sorted. This means that the sort will happen in memory if it follows a range. So in order to keep sort operations from happening in memory, place sorts before ranges in your index order. But it's also important to remember that ESR is just a guideline. The order that you use ultimately depends on your queries, the needs of your application, and business requirements. For example, there may be cases where it makes sense for you to choose equality range sort as your order, where you place a highly selective range after an equality match and before a sort. In this case, you would sort a small number of documents in memory. Now that we know more about index field order and ESR, let's apply the concepts that we just learned to our bank messaging app. These are the three queries that we need to create indexes for. They retrieve messages for our users inbox using different filters. We identified similarities between the fields used to filter in the first two queries. Both include user ID and time stamp, while the second query also includes a sort on the category field. Because of these similarities, it looks like we can support the first two queries with one index. Based on our analysis of the queries, we know that our index will need to include three index fields for user ID, time stamp, and category. We just need to determine the order of the fields. Both queries perform an equality match on user ID. So if we follow the ESR order, user ID should be placed first in our index order as part of the index prefix. Next, both queries perform a range operation on time stamp. Then the first query sorts on time stamp, while the second query sorts on category. If we follow ESR and place category before time stamp, this index cannot support the sort and range operations in the first query. So what do we do? In our case, we estimate that the range on time stamp in both queries is highly selective, meaning that it will narrow down the number of documents processed in memory. We can place time stamp in the second position and category in the third position. Category placed in the third position cannot be used for a sort, but it is still useful for quickly finding the value of category before sorting in memory. So while this index will perform better for the first query than the second query, we predict that the performance will be good enough for our application's needs. If the range on time stamp was not highly selective, then we may need to reconsider our strategy and create two different indexes. This seems like the optimal order for our index fields, and we'll test it out to be sure when we create the index later. First, let's recap what we learned in this video. When you're building compound indexes with MongoDB, the order of your index fields matters. Index key order can determine if an index supports your queries and how efficient your index is. To ensure that your index supports your queries, make sure that the index prefix aligns with your query patterns. And to improve the performance of your indexes, order your index fields by following a quality sort range or ESR.