Indexing Design Fundamentals / Index Size
By now, you understand that the order of your index keys determines if the index will support a query and how efficient it is at finding relevant documents.
You also know that we can use the ESR guideline to help determine the order of fields.
But what happens if we have two or more fields performing an equality match in a query? Does it matter which comes first? We could be tempted to say that it doesn't because the order won't make the queries any faster or slower, but it will affect the size of your index.
In this video, we'll discuss how an index prefix and prefix compression can affect the size of an index. We'll also show you how partial indexes can help with resource utilization.
Then we'll apply what we learned to our bank messaging app.
Let's briefly recall where we are in the process of creating indexes for our bank messaging app. To support these three queries, we need to create two indexes.
All three queries are used to retrieve and filter messages for a user's inbox.
So far, we've designed one compound index to support the first two queries.
Now we need to design a second compound index to support the third query.
Query three displays all unread messages for one user in their inbox.
It helps us query for messages that haven't been seen by customers in order to comply with regulations on bank notifications.
This query uses an equality match on user ID and is read. We need to create a compound index on user ID and is read, but the query performs an equality match on both fields. No matter which order we choose, both options will index the same number of documents. The b plus tree for each option will have the same depth, which means that performance for both indexes will be the same.
So why does the order matter?
Well, the order of the equality match fields in the index won't affect query execution times, but it can affect the size of the index.
This is because of MongoDB's index format, which uses prefix compression.
MongoDB uses prefix compression to store index keys more efficiently by storing identical prefixes only once per page.
So the more identical prefixes that we have, the fewer we have to store. This means that if the index prefix includes a field with a low number of distinct values, otherwise known as a low cardinality field, then the index will actually be smaller.
A smaller index means more efficient resource usage, so we wanna take advantage of prefix compression whenever possible. Let's take a look at how this applies to the fields that we need to index for our third query.
User ID is an example of high cardinality because each user has a unique ID.
Is red, on the other hand, is low cardinality because it is a boolean and only has two possible values, true or false?
In this case, using is red as the first field will make our index much smaller because it is a low cardinality field. This is great, but there's actually one more thing that we can do to make this index even smaller.
In our case, we're only ever querying on is read false to find unread messages.
This means that we really only care about this sliver of values on the left side of our index.
Rather than indexing all documents in the messages collection, we can use a partial index to only index documents where is read is false. Partial indexes only index documents in a collection that need a specified filter expression.
They have lower storage requirements and can reduce the performance cost for index creation and maintenance.
By adding a partial filter expression when we create the index for this query, only documents where is read is false will be included in the index. This will drastically reduce the size of the index over time. Nice work. We've decided on indexes for all three of our queries. Now we need to create them. First, let's recap what we covered in this video.
A smaller index means more efficient resource usage, so we wanna keep our indexes as small as possible.
MongoDB uses index prefix compression to reduce the size of an index.
When our query patterns allow and we have more than one equality match, use a low cardinality field as the index prefix to further reduce the size of an index. Partial indexes only index documents in a collection that need a specified filter expression.
Use a partial index whenever possible to further reduce the size of an index.
