Indexing Design Fundamentals / Identifying Key Workloads
All queries running on a production system should be supported by an index, but indexes come at a cost.
We should only create the minimum amount of indexes needed to support our queries.
To make sure that we only create the indexes that we need, we must understand our key queries and query patterns for critical workloads. In this video, we'll identify and analyze key queries for an example app in order to determine how many indexes we need to create.
In our example, we're working on a messaging application that is used by a financial institution, Leafy Bank, for secure and seamless communication with their customers.
MongoDB stores several collections for this application.
For this demo, we'll focus on the messages collection.
This collection stores all communications exchanged between the bank and its customers.
A typical document in the messages collection looks like this. Currently, this collection doesn't have any indexes except for the default index on the underscore ID field. In this demonstration, we'll create a few of the indexes necessary to support queries on the messages collection.
The queries that we'll focus on are a part of a workload that retrieves and filters messages for a user's inbox.
Luckily, we have a framework to guide us through the process. To do this, we need to identify and analyze queries for key workloads, create the minimum number of indexes needed to support those queries, and monitor and maintain those indexes. We'll start by identifying and analyzing queries that retrieve messages for our users inbox.
At this stage, we want to identify common patterns and components shared across queries.
With this information, we can create the minimum number of indexes that we need. This step will help us avoid creating redundant indexes.
Ideally, identifying and analyzing queries will happen in a preproduction environment.
However, as your application evolves, the nature of data access and query patterns may change, requiring adjustments to queries and indexes. To identify if a MongoDB collection needs an index, start by monitoring for slow performing queries using MongoDB Query Profiler.
For more information, check out our skill badge on monitoring tools.
Let's take a look at the queries that we'll be working with in this demo. The first query is used to retrieve documents from the messages collection for a specific user identified by user ID, where the document time stamps fall within a specified date range.
The results are sorted in descending order based on the timestamp.
This ensures that the most recent entries in the given date range appear first.
This query pattern is commonly used for displaying recent activity or messages for a user within a certain time period.
The second query is used to find all messages for a user within a time range and sort those messages by category.
This feature helps users quickly find and manage important notifications related to specific categories, such as transactions, loans, or system updates. The final query filters documents by user ID and is read fields. This query is used to track which messages have been read by a user. It can also be used to serve regulatory or compliance purposes by providing an audit trail of message engagement. Now let's compare the queries to see if we can find any similarities.
By identifying similarities between our queries, we can determine the fewest number of indexes that will need to support them. Because in many cases, similar queries can be supported by the same index. All three queries filter documents based on the user ID field.
The first two queries also filter within a particular date range using the time stamp field, the first source by time stamp, and the second by category.
The third query filters using the is read field in addition to user ID. Based on these patterns, we can create only two compound indexes to support all three queries.
A compound index supports queries using two or more fields and can reference up to thirty two fields.
If you're coming from a relational database background, a compound index is similar to a composite index. Like a single field index, the leaf nodes of a compound index store field values and the record IDs that point to relevant documents in the collection.
Notice that combined values are sorted first by the first field or prefix and then by the second field.
Back to our example, we can use one compound index on user ID, time stamp, and category to support the first two queries.
In addition to this index, we'll also need to create a compound index on user ID and is read for the third query.
So we will be able to support all three queries with just two compound indexes. While compound indexes are not the only type of index provided by MongoDB, they're the most commonly used.
We'll explore creating the compound indexes for this scenario to learn how to apply best practices.
Let's recap what we learned in this video.
We introduced you to our example bank messaging app and explained why we need to create indexes on the messages collection. We took the first step in this process by identifying queries that can benefit from an index.
Finally, we analyze the queries to identify similarities.
This analysis helped us develop the minimum number of indexes that efficiently support all our query requirements.
