Indexing Design Fundamentals / Introduction to Indexes
Indexes are an essential part of query optimization in MongoDB.
But to understand why and how they fit into a strategy, we need to understand how they work. In this video, we'll discuss what an index is, why they are so efficient for finding and retrieving documents, and briefly discuss some of the types of indexes that are available. Let's start by defining an index. Indexes are special data structures that store a small portion of the collection's data in an ordered format.
This format is optimized for fast search.
But why should you use indexes?
Consider an orders collection without any indexes. If we wanna find a document that contains a specific order ID, MongoDB has to search every document in the collection to find it. This is known as a collection scan. Why is this a problem?
Well, searching every document in a collection is time consuming.
Performing a collection scan will slow down read operations and negatively impact performance.
If we add an index on the order ID field, we can use the index to efficiently retrieve the documents we need instead of scanning the entire collection.
Let's take a look at how indexes do this. When a b+ tree is used to build an index, it consists of internal nodes that contain keys that help the database quickly traverse the index and leaf nodes that are the endpoint of the structure.
When an index is selected for a query, the database traverses the index starting at the root of the tree, then follows the pointers in the internal nodes down to the leaf nodes. The leaf nodes of the b+ tree contain keys that represent the index field value and record IDs for the actual document or documents that contain that value.
You can think of this record ID as a shortcut to where the document is stored. Once you find the leaf node, you can access the document.
In this example, the leaf node will include the value '3' and record ID that points to the corresponding document.
So instead of reading all of the documents in the collection to find a document with order ID 3, the index points the query towards the one document it is searching for. This is an example of a simple equality match, but indexes can help optimize range and sort operations too. Say we're querying for order IDs that are less than 4. The index allows us to quickly collect record IDs from leaf nodes starting from 1 until we reach the value of 4.
Additionally, the record IDs will be sorted, in this case, in ascending order.
Or say we have a query that sorts all order IDs in ascending order.
Without an index, this sort would happen in memory.
With an index on the order ID field, data is stored in order. So MongoDB can retrieve the documents by scanning the index either in forward or reverse order, supporting both ascending or descending sort requests. Speaking of memory, in addition to being much faster, indexes can also help with resource usage.
Memory is a precious resource, so we want use it as efficiently as possible.
When MongoDB performs a collection scan, every document from the orders collection is pulled into memory.
As the collection grows, this negatively impacts system performance.
With an index, we only need to pull the relevant section of the index and the document that matches our query into memory. That's a much more efficient use of our resources.
The example that we've been following so far is of a single field index on the order ID field. But MongoDB offers other index types and properties to support different types of data and queries.
The most common indexes are single field indexes, which index on values for one field, and compound indexes, which index on two or more fields.
Other types of indexes include multi key indexes, geospatial indexes, text indexes, hashed indexes, and wildcard indexes.
MongoDB also provides other index types that can be created by adjusting the options for an index. For example, a partial index only includes documents from a collection that meet a specific filter criteria.
And we can set that criteria by adding a partial filter expression when we create the index.
Check out the MongoDB documentation for more information on each type of index.
But how do we create an index?
The Atlas Data Explorer, Atlas CLI, MongoDB Shell, and the MongoDB drivers all use the create index command when building an index. But before you start creating indexes for all of your queries, it's important to be aware of the associated cost.
First, while indexes improve the performance of read operations, they come at the cost of write operations.
Whenever a document is inserted, updated, or deleted, the database must atomically update corresponding index entries, adding time to write operations.
Indexes also consume additional disk space and memory, which can be considerable depending on the number and sizes of indexes.
To mitigate these costs, we can use the following framework to manage the life cycle of our indexes, identify queries for key workloads that can benefit from an index, create the minimum number of indexes that support those workloads, and monitor and maintain indexes.
We'll show you how to apply this framework along with best practices by walking you through a real world scenario involving a messaging application used by a bank to send secure messages to customers.
For now, let's quickly recap what we covered in this lesson.
Indexes are special data structures that store a small portion of a collection's data in ordered format that is optimized for fast search.
Indexes are crucial for optimizing query performance in MongoDB because they allow the database to quickly locate and retrieve documents, eliminating the need to scan entire collections.
While indexes enhance read performance significantly, they also introduce overhead for write operations and can consume additional resources.
To manage indexes effectively, identify key workloads that can benefit from indexing, create only necessary indexes to support these workloads, and regularly monitor and adjust the indexing strategy to align with your needs.
