Schema Design Patterns and Antipatterns / Apply Schema Design Patterns
Data modeling with MongoDB often involves multiple documents within different collections. For instance, our bookstore app has collections for users, reviews and books. But what if you need to get data from documents and all of those different collections? If you come from a relational database background you're probably thinking about performing a join operation.
The equivalent in MongoDB is the $lookup aggregation operator. But using this can be resource intensive, much like a join. We can avoid joins altogether by applying the extended reference pattern to our data. In this video, we'll learn about the extended reference schema design pattern and how we can apply it to the bookstore app.
An extended reference is a reference that is rich enough to include all that is needed so we can avoid a join. You create an extended reference by embedding relevant data from multiple documents and different collections into the main document. Its primary objectives are to reduce the latency of read operations, avoid round trips to the database, and avoid touching too many pieces of data. The result will be faster reads due to a reduced number of joins and lookups.
Let's see an example to get a better idea of how it works. In our app, we've decided to introduce a book review search feature. Users can search for reviews by book title, review title, and username. Books are stored in a separate collection from reviews.
We could query based on the product SKU of a book and the reviews collection, but we're still missing some key information like book title and product type. We can also use the $lookup aggregation operator to obtain this information. But if you recall, we anticipate that this operation will be run 200 times a second. Running that many lookups or joins could hurt the performance of the application.
With that in mind, we can optimize for this requirement by embedding the information we need from the book document into the review document. In this case, we'll embed the book's title and product type into the reviews document, grouping it together with the SKU within our product sub document. We'll also need to know which user wrote the review for the book. Again, we can use the extended reference pattern to help us optimize this.
Looking at the current structure of the user and review documents, we can reference both the user ID and the user's name in the review document within the reviewer sub document. Now when we query for reviews, all the information that we need is in one place. This is great but it does create a potential issue, duplication. When deciding to use the extended reference pattern, think about how you can minimize duplication.
The pattern works best if you select fields that don't change often. Also, try to only bring the fields you need. For example, the title of a book will not change much over time. We also need to consider how to keep duplicate data in sync with the source.
To manage duplication when a source field is updated, first identify what is the list of dependent extended references. In other words, what other fields need to be updated once the source gets updated? Next, do the extended references need to be updated immediately or can they be updated at a later time? The simple answer for most cases is no, they don't need to be updated immediately.
For example, changing the ranking of your best selling books doesn't require that you instantly update all the books that show a ranking in the app. What if we want to implement the extended reference pattern on an existing data set? We can use the aggregation framework. Assume we have the following review documents we want to transform.
First we will update all documents in the reviews collection by bringing in relevant product information from the books collection using the $lookup stage. Once our new document with the extended reference is formed, we use the $merge stage to merge the output into the existing reviews collection. After running a find at the reviews collection and running the pipeline, you can see that the review documents now contain product information as well. This aggregation pipeline gives you an idea of how you can apply the extended reference pattern to an existing data set.
We can use a similar approach to merge needed fields from the user collection. Awesome job. In this video, we learned that the extended reference pattern helps you avoid joining too many pieces of data in a query. To create an extended reference, embed the data from other documents into the main document.
While this does result in duplication, it has great benefits for performance. Next, we applied the extended reference pattern to the reviews in the bookstore application, and discussed how to apply the pattern to an existing data set. Finally, we covered how to minimize duplication when applying the extended reference pattern.
