Advanced Schema Patterns and Antipatterns / Identify Advanced Antipatterns
When modeling your data, it may seem intuitive to normalize it or split it into different pieces to optimize for space and reduce duplication. But separating data that is accessed together means that you have to use multiple queries on different collections or a MongoDB $lookup operation to retrieve that data. This can be expensive and negatively impact performance. When this happens frequently, we could be experiencing the data normalization anti-pattern, which is when our data model separates data that is accessed together into different collections.
In this video, we'll discuss the data normalization anti-pattern and how to solve it. Luckily, we can fix this anti-pattern by leveraging either the subset pattern or the extended reference pattern to keep the data that we need to access together in a single collection. Ultimately, the best solution is the one that fits our data and use case. Let's go back to our bookstore application.
When we originally modeled the data, we normalized it. Since books and reviews are separate entities in our model, we created separate collections for each. However, book and review data are often accessed together, so we use multiple queries or $lookup operations to access that data. While we may be tempted to fix this by embedding review data as an array within a book document, we should avoid this.
A global list of reviews will result in an unbounded array and a bloated document. So let's examine two possible solutions. Our first option is a subset pattern. This pattern is useful for documents with arrays that could become very long, like lists of reviews or comments.
We can apply this pattern to improve database performance for a book's homepage, which includes all book details and a subset of reviews. To implement this pattern, we duplicate a subset of the review documents and store them in the corresponding book document in the book's collection. We need to retrieve three book reviews along with the book data. So we store those reviews in an array in the book document and as separate documents in the reviews collection.
We still keep book and review documents in separate collections. We duplicate some of our data, but we don't need to use multiple queries to access this data. But what if we are modeling our data for a reviews page that displays all reviews for a book? In this scenario, we need a few fields from a book document to display with review data.
We can use the extended reference pattern to embed the book data that we need in each review document. For example, we can store the book's title and author fields in a review document instead of embedding the entire book document. This level of duplication, it's OK because we are just including two fields from a book document. And the book's title and author won't change, so we won't need to update this information in the future.
Like the subset pattern solution, we keep book and review documents in separate collections. We duplicate some of our data, but we can now access the data we need with just one query. This solution helps us resolve the normalization anti-pattern. And as you learned, the best solution depends on the needs of your application.
Let's recap what you've learned. The data normalization anti-pattern occurs when separate data is accessed together. This results in costly $lookup operations or multiple queries to access this data. We covered two solutions to this anti-pattern, the subset pattern and the extended reference pattern.
Great job. See you in the next lesson.
