Schema Design Patterns and Antipatterns / Identify Antipatterns
You may recall that we modeled book reviews using an array in the book document. This one-to-many relationship works well for our bookstore application. But what happens when the number of reviews skyrockets? In this video, we'll discuss the unbounded array antipattern and what can be done to solve it.
When modeling data with MongoDB, we try to keep data that is accessed together stored together. We do this through embedding. And one way to embed is to use arrays. But sometimes when we use arrays, we end up creating unbounded arrays, which is a common antipattern.
MongoDB defines an unbounded array as a large, growing array with an unlimited number of elements. Unbounded arrays can strain application resources and put documents at risk of exceeding the BSON document size limit of 16 megabytes. As our risk size increases, we can also experience a decrease in index performance. A few things to keep in mind to avoid the unbounded array antipattern are--
only store data together if it's queried together. An array should not grow without bounds. And high cardinality arrays should not be embedded. However, as your application evolves, you still may end up with unbounded arrays.
Here are a few ways to correct this antipattern. Let's use our bookstore app to examine the one-to-many relationship between books and reviews. When we first modeled our data for the bookstore app, we stored reviews as an array field in a book document to increase query efficiency. While this model might work at first, we can end up with a very large array as we begin storing more and more reviews, especially for our most popular books.
What can we do to fix this unbounded array? There are two schema design patterns that can help us avoid unbounded arrays while still keeping data that is accessed together stored together--
the extended reference pattern and the subset pattern. You may recall that the extended reference pattern allows us to embed relevant data for multiple documents and different collections into the main document. With the subset pattern, we can reduce document size by relocating data that isn't frequently accessed. Both patterns allow us to exercise control over the size of the array.
But remember, the pattern that you choose depends on your use case. In our bookstore app, we have decided to show only three reviews on a book's Home page. We could use the extended reference pattern to eliminate the array. But this is a poor option for our use case.
To understand why, let's take a look at how we would implement this. First, we move all the review documents to their own collection, with an added field to embed book information. Next, we would eliminate unbounded array field from the book document. This solves the problem of the unbounded array.
But now we've added significant duplication to the database. Most of the book data needs to be extended or duplicated in each review document since reviews are now the main entity. We've also introduced query complexity because now we must query the reviews collection to retrieve book data and reviews for a book's home page. Clearly, this pattern is not a good fit for our use case.
How about the subset pattern? Again, we first separate book and review data into two collections. We then store a handful of reviews that are frequently accessed with the books data. For example, since we know that we only need to access three reviews to display on a book page, we can store those reviews in a book document instead of embedding all reviews for that book.
Then we can store all reviews in our reviews collection. While the subset pattern will create some data duplication, it will help us eliminate the unbounded array, avoid using queries or $lookup operation, and keep frequently accessed information together. This is the best solution for our use case. But we need to remember that this may not be the right solution in every scenario.
That's why it's important to understand the needs of your application and business before choosing a solution. Let's recap what you've learned in this video. The unbounded array antipattern occurs when a document contains a large, growing array with an unlimited number of elements. Here are a few things to keep in mind to avoid this antipattern.
Only store data together that is queried together. An array should not grow without bounds. And high cardinality arrays should not be embedded. Finally, we've identified the subset pattern and the extended reference pattern as solutions for avoiding unbounded arrays.
Nice job. See you in the next lesson.
