Schema Design Optimization / Optimize Your Schema
All of our schema design assumptions will be tested in the real world. After our app has been public for a while, we may find usage patterns that are not well covered by our initial design. For example, in our bookstore application, a user can add comments to any review. The team that originally designed the schema for the review documents estimated that the typical review would receive a handful of comments only.
As a result, they decided to embed comments within an array field in the review document. This model might have worked for most reviews, but what happens when someone leaves a controversial book review that receives hundreds or thousands of comments? Clearly, the original decision to embed comments in the review document won't work in this outlier case and could result in an anti-pattern. This is because the comments field of a review is an unbounded array and may potentially exceed the document size limit.
Luckily, the Outlier Pattern can help us solve this issue because it allows us to treat a few documents with unusual characteristics differently. The Outlier Pattern is useful when some documents in our database are sufficiently different that they require special handling, but optimizing our entire app for these edge cases would degrade the overall performance. This pattern is often seen when handling popular items on an e-commerce website or an influencer with millions of followers on a social media network. Let's take a closer look at how this pattern can help us with reviews collection.
Instead of completely changing the model for all review documents to accommodate this edge case, we can apply the outlier pattern to identify the reviews that have hundreds or thousands of comments. Once this pattern is applied, then we can implement a solution that handles review outliers differently than the rest of our data. You may recall we used the Bucket Pattern in a previous lesson to handle user reviews for book documents. We can also apply the Bucket Pattern to our outlier documents.
We won't model the application logic or how to implement the Bucket Pattern in this video, but we'll mention how to use it after modeling the Outlier Pattern. Let's use the Outlier Pattern to update review documents with the Outlier field. First, we need to define a threshold to identify outliers. In our case, let's use three comments.
Then, we add a new field called Outlier to the review document, which will be set to true when we exceed the three comment threshold. Once a review document is flagged as an outlier, our app would store any additional comments for that review in a different collection. We can use updateMany to implement the outlier pattern for our existing reviews collection. First, we filter review documents to find those with more than three comments.
Then, we use the $set operator to add a new field to the review document called Outlier and set its value to true. Review documents that contain more than three comments are now marked as an outlier. Keep in mind that what we have done so far will not change the current size of the comment array in reviews marked as outliers. The next time a review document receives a new comment, we could have the app move the excess comments.
We could also run a separate query to move them to a different collection. Finally, we could use the Bucket Pattern to handle outliers by storing any excess comments in buckets, but the solution you choose will depend on the unique needs of your application. Let's do a quick recap. The Outlier Pattern lets us handle documents with unusual characteristics separately so we don't adopt a data model that is suboptimal for the majority of our data set.
And we use the updateMany method to apply the Outlier Pattern to the reviews collection. Great job. See you in the next lesson.
