Advanced Schema Patterns and Antipatterns / Apply Advanced Schema Design Patterns
Quick question for you. What's the human population on Earth? It changes so fast that any number you say is wrong the moment you say it. Even numbers from national organizations are just a collection of different measures taken at different times.
Should we obsess about finding the exact population number? No. Because it's difficult to calculate, and the number we have is good enough. Perfect numbers are great but the operation to find them might cost more than it's worth.
In this video, we'll discuss the approximation schema design pattern. This pattern generates a statistically valid, approximate number that is not exact. We use this pattern when data is either difficult or expensive to calculate, and getting the exact number is not critical for our use case. It is also well-suited for working with big data.
The approximation pattern reduces writes, and in some cases, can help reduce contention on heavily updated documents. Using the bookstore app, let's reconsider the problem of maintaining a book's rating as new reviews are added. We could increment the review count and recalculate the average number of stars every time a new review is added for a book, like we did using the computed pattern in the previous video. That gives us absolute accuracy, but doubles the number of database writes.
When there are just a handful of reviews for a book, that extra cost is justified because we want our reviews to be accurate. What happens when a popular book has received tons of reviews? Each new review makes very little difference to the average number of stars. Does anyone really care that we list a million reviews rather than 1,000,001?
Probably not. If our app sees that a book has already received a significant number of reviews, then it could decide to only recalculate the books reading periodically instead of on demand. This can drastically reduce the number of database writes by sacrificing some accuracy. One way to achieve this is through the use of a random number generator in our app logic.
The app can generate a random number between 1 and 10 when a new review is posted, but only runs the computation when the random number is 10. At that point, the app can follow its normal review post logic to store the new review and recompute the books reading with one exception. We must extrapolate the new rating. Instead of increasing the review count by one, we increase it by 10.
And instead of simply using the new review rating, we must first multiply it by 10. This approximation reduces the number of writes to the book document by 90% for the most frequently reviewed books. The new review rating is statistically valid. However, it is not 100% accurate.
The approximation pattern is implemented in our bookstore application logic and does not impact the document model. In the schema, you only have to plan for a field or fields that will carry the approximate value, just like we did in the computed pattern. Let's recap what you learned in this video. The approximation pattern generates a statistically valid, approximate number that is not exact.
We use it to reduce resource usage for data that does not need to be perfect. Remember that this pattern is implemented on the application side. It trades a slight reduction in accuracy in exchange for far better database performance by computing values only when it matters.
