Schema Design Optimization / Optimize Your Schema

4:07
As the popularity of our bookstore skyrockets, we decide to track user views per book to learn more about the behavior of our online customers. We want to model the relationship between a book and user views as one to many, but we expect the number of user views to be very large in some cases. Modeling these kinds of relationships with high cardinality can be tricky. Embedding is not always a good solution here, since it will result in very large documents and unbounded arrays. We may want to reference this data to avoid these pitfalls. However, referencing may lead to poor performance due to large indexes and potentially increased query complexity. In this video, we'll discuss the bucket pattern, an approach that helps when neither pure referencing nor embedding is a great choice. The bucket pattern helps us group individual pieces of information into buckets so the document size is more predictable and optimized for our system. In today's internet of things world, the sensors generate a never ending stream of small readings. Take, for instance, a wearable like a fitness band. The sensor readings from this device are time series data. The bucket pattern is often used to handle sensor data effectively. We could store each sensor reading as a separate document. But if we bucket the data in some meaningful way, we can make it easier to organize and access. It's common to use the bucket pattern together with the computed pattern to store pre-computed statistics in the buckets. This further improves performance and data access. When used properly, the bucket pattern can help us keep document size predictable, read only data that we need, reduce the overall number of documents in a collection, and improve index performance. Let's look at the bucket pattern in action. We'll apply the pattern to track users views per book in our bookstore application. When a user views one of our books, we want to capture the book ID, timestamp, and user ID. But before we begin, we must understand how the data will be queried so that we can decide how granular our bucket should be. Our most important queries need to compute monthly values. So we'll group monthly views per book in buckets and store them in the Views collection. Now let's apply this pattern to our bookstore app. First, we need to include a field to identify the buckets. In our bookstore application, we are looking for monthly views per book. So we will use a bucket ID containing a month field with a timestamp and a book ID field. Next, we need to add an array to the bucket document. We will call it Views because it will contain the incoming information for the user's views. User view data includes timestamp and user ID. We should avoid creating an unbounded array in each bucket document, or we could end up with millions of views in each month. To help us avoid that pitfall, we could use a schema validation tool. We can also add logic to our application to specify a threshold for the number of views per bucket document so that whenever a bucket document reaches that threshold, a new bucket document will be created. While we won't model either strategy here, both will help us ensure that bucket documents have a predictable size. Once we've implemented the pattern, we can easily analyze our data. For example, we can query the views collection to calculate the number of monthly views for a particular book. We use a simple find command with a filter by book ID and month and a projection using the size operator to count the number of views for a particular book in the specified month. We could instead apply the computed pattern to our view documents and have a set of pre-computed fields for commonly used application statistics. Great work. Let's recap what we have covered in this video. The bucket pattern is a good alternative to fully embedding or referencing when modeling one to many relationships with high cardinality. We implement this pattern by first considering how data will be queried and then choosing a level of granularity. Finally, we group large amounts of data into predictably sized buckets. Great job. See you in the next lesson.