Schema Design Optimization / Optimize Your Schema

3:08
Our bookstore app keeps growing, and the review feature is becoming very popular. As a result, we're accumulating a lot of older reviews that don't generate a lot of activity and are accessed infrequently. It's common to keep application data indefinitely, even when it is not actively used anymore. We need a strategy for storing inactive data so that it doesn't impact performance. In this video, we'll discuss the Archive Pattern, which helps us move data that is no longer or rarely used outside of the main database to improve performance and potentially reduce primary storage requirements. Apps may need to retain older data for a number of reasons, including data compliance regulations. This can include data related to the banking and pharmaceutical industries, IoT measurements, or even old logs. But over time, that data can add up. Active collections with a large number of inactive documents can impact performance for a number of reasons. For instance, write operations must update all indexes on the affected collections, and large indexes can waste memory. Additionally, inactive documents take up space in your primary storage, which is typically a high performance and expensive subsystem. This is where the Archive Pattern can help by moving the inactive documents out of the main database. Let's walk through the general steps we would take to apply this pattern to the refuse collection in our bookstore app. We want to archive reviews that are not frequently used. To do this, we'll archive book reviews that are more than a year old and haven't received any votes from other users. The simplest solution is to take a copy of the source document and write it to our archive. However, if the source document contains references, then recreating the entire document hierarchy could be challenging. For example, it could require queries from multiple systems. We could use the extended reference pattern to enhance our Archive document with the reference data. This way, we can get everything we need from our Archive document. Our review document contains everything we need. So we could simply make a copy. However, we decided to extend the user ID and product SKU references to include the name. This enhanced Archive document simplifies queries. Next, we need to select a storage solution for our data. The right solution will depend on your business needs. We highly recommend the use of the MongoDB online archive service if your application is running on an M10 or above Atlas cluster. Online archive allows you to easily automate data tiering and keep your data accessible. After we've selected our storage solution, the final step in implementing the Archive Pattern would be to determine a schedule for archiving and deleting documents. We won't walk through that step in this video, but remember, it's better to archive and delete frequently to streamline the performance of your working set. Let's recap. The Archive Pattern helps us move data that is no longer or rarely used outside of the main database to keep your cluster operating at peak performance. When properly applied, it can help us reduce the costs of managing unused documents, satisfy regulatory requirements, and improve performance and scalability. Great job.