Data Resilience: Atlas / Data Resilience Scenario
Data resilience is all about trade offs, balancing the availability you need against what you're willing to invest. There's no one size fits all solution, which is why MongoDB Atlas offers a comprehensive range of options to match your specific requirements.
But how do you determine what's essential and what's optional? In this video, we'll explore a real world scenario, a fintech company designing its data resilience strategy with MongoDB Atlas.
We'll examine how to leverage Atlas features to meet each critical requirement and build a solution that's both reliable and cost effective. Let's get started.
Leafy Bank is a fintech company specializing in online payment processing and investment services. They have a primary mission to safeguard customer data while maintaining uninterrupted service.
As a financial company, they operate in a highly regulated environment and must comply with industry standards and ensure customer trust. Understanding Leafy Bank's requirements is the first step to creating a strong data resilience strategy. So let's take a look at what those requirements are.
First, the ability to remain fully operational in the event of a full cloud provider region outage.
This ensures continuous service for customers even during major outages.
Second, all personally identifiable information or PII must be encrypted so that only authorized users can see the data.
This will require full end to end encryption, which includes in transit, at rest, and in use. Next, we know that Leafy Bank can tolerate losing no more than one minute of data and must restore operations within fifteen minutes to minimize business disruption.
This translates to a recovery point objective or RPO of one minute and recovery time objective or RTO of fifteen minutes.
Finally, backups must be retained for seven years and be immutable. No one can edit, modify or delete backup copies. These backups must be retained in multiple locations, including a long term archival located off-site for regulatory compliance and disaster recovery.
We'll also need to balance performance and cost when determining our backup storage strategy. Now that we understand the requirements that we need to meet, we just need to configure Atlas appropriately and MongoDB will take care of the rest. Let's start with the first requirement, remaining fully operational in the event of a full cloud provider region outage. By default, every Atlas cluster includes at least three nodes in a replica set spread across different zones in a provider region.
If your primary node goes down, MongoDB automatically fails over to a secondary node in a different zone to keep your application running. However, for protection against full region outages, we'll choose to use a two two one node configuration using multi region data distribution. This will spread Leafy Bank's five nodes across three different cloud regions. We'll use a single cloud provider to simplify operations and control costs while still achieving geographic resilience through multi region deployment.
With this topology, even if an entire region goes offline, Atlas can still automatically fail over to one of the other regions and Levy Bank's application can keep running without interruption for their customers. Of course, multi region resilience comes with trade offs in complexity and latency that need to be carefully considered. To configure this cluster, we'll use the Atlas console. During setup, we'll select three cloud provider regions and specify the two two one node distribution. Once this is done, Atlas will make life easier by replicating data, managing failover, and keeping everything synchronized automatically.
That takes care of the first requirement. Now let's look at the second, encrypting all personally identifiable information or PII. MongoDB Atlas comes with strong encryption built in. All data traveling between your application and the database uses encryption in transit, and data stored on disk is automatically encrypted at rest.
But when you're handling sensitive information like PII, you need an extra layer of protection. To do this, we can use client side field level encryption or CSFLE with a remote key management system or KMS. CSFLE is the industry standard for this use case because it ensures sensitive fields like social security numbers used for identity verification, are encrypted before they even leave Leafy Bank's application. This means your PII stays protected everywhere while traveling over the network, in database memory, in storage and backups, and even in system logs.
So while CSFLE adds complexity through key management requirements, it will deliver end to end encryption protecting sensitive data. To implement CSFLE, Leafy Bank can start by identifying which fields need protection, such as social security numbers or account information, then set up encryption keys using a key management service. Atlas provides client libraries for popular programming languages that make it easy to mark these sensitive fields for automatic encryption.
Once configured, the encryption happens seamlessly in the application layer. So developers can work with encrypted data without changing how they query or store information. Nice work. Let's move on to the third requirement, achieving an RPO of one minute and an RTO of fifteen minutes or less.
Atlas makes this possible with continuous cloud backups, which can be enabled while configuring a cluster. For Leafy Bank, we can adjust our backup policy settings to retain hourly snapshots for seven days, daily for fourteen days, weekly for eight weeks, and monthly for seven years, satisfying multiple requirements at once. Then by enabling multi region snapshot distribution under backup policy settings, Atlas stores backup copies in all three cluster regions. During a restore, Atlas uses the local copy in each region to get Leafy Bank back online in fifteen minutes or less. We can also enable point in time recovery so the Leafy Bank team can restore to any specific moment within a seven day window, giving them up to a zero minute RPO. Adding continuous cloud backups, multi region distribution, and point in time recovery will increase the cost of the cluster, but will allow Leafy Bank to achieve the required RPO and RTO.
Now we have one final requirement to address. Backups in Atlas need to be immutable, retained for seven years, and have multiple copies including a long term archival located off-site.
This presents a challenge. Retaining seven years of operational data in the live database would significantly impact performance and escalate costs as the dataset grows over time. The solution is a tiered data retention strategy using MongoDB Atlas Online Archive. This feature automatically moves older and frequently access data to cost effective cloud storage while keeping it accessible for compliance and auditing needs.
Here's how this works for Leafy Bank.
Operational data remains in the cluster for two years, ensuring fast backups and optimal application performance. Cold data, older than two years, which is still required for regulatory compliance but is rarely accessed, is automatically archived to lower cost storage using Atlas Online Archive. This operation prevents index bloat, maintains database performance, and reduces storage costs. Atlas Online Archive provides an economical solution for retaining the large volumes of historical data needed for compliance without compromising the performance of active workloads.
Now for immutability.
Atlas makes this simple with the backup compliance policy.
Once enabled at the project level, no user can delete backups or modify retention periods regardless of their role. It acts as a minimum backup schedule that all clusters must follow.
Leafy Bank can configure the same retention schedule, hourly for seven days, daily for fourteen, weekly for eight weeks, and monthly for seven years. Once set, the policy can only be disabled through a verification process with MongoDB support by a specified contact, protecting against accidental deletion or cyber attacks. Finally, for off-site archival, Atlas lets you export snapshots to your own S3 bucket in (gzip-compressed) JSON format.
This will give Leafy Bank both long term storage and complete control over their backup destiny.
Together, these features create a comprehensive data resilience strategy that keeps Leafy Bank and your applications running smoothly, securely, and compliantly, no matter what challenges arise. Nice work. Let's quickly recap the resources and tools that we use to meet Leafy Bank's requirements in this video.
Multi region data and snapshot distribution spreads your cluster and backup copies across three regions for automatic failover during outages.
Client side field level encryption or CSFLE protects PII by encrypting sensitive fields before they leave your application.
Continuous Cloud Backups delivers automated snapshots with point in time recovery to meet strict RPO and RTO goals. Finally, backup compliance policy ensures backups are immutable and protected from deletion or modification by any user. While Atlas automates much of the configuration for high availability and disaster recovery, a resilient architecture still requires intentional design before production.
By offloading the operational complexity of managing distributed clusters to Atlas, you can focus on the higher level engineering tasks of optimizing security, compliance, and uptime.
To learn more about implementing these tools, visit MongoDB's documentation or explore other skill badges at MongoDB University.
