Cluster Reliability

Gain hands-on experience in diagnosing and resolving issues in MongoDB clusters. Learn how to troubleshoot replication, sharding, and performance issues to maintain database health and availability.

Upon completion of the Cluster Reliability skill and skill check, you will earn a Credly Badge that you are able to share with your network.


  Learning Objectives

Identify and Respond to Cluster Failures

Assess cluster health and determine when recovery actions are necessary. Distinguish between temporary disruptions and critical data loss scenarios.

Troubleshoot Replication Issues

Diagnose and resolve replication lag, inconsistency, and failover issues in MongoDB clusters.





Troubleshoot Sharding Issues

Identify and fix issues related to sharded cluster configurations, chunk distribution, and balancer operations.

Aaron Becker | Technologist, Education

Aaron Becker | Technologist, Education

Aaron Becker is a Technical Trainer, Instructional Designer, and Training Manager who has worked in the tech sector for over 13 years. Before joining the Curriculum team at MongoDB, Aaron worked in DevOps at CircleCI, creating their first Certification course (CircleCI Associate Developer) and leading a team responsible for creating and managing the educational content for CircleCI Academy for external/customer training, as well as CircleCI University for internal team member training.

Prior to that, Aaron worked in data protection, redundancy, and security at Carbonite, where he headed up the Training team, created and delivered ILT training courses for Carbonite's Mid-Market and Enterprise level products, and assisted over 150 employees in earning Microsoft certifications.

Aaron enjoys writing, performing, recording, mixing and mastering music, playing video games, and writing biographical text in the third person.

Emily Pope | Lead Curriculum Designer

Emily Pope | Lead Curriculum Designer

Emily Pope is a Lead Curriculum Designer at MongoDB. She loves learning and loves making it easy for others to learn how and when to use deeply technical products. Recently, she's been creating AI and vector search content for MongoDB University. Before that, she's created learning experiences on databases, computer science, full stack development, and even clinical trial design and analysis. Emily holds an Ed.M. in International Education Policy from Harvard Graduate School of Education and began her career as an English teacher in Turkiye with the Fulbright program.

Emilio Scalise | Senior Technologist

Emilio Scalise | Senior Technologist

Emilio is a multi-skilled IT specialist with a vast knowledge in system administration, databases, software development, network security, and cloud solutions. He is currently a Staff Technologist at MongoDB, producing internal and external learning materials. With over 8 years at MongoDB Support Organization, including five as a Staff Technical Support Engineer, he's developed considerable expertise in MongoDB's products and cloud services. In addition, Emilio is a certified MySQL DBA and experienced in technical translations between English and Italian.

Manuel Fontan Garcia | Senior Technologist, Education

Manuel Fontan Garcia | Senior Technologist, Education

Manuel is a Senior Technologist on the Curriculum team at MongoDB. Previously he was a Senior Technical Services Engineer in the Core team at MongoDB. In between Manuel worked as a database reliability engineer at Slack for a little over 2 years and then for Cognite until he re-joined MongoDB. With over 15 years experience in software development and distributed systems, he is naturally curious and holds a Telecommunications Engineering MSc from Vigo University (Spain) and a Free and Open Source Software MSc from Rey Juan Carlos University (Spain).

John McCambridge | Curriculum Engineer

John McCambridge | Curriculum Engineer

John is a Curriculum Engineer on the University team at MongoDB. Before his work as a Curriculum Engineer, he was an instructor and teaching assistant for coding boot camps at UT (Austin), and UCLA. Additionally, he worked as a QA engineer for a startup called Coder and spent five years at Apple Inc. John is a passionate software engineer and educator who enjoys taking complex topics and making them digestible for the community.

Davenson Lombard | Senior Software Engineer

Davenson Lombard | Senior Software Engineer

Davenson Lombard is a Senior Software Engineer at MongoDB on the Education Team. Prior to that, Davenson was a Technical Services Engineer at MongoDB and a Customer Success architect at Confluent. Davenson holds a Bachelor in Electrical Engineering from Concordia University in Montreal.

Joel Lord | Lead Curriculum Engineer

Joel Lord | Lead Curriculum Engineer

Joel is a Lead Curriculum Engineer at MongoDB, focused on helping developers build better applications through accessible educational content. He started his career in software nearly 25 years ago and only paused briefly to pick up a B.Sc. in computational astrophysics from Université Laval. Since then, he’s worked across software development, developer advocacy, and technical education. Outside of work, he enjoys stargazing, homebrewing, and providing emotional support to his two cats, who frequently make guest appearances on Zoom.

Picture this. You have a popular, high traffic application backed by a MongoDB cluster. Everything seems to be in place and running smoothly, and yet something's wrong.

Suddenly, read latency starts climbing. Your users start to complain, and the clock is ticking to get this fixed. Can you quickly identify the cause? Can you resolve this without downtime? Perhaps most importantly, what can you do to prevent it from happening again in the future?

Welcome to MongoDB's cluster remediation skill. This skill will empower system administrators like you with the skills and tools you need to identify, mitigate, and remediate issues you may encounter with your MongoDB clusters.

This ensures that your data remains secure, highly available, and resilient. My name is Aaron. I'm a curriculum engineer at MongoDB. In this skill, you'll benefit from MongoDB's expertise in helping customers implement best practices and to optimize their clusters.

I'll be your guide as we explore strategies to keep your MongoDB systems resilient and performant. First, we'll discuss how to assess the overall health of your clusters so you can understand baseline performance.

Then, we'll discuss how to diagnose issues, alleviate their symptoms, and resolve their causes. With that foundation, we'll dive into some common issues you might encounter when managing your MongoDB cluster and show you how to fix them, starting with replication.

After that, we'll learn how to fix common sharding issues like unbalanced workloads and jumbo chunks. And sometimes, despite our best efforts, things go wrong and we need to think about recovery. We'll show you how to recover from backups to minimize downtime and data loss.

Finally, while we'll often use self managed MongoDB instances in our demonstrations to help you understand these concepts deeply, we'll also show you some features of MongoDB Atlas that solve some of these issues automatically for you via our fully managed service. Once you've learned this content, you'll be ready to apply your new skills by earning the cluster remediation skill badge. It's more than just a digital badge you can share on LinkedIn. It's proof that you know how to resolve issues that may arise with your cluster and how to prevent them from occurring in the future. Let's get started.