Data Resilience: Self-Managed / MongoDB Architecture for Resilience

Code Summary: MongoDB Architecture for Resilience

Connecting to and Checking Replica Set Status

The mongosh command connects to a MongoDB instance using the replica set connection string. In this example, we’re using the SRV Connection Format. The SRV Connection String is the most resilient way to manage underlying server changes without impacting clients. Once connected, the rs.status() command retrieves detailed information about the replica set, including the state of each member (PRIMARY, SECONDARY, or ARBITER), health status, uptime, and replication lag.

mongosh "mongodb+srv://your-replica-set-connection-string"
rs.status()
{
  set: 'dsrm-shard_0',
  date: ISODate('2025-11-15T02:23:48.042Z'),
  myState: 2,
  . . .
  members: [
    {
      _id: 0,
      name: 'ip-of-node1:27017',
      health: 1,
      state: 1,
      stateStr: 'PRIMARY',
        . . .
     },
    {
      _id: 1,
      name: 'ip-of-node2:27017',
      health: 1,
      state: 2,
      stateStr: 'SECONDARY',
        . . .
     },
    {
      _id: 2,
      name: 'ip-of-node3:27017',
      health: 1,
      state: 2,
      stateStr: 'ARBITER',
       . . .
    }
}

Identifying the Primary Node in a Replica Set

This command filters the replica set members to find and display information about the current Primary node. The output shows the Primary's hostname, state, health metrics, and election details.

rs.status().members.find(m => m.stateStr === "PRIMARY")
{
 _id: 0,
 name: 'ip-172-31-7-29.us-east-2.compute.internal:27017',
 health: 1,
 state: 1,
 stateStr: 'PRIMARY',
 uptime: 2343,
 optime: { ts: Timestamp({ t: 1763404473, i: 1 }), t: Long('1') },
 optimeDate: ISODate('2025-11-17T18:34:33.000Z'),
 optimeWritten: { ts: Timestamp({ t: 1763404473, i: 1 }), t:   Long('1') },
 optimeWrittenDate: ISODate('2025-11-17T18:34:33.000Z'),
 lastAppliedWallTime: ISODate('2025-11-17T18:34:33.284Z'),
 lastDurableWallTime: ISODate('2025-11-17T18:34:33.284Z'),
 lastWrittenWallTime: ISODate('2025-11-17T18:34:33.284Z'),
 syncSourceHost: '',
 syncSourceId: -1,
 infoMessage: '',
 electionTime: Timestamp({ t: 1763402147, i: 1 }),
 electionDate: ISODate('2025-11-17T17:55:47.000Z'),
 configVersion: 1,
 configTerm: 1,
 self: true,
 lastHeartbeatMessage: ''
}

Simulating a Primary Node Failure

This system command stops the MongoDB process on the Primary node server, simulating a failure scenario. This allows administrators to test and observe the automated failover process in a replica set.

sudo systemctl stop mongod

Monitoring Failover and New Primary Election

After a Primary failure, running rs.status() on a remaining node shows the election process. The output displays the new Primary node that was elected from the healthy Secondary members, including its state, election time, and configuration details.

rs.status() 
  {
   _id: 1,
   name: 'ip-172-31-9-104.us-east-2.compute.internal:27017',
   health: 1,
   state: 1,
   stateStr: 'PRIMARY',
   uptime: 3153,
   optime: { ts: Timestamp({ t: 1763405284, i: 2 }), t: Long('2') },
   optimeDate: ISODate('2025-11-17T18:48:04.000Z'),
   optimeWritten: { ts: Timestamp({ t: 1763405284, i: 2 }), t: Long('2') },
   optimeWrittenDate: ISODate('2025-11-17T18:48:04.000Z'),
   lastAppliedWallTime: ISODate('2025-11-17T18:48:04.130Z'),
   lastDurableWallTime: ISODate('2025-11-17T18:48:04.130Z'),
   lastWrittenWallTime: ISODate('2025-11-17T18:48:04.130Z'),
   syncSourceHost: '',
   syncSourceId: -1,
   infoMessage: '',
   electionTime: Timestamp({ t: 1763405258, i: 2 }),
   electionDate: ISODate('2025-11-17T18:47:38.000Z'),
   configVersion: 1,
   configTerm: 2,
   self: true,
   lastHeartbeatMessage: ''
 },

Restarting a Failed Node

This system command restarts the MongoDB process on the previously failed node. Once restarted, the node rejoins the replica set as a Secondary and synchronizes with the current Primary using the oplog.

sudo systemctl start mongod