Data Resilience: Self-Managed / MongoDB Architecture for Resilience
Code Summary: MongoDB Architecture for Resilience
Connecting to and Checking Replica Set Status
The mongosh command connects to a MongoDB instance using the replica set connection string. In this example, we’re using the SRV Connection Format. The SRV Connection String is the most resilient way to manage underlying server changes without impacting clients. Once connected, the rs.status() command retrieves detailed information about the replica set, including the state of each member (PRIMARY, SECONDARY, or ARBITER), health status, uptime, and replication lag.
mongosh "mongodb+srv://your-replica-set-connection-string"
rs.status()
{
set: 'dsrm-shard_0',
date: ISODate('2025-11-15T02:23:48.042Z'),
myState: 2,
. . .
members: [
{
_id: 0,
name: 'ip-of-node1:27017',
health: 1,
state: 1,
stateStr: 'PRIMARY',
. . .
},
{
_id: 1,
name: 'ip-of-node2:27017',
health: 1,
state: 2,
stateStr: 'SECONDARY',
. . .
},
{
_id: 2,
name: 'ip-of-node3:27017',
health: 1,
state: 2,
stateStr: 'ARBITER',
. . .
}
}
Identifying the Primary Node in a Replica Set
This command filters the replica set members to find and display information about the current Primary node. The output shows the Primary's hostname, state, health metrics, and election details.
rs.status().members.find(m => m.stateStr === "PRIMARY")
{
_id: 0,
name: 'ip-172-31-7-29.us-east-2.compute.internal:27017',
health: 1,
state: 1,
stateStr: 'PRIMARY',
uptime: 2343,
optime: { ts: Timestamp({ t: 1763404473, i: 1 }), t: Long('1') },
optimeDate: ISODate('2025-11-17T18:34:33.000Z'),
optimeWritten: { ts: Timestamp({ t: 1763404473, i: 1 }), t: Long('1') },
optimeWrittenDate: ISODate('2025-11-17T18:34:33.000Z'),
lastAppliedWallTime: ISODate('2025-11-17T18:34:33.284Z'),
lastDurableWallTime: ISODate('2025-11-17T18:34:33.284Z'),
lastWrittenWallTime: ISODate('2025-11-17T18:34:33.284Z'),
syncSourceHost: '',
syncSourceId: -1,
infoMessage: '',
electionTime: Timestamp({ t: 1763402147, i: 1 }),
electionDate: ISODate('2025-11-17T17:55:47.000Z'),
configVersion: 1,
configTerm: 1,
self: true,
lastHeartbeatMessage: ''
}
Simulating a Primary Node Failure
This system command stops the MongoDB process on the Primary node server, simulating a failure scenario. This allows administrators to test and observe the automated failover process in a replica set.
sudo systemctl stop mongod
Monitoring Failover and New Primary Election
After a Primary failure, running rs.status() on a remaining node shows the election process. The output displays the new Primary node that was elected from the healthy Secondary members, including its state, election time, and configuration details.
rs.status()
{
_id: 1,
name: 'ip-172-31-9-104.us-east-2.compute.internal:27017',
health: 1,
state: 1,
stateStr: 'PRIMARY',
uptime: 3153,
optime: { ts: Timestamp({ t: 1763405284, i: 2 }), t: Long('2') },
optimeDate: ISODate('2025-11-17T18:48:04.000Z'),
optimeWritten: { ts: Timestamp({ t: 1763405284, i: 2 }), t: Long('2') },
optimeWrittenDate: ISODate('2025-11-17T18:48:04.000Z'),
lastAppliedWallTime: ISODate('2025-11-17T18:48:04.130Z'),
lastDurableWallTime: ISODate('2025-11-17T18:48:04.130Z'),
lastWrittenWallTime: ISODate('2025-11-17T18:48:04.130Z'),
syncSourceHost: '',
syncSourceId: -1,
infoMessage: '',
electionTime: Timestamp({ t: 1763405258, i: 2 }),
electionDate: ISODate('2025-11-17T18:47:38.000Z'),
configVersion: 1,
configTerm: 2,
self: true,
lastHeartbeatMessage: ''
},
Restarting a Failed Node
This system command restarts the MongoDB process on the previously failed node. Once restarted, the node rejoins the replica set as a Secondary and synchronizes with the current Primary using the oplog.
sudo systemctl start mongod