Cluster Reliability / Troubleshooting Replication Issues
Code Recap: Troubleshooting Initial Sync and Failover Issues
Checking replica set health and initial sync status with rs.status()
rs.status()
Method
Use the rs.status()
method to return the replica set status from the point of view of the member where the method is run. This includes the status of the node
STARTUP2
indicates that the node is undergoing the initial syncRECOVERING
indicates that the node has finished the sync and is running checks prior to re-joining the replica setSECONDARY
indicates that the node has successfully re-joined the replica set as a secondary node
rs.status()
Example Output:
{
_id: 2,
name: 'atlas-zr1do5-shard-00-02.xwgj1.mongodb.net:27017',
health: 1,
state: 1,
stateStr: ‘SECONDARY’,
uptime: 333139,
optime: { ts: Timestamp({ t: 1747042492, i: 1 }), t: Long('16') },
optimeDate: ISODate('2025-05-12T09:34:52.000Z'),
optimeWritten: { ts: Timestamp({ t: 1747042492, i: 1 }), t: Long('16') },
optimeWrittenDate: ISODate('2025-05-12T09:34:52.000Z'),
lastAppliedWallTime: ISODate('2025-05-12T09:34:52.385Z'),
lastDurableWallTime: ISODate('2025-05-12T09:34:52.385Z'),
lastWrittenWallTime: ISODate('2025-05-12T09:34:52.385Z'),
syncSourceHost: '',
syncSourceId: -1,
infoMessage: '',
electionTime: Timestamp({ t: 1746709383, i: 1 }),
electionDate: ISODate('2025-05-08T13:03:03.000Z'),
configVersion: 1,
configTerm: 16,
self: true,
lastHeartbeatMessage: ''
}
When rs.status()
is run while a node is undergoing an initial sync, it will include initialSyncStatus
information.
Example Output:
rs1 [direct: secondary] test> rs.status()
…
"initialSyncAttempts" : [
{
"durationMillis" : 59539,
"status" : "InvalidOptions: error fetching oplog during initial sync :: caused by :: Error while getting the next batch in the oplog fetcher :: caused by :: readConcern afterClusterTime value must not be greater than the current clusterTime. Requested clusterTime: { ts: Timestamp(0, 1) }; current clusterTime: { ts: Timestamp(0, 0) }",
"syncSource" : "m1.example.net:27017",
"rollBackId" : 1,
"operationsRetried" : 120,
"totalTimeUnreachableMillis" : 52601
}
]