Cluster Reliability / Troubleshooting Replication Issues
Code Recap: Mitigating and Recovering from Replication Lag Issues
Performing an initial sync on a stale node
Below is the process used to perform an initial sync on a node which has exceeded the oplog window, or otherwise needs to be synchronized to the other nodes.
db.shutdownServer()
method
Use the db.shutdownServer()
method to stop the mongod process on the node in question:
db.shutdownServer()
Back up diagnostic data
Back up the diagnostic.data
subdirectory of your dbPath directory (or, optionally, you can backup the entire dbPath directory):
/var/lib/mongo/diagnostic.data
Delete data in dbPath and all subdirectories
Delete the entire contents of the dbPath directory and all subdirectories:
rm -r /var/lib/mongo
Restart the mongod process
To begin the initial sync, restart the mongod process:
mongod
Resize the oplog with replSetResizeOplog
replSetResizeOplog
Admin Command
Use the replSetResizeOplog
admin command to resize the oplog or its minimum retention period dynamically without restarting the mongod
process.
db.adminCommand(
{
replSetResizeOplog: <int>,
size: <double>,
minRetentionHours: <double>
}
)
Information to provide when contacting Support
When contacting Support, please provide:
- The name and configuration (replica set, sharded cluster, etc.) of the affected cluster
- The date and time of the incident
- Mongod.log files from affected nodes
- The contents of the diagnostic.data subdirectory of the dbPath