Sharding Strategies / Training

What You'll Learn

Modify Your Sharding Strategy: Adapt your sharding strategy by resharding your collection with a new shard key, or by refining an existing shard key for evolving application demands.
Modify Your Sharding Strategy

Applications and their requirements are ever evolving. This means that strategies we put in place at the onset might not be the most advantageous later on. So, how we scale also needs to evolve. This is where resharding comes in.

Maybe your application has grown and you need to accommodate different workloads, or maybe the shard key you chose turned out to be suboptimal. In these cases, resharding can help you change the shard key of your collection and redistribute the data.

What is Resharding?

Resharding allows you to change the shard key of an existing sharded collection with minimal impact on running workloads.

In a collection resharding operation, shards act as donors, currently storing chunks for the sharded collection and recipients, storing new chunks for the sharded collection based on the shard keys and zones. A shard can be a donor and a recipient at the same time. The config server primary is always the resharding coordinator and starts each phase of the resharding operation.

Once initiated, the process is completed in four phases.

Phase 1: Copy Data
The first phase is when the current collection's data is copied into the new temporary resharded collection.

Phase 2: Build Indexes
In the second phase, each recipient shard (a shard receiving a portion of data) builds the required indexes. This includes all existing indexes and a new index matching the resharded collection's shard key pattern (if one doesn't already exist).

Phase 3: Catch Up on Write Operations
The third phase is the catch-up phase. During this phase, any writes performed on the source collection after resharding starts are applied to the new collection. It’s important to remember that resharding is an online process, meaning that the source collection remains accessible throughout the operation.

Phase 4: Commit to Cut Over

During the final phase, the resharding coordinator makes sure all the shards have the same up-to-date data, ensuring everything is consistent.

Once consistency is confirmed, the coordinator tells each donor and recipient shard to rename the temporary collection (where the new data was copied during resharding). This renamed collection now becomes the official new collection.

Finally, each donor shard deletes the old version of the collection, since it's no longer needed.

How to Reshard

You can begin the resharding process by running the sh.reshardCollection command in the MongoDB Shell. For this command, you must pass in the name of the database and collection to be resharded, along with the new shard key, as shown here. You can also run the resharding process as a database command and through a driver. Find out more in the documentation.

sh.reshardCollection( 
    "<database>.<collection>", 
    { <field1>: <1|"hashed">, ... }
)
Requirements to Reshard

Resharding requires additional storage space on each shard that will hold the collection's data. The available space must be at least twice the size of the collection being resharded, plus its total index size, divided by the number of shards.

storage_req = ((collection_storage_size + index_size)*2) / shard_count

There are several requirements that must be met before you can reshard a collection. Take a look at “About this Task” in the Reshard a Collection section of our documentation to make sure your application and collection meets the requirements.

View Documentation
Note
Major improvements in usability and speed were made to MongoDB's reshard capabilities in version 8.0. For this reason, we recommend upgrading your version before attempting to reshard.
Resharding vs. Refining a Shard Key
To reshard, your application needs to be able to tolerate writes being blocked for about two seconds. If that is a problem, you might be able to refine your shard key instead of resharding the entire collection.
Refining a Shard Key

Refining a shard key in MongoDB is the process of enhancing an existing shard key by appending additional fields to it. This is useful when the initial shard key is not granular enough, and therefore insufficient in distributing data evenly across shards or in supporting optimal query patterns.

Refining requires no downtime. The balancer will balance data according to the new shard key for the new data that is inserted after refining the shard key.

To learn more about refining your shard key, take a look at our documentation.

To Refine
  • When enhancements to the existing shard key can address issues
  • Good for workloads that cannot tolerate downtime
To Not Refine
  • If you want to change your sharding strategy or have completely new fields in the shard key, i.e. change from hashed to ranged sharding
  • If you need to move data across shards quickly, resharding is a better option for this. Starting with 8.0, you can reshard on the same shard key, allowing quick distribution of data across shards without downtime. You can also use it when adding or removing shards as needed to redistribute data. To do so, you would set forceRedistribution: true, Check out the documentation on Reshard to Shard to learn more.

In most cases, resharding your collection will be the best course of action to solve shard distribution issues. While refining a shard key is appropriate for situations where a minor adjustment can lead to improved data distribution, resharding allows you to redefine and fine-tune the sharding strategy. This generally allows for a better match to current workloads and scales for future needs. However, the ideal choice depends on the specific circumstances and needs of the application.

The following video walks through an example of resharding, using our LeafyBank app.

Select Play to learn more.
Video Thumbnail
4:13
Note
Another way you can modify your strategy is by unsharding a collection. The unshardCollection command unshards an existing sharded collection and moves the collection data onto a single shard. When you unshard a collection, the collection cannot be partitioned across multiple shards, and the shard key is removed. Learn more by visiting the documentation.
Key Points to Remember

Awesome work! You have now learned why and how to modify your sharding strategy by resharding a collection.

Here are some key points to remember as you monitor and optimize the performance of your own sharded collections:

  • Resharding: Use reshardCollection to redefine the shard key, redistributing data across shards.
  • Resharding Process: Disable the balancer and connect to a mongos router before running reshardCollection.
  • Choosing Refining vs. Resharding: Refining a shard key is ideal for incremental improvements, while resharding is necessary for fixing poor shard key choices.
Great job! Now that you know how to modify your sharding strategy, complete the lab on the next page to put your skills into action. Then take the short Skill Check to earn your badge.