Are you looking for an appropriate sharding strategy for your MongoDB database?
If yes, you’ve come to the right place.
I can’t stress how important it is to choose the right sharding strategy for your database.
This is the reason why sharding is usually considered one of the last options when it comes to scaling a database.
In this post, I will take you through the 2 MongoDB sharding strategies and offer actionable advice on when you should choose a particular strategy.
However, if you have landed directly on this page, I recommend you take a few minutes to understand the overall MongoDB Sharding Architecture.
1 – MongoDB Shard Key
A sharding strategy is incomplete without the shard key.
But what is a shard key?
In MongoDB, a shard key can be of two types:
- Single indexed field
- Multiple fields covered by a compound index
The index is very important here. All sharded collections in MongoDB must have an index on the shard key. If you shard an empty collection, MongoDB will automatically create an index on the shard key. Else, you have to do it manually.
As the name suggests, the shard key forms the basis for sharding a particular MongoDB collection.
What this means is that MongoDB divides the entire range of shard key values into smaller ranges of non-overlapping shard key values.
Each range is associated with a chunk and the chunks are distributed evenly across the shards.
Here’s what it looks like:
The Shard key directly impacts the effectiveness of the chunk distribution.
Therefore, choosing the right shard key is extremely important in the context of any sharding strategies. More on that in the post about choosing the right sharding key.
2 – MongoDB Hashed Sharding Strategy
Let us now look at the sharding strategies available in MongoDB.
The first is the hashed sharding strategy.
In this strategy, MongoDB computes the hash of the shard key’s value. The chunk ranges are determined based on the hashed shard key’s value.
Here’s what it looks like in practice:
The great part is that MongoDB automatically computes the hashes when you send a query with a hashed index. You don’t need to compute the hashes in your application program.
This frees up the developer to simply focus on business logic without worrying about the whole business of hashing.
3 – MongoDB Ranged Sharding Strategy
The second sharding strategy supported by MongoDB is the ranged sharding strategy.
Range-based sharding works by dividing the data into contiguous ranges determined by the shard key values.
Here’s what it looks like in practice.
Each chunk has a fixed range that it can support.
Chunk 1 supports the range from minimum key value to 9. Chunk 2 takes care of 10 to 19. And, Chunk 3 handles 20 to the max key value.
When a particular key value is queried by the application, MongoDB gets the data from the correct chunk.
Note that the range-based sharding strategy is the default strategy.
4 – Hashed vs Ranged Sharding Strategy in MongoDB
We have looked at both the sharding strategies MongoDB supports.
The question is – when to use which strategy?
Hashed Keys are great for a uniform distribution of the data across different shards. If you have a key that increases monotonically (such as a numeric id, timestamp, or the MongoDB object), you should go for hashed sharding strategy.
For such keys, ranged sharding causes a particular shard to get a bulk of read and write operations and mitigates the benefits of sharding.
As an example, if you had three keys close to each other and you went for a ranged sharding strategy, the distribution will look like the below illustration.
In fact, in a monotonically increasing shard key, the key value is always increasing.
This means that the chunk meant for the upper bound value of the key always receives the majority of the records. Ultimately, it defeats the purpose of sharding in the first place.
In a hashed strategy, the three keys shown in the above diagram would have still landed in totally different chunks resulting in a more uniform distribution.
A range-based sharding strategy works best for shard keys that have a large cardinality and low frequency of a particular key.
In other words, you should be able to create a large number of shards for such a key.
For example, a field like product price can act as a good key for a range-based sharding strategy if the distribution of products with the price is uniform. However, it won’t be a good idea if there is a high frequency of products at a particular price point.
Needless to say, range-based sharding is largely ineffective for monotonically increasing shard keys. In fact, they can create more issues than advantages.
MongoDB supports two sharding strategies – hashed and range-based.
Both strategies have their own advantages and disadvantages. They fit certain situations better and you have to choose a strategy based on the specific situation.
A lot of it also depends on the shard key that you are using. That must be one of the first decisions you take before getting started with sharding
In the next post, I explain how you can choose the right shard key for MongoDB.
Looking for More?
Are you struggling to grow in your current role and stay relevant in your organization?
Do you feel you are getting short-changed when it comes to promotions?
You might think that hard work is the answer.
And of course, hard work is important. But you also need a clear direction.
At Progressive Coder, my goal is to provide actionable information that can help you sharpen your skillset technically as well as non-technically so that you can become the best at your job and get the recognition you deserve.
Subscribe now and get two emails per week (packed with important info) delivered right to your mailbox.