You have decided to shard your MongoDB database.

Also, you have selected the sharding strategy or at least, understood the pros and cons of the available strategies.

The next logical step is to select a MongoDB shard key.

MongoDB Shard Key Selection is a very important process. Choosing the right shard key is critical for your application’s performance.

Moreover, it is not a direct answer selection. There are several factors that can impact the selection of MongoDB shard key and balancing the different factors is vital.

In this post, I will go through 6 Factors that determine the process of choosing an appropriate MongoDB Shard Key.

1 – What is a Shard Key in MongoDB?

Shard Key is what determines the distribution of a collection’s documents between various shards.

It can be a single indexed field in your MongoDB collection or a combination of fields that are part of a compound index.

So, if you have a shard key column that can have values from 1 to 100, MongoDB will divide the entire span of possible key values into ranges of shard key values. For example, you can have ranges like 1-25, 26-50, 51-75, and so on.

Here’s what it looks like:

how shard key in mongodb works?
MongoDB Shard Key

Few important points to take note of:

  • The ranges don’t overlap each other.
  • Each range is associated with a chunk.
  • The chunks are evenly distributed between the actual shards.

2 – Factors for MongoDB Shard Key Selection

So, how do you choose a good shard key in MongoDB?

You need to consider multiple factors when selecting an appropriate shard key.

It is not a pure if-else kind of decision-making because each factor can feed into the other factor. Moreover, certain things depend on forecasting the future which is always tricky.

Let’s look at the key factors one-by-one.

2.1 – The Shard Key’s Cardinality

Cardinality is a concept that refers to the number of unique values in a set. For example, the cardinality of set [2, 4, 6] is 3.

It is better to choose a shard key with high cardinality.

Higher the cardinality of your MongoDB shard key, the more the number of chunks and the better the chances of horizontal scalability.

Low cardinality may result in uneven distribution of data.

For example, consider that you have a collection for storing product information and you shard it on the basis of product category. If you have just 2 product categories, the cardinality of the key is just 2. This means there can’t be more than 2 chunks and 2 effective shards.

mongodb shard key cardinality impact
MongoDB Shard Key Cardinality

In the above illustration, the low cardinality means that the distribution of documents is skewed.

That’s not great for horizontal scalability and defeats the purpose of sharding.

2.2 – The Shard Key’s Frequency

We all know about the frequency of data. It simply represents the number of times a particular shard key value occurs in the data.

A uniform distribution of various values is extremely important for creating balanced shards.

A uniform distribution of various values is extremely important for creating balanced shards.

If a small set of possible shard key values has a much higher frequency, then certain chunks will end up with a lot more records than other chunks.

In other words, some chunks will have to bear a greater load resulting in bottlenecks while other chunks remain empty. This negates the benefits of horizontal scalability.

In our product collection example, this can happen in a situation when just a few categories have a very high number of products.

Check the below illustration:

mongodb shard key frequency impact
MongoDB Shard Key Frequency

2.3 – The Growth of the Shard Key

The best way to consider this factor is by asking the below question.

Is the shard key growing or reducing monotonically?

If yes, you might end up with a very high number of data distribution into a single shard.

  • In case of an ever-increasing shard key value, the shard with the upper bound will get the majority of records.
  • Similarly, for a monotonically decreasing shard key value, the shard with the lower bound value gets the majority of records.

Here’s what it looks like in the case of a monotonically increasing shard key value.

mongodb shard key selection growth
How growth impacts MongoDB shard key selection?

Either way, one of the shards becomes highly imbalanced as you continue to store more and more records.

To prevent this from happening, you should go for MongoDB’s hashed sharding strategy in the case of a shard key that changes monotonically.

2.4 – Query Patterns

A good shard key distributes data evenly across the sharded cluster and also facilitates common query patterns.

This requires you to analyze the type of queries that your application is going to send to the sharded cluster.

For example, if you have queries that frequently filter data based on a particular field, it would be better to have that field as the shard key.

Not doing so will result in cross-shard requests and reduce the overall query performance.

These types of queries are also known as scatter-gather queries and they don’t scale properly.

2.5 – Write Distribution

Sharding is an extremely useful strategy for scaling the write operations to your MongoDB database.

However, it’s important to choose a shard key that distributes write operations evenly across shards. Otherwise, you’ll end up with hotspots and bottlenecks.

For example, let’s say you have a sharded MongoDB database containing customer data, and you use the customer’s ZIP code as the shard key. If you have a high concentration of customers in a specific ZIP code, then the shard containing that ZIP code will receive a disproportionate amount of write operations.

This means a bunch of performance issues for write operations in the entire cluster.

Taking another example, if we choose the order date as the shard key for an e-commerce database, then write operations will be evenly distributed across shards. This is because orders are likely to be spread out evenly across time periods.

Of course, this might not be the case on days like Black Friday. Again, it is mostly a balancing act.

2.6 – Growth of Data

Choosing a shard key for the current situation is well and fine.

But what about making predictions about the future?

In my view, that’s quite hard and generally not possible to do on a regular basis. Of course, that does not mean we should not try to take into account future plans when deciding on a sharding key.

Consider the hypothetical scenario where you initially shard the collection based on the user’s country. This is not a bad choice as such because it evenly distributes reads and writes across shards and allows us to easily perform operations on a particular country’s user data.

However, what happens as your user base grows and you find out that you must support more granular queries based on other fields such as user age or account creation date and so on?

If as our user base grows, we realize that we need to support more granular queries based on things like user age or account creation date. If we had chosen country as the shard key, we would be limited in our ability to efficiently perform these types of queries across the entire user base.

Can you even avoid this problem?

Yes, by trying to choose a shard key that scales with future growth and allows for granular queries. For example, if you had decided to shard the user collection based on a unique user ID, you can easily perform queries based on any user attribute, regardless of the size of the user base.

Another alternative is to go for something like compound sharding where you shard the collection on multiple fields. For example, you could shard the user collection on a compound shard key consisting of both the unique user ID and the user’s age. This allows us to efficiently query both fields without requiring scatter-gather operations.

However, there are no easy choices. Compound sharding is always more complex to manage and may require additional resources, as the data distribution across shards becomes more complex.

The point is that there is no one-size-fits-all solution.

That’s it

Choosing the right sharding key for your MongoDB database can determine the effectiveness of your sharding procedure.

But it is like an investment into the future.

Just like investments, there is no best answer and you need to arrive at the correct choice after understanding your application requirements.

Looking for More?

Are you struggling to grow in your current role and stay relevant in your organization?

Do you feel you are getting short-changed when it comes to promotions?

You might think that hard work is the answer.

And of course, hard work is important. But you also need a clear direction.

At Progressive Coder, my goal is to provide actionable information that can help you sharpen your skillset technically as well as non-technically so that you can become the best at your job and get the recognition you deserve.

Subscribe now and get two emails per week (packed with important info) delivered right to your mailbox.

Categories: BlogMongoDB

Saurabh Dashora

Saurabh is a Software Architect with over 12 years of experience. He has worked on large-scale distributed systems across various domains and organizations. He is also a passionate Technical Writer and loves sharing knowledge in the community.

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *