Have you decided to explore sharding for your MongoDB database?

The first step is to get a grasp on the sharding architecture of MongoDB.

This is essential in order to design an efficient cluster so that you can reap the actual benefits of sharding.

After all, sharding is an involved process and you don’t want to go about it without proper understanding and planning.

In this post, I will take you through the key aspects of the sharding architecture of MongoDB.

How does Sharding in MongoDB work?

To use sharding in MongoDB, the first thing you need is a cluster.

A cluster is a group of interconnected servers (also known as nodes) that work together to provide a unified resource. By stacking servers in a cluster, you can do more with your system by scaling things horizontally.

In the case of MongoDB, this cluster is known as the sharded cluster.

A MongoDB sharded cluster has 3 important components:

  • Shard
  • Mongos Router
  • Config Servers

Here’s how the three components connect and interact with each other.

mongodb sharding architecture overview
An Overview of the MongoDB Sharding Architecture

Let’s look at each component in greater detail:

Shard

In the illustration, the green boxes are shards

A shard is a subset of the data which means that a group of shards holds the entire data for the cluster.

In MongoDB, each shard is deployed as a replica set. This means that each shard is also a cluster of MongoDB instances that implement replication and automated failover.

See the below illustration.

mongodb shards consist of replica sets
MongoDB Shard

Here, the upstream system is the Mongos router.

When operating your MongoDB cluster in the sharded mode, you should not make direct requests to a particular shard.

Since each shard only has some part of the data, the queries sent directly to the shard will return only a subset of the data. This is usually not what we want in our application.

All incoming requests to the shard at the application level should come through the Mongos router. We will talk more about it in the next section.

You should directly connect to a shard only when you need to perform some local administrative and maintenance operations.

Mongos Router

The Mongos router plays a key role within a MongoDB cluster.

Instead of querying the shard from your application server, you make a query to the mongos instance.

The mongos instance routes the incoming queries to the appropriate shard within the sharded MongoDB cluster.

This means that from the perspective of an application, mongos is the only interface to a sharded cluster. Think of it as a middleman that coordinates access to an important person.

The below illustration shows the role of the mongos router.

the role of mongos router in mongodb sharding
The Role of Mongos Router

Here are two important activities performed by the mongos instance.

  • Query routing and load balancing – The mongos router is the single entry point for client applications that want to connect to a sharded cluster. It routes the queries to the appropriate shards and distributes the workload to ensure high performance.
  • Metadata caching – The mongos tracks what data is present on which shard. It uses this information to route queries to the correct shard. However, it does not store any persistent state and instead relies on caching the metadata from the config servers. This makes it super-efficient in terms of resource utilization.

Config Servers

The last component in the MongoDB sharding architecture puzzle is the Config Server.

The job of a Config Server is to store the metadata for your MongoDB sharded cluster.

Think of this metadata as the index for your cluster. It answers key questions such as:

  • How the data is organized?
  • What all components are present in the cluster?

The metadata includes the list of chunks on each and every shard and the ranges that define the chunks.

As we discussed earlier, the mongos instance relies on the config server for all its information. It caches the information from the config server and uses it to route the read and write requests to appropriate shards.

If the config server is the central authority in a cluster, the mongos instance is like the enforcer.

That’s it

The MongoDB sharding architecture makes it easy to visualize what you need to have in order to implement sharding.

Though it is the first step in the process of sharding, I feel this is the most important one as it gives you the big-picture view of the overall sharding requirements.

In the next post, I will explain the sharding strategies supported by MongoDB and when you should go for a particular strategy.

Looking for More?

Do you feel you are getting short-changed when it comes to promotions?

Are you struggling to grow in your current role and stay relevant in your organization?

You might think that hard work is the answer.

And of course, hard work is important. But you also need a clear direction.

At Progressive Coder, I provide actionable information on sharpening your skillset technically as well as non-technically so that you can become the best at your job and get the recognition you deserve.

Subscribe now and get two emails per week (packed with important info) delivered right to your mailbox.

Categories: BlogMongoDB

Saurabh Dashora

Saurabh is a Software Architect with over 12 years of experience. He has worked on large-scale distributed systems across various domains and organizations. He is also a passionate Technical Writer and loves sharing knowledge in the community.

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *