In this post, we will look at MongoDB Schemas. We will concern ourselves with the concept of a schema in MongoDB and how we can design one for our collection.

If you are new to MongoDB, I will recommend you to go through this detailed post on MongoDB Database Creation.

Important disclaimer : In this post, we will be talking more about conceptual aspects. However, with MongoDB recent release, there is also the concept of document validation in MongoDB. It allows us to define strict rules about the schema of our collections. That will be covered in a future post.

1 – MongoDB Schemas

MongoDB is a schema-less database.

This is probably the first pointer that may come to the mind when we hear about schemas in MongoDB. If this statement is true, why are we even talking about MongoDB Schemas.

Well, technically MongoDB is schema-less. In other words, MongoDB does not enforce you to have any schema for the collections. In fact, this is the main advantage that MongoDB enjoys over traditional SQL databases where every table has a fixed layout.

However, in a general application, there are often constraints and requirements. While the flexibility can be liberating, it is often a good idea to have some sort of layout for your collection. In other words, it is good practice to have a schema for your collection.

Does this negate the benefits of MongoDB?

Not exactly. While designing databases, there is a broad spectrum of choices from a completely chaotic approach to a completely rigid approach. See below illustration.

mongodb schema evolution
Moving from left to right signifies stronger rules about the schema

MongoDB allows us to have the best of both worlds. We will look at examples to understand the same.

2 – MongoDB Without Any Rules

In MongoDB, every collection comprises of documents. The documents are a collection of key-value pairs.

Let us take the example of a library database. In this database, we have a books collection. We can insert two documents into the collection as below:

> db.books.insertOne({"bookName":"The Way of Kings", "authorName": "Brandon Sanderson"})
{
	"acknowledged" : true,
	"insertedId" : ObjectId("61cbcd2407216367e66c76f6")
}

> db.books.insertOne({"title":"The Name of the Wind", "creator": "Patrick Rothfuss"})
{
	"acknowledged" : true,
	"insertedId" : ObjectId("61cbcd2407216367e66c76f6")
}

> db.books.find().pretty()
{
	"_id" : ObjectId("61cbcd2407216367e66c76f6"),
	"bookName" : "The Way of Kings",
	"authorName" : "Brandon Sanderson"
}
{
	"_id" : ObjectId("61cbcffb07216367e66c76f7"),
	"title" : "The Name of the Wind",
	"creator" : "Patrick Rothfuss"
}

As you can see, both our documents have completely different schemas. Though both are storing books but they go about the task in completely different ways

Important thing to note is that MongoDB does not care about any of this. It allows us to insert both the documents without any warning or error.

However, at this point, you should have alarm bells ringing in your ears. What if we want to write an application to retrieve data from this collection? Which schema layout should we use? Also, how do we interpret the data? Is bookName same as title? Larger the document, more the questions would be.

While the above approach works fine for storage, it is practically impossible to actually use this data. And without using the data, it has no meaning. After all, we don’t just store data for the sake of it.

3 – A Middle Ground Approach

While the approach in previous section was too chaotic, it is also not a good decision to completely ignore the flexibility of MongoDB.

Wasn’t flexibility one of the important reasons to use MongoDB in the first place?

Let us look at the below example:

> db.books.insertOne({"bookName":"The Way of Kings", "authorName": "Brandon Sanderson", "genre": "Fantasy"})
{
	"acknowledged" : true,
	"insertedId" : ObjectId("61cbcd2407216367e66c76f6")
}

> db.books.insertOne({"bookName":"The Name of the Wind", "authorName": "Patrick Rothfuss"})
{
	"acknowledged" : true,
	"insertedId" : ObjectId("61cbcd2407216367e66c76f6")
}

> db.books.find().pretty()
{
	"_id" : ObjectId("61cbcd2407216367e66c76f6"),
	"bookName" : "The Way of Kings",
	"authorName" : "Brandon Sanderson",
        "genre": "Fantasy"
}
{
	"_id" : ObjectId("61cbcffb07216367e66c76f7"),
	"bookName" : "The Name of the Wind",
	"authorName" : "Patrick Rothfuss"
}

Again, we have the same two books. However, in the first document, we have an additional field genre. This field is not present in the second document. But both the documents have two common fields bookName and authorName.

An application developer can now understand our schema in a much better manner. Also, the additional field can be handled differently to enhance the scope of the application. For example, we can provide a filtering capability based on genre for documents that have a value for the same.

As you can see, this approach helps us make sense of the data. However, it also does not become as rigid as a typical SQL table where every field should have a placeholder whether it contains a value or not.

Conclusion

With this, we have successfully understood the concept behind MongoDB Schemas. We have looked at the pros and cons of flexibility while trying to achieve a middle ground. This will help us design better schemas for our collections.

If you have any comments or queries about this post, please mention in the comments section below.


Saurabh Dashora

Saurabh is a Software Architect with over 12 years of experience. He has worked on large-scale distributed systems across various domains and organizations. He is also a passionate Technical Writer and loves sharing knowledge in the community.

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *