In this post, we will be understanding how to manage data relations in MongoDB.

Relations form a key part of database design. A good relation mapping can increase the performance of an application. It can also make your code more maintainable in the long run. While there is no absolutely right or wrong approach about managing relations, there are some good caveats that can help model the data in a better way.

1 – Two Types of Relations in MongoDB

There are mainly two relationship types we can use – Embedded and Reference.

You can think of embedded relationship as the de-normalized approach. In other words, we try to embed one document into another with the goal of keeping all related information inside the same collection.

On the other hand, the reference relationship is normalized. As you can guess, our attempt is to break the data into different collections and link related documents using a reference id.

Both the approaches are useful and have their merit in different scenarios. Let’s look at them one by one.

2 – Embedded Relations in MongoDB

In embedded relationship, we embed one document within another. In other words, one document will be the parent document while the other will be a child document.

For example, a person has an address. If we store the information of a person in a document, it might be a good idea to also store the address as an embedded document.

See below example:

> db.person.insertOne({name: "Brad Harper", age: 40, address: {city: "New York", area: "Manhattan"}}) 
{
	"acknowledged" : true,
	"insertedId" : ObjectId("61dd45985a2b9c9970f7bc62")
}
> db.person.find().pretty()
{
	"_id" : ObjectId("61dd45985a2b9c9970f7bc62"),
	"name" : "Brad Harper",
	"age" : 40,
	"address" : {
		"city" : "New York",
		"area" : "Manhattan"
	}
}

We have used MongoDB insertOne() method to insert a record and find() to display the same. If interested, you can read in detail about MongoDB CRUD Operations.

Here, the address property is an embedded document. It has its own structure that is more or less independent of the person document.

In this case, we have basically implemented one-to-one relationship. The person document is the parent document and the address is the child document.

We can also implement a one-to-many relationship in the embedded approach. See below example:

> db.post.insertOne({title: "Title1", topic: "Topic1", comments: [{"text": "This is comment1"}, {"text": "This is comment2"}]})
{
	"acknowledged" : true,
	"insertedId" : ObjectId("61dd4c515a2b9c9970f7bc63")
}
> db.post.find().pretty()
{
	"_id" : ObjectId("61dd4c515a2b9c9970f7bc63"),
	"title" : "Title1",
	"topic" : "Topic1",
	"comments" : [
		{
			"text" : "This is comment1"
		},
		{
			"text" : "This is comment2"
		}
	]
}

Here, one post in a blog can have one or more comments. In other words, we have a one-to-many relationship. Also, a comment always belongs to a post. Therefore, it makes sense to store them together as this approach can help from a presentation point of view as well. Basically, the embedded approach works fine in such scenarios.

3 – Reference-based Relations in MongoDB

Let us now consider the reference approach

As an example, we can have one author write many books. Also, every book can have one or more authors. This is a typical one-to-many relation.

Technically, this can also be handled using embedded documents. See below:

> db.books.insertOne({title: "Book1", genre: "History", authors: [{name: "Author1", age: "25"}, {name: "Author2", age: "30"}]})
{
	"acknowledged" : true,
	"insertedId" : ObjectId("61dd527c5a2b9c9970f7bc64")
}
> db.books.find().pretty()
{
	"_id" : ObjectId("61dd527c5a2b9c9970f7bc64"),
	"title" : "Book1",
	"genre" : "History",
	"authors" : [
		{
			"name" : "Author1",
			"age" : "25"
		},
		{
			"name" : "Author2",
			"age" : "30"
		}
	]
}

We have books as the parent document. Within it, we have the authors document that stores the list of authors associated with the book. Now imagine a situation where a particular author is also involved in another book. In that case, the same author data will be part of that book document as well. In other words, there will be duplication of data. If we change the author information (such as age), it has to be updated in every document where that particular author is used.

Needless to say, this is not a great approach to model the data. In such a case, the referenced approach is better.

See below example:

> db.authors.insertOne({_id: 1, name: "Author1", age: "25"})
{ "acknowledged" : true, "insertedId" : 1 }
> db.authors.insertOne({_id: 2, name: "Author2", age: "30"})
{ "acknowledged" : true, "insertedId" : 2 }
> db.books.insertOne({title: "Book2", genre: "History", authors: [{author_id: 1},{author_id: 2}]})
{
	"acknowledged" : true,
	"insertedId" : ObjectId("61dd58bf5a2b9c9970f7bc65")
}
> db.books.find().pretty()
{
	"_id" : ObjectId("61dd58bf5a2b9c9970f7bc65"),
	"title" : "Book2",
	"genre" : "History",
	"authors" : [
		{
			"author_id" : 1
		},
		{
			"author_id" : 2
		}
	]
}

Here, we store the author information in the authors collection using unique ids. Then, we reference to those ids in the books collection. In this way, the author data is independent from the books collection and changes to the author info does not impact other collections. Also, there is no duplication of data in this case.

4 – Decision Making Approach

Relations in MongoDB data model can be a subjective matter. A few things to keep in mind while deciding which approach to follow are as follows:

  • The embedded approach works well when the data in the parent and child component is tightly coupled. Also, it depends on whether the application usually needs the entire data together. For example, a blog post and its associated comments.
  • The embedded approach results in faster fetching.
  • MongoDB document size limit is 16MB. The embedded approach can exceed this limit in case of large document size. For example, if we a have city document with an embedded documents containing details of all the citizens residing in the city.
  • Independent querying of data is an issue in the embedded approach. For example, in the case of author and book, if we need to analyze the author data or book data separately, it will be tough in case both are embedded.
  • The relational approach works well when the entities are independent of each other. For example, we can have a city collection that allows us to manage the city data. Then, we have a citizen collection that stores the information about the city. They can be related using reference id.
  • Querying in relative approach is slower when compared to embedded approach.

Ultimately, there are pros and cons to both the approaches. There is no silver bullet that can determine perfect relations in MongoDB documents.. The choice depends on the business requirements and functional requirements of the application.

Conclusion

With this, we have understood the concept of data relations in MongoDB. We went over the embedded approach and relational approach. Also, we discussed the examples for both along with when one approach might be suitable over another.

If you have any comments or queries about the post, please mention them in the comments section below.


Saurabh Dashora

Saurabh is a Software Architect with over 12 years of experience. He has worked on large-scale distributed systems across various domains and organizations. He is also a passionate Technical Writer and loves sharing knowledge in the community.

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *