Embedded and Reference Documents: Understanding the Differences and Benefits.

In this article, We are going to be discussing embedded and reference documents in mongoose and the details involved in both styles when using mongoose to model your database.

What is an Embedded Document?

Embedded documents are documents that are stored within another document. In a database context, this means that data from one document is included as a subdocument within another document.

some examples of how embedded and reference documents can be used in MongoDB, a popular NoSQL database.

Embedded Document Example:

Suppose we have a collection of blog posts (Post schema), and each post can have comments which will be represented as an array of objects. We can embed the comments information within the post document.

{
  _id: ObjectId("612af888464c9132f35eeec5"),
  text: "NodeJs capabilities",
  commets: [{
    name: "F. Scott Fitzgerald",
    description: 'Nice hints'
  }],
  date_posted: ISODate("1925-04-10T00:00:00Z"),
}

In this kind of document, we can retrieve the document using one query and also update the new post comments by using the push() function in the javascript array method.

Some benefits of using embedded documents in this way are:

  • Improved performance: When we query for a post, we can retrieve the comments information along with it in a single database call.

  • Simplified data modeling: The comments information is directly associated with the comments, so we don't need to create a separate collection for comments.

  • Improved data consistency: If the comments' information changes (e.g. the comment changes their text), the change will be reflected in all posts that reference that comments.

Cons of Embedded Documents:

  • Increased document size: Because all related data is stored together, embedding large or frequently updated related data can lead to increased document size, which can negatively impact performance and scalability.

  • Reduced flexibility: Embedding related data in a single document can make it difficult to work with data in a more flexible way, especially in cases where the related data needs to be queried separately from the parent document.

  • Data duplication: Because related data is stored within each document that needs it, there is a risk of data duplication if the related data changes frequently, which can lead to inconsistencies in the data.

What is a Reference Document

Reference documents, on the other hand, are documents that reference data from another document. Rather than including the data within the document, a reference document includes a reference to the related document.

Reference Document Example:

Now let's suppose we have a collection of events and a collection of artists. Each artist is associated with an event or multiple events they are performing and vice-versa. We can use a reference to the event document in the artist document and also reference the artist document to the event document, like this:

Event collection

{
        "_id": "6459a7ca1bb9d19aa0f44818",
        "name": "Beyond Wonderland",
        "type": "concert",
        "artists": [
            {
                "_id": "64494d7c7d5f9a6be6b87774",
                "name": [
                    "AC Slater"
                ],
                "image": "https://res.cloudinary.com/abiodundev/image/upload/v1682525564/mqyf0eszjro9dtr8mrbq.webp"
            }
        ],
 }

Here, the artists include a reference to the events which is an array of object documents who will perform in a certain event using the artists field.

Artist collection

 {
            "_id": "64494d7c7d5f9a6be6b87774",
            "name": [
                "AC Slater"
            ],
            "image": "https://res.cloudinary.com/abiodundev/image/upload/v1682525564/mqyf0eszjro9dtr8mrbq.webp",
            "events": [
                {
                    "_id": "6459a7ca1bb9d19aa0f44818",
                    "name": "Beyond Wonderland",
                    "promoter": "Insomniac",
                    "type": "concert",
                    "artists": [
                        "64494d7c7d5f9a6be6b87774",
                        "64494dc27d5f9a6be6b87776",
                    ],
                    "lineup_link": "https://socal.beyondwonderland.com/lineup/",
   },

Here, the events include a reference to the artists which is an array of object documents who will be performing in the events, using the events field.

Some benefits of using reference documents in this way are:

  • Improved data integrity: If an artist updates their data and vice-versa, the change will be reflected in all events that reference the artists and vice-versa.

  • Improved scalability: By keeping the event information in a separate collection, we can scale our application more easily by sharding the event collection independently of the artist collection.

  • Increased flexibility: We can easily perform queries that involve multiple collections, such as finding all artists performing in an event with a certain name domain.

Cons of Reference Documents:

  • Increased complexity: Using reference documents can increase the complexity of the data schema since it requires joins or additional queries to retrieve related data.

  • Reduced performance: Because related data is stored in separate documents, retrieving data from multiple documents can require multiple read operations, which can negatively impact performance.

  • Potential for data inconsistency: If the related data changes frequently, there is a risk of inconsistencies in the data if not properly managed, since multiple documents may reference the same related data.

    Conclusion

    The embedded should only be used if the size of the documents will be small comments, or likes but for larger documents. We should use the reference documents type.

    Thank you for reading! Feel free to ask me any questions.

    I'd love to connect with you on Twitter | LinkedIn | GitHub

    See you in my next blog article. Take care!!!