At Strategies we started using MongoDb for our development a number of years ago. In the previous article in this series I discussed two of the lessons we have learned while using MongoDb for our projects. This article will continue by discussing the next lesson.

Lesson 3 – Related Data, Reference or Embed?

In the previous article, Lesson 1 discussed the fact that MongoDb is not designed for Relational Data. However it does have mechanisms for handling Related Data, and in cases where perfect Data Integrity is not as important as speed; it is still the better choice over slower Relational Database Management Systems (RDBMS) such as MySQL.

At Strategies we have chosen to adopt Zend Framework 2 for our bespoke websites, and often use Doctrine 2 as a Database Abstraction Layer (DBAL). Doctrine 2 supports the following 3 options for handling Related Data :-

  1. Embed Mapping – In this case all (or some) of data from the foreign Document is embedded into the local Document.
  2. DBRef – In this case the foreign Document is referenced by storing it’s ID and collection in a DBRef field
  3. Simple Reference – In this case the foreign Document is referenced by storing only the ID, the collection is stored in Doctrine’s metadata for the Document type.

Embedding is the method preferred by MongoDb, it is designed as a Document-Oriented database, and therefore prefers that all data related to a Document is contained within. The advantage here is speed, documents with embedded data can be queried simply by using dot notation, and there is no need for joins (which is good because MongoDb has no joins).

The downsides of Embedded data are that it is very likely that the related data will be duplicated across many documents, and also it is difficult to keep all of that duplicated data up-to-date (if needed). If these two issues outweigh the benefit of performance, it is likely that you need to use an RDBMS rather than MongoDb.

Both types of referencing carry the same advantages and disadvantages. Which are essentially the antithesis of those with Embedded data, referenced data is trivial to keep up-to-date and there is no duplication, however you are limited to querying documents only by the ID of related data. Since there are no joins, if you need to query via another related field; an extra query will be required to find the ID of the related Document, therefore hurting performance.

So, which to choose? The answer to that question is very subjective, depending on the needs and relationships in your system either option could be correct, and it is very likely that a system will need to use a mixture of Embedded and Referenced data. This is certainly the case at Strategies, in fact in many case we have actually found it optimal to use both References and Embedding for the same related data to get the best of both worlds (with an added overhead of-course).

When using Doctrine 2 for data mapping, it is very useful to use Doctrine’s supplied Object Select Form Elements which automatically load in Documents of the related type as the select options. However, I have found that these work best with Referenced data rather than embedded data. This meant that it was wise for us to use References for our related data.

However, we also make heavy use of ElasticSearch for our searching. ElasticSearch does not resolve DBRefs (or Simple References) when indexing, therefore it is necessary to embed any data that needs to be indexed by ElasticSearch so it can be included in searches.

Therefore our choice was clear, for most of our related data we actually use DBRef for ease of administration with Doctrine Form Elements; and embed the Referenced data as well so it is indexed by ElasticSearch.

In a later article I will discuss the methods we use to keep the embedded data up-to-date with the References, and also look out for the final article in this series where I will discuss the remaining lessons we have learned while using MongoDb.