Introduction
MongoDB, a renowned NoSQL database allowing high-performance, agile development practices through its non-relational datastore. Unlike SQL databases, MongoDB uses collections and documents instead of tables and rows. Despite its non-relational nature, MongoDB offers the $lookup
aggregation stage, which bears a resemblance to an SQL JOIN operation, and enables you to combine documents from different collections. In this article, we will delve into how to perform JOIN operations in MongoDB using the $lookup
operator, covering basic to advanced examples.
Getting Started with $lookup
In MongoDB, $lookup
is an aggregation pipeline stage that allows you to specify which collection you want to join with the current collection. This is one way that MongoDB offers to reconcile the non-relational nature of its database with the relational concept of JOIN operations. The $lookup
operator performs an equality match between a field from the documents of the collection being aggregated, often referred to as the “local” field, and a field from the documents of the collection being joined, or the “foreign” field.
Basic Syntax
{ $lookup: {
from: "foreignCollection",
localField: "localFieldName",
foreignField: "foreignFieldName",
as: "outputArrayName"
}}
Here’s a simple example:
// Assuming two collections: orders and customers
// orders document example: { _id: ObjectId("1"), customerId: "C1", total: 100 }
// customers document example: { _id: "C1", name: "John Doe" }
db.orders.aggregate([
{
$lookup: {
from:"customers",
localField:"customerId",
foreignField:"_id",
as: "customerDetails"
}
}
]);
The example above simulates a JOIN between the orders
and customers
collections, where the customerId
field from an order is matched against the _id
of a customer. The resulting documents would contain the original order data, along with a new array field named customerDetails
that includes the matching customer documents. The output might look like this:
{
_id: ObjectId("1"),
customerId: "C1",
total: 100,
customerDetails: [
{ _id: "C1", name: "John Doe" }
]
}
Multiplying Results
One thing to note is that similarly to SQL JOINs, using $lookup
could multiply the number of result documents when multiple matches are found in the foreign collection. For each matching document in the foreign collection, a new document will be added to the output array. If there are no matches, the output array will be empty.
An example could be if a customer has multiple phone numbers stored in a separate collection. The following aggregation would result in a document that features a customer with an array of all of their phone numbers:
// Assuming a separate collection: phone_numbers
// phone_numbers document example: { _id: ObjectId("2"), customerId: "C1", number: "+123456789" }
db.customers.aggregate([
{
$lookup: {
from: "phone_numbers",
localField: "_id",
foreignField: "customerId",
as: "phoneNumbers"
}
}
]);
Here, we would have each customer document followed by a phoneNumbers
array that includes all of their phone numbers.
Unwinding the Results
After performing $lookup
, you might end up with an array of joined documents for each input document. This can be unwound using the $unwind
stage, which is often helpful for flattening this array structure:
// Continuing from the previous examples, to flatten the customerDetails array:
db.orders.aggregate([
{
$lookup: {
from: "customers",
localField: "customerId",
foreignField: "_id",
as: "customerDetails"
}
},
{
$unwind: "$customerDetails"
}
]);
The $unwind
stage will generate a new document for each element in the specified array, copying all the original fields from the input document to each unwound document.
Join Conditions and Sub-pipelines
MongoDB’s $lookup
capability has evolved to support more complex join conditions and the use of sub-pipelines, beyond a simple equality match. The syntax for using a $lookup
with a sub-pipeline is as such:
{
$lookup: {
from: "foreignCollection",
let: { localVariable: "$localField" },
pipeline: [{
$match: {
$expr: { $eq: ["$foreignField", "$localVariable"] }
}
},
// Other aggregation stages can go here as needed
],
as: "outputArrayName"
}
}
This form allows you to specify a pipeline that the foreign collection’s documents must pass in order to be included in the output array. The example starts by defining localVariable
that uses the let
option to capture a local field’s value, which is then used in the sub-pipeline’s match expression.
Handling Null and Missing Fields
When a local or foreign field is null or does not exist, MongoDB will not output a joined result. Use $lookup
in combination with other aggregation stages, such as $match
or $ifNull
, to control how these scenarios are handled.
Consider an example where you want to include orders even if the customer is missing:
// Utilize $ifNull to provide a default value for potentially null fields
db.orders.aggregate([
{
$lookup: {
from: "customers",
localField: "customerId",
foreignField: "_id",
as: "customerDetails"
}
},
{
$project: {
_id: 1,
customerId: 1,
total: 1,
customerDetails: { $ifNull: ["$customerDetails", []] }
}
}
]);
In this aggregation, the $project
stage ensures that customerDetails
is an empty array when no customer information is available, ensuring the data’s consistency.
Advantages and Limitations
Using $lookup
can greatly enhance the power of MongoDB queries by allowing relational operations, but it also comes with certain limitations. There might be performance considerations when using $lookup
on large datasets, as MongoDB has to execute additional processing to join the data. It’s also important to remember the differences in a non-relational database context, such as the potential for duplicated data and the lack of enforcement for referential integrity that is common in SQL databases.
Conclusion
In this article, we explored the $lookup
stage in MongoDB’s aggregation framework, starting with simple equality-based joins and moving on to more complex examples involving sub-pipelines and handling null values. As seen through various examples, $lookup
provides versatility and power to MongoDB queries reminiscent of JOIN operations in relational databases.