MongoDB

Tutorials

Fixing Errors

Fixing MongoDB 'cursor' Error Fix: MongoDB Socket Unlink Error Fix MongoDB NetworkError Fixing MongoDB Sort Data Error Fix MongoDB String to Date Error Fixing MongoDB NoSuchKey Error Fixing MongoDB HostUnreachable Error Fixing MongoDB HostNotFound Error Solving MongoDB FailedToParse Fixing MongoDB UnsupportedFormat Error Fixing BSON Parser Error Fixing MongoDB bad auth Error Fixing Mongo CannotReuseObject Error Fixing MongoDB Lock File Error MongoDB LockTimeout Fixes Fixing MongoDB AlreadyInitialized Fix MongoDB PrivilegeError Solving MongoDB Update Conflicts MongoDB InvalidDBRef Error Solutions Fixing MongoDB LockBusy Error Fix MongoDB AuthSchemaIncompatible Error Fixing MongoDB Capped Doc Error Solving MongoDB WriteConflict Solving MongoDB InvalidUUID Error Resolving MongoDB MigrationConflict

MongoDB: How to select distinct values from a collection (with examples)

Updated: February 3, 2024 By: Guest Contributor Post a comment

Table Of Contents

1 Introduction

2 Basic Usage of Distinct

3 Using Distinct with a Query

4 Advanced Usage with Aggregation Pipeline

5 Combining Filters with Aggregation

6 Incorporating Sort and Project

7 Projection of Distinct Sub-Documents

8 Conclusion

Introduction

Working with databases frequently involves handling large datasets with potential duplicates. Such is the case with MongoDB, a NoSQL database popular for its flexibility and scalability. One common requirement for database operations is to retrieve unique instances of values for certain fields across your records (or documents, in MongoDB parlance). In this tutorial, we will explore the process and varied approaches to selecting distinct values from a collection in MongoDB.

Let’s say we have a MongoDB collection named ‘products’ which contains multiple documents with potential duplicate items. Our target is to filter the duplicate entries and retrieve a list of unique products. To achieve this, MongoDB offers the distinct command. It performs an operation that identifies unique values within a specified field across a single collection.

Basic Usage of Distinct

To start with a straightforward example, suppose we want to retrieve all the unique ‘category’ values from our ‘products’ collection. The fundamental distinct command would look like this:

db.products.distinct("category")

This command will return an array of unique categories from the products collection. Here is a sample output you might expect:

["Electronics", "Clothing", "Kitchenware"]

Using Distinct with a Query

Now, let’s take it a step further. What if you want to get distinct values conditioned by a certain filter? MongoDB’s distinct function also allows you to specify a query object to filter the documents before fetching the distinct values. Here’s an example:

db.products.distinct("category", { price: { $gt: 99 } })

In this case, we are asking MongoDB to provide a list of distinct categories for products priced over $99. Accordingly, the output would adapt to this condition:

["Electronics", "Kitchenware"]

Advanced Usage with Aggregation Pipeline

The distinct command provides a quick and easy way to extract unique values, however, it has its limitations when it comes to more complex operations. For advanced scenarios, MongoDB’s aggregation pipeline comes into play—it allows you to string together a series of operations to process your data.

To illustrate this, imagine you want to know the distinct categories with an added requirement of knowing how many products fall under each. You’d need to combine $group and $sum operators:

db.products.aggregate([
    {
        $group: {
            _id: "$category",
            count: { $sum: 1 }
        }
    }
])

The output now includes each unique category alongside the count of corresponding products:

[
    { "_id": "Electronics", "count": 15 },
    { "_id": "Clothing", "count": 25 },
    { "_id": "Kitchenware", "count": 8 }
]

Combining Filters with Aggregation

If we extend our example to include a filtering step before grouping the data, we can add a $match stage to our pipeline:

db.products.aggregate([
    {
        $match: { price: { $gt: 99 } }
    },
    {
        $group: {
            _id: "$category",
            count: { $sum: 1 }
        }
    }
])

This returns a count of categories for products above $99, filtering out any others. The same selective listing, now based on the price, might look like this:

[
    { "_id": "Electronics", "count": 10 },
    { "_id": "Kitchenware", "count": 5 }
]

Incorporating Sort and Project

Continuing from here, you may also want to sort these categories based on their product count or include/exclude specific fields from the results. To do so, you’ll add $sort and $project stages respectively to the pipeline:

db.products.aggregate([
    {
        $match: { price: { $gt: 99 } }
    },
    {
        $group: {
            _id: "$category",
            count: { $sum: 1 }
        }
    },
    {
        $sort: { count: -1 }
    },
    {
        $project: { category: "$_id", _id: 0, itemCount: "$count" }
    }
])

The categories are now presented in descending order based on the number of items they have above the specified price, and they have been reformatted for readability:

[
    { "category": "Clothing", "itemCount": 25 },
    { "category": "Electronics", "itemCount": 10 },
    { "category": "Kitchenware", "itemCount": 5 }
]

Projection of Distinct Sub-Documents

For complex structures such as nested documents, getting distinct sub-documents can be achieved through a combination of $unwind, $group, and careful projection. This ensures you can flatten arrays, consolidate them into groups, and then project distinct sub-documents.

Assuming every product has multiple reviews that are sub-documents within an array, and we want distinct user names from reviews, the aggregation framework provides a systematic methodology:

db.products.aggregate([
    {
        $unwind: "$reviews"
    },
    {
        $group: {
            _id: "$reviews.username",
            count: { $sum: 1 }
        }
    },
    {
        $project: { username: "$_id", _id: 0 }
    }
])

The output would be something like this:

[
    { "username": "johndoe" },
    { "username": "janedoe" }
]

Conclusion

Throughout this tutorial, we have explored a variety of strategies for selecting distinct values from a MongoDB collection. Whether you require a simple list of distinct field values or a complex aggregation of unique data patterns after comprehensive transformation and filtering, MongoDB offers powerful and flexible solutions to address your needs. We progressed from basic distinct field queries to advanced aggregation pipelines techniques, each with appropriate code examples and expected outputs to guide you through your own use cases.

Next Article: 3 Ways to install MongoDB on Mac

Previous Article: MongoDB: How to retry on read/write failure (with examples)

Series: MongoDB Tutorials

MongoDB