Querying for MongoDB Documents Where Every Sub-document Matches a Pattern

Imagine you’re implementing a feature to remove all “empty” documents from a MongoDB “events” collection. These documents aren’t completely empty. Each document will always have an array of checks, and each check will have an array of sources that satisfied that check.

For example, here are a few documents in our events collection:


{
    "checks": [
        {
            "sources": []
        },
        {
            "sources": []
        },
    ]
}


{
    "checks": [
        {
            "sources": [
                {
                    "_id": 1
                }
            ]
        },
        {
            "sources": []
        },
    ]
}

The first document is considered “empty”, because all of the sources arrays nested within each of the checks is empty ([]). The second document isn’t empty, because the sources array within the first check isn’t an empty array.

That’s Not How All Works!

That’s the situation I found myself in while working on a recent client project. Skimming (too quickly) through the MongoDB documentation, I decided that the $all query operator would be the perfect tool to solve my problem. After all, I was looking for documents where all of the sources arrays were empty.

My first stab at cleaning up these empty event documents looked something like this:


await Event.model.deleteMany({
    checks: {
        $all: [
            {
                $elemMatch: {
                    sources: { $eq: [] }
                }
            }
        ]
    }
});

My plan was to use the $all and $elemMatch MongoDB operators together to find and remove any event documents who’s checks sub-documents all contain empty ({ $eq: [] }) sources arrays.

Unfortunately, this is not how $all works.

The $all operator matches on any documents who’s sub-documents contain “all” of the elements listed within the $all operator. It does not match on documents where every sub-document matches the described element.

For example, a query like { foo: { $all: [ 1, 2, 3 ] } } will match on documents that have 1, 2, and 3 within their foo array. Their foo arrays can contain other elements, like 4, 5, or 6, but it must at least contain all of our specified values.

Applying this to our situation, we can see that our query will delete all documents that have at least one empty check, and not documents that have all empty checks.

This means we’re deleting event documents that aren’t empty!

We Don’t Even Need All

Armed with this new knowledge and guided by a failing test, I went to work trying to refactor my solution. After much tinkering, I came to the conclusion that the query I’m after can’t be directly expressed using the $all query operator. However, flipping the problem around, I came to a simpler solution that doesn’t make use of $all at all!

Instead of looking for documents where every sub-document in the checks array contains an empty sources array, let’s look for documents that have a non-empty sources array in any one of their checks. The result set we’re looking for is the inverse of this set of documents.

We can express this nice and neatly with MongoDB’s $not operator:


await Event.model.deleteMany({
    checks: {
        $not: {
            $elemMatch: {
                sources: { $ne: [] }
            }
        }
    }
});

As you can see, we’re removing any events who’s checks array does not contain any sub-documents with a non-empty sources array.

That’s it! With that, we’re able to clean up any truly “empty” events, while preserving any events with non-empty sources.