Imagine you’re implementing a feature to remove all “empty” documents from a MongoDB “events” collection. These documents aren’t completely empty. Each document will always have an array of checks
, and each check will have an array of sources
that satisfied that check.
For example, here are a few documents in our events collection:
{
"checks": [
{
"sources": []
},
{
"sources": []
},
]
}
{
"checks": [
{
"sources": [
{
"_id": 1
}
]
},
{
"sources": []
},
]
}
The first document is considered “empty”, because all of the sources
arrays nested within each of the checks
is empty ([]
). The second document isn’t empty, because the sources
array within the first check isn’t an empty array.
That’s Not How All Works!
That’s the situation I found myself in while working on a recent client project. Skimming (too quickly) through the MongoDB documentation, I decided that the $all
query operator would be the perfect tool to solve my problem. After all, I was looking for documents where all of the sources
arrays were empty.
My first stab at cleaning up these empty event documents looked something like this:
await Event.model.deleteMany({
checks: {
$all: [
{
$elemMatch: {
sources: { $eq: [] }
}
}
]
}
});
My plan was to use the $all
and $elemMatch
MongoDB operators together to find and remove any event documents who’s checks
sub-documents all contain empty ({ $eq: [] }
) sources
arrays.
Unfortunately, this is not how $all
works.
The $all
operator matches on any documents who’s sub-documents contain “all” of the elements listed within the $all
operator. It does not match on documents where every sub-document matches the described element.
For example, a query like { foo: { $all: [ 1, 2, 3 ] } }
will match on documents that have 1
, 2
, and 3
within their foo
array. Their foo
arrays can contain other elements, like 4
, 5
, or 6
, but it must at least contain all of our specified values.
Applying this to our situation, we can see that our query will delete all documents that have at least one empty check, and not documents that have all empty checks.
This means we’re deleting event documents that aren’t empty!
We Don’t Even Need All
Armed with this new knowledge and guided by a failing test, I went to work trying to refactor my solution. After much tinkering, I came to the conclusion that the query I’m after can’t be directly expressed using the $all
query operator. However, flipping the problem around, I came to a simpler solution that doesn’t make use of $all
at all!
Instead of looking for documents where every sub-document in the checks
array contains an empty sources
array, let’s look for documents that have a non-empty sources
array in any one of their checks
. The result set we’re looking for is the inverse of this set of documents.
We can express this nice and neatly with MongoDB’s $not
operator:
await Event.model.deleteMany({
checks: {
$not: {
$elemMatch: {
sources: { $ne: [] }
}
}
}
});
As you can see, we’re removing any events who’s checks
array does not contain any sub-documents with a non-empty sources
array.
That’s it! With that, we’re able to clean up any truly “empty” events, while preserving any events with non-empty sources
.