One year ago today, I wrote an article discussing NoSQL Injection and GraphQL. I praised GraphQL for eradicating the entire possibility of NoSQL Injection.

I claimed that because GraphQL forces you to flesh out the entirety of your schema before you ever write a query, it’s effectively impossible to succumb to the incomplete argument checking that leads to a NoSQL Injection vulnerability.

Put simply, this means that an input object will never have any room for wildcards, or potentially exploitable inputs. Partial checking of GraphQL arguments is impossible!

I was wrong.

NoSQL Injection is entirely possible when using GraphQL, and can creep into your application through the use of “custom scalar types”.

In this article, we’ll walk through how the relatively popular GraphQLJSON scalar type can open the door to NoSQL Injection in applications using MongoDB.

Custom Scalars

In my previous article, I explained that GraphQL requires that you define your entire application’s schema all the way down to its scalar leaves.

These scalars can be grouped and nested within objects, but ultimately every field sent down to the client, or passed in by the user is a field of a known type:

Scalars and Enums form the leaves in request and response trees; the intermediate levels are Object types, which define a set of fields, where each field is another type in the system, allowing the definition of arbitrary type hierarchies.

Normally, these scalars are simple primitives: String, Int, Float, or Boolean. However, sometimes these four primitive types aren’t enough to fully flesh out the input and output schema of a complex web application.

Custom scalar types to the rescue!

Your application can define a custom scalar type, along with the set of functionality required to serialize and deserialize that type into and out of a GraphQL request.

A common example of a custom type is the Date type, which can serialize Javascript Date objects into strings to be returned as part of a GraphQL query, and parse date strings into Javascript Date objects when provided as GraphQL inputs.

Searching with JSON Scalars

This is all well and good. Custom scalars obviously are a powerful tool for building out more advanced GraphQL schemas. Unfortunately, this tool can be abused.

Imagine we’re building a user search page. In our contrived example, the page lets users search for other users based on a variety of fields: username, full name, email address, etc…

Being able to search over multiple fields creates ambiguity, and ambiguity is hard to work with in GraphQL.

To make our lives easier, let’s accept the search criteria as a JSON object using the GraphQLJSON custom scalar type:


type Query {
    users(search: JSON!): [User]
}

Using Apollo and a Meteor-style MongoDB driver, we could write our users resolver like this:


{
    Query: {
        users: (_root, { search }, _context) => {
            return Users.find(search, {
                fields: {
                    username: 1, 
                    fullname: 1, 
                    email: 1
                }
            });
        }
    }
}

Great!

But now we want to paginate the results and allow the user to specify the number of results per page.

We could add skip and limit fields separately to our users query, but that would be too much work. We’ve already seen how well using the JSON type worked, so let’s use that again!


type Query {
    users(search: JSON!, options: JSON!): [User]
}

We’ve extended our users query to accept an options JSON object.


{
    Query: {
        users: (_root, { search, options }, _context) => {
            return Users.find(search, _.extend({
                fields: {
                    _id: 1,
                    username: 1, 
                    fullname: 1, 
                    email: 1
                }
            }, options));
        }
    }
}

And we’ve extended our users resolver to extend the list of fields we return with the skip and limit fields passed up from the client.

Now, for example, our client can make a query to search for users based on their username or their email address:


{
    users(search: "{\"username\": {\"$regex\": \"sue\"}, \"email\": {\"$regex\": \"sue\"}}",
          options: "{\"skip\": 0, \"limit\": 10}") {
        _id
        username
        fullname
        email
    }
}

This might return a few users with users with "sue" as a part of their username or email address.

But there are problems here.

Imagine a curious or potentially malicious user making the following GraphQL query:


{
    users(search: "{\"email\": {\"$gte\": \"\"}}",
          options: "{\"skip\": 0, \"limit\": 10}") {
        _id
        username
        fullname
        email
    }
}

The entire search JSON object is passed directly into the Users.find query. This query will return all users in the collection.

Thankfully, a malicious user would only receive our users’ usernames, full names, and email addresses. Or would they?

The options JSON input could also be maliciously modified:


{
    users(search: "{\"email\": {\"$gte\": \"\"}}",
          options: "{\"fields\": {}}") {
        _id
        username
        fullname
        email
    }
}

By passing in their own fields object, an attacker could overwrite the fields specified by the server. This combination of search and options would return all fields (specified in the GraphQL schema) for all users in the system.

These fields might include sensitive information like their hashed passwords, session tokens, purchase history, etc…

Fixing the Vulnerability

In this case, and in most cases, the solution here is to be explicit about what we expect to receive from the client. Instead of receiving our flexible search and options objects from the client, we’ll instead ask for each field individually:


type Query {
    users(fullname: String,
          username: String,
          email: String,
          skip: Number!,
          limit: Number!): [User]
}

By making the search fields (fullname, username, and email) optional, the querying user can omit and of the fields they don’t wish to search on.

Now we can update our resolver to account for this explicitness:


{
    Query: {
        users: (_root, args, _context) => {
            let search = _.extend({}, args.fullname ? { fullname } : {},
                                      args.username ? { username } : {},
                                      args.email ? { email } : {});
            return Users.find(search, {
                fields: {
                    _id: 1,
                    username: 1, 
                    fullname: 1, 
                    email: 1
                },
                skip: args.skip,
                limit: args.limit
            });
        }
    }
}

If either fullname, username, or email are passed into the query, we’ll add them to our query. We can safely dump this user-provided data into our query because we know it’s a String at this point thanks to GraphQL.

Lastly, we’ll set skip and limit on our MongoDB query to whatever was passed in from the client. We can be confident that our fields can’t possibly be overridden.

Final Thoughts

Custom scalar types, and the JSON scalar type specifically, aren’t all bad. As we discussed, they’re a powerful and important tool for building out your GraphQL schema.

However, when using JSON types, or any other sufficiently expressive custom scalar types, it’s important to remember to make assertions about the type and shape of user-provided data. If you’re assuming that the data passed in through a JSON field is a string, check that it’s a string.

If a more primitive GraphQL type, like a Number fulfills the same functionality requirements as a JSON type, even at the cost of some verbosity, use the primitive type.