Batching GraphQL Queries with DataLoader

One of the biggest drawbacks of an out-of-the-box GraphQL solution is its tendency to make ridiculous numbers of N+1 queries. For example, consider the following GraphQL query:


{
    patients {
        name
        bed {
            code
        }
    }
}

We’re trying to grab all of the patients in our system, and for each patient, we also want their associated bed.

While that seems simple enough, the resulting database queries are anything but. Using the most obvious resolvers, our GraphQL server would ultimate make N+1 queries, where N represents the number of patients in our system.


const resolvers = {
    Query: {
        patients: (_root, _args, _context) =>  Patients.find({})
    },
    Patient: {
        bed: ({ bedId }, _args, _context) => Beds.findOne({ _id: bedId })
    }
};

Our application first queries for all patients (Patients.find), and then makes a Beds.findOne query for each patient it finds. Thus, we’ve made N (bed for patients) +1 (patients) queries.

This is unfortunate.

We could easily write a traditional REST endpoint that fetches and returns this data to the client using exactly two queries and some post-query transformations:


return Patients.find({}).then(patients => {
    return Beds.find({ _id: { $in: _.map(patients, 'bedId') } }).then(beds => {
        let bedsById = _.keyBy(beds, '_id');
        return patients.map(patient => {
            return _.extend({}, patient, {
                bed: bedsById[patient.bedId]
            });
        });
    });
});

Despite its elegance, the inefficiency of the GraphQL solution make it a no-go for many real-world applications.

Thankfully, there’s a solution! 🎉

Facebook’s dataloader package is the solution to our GraphQL inefficiency problems.

DataLoader is a generic utility to be used as part of your application’s data fetching layer to provide a consistent API over various backends and reduce requests to those backends via batching and caching.

There are many fantastic resources for learning about DataLoader, and even on using DataLoader in an Apollo-based project. For that reason, we’ll skip some of the philosophical questions of how and why DataLoader works and dive right into wiring it into our Apollo server application.

All we need to get DataLoader working in our application is to create our “batch”, or “loader” functions and drop them into our GraphQL context for every GraphQL request received by our server:


import loaders from "./loaders";
...
server.use('/graphql', function(req, res) {
    return graphqlExpress({
        schema,
        context: { loaders }
    })(req, res);
});

Continuing on with our current patient and bed example, we’ll only need a single loader to batch and cache our repeated queries against the Beds collection.

Let’s call it bedLoader and add it to our loaders.js file:


export const bedLoader = new DataLoader(bedIds => {
    // TODO: Implement bedLoader
});

Now that bedLoader is being injected into our GraphQL context, we can replace our resolvers’ calls to Beds.findOne with calls to bedLoader.load:


const resolvers = {
    Patient: {
        bed: ({ bedId }, _args, { loaders }) => loaders.bedLoader.load(bedId)
    }
};

DataLoader will magically aggregate all of the bedId values that are passed into our call to bedLoader.load, and pass them into our bedLoader DataLoader callback.

Our job is to write our loader function so that it executes a single query to fetch all of the required beds, and then returns them in order. That is, if bedIds is [1, 2, 3], we need to return bed 1 first, bed 2 second, and bed 3 third. If we can’t find a bed, we need to return undefined in its place:


export const bedLoader = new DataLoader(bedIds => {
    return Beds.find({ _id: { $in: bedIds } }).then(beds => {
        const bedsById = _.keyBy(beds, "_id");
        return bedIds.map(bedId => bedsById[bedId]);
    });
});

That’s it!

Now our system will make a single query to grab all of the patients in our system. For every patient we find, our bed resolver will fire and that patient’s bedId into our bedLoader DataLoader. Our bedLoader DataLoader will gather all of our bedId values, make a single query against the Beds collection, and return the appropriate bed to the appropriate bed resolver.

Thanks to DataLoader we can have the elegance of a GraphQL approach, combined with the efficiency and customizability of the manual approach.