Understanding the N+1 problem with GraphQL

Does the N+1 problem ring a bell? You’ve probably heard about it already, in connection with GraphQL or with ORMs. It’s one of those traps that don’t show up in the code, that go unnoticed in development with a tiny dataset, and that suddenly make response times collapse in production. Before you can solve it, you first need to understand exactly how it shows up.

Where the name “N+1” comes from

The problem takes root in a context of fetching relational data, where an initial query (the “+1”) is followed by a series of additional queries (the “N”), hence the name: N + 1.

Be careful, though, not to reduce that “relational” to the database sense alone. More generally, the initial query gives access to a set of items, and each of those items then triggers its own query. The problem can therefore arise just as easily with HTTP APIs as with a database, hence the various names you come across: N+1 Queries or N+1 API Calls.

In other words, this isn’t a flaw specific to a particular technology: it’s a data-access pattern that reappears whenever you iterate over a collection and, for each item, go and fetch its associated data.

GraphQL, especially fertile ground

Take GraphQL, a tool with which the problem can show up very quickly.

GraphQL is a runtime that combines three things: a dedicated query language (a DSL), schemas that define the structure of the accessible data, and a set of resolvers, functions responsible for fetching the data for each element of the schema. It’s precisely that last building block, the resolver, that makes the N+1 so easy to trigger without realizing it.

An example schema: books and their reviews

Let’s imagine a system that gives access to books and their reviews. The GraphQL schema might look like this:

type Query {
  books: [Book]
}

type Book {
  id: ID!
  title: String!
  reviews: [Review]
}

type Review {
  id: ID!
  rating: Int!
}

Now let’s say we have a query that asks for “the list of all books, and for each book, its associated reviews”:

{
  books {
    reviews {
      rating
    }
  }
}

Nothing suspicious at first glance: the query is concise, readable, and expresses exactly the need. The whole trap lies in the way GraphQL is going to execute it.

Resolvers, where everything plays out

For GraphQL to be able to build the response, we need to define the resolvers that will fetch the data:

const resolvers = {
  Query: {
    books: () => booksRepository.findAll(),
  },
  Book: {
    reviews: (book: Book) => reviewsRepository.findByBookId(book.id),
  },
};

The books resolver is invoked only once, which triggers a single SQL query to fetch the entire set of books:

SELECT * FROM books;

So far, so good. That’s the “+1”.

The problem arises at the next step. To resolve the reviews field of each book, GraphQL invokes the reviews resolver once per book present in the result. If the first query returned N books, then the reviews resolver is called N times, which generates a cascade of SQL queries:

SELECT * FROM reviews WHERE book_id = 1;
SELECT * FROM reviews WHERE book_id = 2;
-- ...
SELECT * FROM reviews WHERE book_id = N;

There are the “N” being added to the “+1”. One additional query per book.

The bill adds up fast

Concretely, with 50 books in the database, we execute 50 + 1 = 51 SQL queries just to fetch the books and their reviews. And that’s in a case as simple as this one, with a trivial volume of data.

Running a server, the phenomenon jumps out in the logs: the single initial query, immediately followed by a burst of nearly identical queries, differing only by the book identifier.

Server running at http://127.0.0.1:4000
Query > "SELECT * FROM books"
Query > "SELECT * FROM reviews WHERE book_id = 1"
Query > "SELECT * FROM reviews WHERE book_id = 2"
Query > "SELECT * FROM reviews WHERE book_id = 3"
Query > "SELECT * FROM reviews WHERE book_id = 4"
Query > "SELECT * FROM reviews WHERE book_id = 5"
Query > "SELECT * FROM reviews WHERE book_id = 6"
Query > "SELECT * FROM reviews WHERE book_id = 7"
Query > "SELECT * FROM reviews WHERE book_id = 8"
Query > "SELECT * FROM reviews WHERE book_id = 9"
Query > "SELECT * FROM reviews WHERE book_id = 10"

It’s a disaster on several levels. Every extra query is a network round-trip to the database, a connection tied up, latency piling up. Where a single well-thought-out query would suffice, we multiply them by the dozen, and that number grows linearly with the amount of data. Move up to 500 books, to 5,000, and your entire response time blows up, under a load that nothing justifies. The worst part is that all of this stays invisible as long as the test dataset is tiny: the problem only reveals itself in production, at the worst possible moment.

Conclusion

That’s how the N+1 problem manifests with GraphQL: an initial query that mechanically triggers N others, because of the way resolvers are invoked field by field, item by item. It’s not really a bug (the code is correct and does exactly what it’s asked to do), but a data-access pattern you have to learn to spot.

It’s essential to first be aware of this trap in order to adopt the right solutions. The best known in the GraphQL ecosystem consists of batching and caching these calls to reduce the cascade down to a handful of queries: that’s the role of the DataLoader pattern, which I cover in the next part of this series, Solving the N+1 problem with DataLoader.