El Housseine Jaafari

Posted on Feb 18, 2020 • Edited on Feb 26

GraphQL Pagination

#graphql #pagination #node #apollo

Hey there! Glad you are here and welcome to GraphQL Pagination tutorial. The goal of this tutorial is to guide you through setting up your pagination using graphql.

In graphql there are two major approaches, the first one is offset style and the second one goes by cursor style.

As a Ninja once said "行きましょ" 🙈.

Offset Pagination

Now that you have your blog up and running, you notice that you are querying all your posts. This is not efficient, especially if you go over a 100 blog posts. Your SQL query will grow by time and spend then more time to execute.

To solve the problem, by default as someone who knows SQL well, you will probably think of pagination. The way you do it is by adding two arguments to your SQL query offset¹ and limit². You are asking right now about how to achieve this in your graphql server.

Allow me to show you an example of fetching 10 blog posts from all your posts starting from the 11th one. Your query will look like this:

query {
  posts(limit: 10, offset: 10) {
    title,
    content
  }
}

Until now everything seems to be working, this type of pagination works great if you have either a static or small data. It results into a good user experience for quickly fetching the next page data.

But, this approach have its downsides and issues that needs to be addressed.

Performance

As mentioned before, in small datasets, the offset approach is good for you and will not break the user experience.

But as the data grows in your dataset and you are trying to fetch the farthest of the results 😩 it will take longer than you can imagine (You may 😢).

But how? (you ask). Let's break it down easily, take this query as an example:

query {
  posts(limit: 10, offset: 10000) {
    title,
    content
  }
}

The query is fetching 10 posts with an offset of 10K, in a first glance you think that's it, your database engine will directly understand the offset and start querying from that row.

Well! That won't happen because the way a relational database engine works, is that it will still have to compute all the skipped rows inside the server; therefore a large OFFSET might be inefficient.

You see?! It's bad right, just like if you want to start reading a book from a certain page but you still have to go over every page and count it yourself without the use of an index.

Don't worry, we got your back with a better solution 😎

Cursor pagination

This approach tend to set a pointer to a specific record in your datatable. For it to work at its finest, the cursor must be unique and sequential. What this does, is that you will always fetch data after a certain row instead of relying in the positioning of rows inside your table.

Let's have an example to understand this more:

query {
  posts(limit: 10, publisherId: 1, cursor: 1) {
    title,
    content
  }
}

The query will skip all the 24th rows and returns the next 10 results in your dataset after the value of your cursor. The cursor here refers to the id of our posts.

Looking at your resolver, it will look like this:

P.S: We are using Sequelize as our ORM framework.

{
  async posts(parent, { limit, publisherId, cursor }) {
    return await Posts.findAll({
      where: {
        publisherId,
        id: {
          [Op.gt]: cursor
        }
      },
      limit
    })
  }
}

Cursor based pagination is more performant than offsets because we can leverage database indexes on the columns in the where clause that are being used as cursors.

There is more to this solution, and many libraries and frameworks out there are presenting you with their own cursor style pagination; But we can't go over each of if, we will stick to the one that is most used with React.

Relay style pagination

Relay is a framework for building data driven React apps. Relay provides a standard mechanism to slice data and paginate the results.

query {
  posts(first: 10, after: "cursor") {
    edges {
      cursor
      node {
        id
        title
        content
      }
    }
    pageInfo {
      hasNextPage
    }
  }
}

Here, slicing is done using the first argument. This is similar to the limit which we had applied in the earlier queries. Pagination is done using the after argument, which will pass a cursor expecting results to be after that cursor. It also asks for hasNextPage which tells the client whether there are more results, since there is no concept of total number of pages in Cursor pagination.

Recap

We learned two different pagination styles, and walked trough both of it. To recap our tutorial, let's list again what each approach offers.

Offset

Pros

It gives you the ability to see the total number of pages and their progress through that total.
It gives you the ability to jump to a specific page within the set.
It’s easy to implement as long as there is an explicit ordering of the results from a query.

Cons

Using LIMIT OFFSET doesn’t scale well for large datasets. As the offset increases the farther you go within the dataset, the database still has to read up to offset + count rows from disk, before discarding the offset and only returning count rows.
If items are being written to the dataset at a high frequency, the page window becomes unreliable, potentially skipping or returning duplicate results.

Cursor

Pros

This will scale well for large datasets. We’re using a WHERE clause to fetch rows with index values less than the last index from the previous page. This lets us leverage the index on the column and the database doesn’t have to read any rows that we’ve already seen. We’re also not returning the total number of pages or items in the set, so we avoid having to calculate the full result set on each request.
The pagination process is stabilized. Instead of calculating from scratch on each request based on the total number of items, we’re always fetching the next count rows after a specific reference point. If items are being written to the dataset at a high frequency, the overall position of the cursor in the set might change, but the pagination window adjusts accordingly.

Cons

The cursor must be based on a unique, sequential column (or columns) in the source table.
There is no concept of the total number of pages or results in the set.
The client can’t jump to a specific page.

P.S.: This image shows the differences between both approaches.

Dictionary

(1): OFFSET says to skip that many rows before beginning to return rows.
(2): LIMIT is an optional clause of the SELECT statement that returns a subset of rows returned by the query.

With that said, I hope you enjoyed reading the article and understood the differences between both approaches 😄

Brought to you with ❤️

=====================

Github repo: https://github.com/JefferyHus/graphql-pagination

Top comments (11)

Ari Seyhun • Feb 20 '21 • Edited

Thanks for this article El Housseine Jaafari!

I am implementing this for myself and using the ID as the cursor works just fine when sorting by time, but in my case I also want to sort by "rating" (a column in my database).

Would it be appropriate to make the cursor be in the shape of {rating}:{created_at} encoded as base64?
For example: 30:1613828846856 for a rating of 30 with the timestamp as 1613828846856 (which would be MzA6MTYxMzgyODg0Njg1Ng== encoded as base64).

I'm not sure if this is hacky but it's the only kind of solution I can come up with for an SQL which ends with ORDER BY rating, created_at

El Housseine Jaafari • Aug 27 '21

Your case is not easy in case of using a relational database. But there is the solution to create a view from your query and add an attribute that will be a combination of the rating & the ID and then use the view to fetch data from it and this way the pagination will work like a charm for you.

Chihab Otmani • Feb 22 '20

Great article El Housseine!

How would the resolver's implementation be if the ids are not in sequence (ordered)?

El Housseine Jaafari • Feb 22 '20

Thanks for your comments and feedback :)

Your question is a good one, in the article I mention in the Cons part that using the cursor pagination style can not be achieved if you don't have a unique & sequential key.

Most of cases we have createdAt which can be used as a cursor key in case your primary key is of UUIDv4 or similar.

Chihab Otmani • Feb 22 '20 • Edited

I see, makes sense!

The issue is that createdAt might not be unique so we could have items on a page re-displayed on the next page.

We could probably use something like a db row_index function (the index of the items in a query result) or (if not supported by the db engine) still use the id and get the X next values after the row that has that ID (probably not a simple query though).

El Housseine Jaafari • Feb 22 '20

That would complicate things lot. You can go for a timestamp value in your createdAt for example, I don't believe there will be more than one row inserted in the same microsecond, isn't it right? 🤔

Chihab Otmani • Feb 22 '20 • Edited

Regarding timestamps, that would be a good solution indeed:

if the application does not have a lot of traffic that might introduce timestamp collisions (which is the case of my crappy apps ;))
if you already have a timestamp in your database table/collection or have the ability to add one (think about read-only legacy databases)
to avoid collisions on timestamps and be able to query next elements without duplicates, timestamps would need to be created in sequence with a random suffix based on time, not sure that this is always provided OOB by db engines.

Actually, if supported, row index is pretty straightforward, you simply add it to your select and you get an index of each row in the result, that would be the cursor value I guess.
Example here for postgresql: postgresqltutorial.com/postgresql-...
Do you see any issue on using ROW_NUMBER()?

PS: Just rambling on here, I'll probably use timestamps on my apps, as you've suggested ;)

El Housseine Jaafari • Feb 24 '20

The only issue I see with using ROW_NUMBER is that you will have to always select all results from a table, assign a row number then only pull the results you want. This is exactly how limit & offset works. So it isn't going to be of any help if your database is growing fast.

As I said if you include microseconds in your timestamp value, there mostly will be no collision.

Chihab Otmani • Feb 24 '20 • Edited

You're right! You'd just have to have these micro-timestamps in your database and have end-users tolerant to potential (even rare) collisions otherwise I guess you'd have to deal with less performant fetching strategies.

Mouad ZIANI • Feb 23 '20

Thanks for sharing jeffery

El Housseine Jaafari • Feb 24 '20

sure thing mate :)

View full discussion (11 comments)

DEV Community

GraphQL Pagination

Offset Pagination

Performance

Cursor pagination

Relay style pagination

Recap

Offset

Cursor

Dictionary

Top comments (11)

3 Myths, 3 Facts, and 3 Strategies to Scale Node.js Apps

GraphQL: A Beginner's Guide

Mastering Paginated API Calls in React for Optimized Data Fetching

Building Composable Platforms with Harmony

Offset Pagination

Performance

Cursor pagination

Relay style pagination

Recap

Offset

Cursor

Dictionary

Read next

3 Myths, 3 Facts, and 3 Strategies to Scale Node.js Apps

GraphQL: A Beginner's Guide

Mastering Paginated API Calls in React for Optimized Data Fetching

Building Composable Platforms with Harmony