Hey there! Glad you are here and welcome to GraphQL Pagination
tutorial. The goal of this tutorial is to guide you through setting up your pagination using graphql
.
In graphql
there are two major approaches, the first one is offset style
and the second one goes by cursor style
.
As a Ninja once said "行きましょ" 🙈.
Offset Pagination
Now that you have your blog up and running, you notice that you are querying all your posts. This is not efficient, especially if you go over a 100 blog posts. Your SQL query will grow by time and spend then more time to execute.
To solve the problem, by default as someone who knows SQL well, you will probably think of pagination. The way you do it is by adding two arguments to your SQL query offset
1 and limit
2. You are asking right now about how to achieve this in your graphql
server.
Allow me to show you an example of fetching 10 blog posts from all your posts starting from the 11th one. Your query will look like this:
query {
posts(limit: 10, offset: 10) {
title,
content
}
}
Until now everything seems to be working, this type of pagination works great if you have either a static or small data. It results into a good user experience for quickly fetching the next page data.
But, this approach have its downsides and issues that needs to be addressed.
Performance
As mentioned before, in small datasets, the offset approach is good for you and will not break the user experience.
But as the data grows in your dataset and you are trying to fetch the farthest of the results 😩 it will take longer than you can imagine (You may 😢).
But how? (you ask). Let's break it down easily, take this query as an example:
query {
posts(limit: 10, offset: 10000) {
title,
content
}
}
The query is fetching 10 posts with an offset of 10K, in a first glance you think that's it, your database engine will directly understand the offset and start querying from that row.
Well! That won't happen because the way a relational
database engine works, is that it will still have to compute all the skipped rows inside the server; therefore a large OFFSET might be inefficient.
You see?! It's bad right, just like if you want to start reading a book from a certain page but you still have to go over every page and count it yourself without the use of an index.
Don't worry, we got your back with a better solution 😎
Cursor pagination
This approach tend to set a pointer to a specific record in your datatable. For it to work at its finest, the cursor must be unique and sequential. What this does, is that you will always fetch data after a certain row instead of relying in the positioning of rows inside your table.
Let's have an example to understand this more:
query {
posts(limit: 10, publisherId: 1, cursor: 1) {
title,
content
}
}
The query will skip all the 24th rows and returns the next 10 results in your dataset after the value of your cursor
. The cursor
here refers to the id
of our posts.
Looking at your resolver, it will look like this:
P.S: We are using Sequelize as our ORM framework.
{
async posts(parent, { limit, publisherId, cursor }) {
return await Posts.findAll({
where: {
publisherId,
id: {
[Op.gt]: cursor
}
},
limit
})
}
}
Cursor based pagination is more performant than offsets because we can leverage database indexes on the columns in the where clause that are being used as cursors.
There is more to this solution, and many libraries and frameworks out there are presenting you with their own cursor style pagination; But we can't go over each of if, we will stick to the one that is most used with React
.
Relay style pagination
Relay is a framework for building data driven React apps. Relay provides a standard mechanism to slice data and paginate the results.
query {
posts(first: 10, after: "cursor") {
edges {
cursor
node {
id
title
content
}
}
pageInfo {
hasNextPage
}
}
}
Here, slicing is done using the first argument. This is similar to the limit which we had applied in the earlier queries. Pagination is done using the after argument, which will pass a cursor expecting results to be after that cursor. It also asks for hasNextPage which tells the client whether there are more results, since there is no concept of total number of pages in Cursor pagination.
Recap
We learned two different pagination styles, and walked trough both of it. To recap our tutorial, let's list again what each approach offers.
Offset
Pros
- It gives you the ability to see the total number of pages and their progress through that total.
- It gives you the ability to jump to a specific page within the set.
- It’s easy to implement as long as there is an explicit ordering of the results from a query.
Cons
- Using LIMIT OFFSET doesn’t scale well for large datasets. As the offset increases the farther you go within the dataset, the database still has to read up to offset + count rows from disk, before discarding the offset and only returning count rows.
- If items are being written to the dataset at a high frequency, the page window becomes unreliable, potentially skipping or returning duplicate results.
Cursor
Pros
- This will scale well for large datasets. We’re using a WHERE clause to fetch rows with
index
values less than the lastindex
from the previous page. This lets us leverage the index on the column and the database doesn’t have to read any rows that we’ve already seen. We’re also not returning the total number of pages or items in the set, so we avoid having to calculate the full result set on each request. - The pagination process is stabilized. Instead of calculating from scratch on each request based on the total number of items, we’re always fetching the next count rows after a specific reference point. If items are being written to the dataset at a high frequency, the overall position of the cursor in the set might change, but the pagination window adjusts accordingly.
Cons
- The cursor must be based on a unique, sequential column (or columns) in the source table.
- There is no concept of the total number of pages or results in the set.
- The client can’t jump to a specific page.
P.S.: This image shows the differences between both approaches.
Dictionary
- (1): OFFSET says to skip that many rows before beginning to return rows.
- (2): LIMIT is an optional clause of the SELECT statement that returns a subset of rows returned by the query.
With that said, I hope you enjoyed reading the article and understood the differences between both approaches 😄
Brought to you with ❤️
=====================
Github repo: https://github.com/JefferyHus/graphql-pagination
Top comments (11)
Thanks for this article El Housseine Jaafari!
I am implementing this for myself and using the ID as the cursor works just fine when sorting by time, but in my case I also want to sort by "rating" (a column in my database).
Would it be appropriate to make the cursor be in the shape of
{rating}:{created_at}
encoded as base64?For example:
30:1613828846856
for a rating of 30 with the timestamp as 1613828846856 (which would beMzA6MTYxMzgyODg0Njg1Ng==
encoded as base64).I'm not sure if this is hacky but it's the only kind of solution I can come up with for an SQL which ends with
ORDER BY rating, created_at
Your case is not easy in case of using a relational database. But there is the solution to create a view from your query and add an attribute that will be a combination of the rating & the ID and then use the view to fetch data from it and this way the pagination will work like a charm for you.
Great article El Housseine!
How would the resolver's implementation be if the ids are not in sequence (ordered)?
Thanks for your comments and feedback :)
Your question is a good one, in the article I mention in the Cons part that using the cursor pagination style can not be achieved if you don't have a unique & sequential key.
Most of cases we have
createdAt
which can be used as a cursor key in case your primary key is of UUIDv4 or similar.I see, makes sense!
The issue is that createdAt might not be unique so we could have items on a page re-displayed on the next page.
We could probably use something like a db row_index function (the index of the items in a query result) or (if not supported by the db engine) still use the id and get the X next values after the row that has that ID (probably not a simple query though).
That would complicate things lot. You can go for a timestamp value in your createdAt for example, I don't believe there will be more than one row inserted in the same microsecond, isn't it right? 🤔
Regarding timestamps, that would be a good solution indeed:
Actually, if supported, row index is pretty straightforward, you simply add it to your select and you get an index of each row in the result, that would be the cursor value I guess.
Example here for postgresql: postgresqltutorial.com/postgresql-...
Do you see any issue on using ROW_NUMBER()?
PS: Just rambling on here, I'll probably use timestamps on my apps, as you've suggested ;)
The only issue I see with using
ROW_NUMBER
is that you will have to always select all results from a table, assign a row number then only pull the results you want. This is exactly howlimit
&offset
works. So it isn't going to be of any help if your database is growing fast.As I said if you include microseconds in your timestamp value, there mostly will be no collision.
You're right! You'd just have to have these micro-timestamps in your database and have end-users tolerant to potential (even rare) collisions otherwise I guess you'd have to deal with less performant fetching strategies.
Thanks for sharing jeffery
sure thing mate :)