Why query pagination by OFFSET might be a bad idea

SELECT * FROM user LIMIT 10 OFFSET 200 ORDER BY user_id;

This SQL query basically tells the database to “Skip the first 200 rows of the user table and get me the 10 rows that comes after that“. If you want your database to get large without crashing your application, then you can’t avoid query pagination. That is, to load only a few records from the database to the application memory as needed and not the whole database itself. If not done, you will regret your choices as the database grows.

OFFSET and LIMIT are very popular among the application developers for implementing query pagination. OFFSET tells the database to skip some rows before returning the result and LIMIT tells how many rows has to be returned after the skipping. We kind of take it for granted.

OFFSET is used extensively for one main reason; It is very convenient.

  1. Any database of your choice supports it (even though the query syntax might be different)
  2. You can just specify the number of rows to skip and the database will do the work for you.
  3. You can easily move back and forth between the pages of the result set if you know the page size (the LIMIT). That is, the formula to find the OFFSET value to be used in a query, to fetch a particular page is
    • OFFSET = PAGENUMBER * LIMIT

But do you know that the same OFFSET clause that improves the application performance is causing problems in the query performance? It is true.

The problem

OFFSET is nothing but a “Skip the rows” functionality. That is, you tell the database to skip some rows before returning the result. See this example.

SELECT * FROM user LIMIT 10 OFFSET 1000 ORDER BY user_id;

In plain English, it tells the database to “Skip the first 1000 rows of the user table and get the 10 rows that comes after that“.

And how will the database skip the 1000 rows? It will first go through the 1000 rows while keeping a counter. When the counter hit 1001, it will start fetching the next 10 rows to get your result.

Now, that is a problem. It doesn’t matter how many rows you asked the database to skip. It still has to go through all of those rows to count it. If the OFFSET value is 1000000, the database will go through the 1000000 rows counting it, before it starts fetching the data for you.

This gets worse as the value of OFFSET gets higher and higher.

Look at the sample data given below

OFFSET valueQuery duration (ms)
01
501
100010
10000160
25000450
500001000
1000001800

Another problem with OFFSET is that the pages would drift if some insertion/delete happens while you are moving though the pages. And you would end up either processing the same record multiple times or not processing some records at all.

The solution

The solution to this problem is the “Seek” method with an index. Simply put, sort the result set and specify the exact row until which the database should skip rather than the number of rows to skip. This is something that you do with your query genius πŸ˜‰ rather than some option like OFFSET that the database provides you. Lets take an example.

Assume, in the queries we saw earlier, that the user_id column is indexed and auto incremented starting its value at 1. Now, you can avoid OFFSET clause by constructing your queries as follows.

Query for first page would look like

SELECT * FROM user WHERE user_id > 0 LIMIT 10 ORDER BY user_id;

Notice that we removed the OFFSET clause. This query will get you the first 10 user records sorted by the user_id. And the user ids of the records would be 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10

The user_id of the last user in the first page is 10. So the query for the second page would look like

SELECT * FROM user WHERE user_id > 10 LIMIT 10 ORDER BY user_id;

Notice that we are skipping all the records with user_id less than or equal to 10. This will get you the first 10 user records with user_id greater than 10. Which is practically your second page of the result. The user ids in this result set would be 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20

Similarly, the query for the third page would look like

SELECT * FROM user WHERE user_id > 20 LIMIT 10 ORDER BY user_id;

This will get you the first 10 user records with user_id greater than 20. Which is practically your third page of the result. The user ids in this result set would be 21, 22, 23, 24, 25, 26, 27, 28, 29 and 30

and so on..

What if the user with user_id 30 got deleted and the user_id of the last user in the third page is now 31 instead of 30? Then you write the query for your fourth page as

SELECT * FROM user WHERE user_id > 31 LIMIT 10 ORDER BY user_id;

This kind of seeking solves the problems with OFFSET in the following ways

  1. The database doesn’t have to go through the records to be skipped as there is no counting involved. It just jumps to the first record that we need in the response.
  2. Since the field we use for the pagination (user_id in this case) is indexed, the seeking will be faster.
  3. Since we are not depending on the number for rows for pagination, the pages would not get drifted even if insertions/delete were happening while we were moving through the pages.

See the comparison below

That’s nice.. But what if I want to move back and forth between the pages?

One of the main advantages of the OFFSET query is the easiness to navigate back and forth between the pages; if you know the page number and the page size (LIMIT). Seek method needs the id of the last item of the previous page to go to the next page. This would not be handy when you want to jump from page 1 to page 10.

One way to solve this problem would be to use the OFFSET and LIMIT to find the id of the first item of the page and then use this id in the seek method. For example, if you want to fetch page 10 where limit is 10, then you can do it as follows

Step 1: Get the user id of the first user in page 10 using the following query using OFFSET and LIMIT on the indexed user_id column

SELECT user_id FROM user LIMIT 1 OFFSET 90 ORDER BY user_id;

Step 2: Use the resulting user id to get the page number 10 of the users query

SELECT * FROM user WHERE user_id >= ? LIMIT 10 ORDER BY user_id;

The first query will be faster than using OFFSET in the actual data fetch since it will be run entirely within an index (also known as covering index).

So overall, the combination of these two queries will be faster than using the OFFSET clause in the data fetch query when the page number is large. However it brings the problem of page drifting back since we are now relying on the row counts for pagination.

Some things to watch out for while seeking

You have to make sure of the following things for the seeking method to work properly

  1. The seeking column (user_id in this case) should be indexed
  2. Either use a seeking column that does not have duplicate values or adjust your query to tolerate duplicate values. (You may even use OFFSET here to skip through the duplicate values in the seeking column that might have been already returned in the previous pages. Here, the query would not be as slow since you are skipping through only the duplicate values of the seeking column and not the entire column).

And finally, choose the method that makes the most sense

Even though OFFSET is slower, sometimes it is better to use it over seeking. Largely because it is more convenient and easy to implement. You wouldn’t even feel the performance difference for smaller OFFSET. So if your result set is small, it would make more sense to go with OFFSET. Seek method is better when the result set is too large and you need to move page by page.

Know your database, analyse the situation and choose wisely.

Leave a comment