Category Archives: PostgreSQL

The hardest task to do well in every frontend/backend system – paging

What is the hardest code to write efficiently in pretty much every frontend and backend system? I think it is paging.

Nearly every system I’ve worked on as full-stack developer has had some sort of reporting whether it is searching and filtering or news-feed style item display, all of these depend on paging. Whilst simple paging is trivial to implement, making it work quickly at scale is a very tricky challenge.

A simple approach (based on SQL, but could be with any backend data store) would be something like:

It looks simple and it works in small-scale development systems. But there a massive number of issues and inefficiencies with this approach:

  • Whilst you need to redo the items query for every page, the count query only needs to be done once for each <query>. Counting is a very expensive operation as it essentially has to fetch all the rows, unless there are some very well optimized keys on your table. This especially the case if you are running over a table larger than a few megabytes on a mobile device, so you need to cache this as much as possible.
  • Often times as the <order> logic has gotten more and more complex I’ve found myself in a situation where some rows are equal in ordering and so they can move around as you switch between pages or redo the query. Always ensure that the last item of ordering is something which is well ordered to avoid this, for example an integer primary key
  • What happens with a LIMIT/OFFSET type query if you get a row inserted before the OFFSET? All the rows shift forwards by one so when you scroll down on your infinite feed you get a duplicate, or the user goes to the next page and sees a repeat of the last entry of the previous page. What happens when a row prior to the OFFSET is deleted? A row between the pages is not seen by the user. In some databases it may be possible to use a long-lived cursor but often in stateless backends for webapps this is not possible. Wherever possible you should try to get a column on the <order> which is simply increasing or decreasing and redo the previous query with use a > against that to ensure you fetch all the rows from the previous point the user saw regardless of what changed above it. This is easy when sorting by primary key, difficult or even impossible otherwise.
  • How do you handle a laggy backend when the user presses the next button multiple times? Does it re-fire the request for the next page multiple times to the backend (consuming unnecessary resources), ignore any presses after the first (better), or does it proceed forward multiple pages, potentially running past the last page and onto blank pages? I’d usually opt for the second approach but limiting so that it always stops on the last page, but it may vary depending on usage case (would users typically want to skip multiple pages quickly, or always want to page through one-at-a-time), backend latency (ie is the data stored locally or remotely, what is the connectivity like)
  • Following on from the above, if you are firing multiple requests for pages, or for infinite scrolling how do you cope with the fact that responses can come back out-of-order? Many times I see it would simply display the content from the last page returned, but this is open to a number of bugs if you are talking about remote services over wireless
  • How do you efficiently abort in-fly calls and database queries if the user changes page – on a large system this could be a significant performance gain

There are quite a large number of optimizations that can be done with paging code, whether this is with a remote backend or locally, for example:

  • Prioritize getting the first page of data to display, say “Showing results 1-50 of …” while the count is loading
  • Track historical data on how long it takes to generate counts or paging, and especially if running single-threaded (eg on a mobile device), delay the count until the main queries have completed and there is some idle time
  • Is it possible to parallelize the two querys against the database so that the query takes half the real-time for the user? Can you use non-blocking IO to open two database connections, one for each query and return a response when both have completed? Can you run 2 separate queries one for count and one for items and update your pager and results separately? For a few rows in a test system this won’t make any difference, but over millions of rows, especially if you need an accurate count, this can save seconds of time
  • Do you actually need a totally accurate count, or can you approximate it, does your database backend support something like this for queries? This can save significant time and resources
  • Have you set up good indexes and verified that they are correctly being used at the scale that you will have in your live environment? Do you optimize well for the common use case?
  • You don’t always need to bother running a count query – if the items returned are less than the items requested you know that you have finished paging and you can automatically count the total from that. This is true whether on the first on the 100th page.
  • Fetch the number of rows that you want to display plus one, so that you know if there is a next page or not. If the number of items is less than you requested you know there won’t be a next page no matter what the total actually is
  • If running on mobile devices or html, limit the number of items that are displayed on an infinite scroller as this can be very memory intensive. Perhaps allow infinite scrolling over 500-1000 rows and then put a next button at the bottom of the page. Or you can do some trickery to replace the top items with a large blank box but this can be very complex and difficult to do well
  • Don’t display a spinner when infinite scrolling (you are starting to load more items before the user hits the bottom of the list right?) Only display the spinner after a short time period or when the user hits the very bottom of the list to give visual feedback that there are more items loading to display

If this looks like a lot to think about, I don’t disagree. Simple, buggy and inefficient paging is easy to do, but fully-optimized, highly responsive complex paging is very tricky to do well. But if you want to scale your app you need to focus on this.

Running lots of postgres commands in parallel

Postgres is great, however one limitation is that you can only run one command at a time in the shell. Sometimes however when you are doing administrative functions over multiple tables, for example (re)creating indexes or vacuuming and you have a nice powerful box, you can run many of these commands in parallel for easy speedup. Here’s an easy way to run lots of commands in parallel.

Firstly, create a text file with one command per line. For example

Then, ensure that you have your .pgpass file set up correctly so that you can just run psql [database] [user] without being prompted for a password.

Finally, run the following command:

-P 20 specifies the number of jobs to run in parallel so change this to what your server can cope with.

Finding the amount of space wasted in postgres tables

Because of way that postgres handles transaction isolation (ie using MVCC), when you modify or delete a row in a table it marks it as deleted, and then frees the space at a later point in time using (auto)vacuum. However, unless you use the heavy-weight VACUUM FULL command (which exclusive locks the table and totally rewrites it, causing anything trying to access it to block until the command is finished) the space is never reclaimed by the operating system. Normally this is not a problem – if you have a heavily used table with 20mb of data in it it probably has 5-10mb of overhead with the dead rows, reclaimed free space etc which is acceptable. However there are a few situations where it is useful to know what exactly the overhead is:

  1. Sometimes if your table changes very quickly, is large, and your disks or autovacuum parameters are unable to keep up, it can end up growing massive. For example we had a table that contains 3Gb of data but was taking up 45Gb due to the fact that autovacuum couldn’t keep up with the frequency of changes in the table
  2. If you are using table partitioning to store historic data then to make the most use of space you want to see whether a VACUUM FULL would be advantageous to run or not. For example if you have a table that is recording data collected from each day, some days it may be mostly just inserts so doesn’t need vacuuming; other days it may have a number of changes made and so have quite a lot of free space that can be reclaimed. Additionally, VACUUM FULL optimizes the order of data in the table and the indexes making it more performant.

In the first case, looking at the output of a command like

(sorry I can’t remember where I found this) should show you that there are a very large number of dead tuples waiting to be reclaimed (ie turned in to free space) in the table.

However, if your disks were struggling at one point, but then you tweaked autovacuum so it reclaimed the dead tuples correctly (as in case 1 above), your table could now be 90% free space but there is no easy way to find this out within postgres.

Fortunately, there is an excellent extension called pgstattuple which allows you to find out the amount of free space within a table file that has been reclaimed but not released to the operating system. The following query lists all tables which are over 100Mb in size, and have more than 10Mb of free space and have more than 20% free space (you can tweak these numbers – I just did it for our platform where our typical table size is 1Gb+):

This only uses an approximate count, however even so it can be a bit slow (it just took 10 minutes here) on a system with many tables and heavy IO. You can use this to find the tables that would most benefit from a VACUUM FULL command being run.

row_to_json() equivalent for jsonb types

With the advent of the awesome jsonb data type in Postgres 9.4, pretty much all of the json-related functions have a json_* and jsonb_* varient. All those, except row_to_json. I spent a bit of time but couldn’t find an answer online, in the end after a bit of experimentation, I discovered that in Postgres 9.5 at least the to_jsonb function does exactly the same, like:

How we improved database performance 1000% with just one command

In my current contract we have a nice database server with 4 datacentre quality SSD’s arranged in raid5, a battery-backed (Dell PERC / megaraid) controller and very poor disk performance. As I was doing a dump of the database (with parallel process, obviously), iostat was showing 100% disk utilization at 200-300 iops and 40-60mb/sec. Really not what it should be. After poking the raid card a lot with the MegaCli tool, I discovered that by changing the Linux IO scheduler to noop or deadline we suddenly got great performance – 50% disk utilization at 1300 iops and 200mb/sec read. So, an issue with the CFQ scheduler then as everywhere seems to suggest? In fact not – it turns out the root of the problem is that because of the raid controller the kernel did not detect that the underlying disks were SSD rather than HDD. Just by tweaking this setting CFQ provides the same performance as noop or deadline schedulers (in fact better for our postgres workload). So, here’s the command we used to get a 1000% improvement in database performance: