Because of way that postgres handles transaction isolation (ie using MVCC), when you modify or delete a row in a table it marks it as deleted, and then frees the space at a later point in time using (auto)vacuum. However, unless you use the heavy-weight
VACUUM FULL command (which exclusive locks the table and totally rewrites it, causing anything trying to access it to block until the command is finished) the space is never reclaimed by the operating system. Normally this is not a problem – if you have a heavily used table with 20mb of data in it it probably has 5-10mb of overhead with the dead rows, reclaimed free space etc which is acceptable. However there are a few situations where it is useful to know what exactly the overhead is:
- Sometimes if your table changes very quickly, is large, and your disks or autovacuum parameters are unable to keep up, it can end up growing massive. For example we had a table that contains 3Gb of data but was taking up 45Gb due to the fact that autovacuum couldn’t keep up with the frequency of changes in the table
If you are using table partitioning to store historic data then to make the most use of space you want to see whether a
VACUUM FULLwould be advantageous to run or not. For example if you have a table that is recording data collected from each day, some days it may be mostly just inserts so doesn’t need vacuuming; other days it may have a number of changes made and so have quite a lot of free space that can be reclaimed. Additionally,
VACUUM FULLoptimizes the order of data in the table and the indexes making it more performant.
In the first case, looking at the output of a command like
SELECT psut.relname, to_char(psut.last_vacuum, 'YYYY-MM-DD HH24:MI') as last_vacuum, to_char(psut.last_autovacuum, 'YYYY-MM-DD HH24:MI') as last_autovacuum, pg_class.reltuples::bigint AS n_tup, psut.n_dead_tup::bigint AS dead_tup, CASE WHEN pg_class.reltuples > 0 THEN (psut.n_dead_tup / pg_class.reltuples * 100)::int ELSE 0 END AS perc_dead, CAST(current_setting('autovacuum_vacuum_threshold') AS bigint) + (CAST(current_setting('autovacuum_vacuum_scale_factor') AS numeric) * pg_class.reltuples) AS av_threshold, CASE WHEN CAST(current_setting('autovacuum_vacuum_threshold') AS bigint) + (CAST(current_setting('autovacuum_vacuum_scale_factor') AS numeric) * pg_class.reltuples) < psut.n_dead_tup THEN '*' ELSE '' END AS expect_av FROM pg_stat_user_tables psut JOIN pg_class on psut.relid = pg_class.oid ORDER BY 5 desc, 4 desc;
(sorry I can’t remember where I found this) should show you that there are a very large number of dead tuples waiting to be reclaimed (ie turned in to free space) in the table.
However, if your disks were struggling at one point, but then you tweaked autovacuum so it reclaimed the dead tuples correctly (as in case 1 above), your table could now be 90% free space but there is no easy way to find this out within postgres.
Fortunately, there is an excellent extension called pgstattuple which allows you to find out the amount of free space within a table file that has been reclaimed but not released to the operating system. The following query lists all tables which are over 100Mb in size, and have more than 10Mb of free space and have more than 20% free space (you can tweak these numbers – I just did it for our platform where our typical table size is 1Gb+):
select table_schema, table_name, free_percent, pg_size_pretty( free_space ) AS space_free, pg_size_pretty( pg_relation_size( quoted_name ) ) AS total_size from ( select table_schema, table_name, quoted_name, space_stats.approx_free_percent AS free_percent, space_stats.approx_free_space AS free_space from ( select *, quote_ident( table_schema ) || '.' || quote_ident( table_name ) AS quoted_name from information_schema.tables where table_type = 'BASE TABLE' and table_schema not in ('information_schema', 'pg_catalog') and pg_relation_size( quote_ident( table_schema ) || '.' || quote_ident( table_name ) ) > 100000000 ) t, pgstattuple_approx( quoted_name ) AS space_stats ) t where free_percent > 20 AND free_space > 10000000 ORDER BY free_space DESC;
This only uses an approximate count, however even so it can be a bit slow (it just took 10 minutes here) on a system with many tables and heavy IO. You can use this to find the tables that would most benefit from a
VACUUM FULL command being run.