In postgres (14), we have a table
pubsub_node_option which looks like:
Partitioned table "public.pubsub_node_option" Column │ Type │ Collation │ Nullable │ Default ────────┼────────┼───────────┼──────────┼───────── nodeid │ bigint │ │ │ name │ text │ │ not null │ val │ text │ │ not null │ Partition key: HASH (nodeid) Indexes: "partitioner" btree (nodeid) Number of partitions: 64 (Use \d+ to list them.)
The 64 partitions are named like
_p63. It has a few hundred Gb of data and we want to add a primary key to it with no downtime or locks (perhaps 1s is acceptable). The standard
ADD PRIMARY KEY etc commands will lock the table for the duration of the process which means we can’t use them.
In Postgres a primary key is just a unique index over a set of non-null columns. The documentation says that whilst the
SET NOT NULL command requires a full table scan (with exclusive lock),
ADD PRIMARY KEY has another option which takes an already-existing index, and if there is also
NOT NULL or a similar
CHECK constraint on all of the columns it can do it without needing any locks.
So the basic process looks like:
- Add a CHECK constraint on
- Create a unique index over
ALTER TABLE ... ADD PRIMARY KEY USING INDEX ...to set the PK.
Lets start with (1). If we just add a
CHECK constraint directly, postgres will take out an exclusive lock on the table which will freeze all updates for the duration. However there is a
NOT VALID setting which allows us to later run a validator in the background. So we can do:
ALTER TABLE pubsub_node_option ADD CONSTRAINT pubsub_node_option_nodeid_not_null CHECK (nodeid is not null) NOT VALID;
Then we can try to run the validator which shouldn’t take any locks:
ALTER TABLE pubsub_node_option VALIDATE CONSTRAINT pubsub_node_option_nodeid_not_null;
Unfortunately, at this point the database locks up. Presumably because this is a partitioned table.
So, what I figured out (I’m not sure it’s actually documented anywhere) is that you can run the validator on all partitions, and then it will run instantly without locking on the parent:
ALTER TABLE only pubsub_node_option_p0 VALIDATE CONSTRAINT pubsub_node_option_nodeid_not_null; ... ALTER TABLE only pubsub_node_option_p63 VALIDATE CONSTRAINT pubsub_node_option_nodeid_not_null; ALTER TABLE pubsub_node_option VALIDATE CONSTRAINT pubsub_node_option_nodeid_not_null;
So (1) is solved – we’ve shown postgres that we don’t have any
NULLs in that column.
Time for (2) the
UNIQUE KEY. An index can be created
CONCURRENTLY however this doesn’t work on partitioned tables. So we have to create it on each partition and then hook it in to the main table.
Firstly, we create the index on the main table:
create unique index pubsub_node_option_pkey on only pubsub_node_option (nodeid, name);
ON ONLY means it doesn’t happen on the child tables.
Then we create a new index in each of the children, but
CONCURRENTLY so no locking:
create unique index concurrently pubsub_node_option_p0_pkey on pubsub_node_option_p0 (nodeid, name); ... create unique index concurrently pubsub_node_option_p63_pkey on pubsub_node_option_p63 (nodeid, name);
Now this is done we can attach each of them in to the main primary key:
alter index pubsub_node_option_pkey attach partition pubsub_node_option_p0_pkey; ... alter index pubsub_node_option_pkey attach partition pubsub_node_option_p63_pkey;
Then, we should be able to do the final step of the process to convert the index + check constraint to a primary key (which is really only a bit of semantics for Postgres):
> alter table pubsub_node_option add primary key using index pubsub_node_option_pkey; ERROR: ALTER TABLE / ADD CONSTRAINT USING INDEX is not supported on partitioned tables
D’oh. But at least we can add to all of the individual tables:
alter table pubsub_node_option_p0 add primary key using index pubsub_node_option_p0_pkey; ... alter table pubsub_node_option_p63 add primary key using index pubsub_node_option_p63_pkey; ...
I still can’t see a way to add it to the main table though, even
ALTER TABLE ONLY pubsub_node_option comes up with the same error.
I’m not really sure where to go from here, however the reason for wanting a
PRIMARY KEY is because we want to use
pglogical to replicate data, and this requires “a PRIMARY KEY or other valid replica identity such as using an index, which must be unique, not partial, not deferrable, and include only columns marked NOT NULL”. So I think we should be OK with how far we have gotten for now.
Please leave a comment if you know how to complete the process!