Category Archives: Random

Most posts will be in this category I guess…

Adding a primary key to a partitioned table in postgres with zero downtime or locks

Monday, November 6, 2023 Mark Leave a comment

In postgres (14), we have a table pubsub_node_option which looks like:

  Partitioned table "public.pubsub_node_option"
 Column │  Type  │ Collation │ Nullable │ Default 
────────┼────────┼───────────┼──────────┼─────────
 nodeid │ bigint │           │          │ 
 name   │ text   │           │ not null │ 
 val    │ text   │           │ not null │ 
Partition key: HASH (nodeid)
Indexes:
    "partitioner" btree (nodeid)
Number of partitions: 64 (Use \d+ to list them.)

The 64 partitions are named like pubsub_node_option_p0 to _p63. It has a few hundred Gb of data and we want to add a primary key to it with no downtime or locks (perhaps 1s is acceptable). The standard ADD PRIMARY KEY etc commands will lock the table for the duration of the process which means we can’t use them.

In Postgres a primary key is just a unique index over a set of non-null columns. The documentation says that whilst the SET NOT NULL command requires a full table scan (with exclusive lock), ADD PRIMARY KEY has another option which takes an already-existing index, and if there is also NOT NULL or a similar CHECK constraint on all of the columns it can do it without needing any locks.

So the basic process looks like:

Add a CHECK constraint on nodeid
Create a unique index over (nodeid, name)
use ALTER TABLE ... ADD PRIMARY KEY USING INDEX ... to set the PK.

Lets start with (1). If we just add a CHECK constraint directly, postgres will take out an exclusive lock on the table which will freeze all updates for the duration. However there is a NOT VALID setting which allows us to later run a validator in the background. So we can do:

ALTER TABLE pubsub_node_option
  ADD CONSTRAINT pubsub_node_option_nodeid_not_null
  CHECK (nodeid is not null)
  NOT VALID;

Then we can try to run the validator which shouldn’t take any locks:

ALTER TABLE pubsub_node_option
  VALIDATE CONSTRAINT pubsub_node_option_nodeid_not_null;

Unfortunately, at this point the database locks up. Presumably because this is a partitioned table.

So, what I figured out (I’m not sure it’s actually documented anywhere) is that you can run the validator on all partitions, and then it will run instantly without locking on the parent:

ALTER TABLE only pubsub_node_option_p0
  VALIDATE CONSTRAINT pubsub_node_option_nodeid_not_null;
...
ALTER TABLE only pubsub_node_option_p63
  VALIDATE CONSTRAINT pubsub_node_option_nodeid_not_null;
ALTER TABLE pubsub_node_option
  VALIDATE CONSTRAINT pubsub_node_option_nodeid_not_null;

So (1) is solved – we’ve shown postgres that we don’t have any NULLs in that column.

Time for (2) the UNIQUE KEY. An index can be created CONCURRENTLY however this doesn’t work on partitioned tables. So we have to create it on each partition and then hook it in to the main table.

Firstly, we create the index on the main table:

create unique index
  pubsub_node_option_pkey
  on only pubsub_node_option
  (nodeid, name);

The ON ONLY means it doesn’t happen on the child tables.

Then we create a new index in each of the children, but CONCURRENTLY so no locking:

create unique index concurrently
  pubsub_node_option_p0_pkey
  on pubsub_node_option_p0 (nodeid, name);
...
create unique index concurrently
  pubsub_node_option_p63_pkey
  on pubsub_node_option_p63 (nodeid, name);

Now this is done we can attach each of them in to the main primary key:

alter index pubsub_node_option_pkey
  attach partition pubsub_node_option_p0_pkey;
...
alter index pubsub_node_option_pkey
  attach partition pubsub_node_option_p63_pkey;

Perfect.

Then, we should be able to do the final step of the process to convert the index + check constraint to a primary key (which is really only a bit of semantics for Postgres):

> alter table pubsub_node_option add primary key using index pubsub_node_option_pkey;
ERROR:  ALTER TABLE / ADD CONSTRAINT USING INDEX is not supported on partitioned tables

D’oh. But at least we can add to all of the individual tables:

alter table pubsub_node_option_p0
  add primary key
  using index pubsub_node_option_p0_pkey;
...
alter table pubsub_node_option_p63
  add primary key
  using index pubsub_node_option_p63_pkey;
...

I still can’t see a way to add it to the main table though, even ALTER TABLE ONLY pubsub_node_option comes up with the same error.

I’m not really sure where to go from here, however the reason for wanting a PRIMARY KEY is because we want to use pglogical to replicate data, and this requires “a PRIMARY KEY or other valid replica identity such as using an index, which must be unique, not partial, not deferrable, and include only columns marked NOT NULL”. So I think we should be OK with how far we have gotten for now.

Please leave a comment if you know how to complete the process!

Random

Easily running unison on different Ubuntu versions

Saturday, November 4, 2023 Mark Leave a comment

Unison is a great file-synchronization tool, but it’s highly dependent on both the unison version and the OCAML version being the same between client and server to enable syncing. I’ve wasted much time over the years trying to backport versions to ubuntu as I upgrade a laptop but not the server it backs up to or vice-versa.

This seems like a great problem to solve with containers, and fortunately it seems like it’s quite easy.

Create a new unison container on the server – Dockerfile should look like:

FROM ubuntu:22.04
RUN apt update && apt -y install unison

Then, build it:

docker build --network host -t unison .

And create /usr/bin/unison looking like:

#!/bin/bash
EXTRA=""
for g in $(id -G); do
    EXTRA="$EXTRA --group-add $g"
done
exec docker run --rm \
    -v /home/:/home/ \
    -e USER=$USER -e HOME=$HOME \
    -u "$(id -u):$(id -g)" \
    $EXTRA \
    -v /etc/passwd:/etc/passwd:ro \
    -v /etc/group:/etc/group:ro \
    --network host \
    -i \
    unison \
    unison $@

It’s necessary to run on host network so it picks up the server’s hostname as unison is highly dependent on this, and the environment variables that are passed in.

Then it seems to work just fine. Happy days!

Random

Easily dumping all AWS SSM details

Monday, August 21, 2023 Mark Leave a comment

We have a lot of credentials and config mixed up in AWS SSM/Parameter Store over many different regions and profiles. We want to export all of these to a spreadsheet to allow some more junior team members to convert the non-secret ones into standard parameters which can be stored in our kubernetes git config (on a per-environment basis). The following script does this:

#!/bin/bash
for profile in x y z; do
    awsv2 --profile $profile ssm get-parameters-by-path --path / --with-decryption > paramstore-$profile.json
done
cat *.json  | jq -r '.Parameters[] | select(.Name+.Value |test("pass|private|secret|token|://.*:|certificate|pwd|cookie|key"; "i")|not) | [.Name, .Value] | @csv' > out.csv

Random

Using Postgres NOTIFY with placeholders

Tuesday, December 20, 2022 Mark Leave a comment

When writing SQL you always want to use placeholders rather than trying to escape text yourself and risk an SQL injection attack.

Postgres provides great functionality for this such as:

SELECT * FROM table WHERE username = $1

Today I was trying to send arbitrary text to a channel via the very powerful NOTIFY command. However every time I tried to use placeholders I was getting errors (from python’s asyncpg driver, which is pulling it directly from postgres error code 42601).

Eventually, looking through the docs I found this quote:

To send a notification you can also use the function pg_notify(text, text). The function takes the channel name as the first argument and the payload as the second. The function is much easier to use than the NOTIFY command if you need to work with non-constant channel names and payloads.

So, after wasting an hour trying all sorts of different quoting strategies, I was able to change NOTIFY $1, $2 into SELECT pg_notify($1, $2) and resolve the issue.

I’m using SQLAlchemy by the way so it looks something like:

from sqlalchemy import text 
await session.execute(
  text("SELECT pg_notify(:channel, :data)")
    .bindparams(channel="channel", data="my text")
)

Perfect!

Random

Hacking ElasticSearch python client to work with AWS OpenSearch

Monday, November 14, 2022 Mark Leave a comment

Because of various disagreements between AWS and ElasticSearch, AWS released a fork called OpenSearch but and Elastic updated their clients to throw errors if you try to use them with this product.

This is obviously really annoying if you are using 3rd party software which uses the Elastic libraries, but trying to run them against AWS managed services. The following hack fixes this for Python elasticsearch v7.17 at least, by disabling the unnecessary version check:

# Hack elasticsearch to work with AWS
from elasticsearch import Transport
Transport._do_verify_elasticsearch = lambda self, headers, timeout: None

Random

Screen corruption on KDE with Ubuntu 20.04

Friday, March 18, 2022 Mark Leave a comment

At some point in the past couple of weeks I guess my laptop updated libraries or something, because when I had to reboot my laptop yesterday I started seeing massive screen corruption in some applications. It manifested itself in horizontal white/black lines remaining especially when selecting text. This was especially visible in konsole. I couldn’t see any package in particular which had updated recently and I tried looking at a few different Xorg or kernel options but to no avail.

Eventually I looked at the KDE compositor settings. I noticed that the “Rendering Backend” was set to XRender, which as far as I understand it is very old. Updating it to ‘OpenGL 3.1 instantly fixed the issue.

I’m leaving this mostly as a note to myself for if it happens again in the future, but at the same time perhaps it is a wider regression in the ubuntu 20.04 KDE packages so it would help someone else.

Random

Best wordle starter words

Friday, January 28, 2022 Mark Leave a comment

I recently, like pretty much everyone else got into Wordle. One of the most important things in getting the correct answer is to find the best first word or two to start with which will help guide you to the correct answer. The ideal first word(s) should use one each of the most common letters so for example in the first 2 guesses you can test the top 10 characters.

My first (relatively uneducated) guesses based on what I vaguely remembered about letter frequency in English were ‘spear’ and ‘mount’ – 4 vowels and some of the most common consonants. However it’s pretty much a random guess so I was wondering if we could figure out a better approach.

It’s pretty straight forward to look at the source code of Wordle, which contains two word lists. The first one contains 2315 5-letter words which can be the answer, the second contains a further 10,000 of all possible 5 letter words in English.

So, I wrote a small script to analyse the frequency of letters in the list of possible answers, and then based on that filter the possible words to find the best starting (and subsequent) guesses which would work.

I’ve put the simple python script I used at the bottom of the article, but the output is:

Matching 5 new letters (39%) are: [‘arose’]
Matching 5 new letters (66%) are: [‘unlit’, ‘until’]
Matching 4 new letters (81%) are: [‘duchy’]
Matching 3 new letters (89%) are: [‘pygmy’]

What this means is that if you start with the word ‘arose’, and then ‘until’ (or ‘unlit’), even though it’s only 10 unique letters (38% of the alphabet) because they are the most frequent ones they will cover 2/3 (66%) of the possible words.

In terms of letter frequency overall we get the following ordered detail:

[(‘e’, 1233), (‘a’, 979), (‘r’, 899), (‘o’, 754), (‘t’, 729), (‘l’, 719), (‘i’, 671), (‘s’, 669), (‘n’, 575), (‘c’, 477), (‘u’, 467), (‘y’, 425), (‘d’, 393), (‘h’, 389),
(‘p’, 367), (‘m’, 316), (‘g’, 311), (‘b’, 281), (‘f’, 230), (‘k’, 210), (‘w’, 195), (‘v’, 153), (‘z’, 40), (‘x’, 37), (‘q’, 29), (‘j’, 27)]

The script I wrote is not perfect but it’s at least a start at finding some optimum words

import sys
with open(sys.argv[1]) as fh:
    words = [l.strip() for l in fh]

chars = {}
for char in ''.join(words):
    chars[char] = chars.get(char, 0) + 1
frequency = sorted(chars.keys(), key=lambda c: -chars[c])
print(sorted(chars.items(), key=lambda c: -c[1]))

total_freq = 0
while len(frequency) > 5:
    matching = words
    letters = []
    for char in frequency:
        new_matching = [w for w in matching if char in w]
        if new_matching:
            matching = new_matching
            letters.append(char)
        if len(letters) == 5:
            break

    total_freq += sum([chars[c] for c in letters])
    print("Matching %d new letters (%d%%) are: %r" % (len(letters), total_freq / sum(chars.values()) * 100, matching))
    frequency = [c for c in frequency if c not in letters]

Random

Etohum BuradanOraya

Friday, February 6, 2015 Mark Leave a comment

I was recently selected by the etohum accelerator firm as one of the top 15 startups in Turkey in 2015. You can see an interview here!

Random

The Potter and the Clay

Thursday, June 17, 2010 Ruth Leave a comment

Ruth’s china decorating class recently had an exhibition in of the town’s local council buildings as a part of the end of term celebrations. Ruth’s course began last September, but many women have been attending for a couple of years. Because Ruth only joined the course a few months back (yes, she snook in despite usually having to wait till September for the new intake!), none of her work was exhibited (she has only been doing practice work to try out the skills needed). However, it was very exciting to see the quality of work that her coursemates are now able to produce. Hopefully one day she will also reach that level… we’ll see…

You can see some examples of the class’s work by clicking on the photo below.

/China+Exhibition/

Random

Hidden Gems of Istanbul

Thursday, June 17, 2010 Mark Leave a comment

We’ve been enjoying the sunshine and getting around to some of Istanbul’s lesser known (to us) sights. Our latest trip took us to Rustem Pasa Mosque and to the Eyup shrine, both breathtaking and intriguing in their own way. You can find pictures from our trip by clicking the photo below.

eyup/

The Rustem Pasa mosque was a jewel in the crown of Mimar Sinan (Sinan the Architect) whose work includes some of Istanbul’s finest historic sights. This tiny mosque is nestled on top of a higgledy piggledy assortment of bazaar-like shops, and, slipping a veil over my head as we entered through a shadowy archway and up some spiralled stairs, it seemed that we could be going back to Sinan’s time. The meydan (courtyard) is small but prettily arranged with plants and some tiled walls that, although stunning in themselves, are actually an aperitif to the stunning mosque interior. Slipping off shoes and entering into the mosque, it feels as though one is entering an Aladdin’s cave; a true diamond in the rough exterior of Istanbul’s bazar district. Every inch of wall was covered in exquisite tiles, and the height of the domed ceiling added a grandeur wholly unexpected in a place so small. Although we couldn’t stay long, I felt as though we should have stayed for hours just soaking in the beauty and paying tribute to the workmanship that had so lovingly crafted this jem-like place. Here are photos of the china tiles from Rustem Pasa Mosque, Eyup and Pierre Loti; click on the photo below to view the album.

/China_001/

From there we took the halic (Golden Horn) boat to Eyup. The boat journey itself presented a chance to admire Istanbul from a new vantage point, and although the Golden Horn is notoriously more industrial than the Bosphorus, it still has some sights along the way to redeem itself and give interest to the eye.

Eyup is a pretty, leafy little town that gives the illusion of spaciousness despite hugging the base of a clifflike cemetery. The place has a great importance for more religious Turks as it has a shrine to Eyup, considered to be the standard bearer of their Prophet. It is therefore considered a holy place, and going there is deemed to bring blessing to the pilgrim. In fact, there is even a special area dedicated to circumcision, and families often take their young sons there as they believe that such rites of passage are entrance points for either blessings or curses into family life. Above the cemetary there is also a famous cafe, Pierre Loti, named after a French Turkophile writer who, it is supposed, frequented the cafe. Although the cafe only serves beverages, there are also a couple of eateries that also profit from the commanding view back down the Golden Horn. We sat there and enjoyed our Menemen and Manti, traditional Turkish dishes, before taking the steep cable car back down to the town.

It was then time to visit the shrine ourselves, and, entering alongside many pilgrims, it was amazing to see the homage paid to the relics (including a stone said to bear Eyup’s footprint), hear the fervent prayers, and observe the respect as people walked backwards out of the place so as not to turn their backs on the shrine itself. The walls were again lavishly adorned with tiles that showed countless hours of painting and design.

Leaving the shrine, we then meandered through the town and along to another fine mosque the Zalmahmut Pasa Mosque. Although we didn’t enter it, the courtyard provided a welcome rest amidst leafy pergolas, tea drinking locals and one friendly, ‘lazing in the sun’ cat. What a treat to live in such a richly cultured, and intricately woven tapestry of a city.

Mark's blog

Category Archives: Random

Adding a primary key to a partitioned table in postgres with zero downtime or locks

Easily running unison on different Ubuntu versions

Easily dumping all AWS SSM details

Using Postgres NOTIFY with placeholders

Hacking ElasticSearch python client to work with AWS OpenSearch

Screen corruption on KDE with Ubuntu 20.04

Best wordle starter words

Etohum BuradanOraya

The Potter and the Clay

Hidden Gems of Istanbul

High-performance Linux optimization and development.