All posts by Mark

I'm a full-stack Linux consultant from the UK specializing in high performance systems, DNS and databases. I have also written and lead teams producing a number of web/mobile apps. I'm fluent in English and Turkish.

Random

Getting OpenTelemetry Logging working in Python

Friday, May 30, 2025 Mark Leave a comment

I’ve just spent a few hours tracking through the source code of OpenTelemetry and Python’s logging packages to try to get a better understanding of how this works in order to implement OpenTelemetry logging in python. As it’s not trivial I thought I’d document the process for others (and my future self) to better understand.

Firstly, the OpenTelemetry docs about OTEL_PYTHON_LOG_LEVEL seem to be incorrect – there’s no reference to this (yet) in the code, only in an open PR.

Secondly, it doesn’t actually matter what that is set to even if it works, if the level of your root logger is higher than what you want to log via OTEL it simply won’t ever receive those messages.

If you are doing logging.basicConfig(level=...) then that level is set on the root logger in your program, and any logs of a lesser level will be dropped. So if you’re saying logging.basicConfig(level=logging.WARNING) then INFO/DEBUG will never be sent to OTEL or anywhere else for that matter, even if you create handlers with their levels as lower than this.

So, firstly you need to configure the root logger correctly, ie rather than:

logging.basicConfig(
  level=logging.WARNING,
  format="%(asctime)s %(name)-12s %(levelname)-8s %(message)s",
  ...
)

you need to say:

default_handler = logging.StreamHandler()                                                                                                                                        
default_handler.setLevel(logging.getLevelName(log_settings.log_level))                                                                                                           
default_handler.setFormatter(logging.Formatter("%(asctime)s %(name)-12s %(levelname)-8s %(message)s"))                                                                           
                                                                                                                                                                                 
# Ensure that the root logger gets all messages, as we may want another log handler (eg opentelemetry) to be able                                                                
# to fetch messages of lower level than we output to the console.                                                                                                                
logging.basicConfig(                                                                                                                                                             
    level=logging.NOTSET,                                                                                                                                                        
    handlers=[default_handler],    
    ...                                                                                                                                              
)

It’s more verbose (and basically what the basicConfig defaults are doing under the hood), but it will give you more ability to configure logs in the future.

Anyway, now we’ve got the basic logging configured correctly we can initialize the OpenTelemetry logging.

I’m going to assume that you have OpenTelemetry already working eg via opentelemetry-instrument -a python ...; you can set the following environment variables to enable the logging:

OTEL_PYTHON_LOG_CORRELATION=true  # Add trace information
OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true  # Enable logging
OTEL_LOGS_EXPORTER=console  # Obviously set this to something else in the real-world, but useful for testing
OTEL_PYTHON_LOG_LEVEL=info

This should dump OTEL log events to your console when you trigger them. However a logging.debug(...) will also be dumped because as I said at the beginning, OTEL_PYTHON_LOG_LEVEL is not yet implemented in Python in spite of what the docs say (see https://github.com/open-telemetry/opentelemetry-python/pull/4203 for a proposed PR which hopefully will be merged soon).

A quick hack to set the OTEL log level is as follows:

from opentelemetry.sdk._logs._internal import LoggingHandler

# Patch the newly created log handler so that OTEL_PYTHON_LOG_LEVEL is honoured. Can be removed once            
# https://github.com/open-telemetry/opentelemetry-python/pull/4203 is merged.            
otel_logger = next(filter(lambda n: isinstance(n, LoggingHandler), logging.getLogger().handlers), None)            
if otel_logger:            
    otel_logger.setLevel(logging.getLevelName(os.environ.get("OTEL_PYTHON_LOG_LEVEL", "WARNING").upper()))

And you’re done.

As a side-note, I recently found a really simple way to make OpenTelemetry auto-instrumentation work correctly with forking servers such as uvicorn or gunicorn. For FastAPI in particular (but it should be easy to modify for other frameworks) simply create a lifespan hook which imports opentelemetry.instrumentation.auto_instrumentation.sitecustomize:

@asynccontextmanager
async def otel_lifespan(app: FastAPI) -> t.AsyncIterator[None]:
    if "OTEL_SERVICE_NAME" in os.environ:  # Only enable OTEL if this is set    
        if "PYTHONPATH" not in os.environ:    
            os.environ["PYTHONPATH"] = ":".join(sys.path)

        # This import will trigger auto-instrumentation for this process, which is needed for anything with multiple    
        # worker processes (e.g. gunicorn, uvicorn)    
        import opentelemetry.instrumentation.auto_instrumentation.sitecustomize  # noqa
    yield

app = FastAPI(lifespan=otel_lifespan, ...)

This basically is the same as running opentelemetry-instrument on each worker process that is forked by the process manager. This hack should work with gunicorn, uvicorn etc.

Random

Running musescore from within a docker container

Sunday, September 8, 2024 Mark 4 Comments

For a project, I’m wanting to run the excellent musescore from the CLI to bulk convert .mscz files into MusicXML, PDF renders and even instrumental MP3s in bulk. However because the user content is from modern versions of musescore, and the server is very old, I’m having to use docker to run a modernish version of ubuntu that supports version 4 of musescore. Also when running musescore from a CLI it is very painful because it’s unfortunately built to be a GUI application. The following Dockerfile allows us to run it:

FROM ubuntu:22.04

RUN apt update && \
    DEBIAN_FRONTEND=noninteractive apt install -y --no-install-suggests xvfb wget libopengl0 musescore ca-certificates && \   
    wget -O musescore.appimage https://cdn.jsdelivr.net/musescore/v4.4.1/MuseScore-Studio-4.4.1.242490810-x86_64.AppImage && \
    chmod +x musescore.appimage && \
    apt remove -y wget musescore && \
    ./musescore.appimage --appimage-extract && \
    rm musescore.appimage

ENTRYPOINT [ \
        # Hangs with the error in the GUI if there was an issue so force it to stop at some point
        "/usr/bin/timeout", "10", \
        # Fake X server
        "xvfb-run", "-s", "-screen 0 640x480x24 -ac +extension GLX +render -noreset", \
        # Run the actual command
        "squashfs-root/bin/mscore4portable" \
    ]

For some reason the appimage itself doesn’t pass stuff through correctly so we have to extract it and run the app directly from the filesystem. We can then run it like:

docker run --rm -v $PWD:/data -it musescore /data/001.mscz -o /data/out.xml

to convert a file.

Random

Adding a primary key to a partitioned table in postgres with zero downtime or locks

Monday, November 6, 2023 Mark Leave a comment

In postgres (14), we have a table pubsub_node_option which looks like:

  Partitioned table "public.pubsub_node_option"
 Column │  Type  │ Collation │ Nullable │ Default 
────────┼────────┼───────────┼──────────┼─────────
 nodeid │ bigint │           │          │ 
 name   │ text   │           │ not null │ 
 val    │ text   │           │ not null │ 
Partition key: HASH (nodeid)
Indexes:
    "partitioner" btree (nodeid)
Number of partitions: 64 (Use \d+ to list them.)

The 64 partitions are named like pubsub_node_option_p0 to _p63. It has a few hundred Gb of data and we want to add a primary key to it with no downtime or locks (perhaps 1s is acceptable). The standard ADD PRIMARY KEY etc commands will lock the table for the duration of the process which means we can’t use them.

In Postgres a primary key is just a unique index over a set of non-null columns. The documentation says that whilst the SET NOT NULL command requires a full table scan (with exclusive lock), ADD PRIMARY KEY has another option which takes an already-existing index, and if there is also NOT NULL or a similar CHECK constraint on all of the columns it can do it without needing any locks.

So the basic process looks like:

Add a CHECK constraint on nodeid
Create a unique index over (nodeid, name)
use ALTER TABLE ... ADD PRIMARY KEY USING INDEX ... to set the PK.

Lets start with (1). If we just add a CHECK constraint directly, postgres will take out an exclusive lock on the table which will freeze all updates for the duration. However there is a NOT VALID setting which allows us to later run a validator in the background. So we can do:

ALTER TABLE pubsub_node_option
  ADD CONSTRAINT pubsub_node_option_nodeid_not_null
  CHECK (nodeid is not null)
  NOT VALID;

Then we can try to run the validator which shouldn’t take any locks:

ALTER TABLE pubsub_node_option
  VALIDATE CONSTRAINT pubsub_node_option_nodeid_not_null;

Unfortunately, at this point the database locks up. Presumably because this is a partitioned table.

So, what I figured out (I’m not sure it’s actually documented anywhere) is that you can run the validator on all partitions, and then it will run instantly without locking on the parent:

ALTER TABLE only pubsub_node_option_p0
  VALIDATE CONSTRAINT pubsub_node_option_nodeid_not_null;
...
ALTER TABLE only pubsub_node_option_p63
  VALIDATE CONSTRAINT pubsub_node_option_nodeid_not_null;
ALTER TABLE pubsub_node_option
  VALIDATE CONSTRAINT pubsub_node_option_nodeid_not_null;

So (1) is solved – we’ve shown postgres that we don’t have any NULLs in that column.

Time for (2) the UNIQUE KEY. An index can be created CONCURRENTLY however this doesn’t work on partitioned tables. So we have to create it on each partition and then hook it in to the main table.

Firstly, we create the index on the main table:

create unique index
  pubsub_node_option_pkey
  on only pubsub_node_option
  (nodeid, name);

The ON ONLY means it doesn’t happen on the child tables.

Then we create a new index in each of the children, but CONCURRENTLY so no locking:

create unique index concurrently
  pubsub_node_option_p0_pkey
  on pubsub_node_option_p0 (nodeid, name);
...
create unique index concurrently
  pubsub_node_option_p63_pkey
  on pubsub_node_option_p63 (nodeid, name);

Now this is done we can attach each of them in to the main primary key:

alter index pubsub_node_option_pkey
  attach partition pubsub_node_option_p0_pkey;
...
alter index pubsub_node_option_pkey
  attach partition pubsub_node_option_p63_pkey;

Perfect.

Then, we should be able to do the final step of the process to convert the index + check constraint to a primary key (which is really only a bit of semantics for Postgres):

> alter table pubsub_node_option add primary key using index pubsub_node_option_pkey;
ERROR:  ALTER TABLE / ADD CONSTRAINT USING INDEX is not supported on partitioned tables

D’oh. But at least we can add to all of the individual tables:

alter table pubsub_node_option_p0
  add primary key
  using index pubsub_node_option_p0_pkey;
...
alter table pubsub_node_option_p63
  add primary key
  using index pubsub_node_option_p63_pkey;
...

I still can’t see a way to add it to the main table though, even ALTER TABLE ONLY pubsub_node_option comes up with the same error.

I’m not really sure where to go from here, however the reason for wanting a PRIMARY KEY is because we want to use pglogical to replicate data, and this requires “a PRIMARY KEY or other valid replica identity such as using an index, which must be unique, not partial, not deferrable, and include only columns marked NOT NULL”. So I think we should be OK with how far we have gotten for now.

Please leave a comment if you know how to complete the process!

Random

Easily running unison on different Ubuntu versions

Saturday, November 4, 2023 Mark Leave a comment

Unison is a great file-synchronization tool, but it’s highly dependent on both the unison version and the OCAML version being the same between client and server to enable syncing. I’ve wasted much time over the years trying to backport versions to ubuntu as I upgrade a laptop but not the server it backs up to or vice-versa.

This seems like a great problem to solve with containers, and fortunately it seems like it’s quite easy.

Create a new unison container on the server – Dockerfile should look like:

FROM ubuntu:22.04
RUN apt update && apt -y install unison

Then, build it:

docker build --network host -t unison .

And create /usr/bin/unison looking like:

#!/bin/bash
EXTRA=""
for g in $(id -G); do
    EXTRA="$EXTRA --group-add $g"
done
exec docker run --rm \
    -v /home/:/home/ \
    -e USER=$USER -e HOME=$HOME \
    -u "$(id -u):$(id -g)" \
    $EXTRA \
    -v /etc/passwd:/etc/passwd:ro \
    -v /etc/group:/etc/group:ro \
    --network host \
    -i \
    unison \
    unison $@

It’s necessary to run on host network so it picks up the server’s hostname as unison is highly dependent on this, and the environment variables that are passed in.

Then it seems to work just fine. Happy days!

Random

Easily dumping all AWS SSM details

Monday, August 21, 2023 Mark Leave a comment

We have a lot of credentials and config mixed up in AWS SSM/Parameter Store over many different regions and profiles. We want to export all of these to a spreadsheet to allow some more junior team members to convert the non-secret ones into standard parameters which can be stored in our kubernetes git config (on a per-environment basis). The following script does this:

#!/bin/bash
for profile in x y z; do
    awsv2 --profile $profile ssm get-parameters-by-path --path / --with-decryption > paramstore-$profile.json
done
cat *.json  | jq -r '.Parameters[] | select(.Name+.Value |test("pass|private|secret|token|://.*:|certificate|pwd|cookie|key"; "i")|not) | [.Name, .Value] | @csv' > out.csv

Random

Using Postgres NOTIFY with placeholders

Tuesday, December 20, 2022 Mark Leave a comment

When writing SQL you always want to use placeholders rather than trying to escape text yourself and risk an SQL injection attack.

Postgres provides great functionality for this such as:

SELECT * FROM table WHERE username = $1

Today I was trying to send arbitrary text to a channel via the very powerful NOTIFY command. However every time I tried to use placeholders I was getting errors (from python’s asyncpg driver, which is pulling it directly from postgres error code 42601).

Eventually, looking through the docs I found this quote:

To send a notification you can also use the function pg_notify(text, text). The function takes the channel name as the first argument and the payload as the second. The function is much easier to use than the NOTIFY command if you need to work with non-constant channel names and payloads.

So, after wasting an hour trying all sorts of different quoting strategies, I was able to change NOTIFY $1, $2 into SELECT pg_notify($1, $2) and resolve the issue.

I’m using SQLAlchemy by the way so it looks something like:

from sqlalchemy import text 
await session.execute(
  text("SELECT pg_notify(:channel, :data)")
    .bindparams(channel="channel", data="my text")
)

Perfect!

Random

Hacking ElasticSearch python client to work with AWS OpenSearch

Monday, November 14, 2022 Mark Leave a comment

Because of various disagreements between AWS and ElasticSearch, AWS released a fork called OpenSearch but and Elastic updated their clients to throw errors if you try to use them with this product.

This is obviously really annoying if you are using 3rd party software which uses the Elastic libraries, but trying to run them against AWS managed services. The following hack fixes this for Python elasticsearch v7.17 at least, by disabling the unnecessary version check:

# Hack elasticsearch to work with AWS
from elasticsearch import Transport
Transport._do_verify_elasticsearch = lambda self, headers, timeout: None

Random

Screen corruption on KDE with Ubuntu 20.04

Friday, March 18, 2022 Mark Leave a comment

At some point in the past couple of weeks I guess my laptop updated libraries or something, because when I had to reboot my laptop yesterday I started seeing massive screen corruption in some applications. It manifested itself in horizontal white/black lines remaining especially when selecting text. This was especially visible in konsole. I couldn’t see any package in particular which had updated recently and I tried looking at a few different Xorg or kernel options but to no avail.

Eventually I looked at the KDE compositor settings. I noticed that the “Rendering Backend” was set to XRender, which as far as I understand it is very old. Updating it to ‘OpenGL 3.1 instantly fixed the issue.

I’m leaving this mostly as a note to myself for if it happens again in the future, but at the same time perhaps it is a wider regression in the ubuntu 20.04 KDE packages so it would help someone else.

Random

Best wordle starter words

Friday, January 28, 2022 Mark Leave a comment

I recently, like pretty much everyone else got into Wordle. One of the most important things in getting the correct answer is to find the best first word or two to start with which will help guide you to the correct answer. The ideal first word(s) should use one each of the most common letters so for example in the first 2 guesses you can test the top 10 characters.

My first (relatively uneducated) guesses based on what I vaguely remembered about letter frequency in English were ‘spear’ and ‘mount’ – 4 vowels and some of the most common consonants. However it’s pretty much a random guess so I was wondering if we could figure out a better approach.

It’s pretty straight forward to look at the source code of Wordle, which contains two word lists. The first one contains 2315 5-letter words which can be the answer, the second contains a further 10,000 of all possible 5 letter words in English.

So, I wrote a small script to analyse the frequency of letters in the list of possible answers, and then based on that filter the possible words to find the best starting (and subsequent) guesses which would work.

I’ve put the simple python script I used at the bottom of the article, but the output is:

Matching 5 new letters (39%) are: [‘arose’]
Matching 5 new letters (66%) are: [‘unlit’, ‘until’]
Matching 4 new letters (81%) are: [‘duchy’]
Matching 3 new letters (89%) are: [‘pygmy’]

What this means is that if you start with the word ‘arose’, and then ‘until’ (or ‘unlit’), even though it’s only 10 unique letters (38% of the alphabet) because they are the most frequent ones they will cover 2/3 (66%) of the possible words.

In terms of letter frequency overall we get the following ordered detail:

[(‘e’, 1233), (‘a’, 979), (‘r’, 899), (‘o’, 754), (‘t’, 729), (‘l’, 719), (‘i’, 671), (‘s’, 669), (‘n’, 575), (‘c’, 477), (‘u’, 467), (‘y’, 425), (‘d’, 393), (‘h’, 389),
(‘p’, 367), (‘m’, 316), (‘g’, 311), (‘b’, 281), (‘f’, 230), (‘k’, 210), (‘w’, 195), (‘v’, 153), (‘z’, 40), (‘x’, 37), (‘q’, 29), (‘j’, 27)]

The script I wrote is not perfect but it’s at least a start at finding some optimum words

import sys
with open(sys.argv[1]) as fh:
    words = [l.strip() for l in fh]

chars = {}
for char in ''.join(words):
    chars[char] = chars.get(char, 0) + 1
frequency = sorted(chars.keys(), key=lambda c: -chars[c])
print(sorted(chars.items(), key=lambda c: -c[1]))

total_freq = 0
while len(frequency) > 5:
    matching = words
    letters = []
    for char in frequency:
        new_matching = [w for w in matching if char in w]
        if new_matching:
            matching = new_matching
            letters.append(char)
        if len(letters) == 5:
            break

    total_freq += sum([chars[c] for c in letters])
    print("Matching %d new letters (%d%%) are: %r" % (len(letters), total_freq / sum(chars.values()) * 100, matching))
    frequency = [c for c in frequency if c not in letters]

DNS, IPTables, Linux, Security

Simple mitigation for the new DNS cache poisoning attack

Friday, November 13, 2020 Mark Leave a comment

As reported in many places, a new attack has been presented which can allow an attacker to poison caching and forwarding DNS server entries. The PDF is an interesting read and contains many different ideas which chained together can lead to this attack. I believe the following firewall rule should defend against the attack on caching servers with very little side effect by preventing sending of ICMP messages saying that the given UDP port was unreachable:

iptables -I OUTPUT -p icmp --icmp-type port-unreachable -m u32 --u32 '34 & 0xFF = 17' -j DROP

Mark's blog

All posts by Mark

Getting OpenTelemetry Logging working in Python

Running musescore from within a docker container

Adding a primary key to a partitioned table in postgres with zero downtime or locks

Easily running unison on different Ubuntu versions

Easily dumping all AWS SSM details

Using Postgres NOTIFY with placeholders

Hacking ElasticSearch python client to work with AWS OpenSearch

Screen corruption on KDE with Ubuntu 20.04

Best wordle starter words

Simple mitigation for the new DNS cache poisoning attack

High-performance Linux optimization and development.