In one of my contracts we’re doing some very complex data storage and querying using jsonb in postgres. One common display in the system we’re replacing is basically something like:
SELECT sum( IF( val between 91 and 92, 1, 0 ) ) val_91, sum( IF( val between 92 and 93, 1, 0 ) ) val_92, sum( IF( val between 93 and 94, 1, 0 ) ) val_93, ... GROUP BY somecol
Which is horrible, inefficient and doesn’t let you know if anything occurred outside the range. As we’re already doing a lot with JSON, I was wondering if rather than creating new columns, we could return these counts in a JSON format. It turns out that with postgres this is really easy – here’s how:
First, we want to create a function that given the existing aggregated value and a new key to inclement, returns a jsonb object
CREATE FUNCTION agg_count_values( state jsonb, key text ) RETURNS jsonb STRICT AS $$ SELECT state || jsonb_build_object( key, -- extract the current value, or 0 if the key doesn't exist yet COALESCE(( jsonb_extract_path_text( state, key ) )::int, 0) + 1 ); $$ language SQL;
Then, you define the aggregate:
CREATE AGGREGATE count_values( text ) ( sfunc = agg_count_values, stype = jsonb, initcond = '{}' );
If you noticed the STRICT in the function (which means that NULL’s are not allowed), the aggregate we created notices that and will never pass NULL values in to the function meaning we don’t need to special-case them.
If you then run the query first casting the floating-point value into an integer and then turning that into a text datatype for the json key:
SELECT count_values( val::int::text ) GROUP BY somecol;
You get a JSON result back with the counts:
{"93": 1, "94": 2, "95": 1, "96": 5, "97": 2, "98": 4, "99": 6, "100": 6, "101": 7, "102": 5, "103": 5, ...} ...
Obviously your first port of call when faced with something like this is ‘can I group by the value’, but in this case we want the row to be based on something different so we needed the sub-aggregation of the values.
Happy days