All posts by Mark

I'm a full-stack Linux consultant from the UK specializing in high performance systems, DNS and databases. I have also written and lead teams producing a number of web/mobile apps. I'm fluent in English and Turkish.

How (not) to use PostgreSQL functional indexes

Working on a project where we want to store searchable (using LIKE, meaning we can’t use the inbuilt Postgres macaddr type) MAC addresses in a PostgreSQL table I decided create a unique index in order to only allow one modem per entry in the table and also to speed up searches. However we also want to ensure that we dodn’t get upper- and lower-case versions of the same MAC inserted. PostgreSQL to the rescue – you can create a functional index on the table:

create unique index on modem (( lower(mac) ));

Basically this means that it should first lowercase anything that gets inserted, and then check that the unique constraint isn’t violated. It should also mean that something like:

select 1 from modem where mac = ?

should be indexed and hence work really quickly.

However, while profiling the application I noticed the select statements getting slower and slower the more rows we put into the table – the classical way of seeing that a column is not properly indexed. Looking at the postgres manual I remembered that a functional index actually only speeds up queries where you apply the function to the column, for example something like the below would use the index:

select 1 from modem where lower(mac) = lower(?)

Functionally this was the same as what I was doing, but I didn’t want to have the hassle of remembering to apply that function everywhere we wanted to search by mac. In the end I decided to drop the functional index and just use a standard unique index. I then added a type constraint check which also enforces the value being hex:

alter table modem add constraint ensure_mac_correct CHECK (mac ~ '^[0-9a-f]{12}$');

Did I ever mention how I love postgres? Take that MySQL!

Easily dumping sample data from any database

Following on from my previous post on the wonders of perl’s DBIx::Class::Schema::Loader, after you have dumped the schema using either the perl scripts provided or the dbicdump commandline tool, you can easily dump the first few rows of each table in JSON using the following very simple perl script:

use Parts::Schema;
use JSON;
my $dbic = Parts::Schema->connect('dbi:Pg:...', USER, PASSWORD);
my %data;
for my $source ($dbic->sources) {
    my $rs =
    $data{$source} = [
        map +{ $_->get_inflated_columns }, $dbic->resultset($source)->search({}, { rows => 10 })
    ]
}
print encode_json( \%data );

To dump the whole table simply remove that rows => 10 bit (although as it loads each table into memory and keeps it there before dumping I wouldn’t advise this!)

Alternatively for a NoSQL database like MongoDB:

use MongoDB;
use JSON;
my $db =MongoDB::MongoClient->new(XXX)->get_database(XXX);
my %data;
$data{$_} = [ $db->get_collection($_)->query->limit(10)->all ] for $db->collection_names;
print encode_json( \%data )

Perl’s killer feature – DBIx::Class::Schema::Loader

After I tell people I’m a programmer and they ask ‘which language’, they’re usually pretty shocked when I tell them that for server-side I use Perl out of preference. I guess most people have seen some really bad scripting in perl patched together by successive non-developers at some point in their life and assume that’s what I mean, but that’s a story for another day.

Anyway, one of the main reasons I use perl as my language of choice is because of the excellent ORM, DBIx::Class which allows you to do a lot of complex database queries and joins without having to touch SQL. Fine, you can get a decent Object-Relational Mapper (ORM) in many different languages – PHP, ruby, python, node.js for example. DBIx::Class does seem to have a lot more databases supported than your typical backend though. However the killer feature is not DBIx::Class but a separate module, DBIx::Class::Schema::Loader.

Here’s the beauty of DBIx::Class::Schema::Loader – with a typical ORM you need to define every table’s schema and relationships in code before you can start using it fully, however this module goes into the database and pulls out all of the tables, indexes, relationships and creates the full set of classes which you are then free to edit or update as you wish. The next time you run it it will pull in any database schema updates and keep the custom stuff you wrote. I’m yet to see this functionality in any other language – please point it out to me if it exists.

Now I know at this point there are many devs who like to define a database schema in code and then push it to the database so you can have versioning etc (which by the way these modules also support if you want to use them that way round). Whilst that might be OK for a new project, it’s a nightmare for an existing database or project – just think about how many days work would it be to build up ORM models of 100 tables manually. Also in my experience writing schemas in Ruby, Node.JS or even DBIx::Class is pretty nasty and repetitive – you can do a lot better using SQL CREATE TABLE statements. Another issue is that it’s very difficult to define indexes properly – perhaps you can define an index using the ORM schema definition language, but I’d wager most ORMs don’t know much about how to create a partial or functional index for example in postgres you can easily specify complicated indecies for example:

CREATE INDEX people_names ON people ((first_name || ' ' || last_name));

or

CREATE INDEX access_log_client_ip_ix ON access_log (client_ip)
WHERE NOT (client_ip > inet '192.168.100.0' AND
           client_ip < inet '192.168.100.255');

Certainly once you go into the realms of complex geo-spatial data-types it’s impossible not to touch the database manually no matter what language or ORM you use.

An example of using DBIx::Class::Schema::Loader to connect to a database and dump all of the schema we simply have the following as a library (under dbic_lib/Parts/Schema.pm):

package Parts::Schema;
use v5.14
use base 'DBIx::Class::Schema::Loader';

our $conn_info = [ 'dbi:Pg:dbname=...', $USER, $PASS ];

sub connect { shift->SUPER::connect( @$conn_info ) }

1

And as the dumper script:

use v5.14;
use FindBin::libs qw< base=dbic_lib use=1 >;
use Parts::Schema;

Parts::Schema->loader_options({ dump_directory => "lib" });
Parts::Schema->connect;

That was easy wasn’t it? This works whether you have 2 tables or 200. So far I’ve used it with MySQL, PostgreSQL, MSSQL, Oracle, SQLite and it works brilliantly in all of these thanks to the excellent driver support in DBI which all of this is based on.

In a previous job we had an MSSQL data-warehouse that we wanted to connect to and pull out various views that defined customer lists so could email them. We used this module to refresh the schema dump on demand, reload it into the app and allow people using just a web browser to choose a view from the data warehouse, define columns or search criteria put these into a template (also created by them on the frontend) and email out. All this in about 30 lines of perl code for the database interface.

As another example just this week I was given the task of figuring out a client’s database structure with over 150 tables only about 5 of which even had a primary key. First thing – whip out the code above and get a full dump of all the tables into a perl class structure (by the way it was an oracle database – I’ve never used that brand of SQL before, wouldn’t know what command-line linux client to use and it’s a bit strange it doesn’t even have LIMIT – not a problem for DBIx::Class though). Within about 5 minutes I had a full dump of all tables including indexes, column types (and relationships if any had been defined). Within about 10 more minutes I could write a script to dump complex data based on relationships between the tables into a JSON format.

Please show me another language that has such great database-agnostic ORM support.

Update: DBIx::Class::Schema::Loader comes with a tool called dbicdump. If you just want to take a simple dump of the schema of the database above rather than writing the dump script yourself you can just run:

dbicdump Parts::Schema 'dbi:Pg:dbname=...' USER PASS

Angular Smart Table and drop-down select

The reason I wrote my previous post explaining my difficulties with selects was because I am currently using the excellent Smart Table angular module to bring some interactivity to tables in a project I’m working on. Smart Table seems to just work although I still find the st-table and st-safe-src attribute requirements a little strange.

One issue I have found is with filtering. The following does not filter properly (doesn’t filter anything), even though the option values seem to be set correctly:

<select st-search="field" st-input-event="change" st-delay=0 ng-model="dummy" class="form-control input-sm" ng-options="item.field as item.field for item in items track by item.field">
                    </select>

Instead, you have to write out the select using an ng-repeat which seems to do the trick nicely:

                    <select st-search="field" st-input-event="change" st-delay=0 ng-model="dummy" class="form-control input-sm">
                        <option value="">
                            All items
                        </option>
                        <option ng-repeat="item in items" value="{{ item.field }}">
                            {{ item.field }}
                        </option>
                    </select>

<select element problem in AngularJS

So I just spent 10 minutes trying to figure out why the following wasn’t working in AngularJS:

<select ng-options="item.name for item in items">
</select>

I spent ages checking my syntax and that the data was correct. Finally I discovered the reason – <select REQUIRES ng-model.

<select ng-model="dummy" ng-options="item.name for item in items">
</select>

works just fine…

Switching to specific KDE activites using shortcut keys

Today I’ve been playing around with an incredibly powerful feature of KDE – activities. This basically means you can switch between entire sets of applications and virtual desktops very easily. As a consultant I usually have several different clients that I work for at the same time plus a set of programs I use personally. KDE Activities allows you to separate these and switch between them very easily. The default keypress to circle around the different activities is meta-tab but I wanted to be able to switch directly to a specific one, however this doesn’t have any specific support in KDE. Fortunately with Linux and KDE being so flexible I found a solution to this from this thread from 2011. As I was implementing it I noticed a number of changes that have been made so here is an updated version of the instructions:

Firstly, get a list of activity ID’s and their human-readable names; from the console:

$ for i in $(qdbus org.kde.kactivitymanagerd /ActivityManager/Activities ListActivities); do echo -n "$i : "; qdbus org.kde.kactivitymanagerd /ActivityManager/Activities ActivityName $i; done
872ecadb-1388-41fe-89c5-4475c7fd137f : AAA
05104b7f-5df9-4ba2-80bf-0b3eca8a501d : BBB
a20e63e3-6ac6-4e0c-9f0f-2eebb3dfcc8b : CCC

Then open up the “Custom Shortcuts” program and create a new shortcut using “Edit -> New -> Global Shortcut -> D-Bus Command”. Fill out the D-Bus settings as follows:

Remote application: org.kde.kactivitymanagerd
Remote object: /ActivityManager/Activities
Function: SetCurrentActivity
Arguments: 872ecadb-1388-41fe-89c5-4475c7fd137f

And you’re done!

Recovering from strange mysql crash

So, yesterday my server started emailing some cron errors over. One particular script that runs every hour was throwing an error about 10 seconds in to running that appeared to be the server hanging up the query. I thought I may have set some timeout too low such that the server wouldn’t allow queries longer than 10 seconds (as this is a web server then nothing should take that long apart from a few analytics scripts that run overnight). Running the query by hand showed the same problem, so I started doing some analysis of the component parts to see which was taking so long. Then, I looked at the timeouts set and noticed:

mysql> show status like "%time%";
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Uptime                     | 105   |
| Uptime_since_flush_status  | 105   |
+----------------------------+-------+

Hmmm that looks bad. Looking in dmesg confirms that mysql has not been hanging up – it’s actually been crashing!

[504510.549172] init: mysql main process (15317) terminated with status 1
[504510.549186] init: mysql main process ended, respawning

I ran mysqlcheck on the database in question and the server crashed again, even though I was able to query the table fine and even added an index before realizing that there was some issue with it. So, rather than restore from a backup I thought I’ll just clone the table and replace the existing one with it:

mysql> create table t like client_songs;
mysql> insert t select * from client_songs;
mysql> check table t;

New table looks fine, lets do a final update (as it’s continually being inserted into)

mysql> insert t select * from client_songs cs where client_song_ts >= ( select max(client_song_ts) from t ) on duplicate key update select_count= cs.select_count, download_count = cs.download_count, rating = cs.rating, client_song_ts = cs.client_song_ts, print_count = cs.print_count;

Then put it live:

mysql> drop table client_songs;
mysql> rename table t to client_songs;
mysql> check table client_songs;

Everything working again. I wish mysql (5.5.37 from ubuntu 14.04 LTS) was more reliable that’s why I tend to use postgres for new projects these days. It’s really strange that the table could be read fine but one particular query caused it to crash – probably a case of the particular index that was being for the query being corrupted but not the row-data.

Getting HP p1005 and associated LaserJet printers working with Raspberry Pi

The standard hplip doesn’t seem to work very well on the Raspberry Pi with printers like the HP LaserJet 1000/1005/1018/1020/P1005/P1006/P1007/P1008/P1505 as they need a custom firmware downloading to them before being able to print anything. This seems like an issue with the firmware download or page formatting tool which just make the printer give a click sound and do no more. This is annoying because I was wanting to use my Raspberry Pi as a file and print server – my all-in-one HP DeskJet F2420 set up really easily but I was struggling for a long time to make the LaserJet work correctly.

The basic way to make this work is to download and compile the foo2zjs open source printer driver (as the apt sources don’t seem to be working properly at least on my slightly older version of Raspian). After having downloaded and compiled you install using the following commands:

rm /etc/udev/rules.d/*hpmud* # remove any existing hotplug from HP
./getweb P1005 # change depending on your printer
make install
make install-hotplug
make cups

For some reason the install script didn’t seem to correctly install the firmware on my system so I had to work around it:

mkdir -p /usr/share/foo2xqx/firmware/
mv sihp* /usr/share/foo2xqx/firmware/

If you want to test without having to use cups, try the following (after turning your printer off and on again):

foo2xqx-wrapper testpage.ps > testpage.xq # convert to wire-format
cat /usr/share/foo2xqx/firmware/sihpP1005.dl > /dev/usb/lp0 # bang firmware over to printer (wait 5 sec for it to install)
cp testpage.xq /dev/usb/lp0 # print the page (hopefully)

Depending on printer you may need to use foo2zjs-wrapper instead of foo2xqx-wrapper.

Then you have proved it works so you can set it up in cups nice and easily and use it as a remote printer!

A Facebook Share component for AngularJS

There are several facebook share components available for AngularJS however I was needing something that could:

  • Show a count of the number of shares
  • Work within the translation infrastructure of my app (ie a custom template)
  • Handle different URL mappings for pages – whilst a typical angular url might be http://app.com/#/path/… facebook can’t scrape this unless you use the #! method or somesuch. Because of this typically on the server we map to different links such as http://app.com/landing/path/…

The below code snippets work together to do all of this in a few lines:

angularApp.directive("ngFbShare", function($http, $rootScope) {
    return {
        restrict: 'A',
        templateUrl: 'views/components/fbshare.html',
        scope: {
            ngFbShare: "@",
        },
        link: function(scope, elem, attrs) {
            var link; // always the latest link relative to the base of the server. Use this for click tracking etc
            var update_share_link = function() {
                link = '/landing/' + scope.ngFbShare;
                scope.share_link = MY_FULL_ADDRESS + link;
            };
            update_share_link();
            scope.$watch('ngFbShare', update_share_link);

            scope.fb_total_count = 0;
            $http.get('http://api.facebook.com/method/links.getStats?format=json&urls=' + scope.share_link)
                .then(function(res) {
                    scope.fb_total_count = res.data[0].total_count;
                });                
        }       
    };      
});             

<a href="http://www.facebook.com/sharer/sharer.php?u={{ share_link }}" target="fb_share_window" class="btn btn-primary btn-fb">
    <i class="fa fa-fw fa-facebook"></i> Share
    <span class="count">{{ fb_total_count }}</span>
</a>