Category Archives: perl

Better Database Search Functionality in 4 Simple Steps

As Google and other search engines are so good at predictive search these days users (ie me) get very frustrated at poor search results on websites or input boxes. Something I try to use across my apps is a decent auto-complete search interface however so many websites are very poor at this either matching only the first part of the string or matching any part of the substring. Additionally sometimes they don’t handle differences in case properly, certainly they don’t usually work with different accenting marks or order particularly well (eg search for beyoglu doesn’t usually return results like Beyoğlu). So, here follows a simple suggestion and code design pattern about how to implement this properly in PostgreSQL (Also works in MySQL although the regex matching code is slightly different). You can then have great instant typeahead functionality for example using the excellent AngularJS Bootstrap Typeahead input. I’ve implemented this in Perl/DBIC but it is a pattern that can be easily applied to any language such as Ruby/Rails or NodeJS.

Whilst there are a number of different search options out there that can plug into existing databases such as ElasticSearch, Sphinx or MySQL/Postgres fulltext search these are often fiddly to set up and are more intended for natural fulltext than for simple phrases or keywords which is what I generally aim for. The below method is pretty quick and easy to set up and allows you full control over the rebuilds, stemming, common word removal etc which is especially important for multi-lingual sites. You can also easily switch between database servers without having to totally redo your search functionality using this method.

Step 1: Add Column to Database Tables

Firstly, for any table you wish to search create a searchdata column probably varchar, with the maximum length of the data you’ll want to be searching (eg article title, author etc combined). For example:

Step 2: Create Search Query Normalization Code

Then in your code create two routines to normalize any search text. Here is a (perl) example from my code:

The first function is purely for normalizing the search terms (firstly stripping accents using the excellent Text::Unidecode module, then killing any non-alphanumeric chars, ensuring only one space between words and no spaces beginning or end of the text), the latter function does the same but also removes any common words you don’t want indexed.

Step 3: Set Columns to Auto Update in Your ORM

In your ORM base-class (you are using an Object-Relational Mapper rather than plain SQL right?) create some functions to handle the auto-population of these fields when the rows get updated by your code. For Perl’s DBIx::Class users here’s the code you inject into your DBIC Result base class. The first function, _get_searchdata is the key one that takes a specified list of columns, normalizes them and returns the searchdata field. The other functions are for the manual refresh of the search data in the row, automatically updating search data on update and create respectively:

In any of your tables where you have added a searchdata column create a method that just returns what columns you want to add to searchdata:

Step 4: Search Queries and Ordering

Whenever a row is added or updated now you’ll have the normalized search text added (see below for a script to auto-populate if you have existing data). To do nice searches you can now execute the following SQL (for MySQL replace ~ with REGEXP operator):

This will match the text anywhere. If you want to only match words beginning with this you can use PostgreSQL’s zero-width start-of-word \m operator (in normal regexp language this would be roughly equivalent to \b although that matches beginning and end of words):

If you want to order results whereby those with beginning-of-string matches go first, then the rest are alphabetical you can do something like (note the !~ as false orders before true):

Well that’s a job well done! You can look at using some sort of index in the database to speed this up but to be honest for tables with less than 10k rows that’s probably not worth while. You’ll need to look at the trie type indexes that Postgres has, I don’t believe MySQL is able to index these sorts of searches.

The DBIC code for this last one:

Extra Step: Create a Reindex Script

You’ll also want to write some code to find any tables with searchdata and update them for initial population. Here’s the perl/dbic solution for this again but should be simple enough with any ORM (note the transaction commit per 100 updates which significantly improves performance on all database servers):

Facebook Graph API Page post changes

So about a month back it looks like facebook changed their graph API to prevent posting links to pages using the method we had always used which was simply a post to //feed with my access token with message and link parameters. Posting just a message was working fine still but when I tried to add a link in I was just getting access denied.

After spending an hour or two bashing my head against the wall I discovered that you had to first access a list of all your pages with your user access token, then from that you would figure out the page’s special access token, and only then could you post.

So the resulting (somewhat messy) perl code is like:

crypt() function potential insecurities with invalid salts

So, yesterday I discovered quite a serious vulnerability in an application using the crypt() function (it was in perl, but perl just calls through to the system library so this application design flaw may be found in anything using the crypt() function for authentication).

Firstly why use crypt() at all? It’s well known that the DES-style crypt is very weak, for example the salt is only 2 characters and it only takes the first 8 characters of the password and ignores everything after that. However the modern glibc implementation of crypt() include a number of very secure hashing functions prefixed with $, particularly $6$ which is hashed SHA-512 and I’d advise everyone to use.

Anyway, back to the issue at hand. Your standard crypt() password check would look like this (taken from perl’s Catalyst::Authentication::Credential::Password module):

ie you use as the salt the encrypted version of the password to encrypt the user specified password, and then check that against the crypted password itself. If they match it means the password was the same.

However in this application for certain pre-created accounts or accounts that had only been logged in to using an oauth mechanism (eg facebook login) the user’s password field was an empty string. This seems reasonable enough – crypt() of a blank password should always return something non-blank (unless you’re using an older version of mysql that has a crypt() inconsistency that I reported 2 years ago). eg

Unless that is, that you specify an invalid (or blank) salt:

D’oh. This basically means that if you are using a blank password column to specify no password login allowed in reality someone can log in with ANY password! So in the case of the app in question if you knew the registered email address of someone who had a precreated (but locked out) account, or the address that someone who logged in using oauth you could do a login with any password. I’m guessing that because this is not particularly widely known amongst developers there are probably a number of apps where this is possible today but no-one has tested it.

Some workarounds/mitigations:

  • Set your database password field to default to something that is non-blank (eg ‘a’) – even if crypt() classes the salt as invalid it will return blank which will not match this field.
  • Use something that overrides crypt() to auto-generate a salt if none is specified (eg Crypt::Password::Util’s crypt function). This won’t cover you on the case where the specified salt is invalid and so crypt() just returns a blank.
  • Assert that the user’s password field is not blank before allowing login (but you need to make sure you do this everywhere in your application).
  • For language designers: Either make sure that your crypt() function doesn’t ever return blank (throws an error on invalid output for example), OR that it automatically generates a valid salt regardless of whether or not a salt is specified (including the case where a salt is specified but is invalid).