Category Archives: perl

Ripping unfinalized DVDs from Linux

Recently we had some DVD’s with old family videos that had been recorded directly but never finalized, the device that recorded them then broke so it was impossible to finalize them, and it seemed to be pretty much impossible for anything else to read them. So, I figured out a way to recover them using linux…

First I installed the dvd+rw-tools package on ubuntu and used that to get various info and prove that the DVD itself was readable even though nothing on the system could see it as a video disk or filesystem.

$ dvd+rw-mediainfo /dev/sr0 
INQUIRY:                [HL-DT-ST][DVDRAM GTB0N    ][1.00]
 Mounted Media:         11h, DVD-R Sequential
 Media ID:              CMC MAG. AE1
 Current Write Speed:   8.0x1385=11080KB/s
 Write Speed #0:        8.0x1385=11080KB/s
 Write Speed #1:        4.0x1385=5540KB/s
 Speed Descriptor#0:    00/0 R@8.0x1385=11080KB/s W@8.0x1385=11080KB/s
 Speed Descriptor#1:    00/0 R@8.0x1385=11080KB/s W@4.0x1385=5540KB/s
 Media Book Type:       00h, DVD-ROM book [revision 0]
 Legacy lead-out at:    2298496*2KB=4707319808
 Media Book Type:       25h, DVD-R book [revision 5]
 Last border-out at:    8390653*2KB=17184057344
 Disc status:           appendable
 Number of Sessions:    1
 State of Last Session: incomplete
 "Next" Track:          1
 Number of Tracks:      19
 Track State:           reserved
 Track Start Address:   0*2KB
 Next Writable Address: 0*2KB
 Free Blocks:           2544*2KB
 Track Size:            2544*2KB
 Track State:           complete incremental
 Track Start Address:   2560*2KB
 Free Blocks:           0*2KB
 Track Size:            240*2KB
 Last Recorded Address: 2591*2KB

I then pulled the entire DVD into a file on the local computer for easier processing later. The command will produce lots of errors (as there are parts of the DVD that are not readable as they were never written to), but the output file (image.iso) will contain a full dump of the DVD eventually

dd if=/dev/sr0 of=image.iso bs=2048 conv=noerror,notrunc iflag=nonblock

I then put together a short perl script to search through this file for 1kb blocks beginning with the magic tag DVDVIDEO – these seemed to be the starts of individual chapters, which avconv (also called ffmpeg on some distributions) can then extract into proper video/audio.

use v5.16;
use strict;
use warnings;
my $off = 0;
my $file = $ARGV[0];
open my $fh, '<:bytes', $file or die;
my $buf;
my @pos;

# Search through each block for one beginning with the header text and store these in array of offsets - I think it's one for each track
while( my $len = read $fh, $buf, 1024 ) {
        die if $len != 1024;

        if( $buf =~ /^DVDVIDEO/ ) {
                push @pos, $off;
push @pos, $off;

my $chap = 0;
for( my $i = 0; $i < @pos - 1; $i++ ) {
        my $length = $pos[$i+1] - $pos[$i];
        next if $length < 1000;
        say "dd if=$file bs=1024 skip=$pos[$i] count=$length | avconv -i - -acodec copy -vcodec copy out$chap.mp4";

Save that as and then run it to extract them as files named like out0.mp4. Note that it won’t process tracks that are less than about 1Mb because there seemed to be a number of small sections like this which we wanted to skip over.

perl image.iso

Job done! Note that there are probably many different formats and layouts for unfinalized DVD’s, this may be just one of many but hopefully the principle remains the same.

Uploading videos to YouTube with perl

As part of a perl project recently I needed to generate a load of videos (using the excellent MLT framework) and upload them to YouTube. As the bulk of the project was already in perl, I needed to write a YouTube upload module for it. Here’s a rough overview of how to authenticate and upload a video (chunked rather than loading it all into memory at once) with perl. Note as well it was quite a struggle to get long-term auth tokens from the Google services – one slight mistake with a parameter and they only give temporary grants which last for an hour or two rather than indefinitely.

package Youtube::Upload;
use Moo;

use LWP::Authen::OAuth2;
use Path::Tiny 'path';
use URI::QueryParam;
use JSON::XS qw< encode_json decode_json >;
use HTTP::Message;

# API described at

has auth_id => is => 'ro', required => 1;
has auth_secret => is => 'ro', required => 1;
has redirect_uri => is => 'ro', required => 1;

# If you havn't used this for a while, blank these and re-run and you'll
# probably need to do some auth against google.
has auth_code => is => 'ro';
has auth_token => is => 'ro';

has auth => is => 'lazy', builder => \&_google_auth;

sub upload {
    my ($self, $details, $youtube_code, $video_file) = @_;

    die "No id to update, but also nothing to upload" if !$youtube_code && !$video_file;

    my %body = %$details;

    # Allow all embedding
    $body{status}{embeddable} = JSON::XS::true;

    my $magic_split = 'BlA123H123BLAH'; # A unique string...

    my $is_new = !defined $youtube_code;
    my ($content, %headers, $uri);
    if( !$is_new ) {
        $body{id} = $youtube_code;

        $content = encode_json(\%body);
        $headers{'Content-Type'} = 'application/json';
        $uri = URI->new( '' );
    } else {
        my $msg = HTTP::Message->new([
            'Content-Type' => 'multipart/related',

                'Content-Type' => 'application/json',
            ], encode_json(\%body) )

        my $video_msg = 
                    'Content-Type' => 'video/*',
        $msg->add_part( $video_msg );
        $content = $msg->as_string;
        (my $head, $content) = split /\r?\n\r?\n/, $content, 2;
        my ($k, $v) = split /:\s*/, $head, 2;
        $headers{$k} = $v;
        $uri = URI->new( '' );

    delete $body{id};
        part => join(',', keys %body), 

    my $res;
    if( $is_new ) {
        my @content = split /\Q$magic_split/, $content;
        die "Magic split failed" if @content != 2;

        my $content_fh = path($video_file)->openr;
        my $request = HTTP::Request->new( 'POST', $uri, HTTP::Headers->new( %headers ), sub {
            #warn "chunk uploaded";
            return '' if !@content;

            if( @content > 1 ) {
                return shift @content;
            } else {
                my $read = read $content_fh, my $data, 1_000_000;
                if( !$read ) {
                    return shift @content;
                return $data;
        } );
        $res = $self->auth->request( $request );
    } else {
        $res = $self->auth->put( $uri,
            Content => $content

    my $cont = $res->decoded_content;
    my $ret;
    if( !$res->is_success ) {
        if($res->code != 403) {   # not our video
            die "Response not success: $cont for " . ( $youtube_code || '' );
    } else {
        $ret = decode_json $cont;

    return ref($ret) ? $ret : {};

sub _google_auth {
    my ($self) = @_;

    my $auth = LWP::Authen::OAuth2->new(
        service_provider => 'Google',
        redirect_uri => $self->redirect_uri,
        client_type => "web server",

        client_id      => $self->auth_id,
        client_secret  => $self->auth_secret,

        save_tokens => sub {
            say "Save token string: $_[0]" if !$self->auth_token;

        token_string => $self->auth_token,

    # For debug:
    #$auth->user_agent->add_handler("request_send",  sub { shift->dump(maxlength => 10_000); return });
    #$auth->user_agent->add_handler("response_done", sub { shift->dump(maxlength => 10_000); return });

    if( !$self->auth_code ) {
        say $auth->authorization_url(
            scope=> '',
            # Need these two to get a refresh token
            approval_prompt => 'force',
            access_type => 'offline',

    $auth->request_tokens( code => $self->auth_code ) if !$self->auth_token;

    return $auth;

As per this bug report you need to hack LWP::Authen::OAuth2::AccessToken::Bearer to enable the chunked uploads to work, otherwise it throws an error.

The auth_id and auth_secret parameters are given by the google code console when you sign up for YouTube API access, and the redirect_uri should be where the web app would redirect to after displaying the Google oauth permission grant screen. The first time you run the script, you’ll need to save the auth_code / auth_token parts and use them in future when calling. Even though it should grant a new auth token each time the script is run you still need to present the original one from the look of things.

Debugging perl memory leaks

In perl you hardly ever have to think about memory management. It has a great garbage collector, and aside from the issue of it not usually passing free memory back to the OS (but rather using it for new objects) it doesn’t have much in the way of memory issues. One exception however is if you are creating objects with circular references and you forgot to weaken one of them meaning that there is a dependency cycle and so the memory cannot be garbage collected. Fortunately weakening is very easy with Moo or Moose

has back_ref => is => 'rw', isa => Object, weak_ref => 1;

If you start seeing memory leaks, it’s really simple to debug; just use the excellent Devel::Leak::Object at the top of your perl code:

use Devel::Leak::Object qw{ GLOBAL_bless };

Then, when your script exits it will print a list of all non-garbage collected objects still in existence. Look for a few with a high value and that’s where your memory leak is. Easy!

A source of random perl crashes

Recently we rolled out a new version of code to production and started observing a lot of segfaults being recorded to do with freeing memory:

Dec  5 14:11:06 XXX kernel: [268032.637417] perl[62948]: segfault at 7ffc3e26dff8 ip 00007f296a21f6ef sp 00007ffc3e26e000 error 6 in[7f296a1a3000+19f000]

Researching a bit further, I found that this was caused by some code which was meant to automatically save an object to the database when it was destroyed, if it had not already been saved:

# Catch objects being destroyed that have not been correctly flushed

Researching further, I discovered that the issue is to do with the way perl handles global destruction. During a normal object destroy phase this code will work just fine; however when the process is exiting and objects are being destroyed you don’t know what order they will be destroyed in. In this case I assume that the database object that we tried sending the object to had already been DESTROY/garbage collected which is what was causing the crash. Fortunately there’s quite a simple work-around:

# Catch objects being destroyed that have not been correctly flushed.
  if( ${^GLOBAL_PHASE} eq 'DESTRUCT' ) {
    warn "Unflushed objects cannot be flushed when process is exiting";
  } else {

If you want to support perl < 5.14 you should use the Devel::GlobalDestruction module. Moo provides this more easily via the DEMOLISH subroutine:

# Catch objects being destroyed that have not been correctly flushed.
  my ($self, $in_destruction) = @_;
  if( $in_destruction ) {
    warn "Unflushed objects cannot be flushed when process is exiting";
  } else {

We had another similar issue when using Log4perl – if you try to access it from within the DESTROY scope the logger object may already have been destroyed and the process baloons in memory until it is OOM’d. Again just a check if you are in global destruction from within your logging function will handle this just fine.

Easily dumping sample data from any database

Following on from my previous post on the wonders of perl’s DBIx::Class::Schema::Loader, after you have dumped the schema using either the perl scripts provided or the dbicdump commandline tool, you can easily dump the first few rows of each table in JSON using the following very simple perl script:

use Parts::Schema;
use JSON;
my $dbic = Parts::Schema->connect('dbi:Pg:...', USER, PASSWORD);
my %data;
for my $source ($dbic->sources) {
    my $rs =
    $data{$source} = [
        map +{ $_->get_inflated_columns }, $dbic->resultset($source)->search({}, { rows => 10 })
print encode_json( \%data );

To dump the whole table simply remove that rows => 10 bit (although as it loads each table into memory and keeps it there before dumping I wouldn’t advise this!)

Alternatively for a NoSQL database like MongoDB:

use MongoDB;
use JSON;
my $db =MongoDB::MongoClient->new(XXX)->get_database(XXX);
my %data;
$data{$_} = [ $db->get_collection($_)->query->limit(10)->all ] for $db->collection_names;
print encode_json( \%data )

Perl’s killer feature – DBIx::Class::Schema::Loader

After I tell people I’m a programmer and they ask ‘which language’, they’re usually pretty shocked when I tell them that for server-side I use Perl out of preference. I guess most people have seen some really bad scripting in perl patched together by successive non-developers at some point in their life and assume that’s what I mean, but that’s a story for another day.

Anyway, one of the main reasons I use perl as my language of choice is because of the excellent ORM, DBIx::Class which allows you to do a lot of complex database queries and joins without having to touch SQL. Fine, you can get a decent Object-Relational Mapper (ORM) in many different languages – PHP, ruby, python, node.js for example. DBIx::Class does seem to have a lot more databases supported than your typical backend though. However the killer feature is not DBIx::Class but a separate module, DBIx::Class::Schema::Loader.

Here’s the beauty of DBIx::Class::Schema::Loader – with a typical ORM you need to define every table’s schema and relationships in code before you can start using it fully, however this module goes into the database and pulls out all of the tables, indexes, relationships and creates the full set of classes which you are then free to edit or update as you wish. The next time you run it it will pull in any database schema updates and keep the custom stuff you wrote. I’m yet to see this functionality in any other language – please point it out to me if it exists.

Now I know at this point there are many devs who like to define a database schema in code and then push it to the database so you can have versioning etc (which by the way these modules also support if you want to use them that way round). Whilst that might be OK for a new project, it’s a nightmare for an existing database or project – just think about how many days work would it be to build up ORM models of 100 tables manually. Also in my experience writing schemas in Ruby, Node.JS or even DBIx::Class is pretty nasty and repetitive – you can do a lot better using SQL CREATE TABLE statements. Another issue is that it’s very difficult to define indexes properly – perhaps you can define an index using the ORM schema definition language, but I’d wager most ORMs don’t know much about how to create a partial or functional index for example in postgres you can easily specify complicated indecies for example:

CREATE INDEX people_names ON people ((first_name || ' ' || last_name));


CREATE INDEX access_log_client_ip_ix ON access_log (client_ip)
WHERE NOT (client_ip > inet '' AND
           client_ip < inet '');

Certainly once you go into the realms of complex geo-spatial data-types it’s impossible not to touch the database manually no matter what language or ORM you use.

An example of using DBIx::Class::Schema::Loader to connect to a database and dump all of the schema we simply have the following as a library (under dbic_lib/Parts/

package Parts::Schema;
use v5.14
use base 'DBIx::Class::Schema::Loader';

our $conn_info = [ 'dbi:Pg:dbname=...', $USER, $PASS ];

sub connect { shift->SUPER::connect( @$conn_info ) }


And as the dumper script:

use v5.14;
use FindBin::libs qw< base=dbic_lib use=1 >;
use Parts::Schema;

Parts::Schema->loader_options({ dump_directory => "lib" });

That was easy wasn’t it? This works whether you have 2 tables or 200. So far I’ve used it with MySQL, PostgreSQL, MSSQL, Oracle, SQLite and it works brilliantly in all of these thanks to the excellent driver support in DBI which all of this is based on.

In a previous job we had an MSSQL data-warehouse that we wanted to connect to and pull out various views that defined customer lists so could email them. We used this module to refresh the schema dump on demand, reload it into the app and allow people using just a web browser to choose a view from the data warehouse, define columns or search criteria put these into a template (also created by them on the frontend) and email out. All this in about 30 lines of perl code for the database interface.

As another example just this week I was given the task of figuring out a client’s database structure with over 150 tables only about 5 of which even had a primary key. First thing – whip out the code above and get a full dump of all the tables into a perl class structure (by the way it was an oracle database – I’ve never used that brand of SQL before, wouldn’t know what command-line linux client to use and it’s a bit strange it doesn’t even have LIMIT – not a problem for DBIx::Class though). Within about 5 minutes I had a full dump of all tables including indexes, column types (and relationships if any had been defined). Within about 10 more minutes I could write a script to dump complex data based on relationships between the tables into a JSON format.

Please show me another language that has such great database-agnostic ORM support.

Update: DBIx::Class::Schema::Loader comes with a tool called dbicdump. If you just want to take a simple dump of the schema of the database above rather than writing the dump script yourself you can just run:

dbicdump Parts::Schema 'dbi:Pg:dbname=...' USER PASS

Better Database Search Functionality in 4 Simple Steps

As Google and other search engines are so good at predictive search these days users (ie me) get very frustrated at poor search results on websites or input boxes. Something I try to use across my apps is a decent auto-complete search interface however so many websites are very poor at this either matching only the first part of the string or matching any part of the substring. Additionally sometimes they don’t handle differences in case properly, certainly they don’t usually work with different accenting marks or order particularly well (eg search for beyoglu doesn’t usually return results like Beyoğlu). So, here follows a simple suggestion and code design pattern about how to implement this properly in PostgreSQL (Also works in MySQL although the regex matching code is slightly different). You can then have great instant typeahead functionality for example using the excellent AngularJS Bootstrap Typeahead input. I’ve implemented this in Perl/DBIC but it is a pattern that can be easily applied to any language such as Ruby/Rails or NodeJS.

Whilst there are a number of different search options out there that can plug into existing databases such as ElasticSearch, Sphinx or MySQL/Postgres fulltext search these are often fiddly to set up and are more intended for natural fulltext than for simple phrases or keywords which is what I generally aim for. The below method is pretty quick and easy to set up and allows you full control over the rebuilds, stemming, common word removal etc which is especially important for multi-lingual sites. You can also easily switch between database servers without having to totally redo your search functionality using this method.

Step 1: Add Column to Database Tables

Firstly, for any table you wish to search create a searchdata column probably varchar, with the maximum length of the data you’ll want to be searching (eg article title, author etc combined). For example:

alter table article add searchdata varchar(255) not null default '';

Step 2: Create Search Query Normalization Code

Then in your code create two routines to normalize any search text. Here is a (perl) example from my code:

package OradanOraya::Search;
use strict;
use warnings;
use utf8;
use Text::Unidecode 'unidecode';

sub to_search_no_strip {
    my ($self, $txt) = @_;
    $txt = lc unidecode($txt);
    $txt =~ s/[^a-zA-Z0-9 ]/ /g;  # kill newlines or space runs
    $txt =~ s/\s+/ /g;  # kill newlines or space runs
    $txt =~ s/^\s+|\s+$//;
    return $txt;
sub to_search {
    my ($self, $txt) = @_;
    $txt = $self->to_search_no_strip($txt);

    # common words to strip out
    $txt =~ s/\b(?: bolum | hastane | doktor | doctor | doc | dr )\S*//xg;

    return $txt;

The first function is purely for normalizing the search terms (firstly stripping accents using the excellent Text::Unidecode module, then killing any non-alphanumeric chars, ensuring only one space between words and no spaces beginning or end of the text), the latter function does the same but also removes any common words you don’t want indexed.

Step 3: Set Columns to Auto Update in Your ORM

In your ORM base-class (you are using an Object-Relational Mapper rather than plain SQL right?) create some functions to handle the auto-population of these fields when the rows get updated by your code. For Perl’s DBIx::Class users here’s the code you inject into your DBIC Result base class. The first function, _get_searchdata is the key one that takes a specified list of columns, normalizes them and returns the searchdata field. The other functions are for the manual refresh of the search data in the row, automatically updating search data on update and create respectively:

sub _get_searchdata {
    my ($self) = @_;

    return My::Search->to_search( join ' ', map { $self->$_ || '' } $self->searchdata_columns )

sub refresh_searchdata {
    my ($self) = @_;
        searchdata => $self->_get_searchdata

sub set_column {
    my $self = shift;

    my $ret = $self->next::method( @_ );

    if( $self->can('searchdata') ) {
        # Note that we call the super-class set_column update method rather than ourselves otherwise we'd have an infinite loop
        $self->next::method( 'searchdata', $self->_get_searchdata );

    return $ret;

sub insert {
    my $self = shift;

    if( $self->can('searchdata') ) {
        $self->searchdata( $self->_get_searchdata );

    return $self->next::method( @_ );

In any of your tables where you have added a searchdata column create a method that just returns what columns you want to add to searchdata:
sub searchdata_columns { qw< title name > }

Step 4: Search Queries and Ordering

Whenever a row is added or updated now you’ll have the normalized search text added (see below for a script to auto-populate if you have existing data). To do nice searches you can now execute the following SQL (for MySQL replace ~ with REGEXP operator):

select * from article where searchdata ~ 'foo'

This will match the text anywhere. If you want to only match words beginning with this you can use PostgreSQL’s zero-width start-of-word \m operator (in normal regexp language this would be roughly equivalent to \b although that matches beginning and end of words):
select * from article where searchdata ~ '\mfoo'

If you want to order results whereby those with beginning-of-string matches go first, then the rest are alphabetical you can do something like (note the !~ as false orders before true):
FROM article
WHERE searchdata ~ '\mfoo'
ORDER BY searchdata !~ '^foo', searchdata

Well that’s a job well done! You can look at using some sort of index in the database to speed this up but to be honest for tables with less than 10k rows that’s probably not worth while. You’ll need to look at the trie type indexes that Postgres has, I don’t believe MySQL is able to index these sorts of searches.

The DBIC code for this last one:

my $search_str = quotemeta($fn->to_search( $p->{search} ));
  searchdata => { '~' => '\m' . $search_str }
}, {
  order_by => [
         \[ 'searchdata !~ ?', [ str => '^' . $search_str ] ],

Extra Step: Create a Reindex Script

You’ll also want to write some code to find any tables with searchdata and update them for initial population. Here’s the perl/dbic solution for this again but should be simple enough with any ORM (note the transaction commit per 100 updates which significantly improves performance on all database servers):

my $count = 1;
for my $table ($dbic->sources) {
    my $rs = $dbic->resultset($table);

    next unless $rs->result_source->has_column('searchdata');

    while( my $d = $rs->next ) {
        if( $count++ % 100 == 0 ) {
            print ".";

Facebook Graph API Page post changes

So about a month back it looks like facebook changed their graph API to prevent posting links to pages using the method we had always used which was simply a post to //feed with my access token with message and link parameters. Posting just a message was working fine still but when I tried to add a link in I was just getting access denied.

After spending an hour or two bashing my head against the wall I discovered that you had to first access a list of all your pages with your user access token, then from that you would figure out the page’s special access token, and only then could you post.

So the resulting (somewhat messy) perl code is like:

my $FB_GRAPH_BASE = '';
my $m = WWW::Mechanize->new;
my $res = $m->get( "$FB_GRAPH_BASE/me/accounts?access_token=$token" );
my $d = decode_json( $res->decoded_content )->{data};
my $page_token = (grep { $_->{id} eq $PAGE_ID } @$d)[0]->{access_token};

$res = $m->post( "$FB_GRAPH_BASE/$PAGE_ID/feed", {
    access_token => $page_token,
    message => $msg,
    link => $url,

crypt() function potential insecurities with invalid salts

So, yesterday I discovered quite a serious vulnerability in an application using the crypt() function (it was in perl, but perl just calls through to the system library so this application design flaw may be found in anything using the crypt() function for authentication).

Firstly why use crypt() at all? It’s well known that the DES-style crypt is very weak, for example the salt is only 2 characters and it only takes the first 8 characters of the password and ignores everything after that. However the modern glibc implementation of crypt() include a number of very secure hashing functions prefixed with $, particularly $6$ which is hashed SHA-512 and I’d advise everyone to use.

Anyway, back to the issue at hand. Your standard crypt() password check would look like this (taken from perl’s Catalyst::Authentication::Credential::Password module):

return $storedpassword eq crypt( $password, $storedpassword );

ie you use as the salt the encrypted version of the password to encrypt the user specified password, and then check that against the crypted password itself. If they match it means the password was the same.

However in this application for certain pre-created accounts or accounts that had only been logged in to using an oauth mechanism (eg facebook login) the user’s password field was an empty string. This seems reasonable enough – crypt() of a blank password should always return something non-blank (unless you’re using an older version of mysql that has a crypt() inconsistency that I reported 2 years ago). eg

$ perl -le 'print crypt("", "aa")'

Unless that is, that you specify an invalid (or blank) salt:
$ perl -le 'print crypt("any password", "a")'

$ perl -le 'print crypt("any password", "")'

D’oh. This basically means that if you are using a blank password column to specify no password login allowed in reality someone can log in with ANY password! So in the case of the app in question if you knew the registered email address of someone who had a precreated (but locked out) account, or the address that someone who logged in using oauth you could do a login with any password. I’m guessing that because this is not particularly widely known amongst developers there are probably a number of apps where this is possible today but no-one has tested it.

Some workarounds/mitigations:

  • Set your database password field to default to something that is non-blank (eg ‘a’) – even if crypt() classes the salt as invalid it will return blank which will not match this field.
  • Use something that overrides crypt() to auto-generate a salt if none is specified (eg Crypt::Password::Util’s crypt function). This won’t cover you on the case where the specified salt is invalid and so crypt() just returns a blank.
  • Assert that the user’s password field is not blank before allowing login (but you need to make sure you do this everywhere in your application).
  • For language designers: Either make sure that your crypt() function doesn’t ever return blank (throws an error on invalid output for example), OR that it automatically generates a valid salt regardless of whether or not a salt is specified (including the case where a salt is specified but is invalid).