Category Archives: perl

Uploading videos to YouTube with perl

As part of a perl project recently I needed to generate a load of videos (using the excellent MLT framework) and upload them to YouTube. As the bulk of the project was already in perl, I needed to write a YouTube upload module for it. Here’s a rough overview of how to authenticate and upload a video (chunked rather than loading it all into memory at once) with perl. Note as well it was quite a struggle to get long-term auth tokens from the Google services – one slight mistake with a parameter and they only give temporary grants which last for an hour or two rather than indefinitely.

As per this bug report you need to hack LWP::Authen::OAuth2::AccessToken::Bearer to enable the chunked uploads to work, otherwise it throws an error.

The auth_id and auth_secret parameters are given by the google code console when you sign up for YouTube API access, and the redirect_uri should be where the web app would redirect to after displaying the Google oauth permission grant screen. The first time you run the script, you’ll need to save the auth_code / auth_token parts and use them in future when calling. Even though it should grant a new auth token each time the script is run you still need to present the original one from the look of things.

Debugging perl memory leaks

In perl you hardly ever have to think about memory management. It has a great garbage collector, and aside from the issue of it not usually passing free memory back to the OS (but rather using it for new objects) it doesn’t have much in the way of memory issues. One exception however is if you are creating objects with circular references and you forgot to weaken one of them meaning that there is a dependency cycle and so the memory cannot be garbage collected. Fortunately weakening is very easy with Moo or Moose

If you start seeing memory leaks, it’s really simple to debug; just use the excellent Devel::Leak::Object at the top of your perl code:

Then, when your script exits it will print a list of all non-garbage collected objects still in existence. Look for a few with a high value and that’s where your memory leak is. Easy!

A source of random perl crashes

Recently we rolled out a new version of code to production and started observing a lot of segfaults being recorded to do with freeing memory:

Researching a bit further, I found that this was caused by some code which was meant to automatically save an object to the database when it was destroyed, if it had not already been saved:

Researching further, I discovered that the issue is to do with the way perl handles global destruction. During a normal object destroy phase this code will work just fine; however when the process is exiting and objects are being destroyed you don’t know what order they will be destroyed in. In this case I assume that the database object that we tried sending the object to had already been DESTROY/garbage collected which is what was causing the crash. Fortunately there’s quite a simple work-around:

If you want to support perl < 5.14 you should use the Devel::GlobalDestruction module. Moo provides this more easily via the DEMOLISH subroutine:

We had another similar issue when using Log4perl – if you try to access it from within the DESTROY scope the logger object may already have been destroyed and the process baloons in memory until it is OOM’d. Again just a check if you are in global destruction from within your logging function will handle this just fine.

Easily dumping sample data from any database

Following on from my previous post on the wonders of perl’s DBIx::Class::Schema::Loader, after you have dumped the schema using either the perl scripts provided or the dbicdump commandline tool, you can easily dump the first few rows of each table in JSON using the following very simple perl script:

To dump the whole table simply remove that rows => 10 bit (although as it loads each table into memory and keeps it there before dumping I wouldn’t advise this!)

Alternatively for a NoSQL database like MongoDB:

Perl’s killer feature – DBIx::Class::Schema::Loader

After I tell people I’m a programmer and they ask ‘which language’, they’re usually pretty shocked when I tell them that for server-side I use Perl out of preference. I guess most people have seen some really bad scripting in perl patched together by successive non-developers at some point in their life and assume that’s what I mean, but that’s a story for another day.

Anyway, one of the main reasons I use perl as my language of choice is because of the excellent ORM, DBIx::Class which allows you to do a lot of complex database queries and joins without having to touch SQL. Fine, you can get a decent Object-Relational Mapper (ORM) in many different languages – PHP, ruby, python, node.js for example. DBIx::Class does seem to have a lot more databases supported than your typical backend though. However the killer feature is not DBIx::Class but a separate module, DBIx::Class::Schema::Loader.

Here’s the beauty of DBIx::Class::Schema::Loader – with a typical ORM you need to define every table’s schema and relationships in code before you can start using it fully, however this module goes into the database and pulls out all of the tables, indexes, relationships and creates the full set of classes which you are then free to edit or update as you wish. The next time you run it it will pull in any database schema updates and keep the custom stuff you wrote. I’m yet to see this functionality in any other language – please point it out to me if it exists.

Now I know at this point there are many devs who like to define a database schema in code and then push it to the database so you can have versioning etc (which by the way these modules also support if you want to use them that way round). Whilst that might be OK for a new project, it’s a nightmare for an existing database or project – just think about how many days work would it be to build up ORM models of 100 tables manually. Also in my experience writing schemas in Ruby, Node.JS or even DBIx::Class is pretty nasty and repetitive – you can do a lot better using SQL CREATE TABLE statements. Another issue is that it’s very difficult to define indexes properly – perhaps you can define an index using the ORM schema definition language, but I’d wager most ORMs don’t know much about how to create a partial or functional index for example in postgres you can easily specify complicated indecies for example:

or

Certainly once you go into the realms of complex geo-spatial data-types it’s impossible not to touch the database manually no matter what language or ORM you use.

An example of using DBIx::Class::Schema::Loader to connect to a database and dump all of the schema we simply have the following as a library (under dbic_lib/Parts/Schema.pm):

And as the dumper script:

That was easy wasn’t it? This works whether you have 2 tables or 200. So far I’ve used it with MySQL, PostgreSQL, MSSQL, Oracle, SQLite and it works brilliantly in all of these thanks to the excellent driver support in DBI which all of this is based on.

In a previous job we had an MSSQL data-warehouse that we wanted to connect to and pull out various views that defined customer lists so could email them. We used this module to refresh the schema dump on demand, reload it into the app and allow people using just a web browser to choose a view from the data warehouse, define columns or search criteria put these into a template (also created by them on the frontend) and email out. All this in about 30 lines of perl code for the database interface.

As another example just this week I was given the task of figuring out a client’s database structure with over 150 tables only about 5 of which even had a primary key. First thing – whip out the code above and get a full dump of all the tables into a perl class structure (by the way it was an oracle database – I’ve never used that brand of SQL before, wouldn’t know what command-line linux client to use and it’s a bit strange it doesn’t even have LIMIT – not a problem for DBIx::Class though). Within about 5 minutes I had a full dump of all tables including indexes, column types (and relationships if any had been defined). Within about 10 more minutes I could write a script to dump complex data based on relationships between the tables into a JSON format.

Please show me another language that has such great database-agnostic ORM support.

Update: DBIx::Class::Schema::Loader comes with a tool called dbicdump. If you just want to take a simple dump of the schema of the database above rather than writing the dump script yourself you can just run: