Automatically creating videos from pictures, music and subtitles

So for one of my projects we have a number of albums and individual songs which we want to upload to youtube as many people use this to listen to music these days. We also want to create a separate collection of videos that have the song words (Think hard-burning subtitles into a video). Obviously you can do this in video editing software but it would be nice to be able to tweak all the videos afterwards without having to do much work.

Initially I tried using avconv/mencoder to generate videos based on the pictures using the following code – generate the picture/music as a video, apply subtitles and then finally apply the audio again but without reencoding it.

    avconv -loop 1 -y \
            -i bgimg.jpg \
            -i "$mp3" \
            -shortest \
            -c:v libx264 -tune stillimage -pix_fmt yuv420p \
            -c:a mp3 \

    # Apply subtitles
    mencoder -utf8 -ovc lavc -oac copy -o "$out" "$t" -sub "$sub"

    # Add in end track and overlay with mp3
    mencoder -audiofile "$mp3" -idx -ovc lavc -oac copy -o "final.avi" "$out" "$append"

Whilst this kind of works it’s got a number of downsides the big ones being 1) it isn’t flexible to eg add another picture/slide at the end, and 2) it reencodes the video/audio a number of times.

Then I remembered that the great kdenlive video editing software is actually just a frontend to the brilliant mlt framework. This is basically a library plus commandline programs to do all sorts of video mixing with live or rendered output.

Using the melt commandline program you can test and generate tracks without having to worry about the XML format that it typically uses for the more advanced options. The final commands:

melt color:black out=5614 \
  t.jpg out=250 \
  -track \
    cdimage.jpg out=5614 \
  -transition composite geometry=0,0:100%x70% halign=1 \
  -consumer xml:basic.mlt

melt basic.mlt
  -filter watermark:subtitles.mpl \
    composite.valign=b composite.halign=c producer.align=centre \
  -audio-track audio.mp3

If you want to do the video output you can add the following onto the last command:

-consumer avformat \
  target=out.mpg \
  mlt_profile=hdv_720_25p f=mpeg acodec=mp2 ab=96k vcodec=mpeg2video vb=1000k

Lets go through this a line at a time:

melt color:black out=5614

Generate black background for 5614 frames

  t.jpg out=250

Followed by t.jpg for 250 frames

    cdimage.jpg out=5614

Generate a new track which is the cd image for the same length as the black track

  -transition composite geometry=0,0:100%x70% halign=1

Mix the two tracks so that the second one (ie the cd image) is 70% of the screen height and centered horizontally to the top.

  -consumer xml:basic.mlt

Output to an xml file (in order to apply subtitles to the whole thing we need to do this intermediary stage)

melt basic.mlt

Start with the mixed video sequence defined in the xml file (which is just instructions, not a staged render)

  -filter watermark:subtitles.mpl
    composite.valign=b composite.halign=c producer.align=centre

Apply the watermark filter with a subtitle mpl file, align to the bottom centered (it will auto scale extra wide lines to be the width of the video). A MPL file looks like this:


Where the first bit is the frame and the second bit is any text to be displayed. New lines demarcated with a tilde (~) character. Here is a simple perl script to convert a srt format subtitle file into this mpl format:

use strict;
use warnings;
use Path::Tiny 'path';

my ($fps, $in) = @ARGV or die;
$in = (path $in)->slurp;
$in =~ s/\r//g;
my @parts = split /\n\n/, $in;
for my $part (@parts) {
    #print "$part\n\n";
    $part =~ s/^ \D* \d+ \n
        ([\d:,]+) \s --> \s ([\d:,]+) \n
    my ($start, $end) = ($1, $2);
    for( $start, $end ) {
        my ($h,$m,$s,$part_s) = split /[:.,]/;
        $_ = int( ( ( $h * 60 + $m ) * 60 + $s + $part_s / 1000 ) * $fps );
    $part =~ s/\n/~/g;
    print "$start=$part\n",


Back to the melt commandline:

  -audio-track audio.mp3

Overlay the audio track

For the non-test output commandline parts:

-consumer avformat target=out.mpg

Output using libav

  mlt_profile=hdv_720_25p f=mpeg acodec=mp2 ab=96k vcodec=mpeg2video vb=1000k

Set the profile to be 25fps 720p hd video using mpeg, set audio rate 96kbps and video rate 1000kbps

Leave a Reply

Your email address will not be published. Required fields are marked *