Maximizing rsync throughput in 2 easy commands

rsync is an excellent tool for linux to copy files between different systems. However, it doesn’t yet have the ability to run multiple copy processes in parallel which means that if you are limited by the speed you can read filesystem metadata (ie list the files), or you have a very fast network connection and lots of cores on both servers you can significantly speed up copying files by running processes in parallel. For example, one process can copy files at perhaps 50MB/sec, however with a 16-core server, 1gbps network connection and a fast SSD array you can copy data at 1GB/sec (gigabytes). Here’s how:

Firstly, you need to get ssh set up so you can connect between the machines without using a password. Even if you are copying between two remote systems and you use ssh-agent key forwarding (which I highly recommend), this can become a significant bottleneck so it’s best to do the following and generate a new key on the source system:

Hit enter when it prompts for a passphrase so that the key is generated without needing a password to open it. This will create two files, rsync which is your private key and which you want to add to your authorized keys on the remote host using something like:

You should then be able to ssh without needing a password by doing:

Next, we need to go to the remote host and allow lots of ssh sessions to be opened at once; open up /etc/ssh/sshd_config on remote_host and append or change these lines:

Now, on your source host run the following command to ensure that rsync uses the ssh key you just created:

Now for the good stuff – first we need to mirror the directory structure but not the files:

And now we can do the copy in parallel (you might need to install the parallel command using something like apt-get install parallel):

This will copy 30 files in parallel, using compression. Play with the exact number, but 1.5 times the number of cores in your boxes should be enough. You can monitor the disk bandwidth with iostat -mx or the network throughput with a tool like iptraf. One of those, or the CPU usage should now be saturated, and your copy should be going as fast as is physically possible. You can re-run this afterwards to synchronise even quicker than a normal rsync, however you won’t be able to use it to delete files.

Leave a Reply

Your email address will not be published. Required fields are marked *