Blog

How to upload your Gmvault backups to S3

Gmvault is a great tool by Guillaume Aubert that makes it easy to create a backup of your Gmail emails on your local disk. It is cross-platform (Windows, Mac OS and Linux), supports incremental backups and handles errors that occur during the backup gracefully.

I have been looking for a tool to take automatic backups of my Gmail account for a while and Gmvault fits the bill. What I also wanted was a way to store the backed up files safely on Amazon S3 so I could recover them even if my local disk failed. This functionality is not included in Gmvault but it was quite straight-forward to code.

The Plan

Gmvault lets you either create a complete backup of all your emails or an incremental backup of just the last two month's emails (called a "quick sync"). Since I wanted the backup to be taken automatically every week, I decided to do one complete backup manually, compress-encrypt-upload it to S3, and then set up a script that does a quick sync and compress-encrypt-uploads newly backed up emails to S3. This script is now located on a small Linux server of mine and called every Monday morning by a Cron job.

Encryption

The quickest and safest choice to encrypt a file seems to OpenPGP. GnuPG is a OpenPGP implementation that is available on most Linux installations. You can find a quick introduction on how to encrypt and decrypt a file using GnuPG here.

Lots of Files

Gmvault stores the backup in a folder named "gmvault-db" in your home directory. At first I planned to just sync the complete folder to S3 but when I saw it contained 190,000+ files after the first complete backup, I quickly gave up that plan. It would have taken far too long and created unnecessary costs (S3 bills you per request). Since the files are clustered by months (one folder per month) I then decided to compress each folder and upload the resulting files. This has the additional benefit that after a quick sync I only need to upload those folders that changed, most likely only the current and maybe the last month.

The Code

Without further ado, below you see what I came up with. Let me know if you have any questions, comments or improvements in the comments!

require 'fileutils'
require 'aws'

# Change these values according to your configuration
gmail_account = '[email protected]'   # Gmail gmail_account that should be synced
s3_bucket     = 'gmvault-backup'  # S3 bucket that the backups should be stored in
password      = 'secret'          # The password used to encrypt the files

# Configure AWS
AWS.config \
  access_key_id:     ENV['AWS_ACCESS_KEY_ID'],
  secret_access_key: ENV['AWS_SECRET_ACCESS_KEY']

# Load bucket
s3_bucket = AWS::S3.new.buckets[s3_bucket]

# Helper method to execute commands
def execute(command)
  system(command).tap do |result|
    puts %(Error running command "#{command}") unless result
  end
end

# Save start time before running Gmvault so we know which files have changed
start = Time.now

# Do a quick sync (only last two months) and redirect STDOUT to /dev/null
command = "gmvault sync --type quick #{gmail_account} > /dev/null"
# Quit if sync fails
exit unless execute(command)

gmvault_folder = File.join(Dir.home, 'gmvault-db')
db_folders     = Dir[File.join(gmvault_folder, 'db', '*')]

# Create "backup" folder inside Gmvault folder which will store the backup files
FileUtils.mkdir_p File.join(gmvault_folder, 'backup')

# Pick only folders that have been changed during sync
db_folders_to_upload = db_folders.sort.select do |db_folder|
  File.mtime(db_folder) > start
end

# Compress, encrypt and upload each folder
db_folders_to_upload.each do |db_folder|
  backup_filename           = [db_folder.split('/').last, 'tar.gz'].join('.')
  backup_file               = File.join(gmvault_folder, 'backup', backup_filename)
  encrypted_backup_filename = [backup_filename, 'gpg'].join('.')
  encrypted_backup_file     = [backup_file, 'gpg'].join('.')

  commands = [
    # Compress db folder (e.g. ~/gmvault-db/db/2012-06) into single file (e.g. ~/gmvault-db/backup/2012-06.tar.gz)
    "tar cvzfP #{backup_file} #{db_folder} > /dev/null",

    # Encrypt the created file with the password
    "echo '#{password}' | gpg --symmetric --batch --yes --passphrase-fd 0 --output #{encrypted_backup_file} #{backup_file}"
  ]

  # Skip to next folder if any command fails
  next unless commands.all?(&method(:execute))

  # Upload to S3
  s3_bucket.objects[encrypted_backup_filename].write(file: encrypted_backup_file)
end

Discuss this post on Hacker News

Ideas? Constructive criticism? Think I'm stupid? Let me know in the comments!