Mesopixel

Mid May
Slow down and back that thing up

Losing data sucks Big Time. I’ve been bitten by the data-loss bug numerous times, both on my website (due to goofs/bad configuration) and on my personal computers (due to hardware failures). And now that I’m working on EC2, it’s becoming more and more apparent that having a good backup strategy is really important.

For example, Amazon’s S3 service has a reported reliability rate of 99.999999999%^[1], which means your chances of losing data is less than your chances of winning the lottery. Awesome. However, EBS backed EC2 instances like the one running this site now aren't so lucky, with what Amazon describes as an annual EBS volume failure rate of up to 0.5%^[2], and an EC2 uptime of at least 99.95%^[3]. Not so great. (But still much better than failure rates in consumer hardware)

Since I never gave it much thought initially when building the original site, my previous strategy, or lack of, was embarrassingly poor – mainly consisting of me manually downloading the site and associated database once a year. The worst part was that, all changes were made directly to the site live, which meant that anyone browsing the site would be affected every time I make a mistake or typo in PHP. All this despite me using best practices in my day to day work... for shame!

Right now, my strategy for backing up this new site is as follows, starting at the AWS layer:

EBS backed AMIs (Amazon Machine Images) created on major server configuration changes – this ensures that we have a properly server instance that can be spooled up (with slightly old data and mysql database) at any time
S3/local based backup of MySQL data base, website, and server config files created every change – this ensures that we have the most recent data that can be restored onto an old AMI, with a little bit of manual work
Git repository of website and server config files – this allows me to pull down a copy of the website for local editing when I don’t have internet access, along with all the other benefits of having a version control system
Development/Production environments in git, and on the server – this ensures that I can properly test new features before pushing it out for people to see (which is done through another script). The production environment is a separate branch from the dev environment, which means that we can revert any faulty pushes if necessary
Occasionally running a full restore process by taking the last AMI, and pulling the latest website/server config to ensure that the backups are actually valid – your backups are only good if they can actually be restored!

The biggest concern of this strategy is that S3 is not accounted for at all, so the question becomes how comfortable I am with S3’s 99.999999999% reliability, or whether it is worth duplicating that data in another S3 zone (probability of loss then becomes 1-p²) or saving it locally. And to be honest, those are pretty good numbers to begin with, and I’m actually OK with leaving data on S3 as-is and performing bi-annual or annual backups. Otherwise, for the more frequent (and expected) failures of EC2 and EBS, I am fairly confident that I could restore the server in the event of a catastrophe. Now as for the personal files on my computers at home, well that’s another strategy that I’m going to have to come up with soon! :)

^[1] Amazon S3 RRS
^[2] Amazon EBS
^[3] Amazon EC2 Service Level Agreement

meso·pixel

Mid May
Slow down and back that thing up

Waterfall

meso·pixel

Mid May Slow down and back that thing up

Waterfall

Mid May
Slow down and back that thing up