meso·pixel    

Mid May

Slow down and back that thing up

Losing data sucks Big Time. I’ve been bitten by the data-loss bug numerous times, both on my website (due to goofs/bad configuration) and on my personal computers (due to hardware failures). And now that I’m working on EC2, it’s becoming more and more apparent that having a good backup strategy is really important.

For example, Amazon’s S3 service has a reported reliability rate of 99.999999999%[1], which means your chances of losing data is less than your chances of winning the lottery. Awesome. However, EBS backed EC2 instances like the one running this site now aren't so lucky, with what Amazon describes as an annual EBS volume failure rate of up to 0.5%[2], and an EC2 uptime of at least 99.95%[3]. Not so great. (But still much better than failure rates in consumer hardware)

Since I never gave it much thought initially when building the original site, my previous strategy, or lack of, was embarrassingly poor – mainly consisting of me manually downloading the site and associated database once a year. The worst part was that, all changes were made directly to the site live, which meant that anyone browsing the site would be affected every time I make a mistake or typo in PHP. All this despite me using best practices in my day to day work... for shame!

Right now, my strategy for backing up this new site is as follows, starting at the AWS layer:

  • EBS backed AMIs (Amazon Machine Images) created on major server configuration changes – this ensures that we have a properly server instance that can be spooled up (with slightly old data and mysql database) at any time
  • S3/local based backup of MySQL data base, website, and server config files created every change – this ensures that we have the most recent data that can be restored onto an old AMI, with a little bit of manual work
  • Git repository of website and server config files – this allows me to pull down a copy of the website for local editing when I don’t have internet access, along with all the other benefits of having a version control system
  • Development/Production environments in git, and on the server – this ensures that I can properly test new features before pushing it out for people to see (which is done through another script). The production environment is a separate branch from the dev environment, which means that we can revert any faulty pushes if necessary
  • Occasionally running a full restore process by taking the last AMI, and pulling the latest website/server config to ensure that the backups are actually valid – your backups are only good if they can actually be restored!

The biggest concern of this strategy is that S3 is not accounted for at all, so the question becomes how comfortable I am with S3’s 99.999999999% reliability, or whether it is worth duplicating that data in another S3 zone (probability of loss then becomes 1-p2) or saving it locally. And to be honest, those are pretty good numbers to begin with, and I’m actually OK with leaving data on S3 as-is and performing bi-annual or annual backups. Otherwise, for the more frequent (and expected) failures of EC2 and EBS, I am fairly confident that I could restore the server in the event of a catastrophe. Now as for the personal files on my computers at home, well that’s another strategy that I’m going to have to come up with soon! :)

[1] Amazon S3 RRS
[2] Amazon EBS
[3] Amazon EC2 Service Level Agreement

Waterfall

Ah, the memories of MIDI

This is almost artistic, cruise ships from an ariel view

Auto-complete Bash history using arrow keys (probably the best Bash tip I know)

Pong

Remember and Big Shiny Tunes and Much Dance? Good times.

Worst office fear: Rolling over your own toes with your computer chair.

Don't say Disney won't go to great lengths to optimize their animatronics...

Like horse racing but for nerds and biologists, Genetic Cars.

Monterey 2013 (4)
Monterey 2013 (4)