Lessons in Backups

I recently had some filesystem issues on my “dev” server at home. I felt that it would be best for me to wipe out the system, reinstall, and restore user data from backups. It was a good excuse to upgrade the OS anyways. I find that every time I do a restore a learn a couple of things.

Lessons from this last time.

  1. Make a backup of your backup scripts. I forgot to do that, so I’ll have to rewrite them. I guess I can add some improvements to them then (like creating checksums and double checking files, etc). Really, I should create a SVN repository for my operations related scripts.
  2. Backup my /etc as well. It wasn’t a big deal this time, because the configuration on this machine fluctuates heavily and isn’t critical. But last time, I was lucky to grab a copy of /etc off a dying live server. I should add that to my backup scripts.

Amusingly, I stumbled across this post about Leafyhost’s recent issues which led me to a post on Arstechnica’s forums chronicling the last much of Leafyhost’s last year. Ultimately they failed in making proper backups and many users did not have their own backups. Reading through it, it reminded me of issues that others have had.

Let me share the other lessons from the past.

  1. Hard drives will fail. Plan on it. I’ve lost about a harddrive a year for the last couple of years. Because of automated backups, each has been less painful than the previous. You must have the attitude that your drives are strictly temporary storage. The only safe way to store your data for the long term is through redundancy.
  2. Reliability should trump performance. RAID 0 should not be used except for temp diskspace, swap, or a replicated copy of data for performance reasons. The little gains you get from your app starting up in two fewer bounces on the dock will not make up for the time and lost data of recovering from a failed drive.
  3. Backups should be automated. Because we are all lazy. It is easy for us to put it off, day after day. Or we forget. Or we’re busy. Whatever. It should be scripted and thrown into cron/task scheduler/launchd. A remote fileserver works best for automated backups (unless you have a tape changer) because it doesn’t require user interaction (”Insert the next disc.”).
  4. No one else cares about your data/site/business as much as you do. Each degree of separation introduces more apathy. Your webhost may be happy satisfying 99% of the users after a catastrophic failure. It may not make financial sense to appease the last 1%. Sucks to be you if you’re in the 1%. And their employees will care even less. If you don’t care enough to backup, how can you rely on others to do a good job?
  5. Double check your backups occasionally. Usually this involves a test restore. I’ve cheated and sacrificed diskspace for transparency. My backups are not compressed, and are point-in-time copies of various directories. I can browse through them easily to see that the data is there, and can easily do partial restores. If you cannot easily check your backups, then you should perform test restores occasionally.
  6. Back up offsite. Currently my websites and the sites I host are backed up to my NAS at home. My home dev server backs up to the NAS at home also. I’m looking to set up an offsite NAS at mom’s and periodically sync them.
  7. Rotate your backups. Keep a couple of copies around just in case. Just in case your source files have been corrupt a little longer than you had realized. Or just in case your last backup was corrupt for whatever reason.
  8. Don’t store your backups in a proprietary format. You may not be able to recreate the same exact software/hardware environment when it comes time to use that backup. Maybe the software doesn’t run on the latest version of the OS. Maybe the latest drivers are incompatible. Whatever it is, Murphy’s Law dictates that it will come up. Save yourself the headache and reduce the complexity of the system.
  9. A note on the NAS: The NAS obviously uses harddrives. Which will fail. The NAS runs them in RAID 5. When one drive fails, it will need to be replaced and the array rebuilt. The rebuild will be disk intensive. The other drives will be similar age (if not the same batch) as the drive that failed. This means that there is a higher risk of failure during this time and no redundancy. I hope to have my offsite mirror up before I suffer a harddrive failure in my NAS.

Ultimately, my data is irreplaceable. My code, my digital photographs — these really are irreplaceable. They are my most valuable possessions, and no insurance/money will bring them back. When thought of in this light, you really shouldn’t be cheap or lazy with regard to your backups.

One Response to “Lessons in Backups”

  1. tienshiao.org » Blog Archive » Lost A Harddrive Says:

    [...] far, it still seems that I lose about a harddrive a year. I mentioned previously that “I hope to have my offsite mirror up before I suffer a harddrive failure in my [...]

Leave a Reply