Tuesday, February 3, 2009

How to Prevent Data Loss

Introduction
The release of FireScope last week was a huge success, judging from the feedback on Twitter and the tens of thousands of downloads.
In last week's issue Kevin expressed our wish to contribute to the web industry. If you checked out FireScope and the SitePoint Reference redesign, we'd love to gain your feedback; if you can suggest any improvements with our reference content to make your job easier, we'd like to hear about that too. Email support@sitepoint.com with your thoughts.
It's quiz time again and we have a huge double prize up for grabs! All you have to do is read our recent article, Ajaxify Your Flex Application, and then take the quiz. You'll go into the draw to win a copy of Adobe CS4 Web Premium and Flex Builder 3 Pro.
In this issue Kevin talks about how every web service's worst nightmare has come true for a few sites recently. If your apps live in The Cloud, remember your parachute!

Summary
Introduction
Every Web Service's Worst Nightmare
How To Prevent Data Loss
New Technical Articles
Techy Forum Threads
More Techy Blog Entries

Every Web Service's Worst Nightmare
by Kevin Yank
"Earlier today social bookmarking site Ma.gnolia suffered what is undoubtedly the worst nightmare for any web startup: a massive, and possibly irreversible loss of data. The site is currently offline, and a note from company founder Larry Halff says that the problem will take days to evaluate."
That was the story reported by Josh Catone on the SitePoint News & Trends blog this past weekend. The note on the site remains unchanged today, and is far from encouraging:
"Early on the West-coast morning of Friday, January 30th, Ma.gnolia experienced every web service's worst nightmare: data corruption and loss. For Ma.gnolia, this means that the service is offline and members' bookmarks are unavailable, both through the website itself and the API. As I evaluate recovery options, I can't provide a certain timeline or prognosis as to when or to what degree Ma.gnolia or your bookmarks will return; only that this process will take days, not hours."
In his report, Josh spells it out: "It should be pretty clear to anyone that this is a very, very bad thing. Even if Ma.gnolia is to somehow recover all the user data it lost, there has very likely been irreparable damage done to their reputation and to the confidence that users have in their service. We expect that the cloud will go down from time to time, but we also expect that it will be back up quickly and with all of our data intact."
The team behind Ma.gnolia does seem to be focused on doing what is right. Their latest update on Twitter confirms this:
"Simultaneously working on tools to help members recover bookmarks from other sources on the web, if necessary."
From this we can infer that they're looking into pulling cached versions of users' Ma.gnolia public bookmarks pages from sources like the Google cache and the Web Archive. Prospects for a complete recovery in such a scenario are grim.
Unfortunately, other services have suffered a similar fate to Ma.gnolia. In Episode 6 of the SitePoint Podcast, we discussed blogging community Journalspace. It went off the air after six years, losing the painstakingly written content of its users overnight.
Servers go down from time to time, but it's every web service's responsibility to have a solid backup strategy in place. What happened to Ma.gnolia represents a failure (or an absence) of such a strategy.
Read on below for a few tips on how you can ensure your site's data is properly backed up.


How To Prevent Data Loss
by Kevin Yank
Following Josh's story about Ma.gnolia, a meaty comment thread discussing approaches to web server backups has sprung up.
First, if your site is hosted on a managed server, all but the cheapest of web hosts will provide automated backup services as part of the package. Dreamhost, for example, will back up both the files on your server and your databases on at least a daily basis.

Even Dreamhost, however, is quick to recommend that you also have a backup plan:
"We recommend you always keep your OWN copy of your entire web site at a remote location as well, but we'll do our best to make sure that's never needed."
Keeping your own copy of the code for your site is usually quite easy, since most development is done using such a copy. Code versioning systems like SVN and Git even provide a complete history of that code. The tricky part is keeping a copy of your site's dynamic data, usually in the form of database records.
If your site runs on MySQL, chances are your host provides a copy of phpMyAdmin (or you can set up your own). phpMyAdmin includes an easy-to-use database export feature. On the front page of your phpMyAdmin page, click Export.

Select which database(s) to export, and make sure they're set up to export the data in these database(s) as well as the structure:

Finally, make sure to check the Save as file checkbox and select a compression type, so that the backup is delivered as a file download.

For a small site, that will do the trick. Doing it every single day may be too tedious to be practical, however. That's when it's time to look into some of the automated solutions discussed in the comments thread.
Whichever solution you use, make sure to give some thought to where and how you choose to store your backups. Commenter Tom Rutter has some wise words which I've reproduced below:
"A lot of people's backup strategies take care of some problems, but not all. Ideally, a backup strategy for any data not worth losing needs to be able to cope with:
Your building burns to the ground
You find out all your data became corrupted/lost, and you have done backups since
"For the first, you need off-site backup. This ensures that if an entire building is burgled, burned down, flooded, etc then the data is recoverable.
"For the second, you need some sort of history of backups. Incremental backups are good because they allow history but save space, though of course you'll need to think about how easy it is to restore from backup.
"Common backup myth: "RAID is backup"RAID is not backup. It provides the ability to replace a failed drive without taking down the system, but that is "availability", not backup. For example, it is not intended to protect against either of the two above scenarios. If the building burns down, it's all lost. If data is corrupted, it's all lost (instantly). A faulty disk controller or power supply, or a power surge which your power supply can't cope with, can ruin the entire RAID set. RAID is good if you need high availability, but even if you have RAID you still need backup."
Do you have a slick backup strategy to share? Want to see what the experts are doing? Stop by the blog post and check out the comments:

News & Trends Blog: Industry news for web professionalsby Josh CatoneOpen Thread: How to Prevent Data Loss (13 comments)


See you next week for another issue of the Tech Times.
Andrew Tetlawtechtimes@sitepoint.comSitePoint Technical Editor

No comments:

Post a Comment