This Blog

Syndication

News

Enterprise Storage Strategies

Deploying enterprise storage has never been more confusing, with a wide variety of technology choices available. On this blog, Nirvanix Director of Consulting, Stephen Foskett, presents proven strategies for building an internal storage service in the enterprise.

When Is A Copy A Backup?

Ocarina's Carter George continued the conversation on backups, asking if the conventional backup paradigm was obsolete, and if file copies could serve the same purpose. As mentioned in our "What Is a Backup?" post, this is the same question posed by EMC's Scott Waterhouse recently.

Putting Copies To The Test

George suggests a copy-based scenario: "Why not just move files that are candidates for being backed up to a separate tier of storage, keeping them as files in their native format, and organizing them in time coherent views?"

To determine whether this is truly a backup, let's apply our new rules to determine when a copy becomes a backup:

  1. A copy is, by definition, a copy of a set of data.
  2. This copy is not mentioned as being protected or offline, which worries the IT admin in me. Could they be overwritten or corrupted? Would they disappear along with the primary data set? It is noted to be on a separate tier, but I would definitely like it to be geographically distant and logically protected.
  3. The copy sounds like it would be suitable for restore or recovery of data. In fact, restore might be simplified by not encapsulating data in an alternate format.
  4. No mention is made of a management process, with metrics, logging, or indexes. But George does mention "a little search engine capability" to facilitate restores.
  5. George specifies that files would be organized with a coherent point in time view. This would presumably be facilitated by the non-mentioned management system.
  6. Distinct copies would also not affect the performance or usability of the primary data set.

So it is entirely possible to create a backup system as suggested that copies data in its native format to an alternate tier or location of storage. But simply copying data isn't enough on its own, as Waterhouse pointed out. A useful backup system must include scheduling, reporting, indexing, and other management features to be useful.

The Benefits Of Copy-Based Backup

Especially intriguing is George's assertion that existing backup systems are antiquated and out of touch with the times. Similar statements were made by the continuous data protection (CDP) pioneers a half-decade ago. And the latest disk-to-disk backup trend certainly demonstrates that the old paradigm isn't in touch with modern data volumes, backup windows, and performance demands. Consider as well the data protection APIs included in vSphere and the proliferation of VSS-aware applications on Windows. Backup is evolving away from the old "dump it to tape" concept of yore.

Both George and Waterhouse also note the unique benefits of copy-based backup systems, and there are many of these indeed!

  1. Keeping data in its original format makes it much more useful. Restores can be quicker, data can be re-indexed, and there is less worry about future-proofing the solution since no proprietary encoding is involved.
  2. Disk-based backup targets keep data accessible, so there is less worry about unknown media failures.
  3. New applications can leverage online backups for other purposes, including archiving, compliance, testing, data mining, and such.
  4. Replication or cloud storage technology can move data off-site for better DR readiness. This is one area where disk-based backup seriously trounce boxes of tapes! Less manual procedures, not tapes falling off the loading dock or truck, and no worries about misplacing them at the warehouse.

Progress

So why isn't everyone abandoning tape? One reason is that folks are very conservative when it comes to peace-of-mind issues like data protection. They are waiting until this new paradigm is proven before they jump. But the success of disk-based backup, VTL, offsite storage, and online backup is definitely trending towards making copy-based backup a Best Practice!

How will we get there? Current backup solutions are evolving. Already, most include some form of disk-based backup, and VTL technology brings this to those that don't. Offsite data protection is the next big leap, and the killer app that will drive the final nail into the coffin of traditional backup. Once we're using disk more than tape, the backup products themselves will begin to change to better leverage the random-access online storage they now see.

Note that >strong>many systems have already discarded the yoke of old-school backup solutions. More and more vertical applications now rely on replicated copies rather than traditional backup for data protection. They perform the management themselves and take advantage of these benefits right off the shelf. Some even come bundled with offsite data protection capability right out of the box. Now that's progress!

Finally, note that copy-based backup has already become the dominant paradigm for small-office and home users. Although backup remains all-too rare outside the data center, those that do choose to protect their data are likely using products like Apple's Time Machine, Maxtor's One-Touch, or EMC's Mozy. I personally rely on Time Machine for primary protection of my Macs, and it works exactly like Carter George's imaginary backup product! The only thing that would make it better is if it included the ability to send occasional copies offsite. Although I have managed to add this capability myself by leveraging the excellent rsync software with Nirvanix CloudNAS, that is a discussion for a different day!

Comments

 

Sam Srinivas said:

To fully realize the benefits of a copy-based backup you should be able to able to fully decouple yourself from the actual backup software you are using to make that backup.

In other words, you should be able to able to say "Today I ditch vendor X and switch to Y" and the copy-based backup made by Vendor X should still be useful.

This is certainly true of Time Machine and and Maxtor's One-Touch -- because the only common denominator is on more long life things such as the Apple File System or the NTFS file system.

For example in the One-Touch, you do have the current state of the file system available as a copy with reverse increments for previous states **all stored as full files** -- with not too much effort you can recover the full state of a file at an earlier time without needing Maxtor's software. Many of the One-Touch competitors seem to do similar stuff, which is very good.

This non-dependence on vendor is just taking the basic advantages of open standards (which tend to live long) into backup.

By the way, Mozy certainly does *not* fit this bill -- the data might appear to you as just a file system when you explore it locally, but you don't have physical access to the data as a regular file system -- not only is the data owned by them physically, its in some custom format that you have no visibility into.

Moving to another issue, as you say, a critical requirement of backup is offsite backup. And with copy based backup, offsite backup is reduced to just remote replication.

Ideally you should own the medium which contains the offsite backup -- either a standard file system you own (a disk you actually own someplace) or the new cloud equivalent of that: a standard file system overlaid on something like Amazon S3 which you can read with standard software, not just the backup vendors.

In other words S3 or equivalent is just the equivalent of a low level cloud storage medium, you interact with via a well known file system overlay, which is open not just provided by the backup vendor.

One thing which is missing for remote replication is a standard way to encrypt the backup while preserving the advantages of an open file based approach. A hybrid way to do this might be to do file by file encryption using a standard such as 'crypt' before shipping data to the remote site -- in other words, the remote data can be decrypted and made into a plain file system replica by using standard "open source" decryption with no lock-in to the backup vendor.

June 3, 2009 7:36 PM
 

sfoskett said:

Great comments, Sam! Indeed, the fact that many copy-based backups leave data in its native format is HUGE. This is true for everyone, from the home user to the enterprise with e-discovery.

June 3, 2009 8:19 PM
 

Back From The Pile: May 30, 2009 – Stephen Foskett, Pack Rat said:

Pingback from  Back From The Pile: May 30, 2009 – Stephen Foskett, Pack Rat

June 13, 2009 1:20 PM

Leave a Comment

(required)  
(optional)
(required)  
Add