Ocarina's Carter George continued the conversation on backups, asking if the conventional backup paradigm was obsolete, and if file copies could serve the same purpose. As mentioned in our "What Is a Backup?" post, this is the same question posed by EMC's Scott Waterhouse recently.
Putting Copies To The Test
George suggests a copy-based scenario: "Why not just move files that are candidates for being backed up to a separate tier of storage, keeping them as files in their native format, and organizing them in time coherent views?"
To determine whether this is truly a backup, let's apply our new rules to determine when a copy becomes a backup:
- A copy is, by definition, a copy of a set of data.
- This copy is not mentioned as being protected or offline, which worries the IT admin in me. Could they be overwritten or corrupted? Would they disappear along with the primary data set? It is noted to be on a separate tier, but I would definitely like it to be geographically distant and logically protected.
- The copy sounds like it would be suitable for restore or recovery of data. In fact, restore might be simplified by not encapsulating data in an alternate format.
- No mention is made of a management process, with metrics, logging, or indexes. But George does mention "a little search engine capability" to facilitate restores.
- George specifies that files would be organized with a coherent point in time view. This would presumably be facilitated by the non-mentioned management system.
- Distinct copies would also not affect the performance or usability of the primary data set.
So it is entirely possible to create a backup system as suggested that copies data in its native format to an alternate tier or location of storage. But simply copying data isn't enough on its own, as Waterhouse pointed out. A useful backup system must include scheduling, reporting, indexing, and other management features to be useful.
The Benefits Of Copy-Based Backup
Especially intriguing is George's assertion that existing backup systems are antiquated and out of touch with the times. Similar statements were made by the continuous data protection (CDP) pioneers a half-decade ago. And the latest disk-to-disk backup trend certainly demonstrates that the old paradigm isn't in touch with modern data volumes, backup windows, and performance demands. Consider as well the data protection APIs included in vSphere and the proliferation of VSS-aware applications on Windows. Backup is evolving away from the old "dump it to tape" concept of yore.
Both George and Waterhouse also note the unique benefits of copy-based backup systems, and there are many of these indeed!
- Keeping data in its original format makes it much more useful. Restores can be quicker, data can be re-indexed, and there is less worry about future-proofing the solution since no proprietary encoding is involved.
- Disk-based backup targets keep data accessible, so there is less worry about unknown media failures.
- New applications can leverage online backups for other purposes, including archiving, compliance, testing, data mining, and such.
- Replication or cloud storage technology can move data off-site for better DR readiness. This is one area where disk-based backup seriously trounce boxes of tapes! Less manual procedures, not tapes falling off the loading dock or truck, and no worries about misplacing them at the warehouse.
So why isn't everyone abandoning tape? One reason is that folks are very conservative when it comes to peace-of-mind issues like data protection. They are waiting until this new paradigm is proven before they jump. But the success of disk-based backup, VTL, offsite storage, and online backup is definitely trending towards making copy-based backup a Best Practice!
How will we get there? Current backup solutions are evolving. Already, most include some form of disk-based backup, and VTL technology brings this to those that don't. Offsite data protection is the next big leap, and the killer app that will drive the final nail into the coffin of traditional backup. Once we're using disk more than tape, the backup products themselves will begin to change to better leverage the random-access online storage they now see.
Note that >strong>many systems have already discarded the yoke of old-school backup solutions. More and more vertical applications now rely on replicated copies rather than traditional backup for data protection. They perform the management themselves and take advantage of these benefits right off the shelf. Some even come bundled with offsite data protection capability right out of the box. Now that's progress!
Finally, note that copy-based backup has already become the dominant paradigm for small-office and home users. Although backup remains all-too rare outside the data center, those that do choose to protect their data are likely using products like Apple's Time Machine, Maxtor's One-Touch, or EMC's Mozy. I personally rely on Time Machine for primary protection of my Macs, and it works exactly like Carter George's imaginary backup product! The only thing that would make it better is if it included the ability to send occasional copies offsite. Although I have managed to add this capability myself by leveraging the excellent rsync software with Nirvanix CloudNAS, that is a discussion for a different day!