Go read that headline again: W. Curtis "Mr. Backup" Preston points out on his blog that replication is not backup, and we can't disagree. Keeping alternative copies of data in multiple locations is a great idea, reducing the risk of data loss and potentially enabling enhanced access, but it's not a historical data protection (aka, backup) strategy. Backup requires management of multiple historic copies of a data set. Clearly, cloud storage in itself isn't backup.
Backup vs. Storage
SNIA defines "backup" thus:
- [Data Recovery] A collection of data stored on (usually removable) non-volatile storage media for purposes of recovery in case the original copy of data is lost or becomes inaccessible; also called a backup copy.
To be useful for recovery, a backup must be made by copying the source data image when it is in a consistent state.
- [Data Recovery] The act of creating a backup. See archive.
Backup has always been a challenge for corporate IT. It's not "in the critical path", affecting the daily activities of business users and customers, so it usually gets short-shrift when it comes to financial and organizational support. Yet the ability to restore data quickly becomes job one for IT when it is lost or corrupted. I think Preston spells it out wonderfully in the first chapter of his (updated) seminal book, (UNIX) Backup and Recovery. Systems always fail, data is always lost, and having a good backup is the surest way to recover.
Storage industry folks have been suggesting that new technologies eliminate "traditional backups" ever since there has been an industry to speak of. Some of these technologies (RAID, replication, high availability, hash-based integrity checks) are great innovations in keeping online data alive, but they fall flat when it comes to data corruption. Others (mirroring, snapshots, versioning, CAS, CDP) are great at retaining multiple copies of data, but even these aren't true backup solutions. Good backup is much more than mere data protection: Backup must manage data, not just protect it. No basic storage technology will eliminate a real backup solution.
Skim through Preston's book (the index is online at Amazon!) and you'll see that merely creating and holding a copy of a given data set is just a small part of a real backup solution. These copies must be tracked, managed, and expired. Operating systems and applications must be integrated into the solution. Bare-metal recovery, disasters, and compliance must be considered. Storage folks ignore these hard-learned lessons at their peril, and any storage vendor who says backup is dead is revealing their ignorance or naïveté!
Cloud Storage For Data Protection
Although storage technology will never be a full answer to the data protection quandary, it has a lot to offer when it comes to assisting backup solutions. Disk technology has literally transformed the backup world in the last decade in the form of replication, snapshots, CDP, virtual tape libraries, and deduplication. These technologies give powerful new capabilities to the existing backup frameworks, overcoming the dismally-limited tape cartridge approach of the olden days. A state-of-the-art backup solution now relies much more on disk-based storage systems than tape or optical capacity, and many use disks exclusively.
Cloud storage presents new opportunities to enable more effective and efficient backup solutions. Most cloud storage platforms can be very highly utilized, reducing system cost, and can be flexibly and non-disruptively expanded as capacity needs grow. But some cloud storage systems go way beyond this:
- One of the hallmarks of public cloud solutions is their physical distance from the systems that use them, decreasing the likelihood of data loss from a local disaster. Backing up to a site hundreds or thousands of miles away has long been a dream of IT, and cloud storage makes this possible and even cost-effective!
- A few cloud storage platforms offer integrated policy-based replication of data (ahem, Nirvanix), and this additional geographic distribution further reduces the risk of data loss in a disaster. It can also aid in recovery, since data can be available locally at remote locations!
- Like all disk-based backup targets, cloud storage is online and accessible, making restore operations quicker and easier. There is no need to wait for tapes to be recalled, delivered, located, and loaded when data is on random-access disk! But unlike local disk, public cloud storage can be accessible remotely as well, bringing this ease to distributed businesses and disaster recovery operations.
- Cloud storage systems can embed metadata with stored content, further accelerating restore operations for systems that can use it since indexes no longer have to be rebuilt. This also enables new archiving and content management features, elevating backup to serve a primary business need.
- One of the hallmarks of cloud storage platforms is their API-based programmability. Backup and archive management companies are discovering the ease and power of integrating programmable cloud storage right into their applications: Watch this space for announcements!
- Further storage smarts are being embedded into cloud systems, too. We have seen deduplication and compression (check out Nirvanix partner, Ocarina!), data protection (Partners, Tarmin and Atempo), media transcoding, indexing, content distribution, and more.
Backup Is Backup
Simply adding basic data protection techniques like snapshots or replication to a storage system doesn't make it a backup solution. Storage isn't backup, but backup is! Backup systems can leverage storage capabilities, but a backup management solution will always be required to get complete data protection.
Clearly, cloud storage isn't a backup solution. But as this unique combination of capabilities demonstrate, it's much more than simple storage capacity. Like so many storage technologies before it, cloud storage is an enabler for advanced backup solutions.