This Blog

Syndication

News

Enterprise Storage Strategies

Deploying enterprise storage has never been more confusing, with a wide variety of technology choices available. On this blog, Nirvanix Director of Consulting, Stephen Foskett, presents proven strategies for building an internal storage service in the enterprise.

January 2010 - Posts

  • Mr. Backup Is Right: (Cloud) Replication Is Not Backup, But Backup Is!

    Go read that headline again: W. Curtis "Mr. Backup" Preston points out on his blog that replication is not backup, and we can't disagree. Keeping alternative copies of data in multiple locations is a great idea, reducing the risk of data loss and potentially enabling enhanced access, but it's not a historical data protection (aka, backup) strategy. Backup requires management of multiple historic copies of a data set. Clearly, cloud storage in itself isn't backup.

    Backup vs. Storage

    SNIA defines "backup" thus:
    1. [Data Recovery] A collection of data stored on (usually removable) non-volatile storage media for purposes of recovery in case the original copy of data is lost or becomes inaccessible; also called a backup copy.
      To be useful for recovery, a backup must be made by copying the source data image when it is in a consistent state.
    2. [Data Recovery] The act of creating a backup. See archive.

    Backup has always been a challenge for corporate IT. It's not "in the critical path", affecting the daily activities of business users and customers, so it usually gets short-shrift when it comes to financial and organizational support. Yet the ability to restore data quickly becomes job one for IT when it is lost or corrupted. I think Preston spells it out wonderfully in the first chapter of his (updated) seminal book, (UNIX) Backup and Recovery. Systems always fail, data is always lost, and having a good backup is the surest way to recover.

    Storage industry folks have been suggesting that new technologies eliminate "traditional backups" ever since there has been an industry to speak of. Some of these technologies (RAID, replication, high availability, hash-based integrity checks) are great innovations in keeping online data alive, but they fall flat when it comes to data corruption. Others (mirroring, snapshots, versioning, CAS, CDP) are great at retaining multiple copies of data, but even these aren't true backup solutions. Good backup is much more than mere data protection: Backup must manage data, not just protect it. No basic storage technology will eliminate a real backup solution.

    Skim through Preston's book (the index is online at Amazon!) and you'll see that merely creating and holding a copy of a given data set is just a small part of a real backup solution. These copies must be tracked, managed, and expired. Operating systems and applications must be integrated into the solution. Bare-metal recovery, disasters, and compliance must be considered. Storage folks ignore these hard-learned lessons at their peril, and any storage vendor who says backup is dead is revealing their ignorance or naïveté!

    Cloud Storage For Data Protection

    Although storage technology will never be a full answer to the data protection quandary, it has a lot to offer when it comes to assisting backup solutions. Disk technology has literally transformed the backup world in the last decade in the form of replication, snapshots, CDP, virtual tape libraries, and deduplication. These technologies give powerful new capabilities to the existing backup frameworks, overcoming the dismally-limited tape cartridge approach of the olden days. A state-of-the-art backup solution now relies much more on disk-based storage systems than tape or optical capacity, and many use disks exclusively.

    Cloud storage presents new opportunities to enable more effective and efficient backup solutions. Most cloud storage platforms can be very highly utilized, reducing system cost, and can be flexibly and non-disruptively expanded as capacity needs grow. But some cloud storage systems go way beyond this:

    • One of the hallmarks of public cloud solutions is their physical distance from the systems that use them, decreasing the likelihood of data loss from a local disaster. Backing up to a site hundreds or thousands of miles away has long been a dream of IT, and cloud storage makes this possible and even cost-effective!
    • A few cloud storage platforms offer integrated policy-based replication of data (ahem, Nirvanix), and this additional geographic distribution further reduces the risk of data loss in a disaster. It can also aid in recovery, since data can be available locally at remote locations!
    • Like all disk-based backup targets, cloud storage is online and accessible, making restore operations quicker and easier. There is no need to wait for tapes to be recalled, delivered, located, and loaded when data is on random-access disk! But unlike local disk, public cloud storage can be accessible remotely as well, bringing this ease to distributed businesses and disaster recovery operations.
    • Cloud storage systems can embed metadata with stored content, further accelerating restore operations for systems that can use it since indexes no longer have to be rebuilt. This also enables new archiving and content management features, elevating backup to serve a primary business need.
    • One of the hallmarks of cloud storage platforms is their API-based programmability. Backup and archive management companies are discovering the ease and power of integrating programmable cloud storage right into their applications: Watch this space for announcements!
    • Further storage smarts are being embedded into cloud systems, too. We have seen deduplication and compression (check out Nirvanix partner, Ocarina!), data protection (Partners, Tarmin and Atempo), media transcoding, indexing, content distribution, and more.

    Backup Is Backup

    Simply adding basic data protection techniques like snapshots or replication to a storage system doesn't make it a backup solution. Storage isn't backup, but backup is! Backup systems can leverage storage capabilities, but a backup management solution will always be required to get complete data protection.

    Clearly, cloud storage isn't a backup solution. But as this unique combination of capabilities demonstrate, it's much more than simple storage capacity. Like so many storage technologies before it, cloud storage is an enabler for advanced backup solutions.

  • SSPs, cloud storage providers, and internal clouds: Zebras, Giraffes, and Horses

    My youngest daughter used to have trouble with her animals. Whenever she saw a giraffe, she would say "zebra" and whenever she saw a zebra she would say "giraffe!" Although an adult would never make that mistake, one can understand why a child would: She was new to these names, and they were entirely arbitrary words. Besides, both are quadrupeds with bizarre coloration and patterns. But my daughter definitely knew a horse when she saw one!

    Private Clouds and SSPs: Horses and Zebras

    Today's cloud storage world can be equally confusing to the uninitiated. Long-time IT folks remember the storage service providers (SSPs) of a decade ago and have watched as storage and server virtualization have gained prominence. When cloud storage began to get some press a couple of years ago, it was natural to try to fit it into the existing paradigms and understandable to fail to spot the differences.

    Internal IT systems have been on the road to virtualization for years. I recall being excited about the potential of server and storage virtualization over a decade ago. But storage service providers like StorageNetworks (where I worked in 2000) didn't use any of this fancy stuff. Rather, the SSPs of last decade were zebras, built of the same storage area network (SAN) storage systems used by their customers but clothed as a service-oriented business.

    As Hu Yoshida of HDS points out in his blog, SSPs were not a raging success. But they were not the colossal failure many assume - just ask the vast assortment of StorageNetworks alumni now in charge at places like HDS! True, all that conventional enterprise storage gear had trouble with multi-tenancy, but the real issue for SSPs was financial. They built out world-class storage networks (pardon the pun) and wrapped them in expensive home-built management and provisioning software. By the time they had a workable offering, the price tag had risen to levels that were hard to justify.

    Today's private storage clouds, as touted by HDS, NetApp, and others, are horses of a different color. Enterprise storage systems are much more flexible and sharable thanks to integrated virtualization and advanced management features. But they retain the traditional enterprise storage access mechanisms, construction, and cost. If you're looking for a horse, it's a blessing to find one: These "cloudy storage" solutions can be plugged in and used precisely because they are conventional. The kind of storage workload described in Hu's blog (massive I/O connected to virtual servers) is best served by these impressive but ordinary storage devices.

    Public Cloud is a Different Animal

    Not every workload needs a workhorse, however. Many have questioned how widespread the need is for the next generation of high performance connectivity. Database systems remain the only really common high-I/O workload, though highly-concentrated server virtualization systems will also soon join this club. But these make up only a small percentage of overall IT server and storage deployments. The majority of applications demand low cost and high flexibility more than extreme performance.

    This is especially true of the types of applications using public cloud storage today. Regardless of industry vertical, every business would benefit from having their vast reference and archival datasets available online rather than moldering on tape. Intelligent cloud storage platforms are rapidly being integrated with the best data management and archiving applications to make this a reality. Already, companies like Nirvanix are hosting petabytes of archival data for the largest corporations and governmental entities. They chose public cloud storage over internal disk or tape because it was competitive on cost as well as being exceptionally available.

    The next generation of business applications will make use of the other big benefit of cloud storage: Collaboration. Forward-thinking businesses have already deployed applications with integrated data sharing using public cloud storage to enhance their technical support and customer service activities. A new wave of similar business-to-business collaboration tools is on the way. IT infrastructure folks might not have noticed this shift in focus by developers, but the revolution in collaborative software is about to strike.

    Although the casual observer might not discern it, public cloud storage solutions are as different from virtualized internal systems and the old SSP offerings as giraffes are from horses or zebras. Today's cloud storage providers are rejecting conventional enterprise storage devices in favor of software solutions based on commodity server hardware. Server and storage virtualization, Fibre Channel SANs, and even enterprise NAS are rare in the data centers of cloud storage providers. Instead, they have re-thought the challenge of protecting data and servicing customers and their solutions have the side effect of being much less expensive. Hardware cost is disappearing, and cloud providers are instead focusing on the "glass floor" of operations and management costs, as well as raising the bar on service and availability.

    Place Your Bets

    This new world will not erase the old, but cloud solutions have little use for traditional server and storage infrastructure approaches. There is a race "up the stack" as IT companies deploy platforms and services rather than merely offering faster versions of last year's server and storage equipment. This is the reason for VMware's acquisition of SpringSource (and perhaps Zimbra) and a series of investments on the part of EMC, Cisco, Dell, HP, IBM, and the rest. There will still be a market for the nuts and bolts products that support traditional IT systems, but these big players are betting that the action is elsewhere.

  • What Will 2010 Bring To Enterprise Storage?

    I'm loathe to give predictions, preferring introspection and outright silliness. But the turn of the year is a time of optimism, so I will take my turn at the megaphone to dish out some ideas I believe will come to pass in the coming year.

    2010 will be a year of normalization (“righting the ship”) for enterprise IT : We will see a return to investment and building out new features after a year of financial panic. IT will begin again to focus on what they do well and continue to outsource everything else – including non-core applications. Without the threat of financial doom, IT folks will be willing to take more risks than in 2009, feeling that their jobs are no longer on edge.

    With regard to enterprise storage, I think a few trends are particularly interesting:

    1. Increasing virtualization drives higher I/O demands – VMware vSphere and Microsoft Hyper-V can now push the big I/O required by databases and other taxing applications, and these will be virtualized (finally) in 2010. This in turn will demand more storage I/O, so we’ll see increasing use of SAN arrays even at the low end of the market.
    2. Expansion of SANs for SMB – As smaller environments (and smaller apps within large environments) virtualize more, they’ll start looking for intelligent, higher-performance SAN storage. This means a bonanza for vendors of iSCSI and sub-$20k storage devices!
    3. Increasing use of archiving – Businesses of all sizes are interested in archiving for compliance and data management reasons, so the use of archiving software, hardware, and services will explode. I expect managed archiving services to be particularly interesting since this has never been a core focus of IT.
    4. Pushing up the stack – Every area of IT is “moving up the stack” with tighter application integration, and this will continue as new technologies come to market. I expect special-purpose storage solutions (software, hardware, and services) integrated with applications (like Exchange, SharePoint, SAP, etc) to be a real focus for 2010.
    5. The end of FC disks – Flash and automated tiering will combine with SAS to spell doom for traditional high-performance disk drives. We’ll see array vendors switch en-masse to larger capacity drives with SAS and increasing amounts of cache RAM and flash in storage systems throughout 2010. 2011 and beyond might see the end of high-performance disks altogether as SSD becomes entrenched.
    6. The advent of extreme tiering – We’ll see flash, SAS, and cloud storage combined into super tiered storage systems, with a number of solutions appearing to cache, balance performance and capacity, and replicate data off-site. Virtualization will meld with cloud front-ends and automated tiering to become extreme tiering devices. This won’t be mainstream until 2011 at the earliest, but it’ll start happening this year.
    7. Still not the year of converged networks – Although Cisco, EMC, and the rest will push hard for 10 GbE, DCB, and FCoE, it will not make a significant impact in IT spend through 2010. But 10 GbE will be deployed successfully in high-I/O environments (see number 1). ISCSI will continue its quiet rise, though.

    What do you think 2010 will bring?