In his most recent post for Dedupe Matters, Tony Asaro makes some predictions around the growing impact deduplication will have in the datacenter for 2009. I was particularly struck by the following statement:
"The perceived risk of implementing D2D backup with dedupe is all but gone and the value is glaringly clear. Additionally, the bad economy makes the value proposition that much more compelling."
This notion of perception is important to flesh out, as it is indicative of another question: When making a purchase decision, are you led by the technology or the vendor?
Let's explore an example of a technology-driven solution being deployed in production datacenters. I made mention in my last post of Data Domain OpenStorage integration with NetBackup-- noting that our joint customers are achieving high throughput in production environments. High throughput has historically been achievable only by using LOTS of disk spindles, back-ending a tape emulation interface (VTL). Data Domain's SISLTM architecture does it with very few disks (which is critical to delivering the cost benefits of deduplication, as discussed here), and the OST interface allows us to further optimize performance at a protocol level. In addition, using the OST interface with Data Domain systems gives NetBackup the ability to manage the WAN efficient replication process between Data Domain systems. Called "Optimized Duplication", this is a key feature enabling administrators to leverage every copy of their data in the distributed environment.
Looking deeper, these customers are also adopting 10gbps Ethernet to connect the NBU Media servers to the Data Domain systems (4gbps FC doesn't seem so fast anymore, does it?), deploying their media servers as a scalable tier of data movers (with dynamic load balancing), and otherwise seeking out, and addressing, the bottlenecks within their infrastructure. This kind of approach to systems architecture is typical of organizations interested in solving their challenges around storage and data protection, and results in a best of breed solution.
The process tends to propagate, as these same customers ask themselves how they can apply the Data Domain system to solve other problems. Why limit yourself to backup, when you can use the same data to automatically refresh reporting and development instances of databases, or ensure that a DR site thousands of miles beyond the reach of block-array based replication technologies is updated daily? Why limit yourself to data protection, when you can use the same storage for cost-effective tier 2 storage as well?
Our customers doing this today know that the limits dissolve when they allow the technology to point them towards the future. Ask yourself, whose lead will you follow?
How many of the customers adopting 10 gig ethernet are getting 10 gbps? My best guess would be none. Who actually got 1 gbps with the roll out of gig e? At least with FC you actually get the advertised 4 gbps. Who is really getting 10 gbps from 10 gig ethernet? Remember, the IP protocol doesn't do a good job in short distances. It was designed for long distances, hence all of the overhead it carries. That overhead is not needed in short distances (like backup) but it still employs it. I have never seen a vendor that provides VTL in their hardware so anti VTL like DataDomain is. Why is that? Why so anti FC from DD?
Posted by: Aaron Kristoff | 02/02/2009 at 11:45 AM
Aaron – thanks for engaging in the discussion. You raise a couple of fair questions. Firstly, I do not mean to imply that 10gbps Ethernet is an inherently superior protocol to 4gbps Fibre Channel. But to be architecturally viable, it doesn’t have to be, as even at relatively modest utilization, there is throughput parity between the two. Today, every major backup vendor provides native support for Data Domain systems as a NAS target, and some, such as Symantec with OpenStorage, go much further – as a whole, these interfaces are simple to manage, with none of the moving parts associated with tape. Our storage systems have all of these options (NAS/OST via Ethernet, VTL via FC), and we encourage our customers to investigate the questions of interface and transport independently, to choose the solution that best meets their needs.
Posted by: Daniel Budiansky | 02/02/2009 at 06:06 PM
Performance in the data center is a chain of various links - performance is impacted by a number of things - bandwidth is just one of them. Getting the data off of the backup servers is a big one and that often is the bottleneck. But it is all additive - when we can advance one aspect it creates efficiencies that can be leveraged.
The argument of FC versus Ethernet will eventually dissipate - since FCoE will be the ultimate convergence of the two over time.
Posted by: Tony Asaro | 02/03/2009 at 08:10 AM
I don't think DD is anti-FC or anti-VTL. We're simply agnostic. All of that stuff is ultimately plumbing to move data from point A to point B. The goal is simplicity and seamless integration into existing infrastructure. True, FC is present in most large data centers, whereas 10GbE is still gaining a foothold. So now we're in a state of emerging transport parity. So long as the transport is not the bottleneck, it is invisible to the process. However, no one can argue that VTL wins over disk-as-disk in terms of simplicity. Therefore, when performance is equally transparent across transports, and either infrastructure is available, then simplicity becomes the deciding factor. I think the point of Daniel's blog was to raise awareness about the changing landscape of high performance backup. Inclusion of a viable Ethernet competitor to FC is fair. And as Ethernet becomes more competitive to FC in the Data Center, perhaps it will prompt new innovation in the FC world as well in terms of simplicity and interoperability. When there is competition the customer wins.
Posted by: Rich Colbert | 02/04/2009 at 12:17 PM