The CEO of EMC, Joe Tucci, recently claimed that they are the number one vendor in data dedupe. Maybe the statement would be more defensible if he didn't state that EMC was beating Data Domain in the market. This is not the case.
The first issue is that much of the revenue EMC derives from data dedupe is from their Avamar backup software - which is not a competitive solution to Data Domain. If they want to do an apples-to-apples comparison they need to compare their disk-to-disk (D2D) backup appliances - the EDL with data dedupe directly to Data Domain - and that number is nowhere close. Unfortunately EMC can say whatever they want - because they don't have to prove it. On the other hand - Data Domain's success is transparent and open for the world to scrutinize.
Why not add Centera revenue into the mix as well since it does single instancing - a less efficient form of data dedupe? That would really boost their numbers against Data Domain. After all, they add Centera to their NAS numbers and claim the number one spot over NetApp - and no one takes that seriously either.
The reason that this is important is because just lke NetApp established their leadership in NAS - Data Domain is doing the same for D2D dedupe backup and their competitors will create FUD to try and slow this down. So it is important to provide anti-FUD every now and then.
Data Domain is the leader in D2D backup data dedupe appliances - with the most mature and advanced products, the greatest number of implementations, the largest number of customers, the smartest and most experienced support organization, an extensive ecosystem of partners and resellers, the highest revenue numbers and a very strong business model with great margins and real momentum in the market.
Well said Tony. EMC has a track record of throwing freebies into a storage deal, and subsequently claiming the total bottom line price in the segment that the throw-in occupies. We'll never know how much of the $90 million in revenue is even remotely related to an actual, intentional dedupe purchase.
But why would EMC claim leadership over Data Domain in dedupe? Didn't NetApp turn on their free FAS Dedupe license last year and already claim victory in the segment? Maybe Joe saw through the NetApp nonsense as easily as everyone will see through the EMC equivalent. Even deeply loyal EMC customers are well aware of the company's adeptness at financial engineering.
In sharp contrast, Data Domain announced last week major performance increases across their current product line. Due to the CPU centric architecture, this improvement is free to all existing Data Domain customers with a simple software upgrade. To me that's a lot more valuable than a gold star for some accounting wizardry.
Posted by: Rich Colbert | 03/13/2009 at 01:10 PM
It is an important point that both EMC and NetApp have Data Domain on their radar. Steve Duplessie pointed out on a recent blog that he felt that Tucci made a mistake bringing up EMC's so-called leadership over Data Domain in dedupe. We do need to come up with a good market name - just like NetApp did for NAS. That might make the analysis easier and defensible.
Posted by: Tony Asaro | 03/17/2009 at 10:59 AM
Data Deduplication sounds like a wonderful productivity aide and is definitely a good move for email server data base backups, and numerous other applications as well, I am sure.
However, from a mainframe disaster perspective, I am having a problem understanding how saving time and money on tapes and ATL's (Automated Tape Libraries), while a wonderful by-product, is really just as reliable and safe as having multiple actual copies of the actual data residing (and refreshed on a regular basis) at a secure site. The data dedupe process is definitely a wonderful development in storage capabilities, but still has three possible points of failure that could affect the re-generation of the specific data that would be needed for a real disaster situation, not to mention the case where Auditors have a real need to verify that this is an acceptable method of backing up financial (and any other necessary) data.
I am admittedly a beginner in the dedupe learning curve, and would appreciate a little (or a lot) more information about the actual workings of Data Domains different approach on this, before being even halfway serious about having all the financial eggs divided into three separate pieces, and then trying to "put Humpty Dumpty back together" with only two or two and a portion of the third piece, so to speak.
Posted by: Ed Hensley | 03/26/2009 at 06:25 AM
Ed,
These are all great questions, and the answers really start to distiguish the wheat from the chaff in the dedupe segment.
(or non-segment as Brian Biles articulates here: http://dedupethis.typepad.com/brianbilesblog/2009/03/dedupe-and-storage-are-not-commodities.html )
There are a couple of things to think about with tape and tape libraries. It starts with the "Don't put all your eggs in one basket" line of thought. While that is very true, the corollary is "Do you really want your eggs scrambled across several thousand baskets?"
Tape is effectively RAID 0. When it comes time to restore, one bad tape spoils the lot, and subtracts on average 7 days from your recovery point. If that's acceptable, then I'd argue that the data isn't really all that important in the first place. However, since it lives in this case on a very high-end platform I suspect that the data is quite valuable.
Going back to the analogy, dedupe disk is basket number two. Replicated dedupe disk is basket number three. Three baskets, across at least two different physical locations, with potentially dozens of recovery points stored on baskets two and three is a pretty good combination of prudence and functionality.
http://forms.datadomain.com/go/datadomain/WS_WP_DR
The key is for these storage systems to be as resiliant and robust as possible. Hence with Data Domain you have a industry-leading data invulnerabilty architecture to protect data on disk like no other storage system:
http://forms.datadomain.com/go/datadomain/WS_WP_DIA
So with Data Domain, the saving time and money as you mention are wonderful by-products. The goal of our product at inception was to provide a better way to protect data, to overcome the shortcomings and fallability of magnetic tape, and to provide a reasonable means to get data offsite quickly without breaking the bank provisioning excessive WAN bandwidth.
If you take 'the bait' and download the whitepapers I linked to, then I'm sure that you'll be getting a cordial ring from one of our dedicate inside sales folks. Don't be alarmed. All they'll ask is if you'd like for us to connect you with a Data Domain account executive and systems engineer in your area.
Dedupe with Data Domain is very simple. However, there is a lot to learn about what's going on in the storage market in order to make an informed choice. We'd be glad to help and appreciative of the opportunity to talk further.
Posted by: Rich Colbert | 03/26/2009 at 01:59 PM