In a commodity market, differentiation is very low between producers. Suppliers are chosen purely on price and distribution.
Dedupe storage products are starkly differentiated. The products couldn't be more distinct, though in the rush to learn about why Data Domain has grown so fast, it's easy to misunderstand what's going on. Dedupe is not a commodity; it's not even a product category.
Bob Passmore at Gartner makes a persuasive case that nothing in storage is a commodity. Enterprise disks aren't commodities. Storage systems definitely aren't commodities; the lowest common denominator between them is not sufficiently capable or predictable for customers of any reasonable scale.
Deduplication is not a product category. It's not even a specific method. It's an effect: data size gets reduced by pooling redundancies. Depending on vendor, deduplication itself can mean significant variation in:
- Data reduction effects and predictability
- Delays and impact on I/O
- Applicability to applications and data types
- Resilience
- Overheads in hardware and competing system processes
- Throughput
- Replication flexibility and speed
- Packaging (slow backup software, inline storage, post-process kludge)
Production dedupe systems have weird optimizations that you can't find off the shelf. It is not a simple solution to get right, and if you start with the wrong architecture, it won't get fixed. It uses massive system computing and disk access overhead for hours. It is sensitive to the data patterns being input. It's also very sensitive to all the normal problems of storage systems; if there's data loss or corruption, a single block can affect hundreds of files. Vendors who are late to market will take shortcuts. In the end, these only show up in bad side effects. Under stress, the problems will be visible.
Dedupe has no particular packaging consistency. It has no particular interface for debate at standards committees. It's less like an "it" and more like a "them." This site shows just some of the divergent characteristics of some of the vendors.
Storage is not a commodity. Dedupe is not a commodity (it's not even a single category). Dedupe storage is far from being a commodity.
Comments