There is a common misconception that the principle known as Occam's Razor can be summed up as "all things being equal, the simplest answer is the right one." While that notion works wonders on a television show where important mysteries need to get resolved in 48 minutes or less, it's not really the point of this very important concept.
Occam's Razor in a modernized nutshell means "don't make things more complicated than they have to be." It's a powerful idea, and one that proves itself over and over again in the real world.
One example that comes to mind is Data Domain's Collection Replication. Collection replication is one of several forms of replication available when using Data Domain systems. The beauty of collection replication is the power of simplicity. Collection replication is an option when a customer wants to perform simple system-to-system mirroring, leveraging the efficiency of data deduplication to replicate vast quantities of data offsite for disaster recovery protection.
Data Domain takes advantage of that inherent simplicity by streamlining the replication process to the greatest extent possible. Because collection replication is aware that all new, unique (i.e. non-duplicate) data segments written to the local filesystem must be transferred, it is able to send that new data across the wire immediately. However, any data that is not new and unique does not need to be sent. In the instance of moves or deletes, collection replication sends across a minimal subset of deduplication-aware 'housekeeping instructions' to the remote side. Once data in the form of a file or virtual tape arrives at the remote system, it is immediately visible and immediately available. This is true even while replication is still in progress and there are still additional files or tapes inbound.
If a typical deduplication effect preemptively eliminates 98% of the data to be transferred, then collection replication ensures that the remaining 2% is moved as quickly as possible. And when compared to other Data Domain replication techniques, you can think of it this way; if multi-site replication functions like a network of highways, then collection replication is like a drag speedway. If all you want to do is move deduplicated data from point A to point B, there isn't a faster way to do it. Period.
There are other powerful use-cases as well. Collection replication is a fantastic option for nearline or archival storage when you want to replicate millions of small files without incurring overhead for the metadata associated with each individual file. There isn't another deduplication-aware replication technology available that has this capability.
Of course, sometimes a simple environment grows more complicated over the course of time. In that circumstance, Data Domain replication that started out as collection based can be converted to any of our other replication topologies quickly and easily, and in most cases without having to resend data that's already been stored at the replication destination.
Collection replication is an enterprise-class capability, and has a major advantage in the marketplace when it comes to serving the high-volume replication needs of the largest data centers. By comparison, neither IBM/Diligent nor Sepaton even have deduplication-aware replication - although they've both been promising it for some time. Other solutions such as the joint EMC/Quantum product, the DL 3000, (formerly known as the DL3D 3000 a.k.a. the DXi7500) must first wait for their slower, post-process deduplication to complete, then for data replication to occur, and then for several other byzantine processes including a 'namespace sync' and possibly some shell script to execute before the data is available. Even then, the data on the remote side is painfully slow to read.
So why is this collection replication unique to Data Domain? The answer has to do with the architecture. First, a system has to perform true inline deduplication to achieve the effective replication throughput of Data Domain's collection replication. All net-new data stored on disk must be immediately known with certainty to be unique. That factor alone disqualifies most of the competition. Second and equally important, a system must have a log-structured file system in order to achieve the efficiency of Data Domain's collection replication. The log-structure of the Data Domain file system effectively queues up data for replication and ensures write-order integrity by design, without the need for complex layers of additional replication code. Those two factors combined ensure that collection replication will remain unique to Data Domain for the foreseeable future.

Your information on IBM/Diligent and Quantum are both incorrect.
First, the minor one. While IBM/Diligent does not have replication built into their VTL, their backups can be replicated to another VTL using storage-based replication. Since they're an inline vendor, the only thing that hits disk (and therefore gets replicated) is the deduplicated data. So while they don't have "dedupe aware" replication, they do have replication that takes advantage of dedupe.
EMC/Quantum never has had to wait for their dedupe process to complete before they replicate. They begin replicating as soon as they begin deduplicating. That's right in the manual. As to the "namespace sync," there was a process they USED to have to do once data had been fully replicated, but that process hasn't been necessary since their 1.1 code shipped, which was mid-Q1. (They do have a restore speed issue when reading from the replicated copy, which you did allude to.)
You are correct on SEPATON, but not for long, so be sure to play that tune as loud as you can for as long as you can.
And, I don't know how many times I have to say it, but post process is NOT slower than inline. What matters is the hardware you throw at the problem and the development team you put behind it. There are winners and losers on both sides of the fence. I watched a demo of a post-process system this week that would leave your fastest box in the dust. The question in the end is not whether it's inline or not; the questions are: how big is it, how fast is it, and how much does it cost?
Posted by: W. Curtis Preston | 05/08/2009 at 02:06 PM
A follow-up on my Diligent comment. The downside to the storage-based replication approach is that it's an all-or-nothing thing, similar to your container-based option. So they are working on VTL-based replication that will allow more granular control. But saying that they do have a way to replicate systems today that takes advantage of dedupe.
Posted by: W. Curtis Preston | 05/08/2009 at 02:09 PM
Curtis, thanks for the comments.
I agree with the minor classification on the Diligent issue. There's no argument however that array based replication is the entire 'set' of writes to disk, whereas Data Domain's collection replication is the 'subset' of writes to disk that are truly new and unique. More interesting are the operational implications. True that with Diligent, the 3rd party array replication gets the data to the remote site - but what must be done in order to use that data? Unless something's changed you have to reboot the remote system first, and probably want to create a writable clone of your luns so you don't start from square-one every time you touch the remote data. In the real world this is such a rare configuration that it's more of an academic question you present. Yes, this configuration exists. But with a polling sample error rate of +/- 3 percent it may in fact not exist.
I noticed you didn't say that EMC/Quantum post-process is not slower than Data Domain's inline. (And I would be willing to entertain a mention of who you believe to have faster or equal post-process speed) That would have directly addressed my message. I'm aware that we're talking about a rolling serial process when it comes to their architecture. However I'm also aware that from first byte backed up to last byte replicated for real-world data sets that EMC/Quantum replication doesn't come close to Data Domain's technology. This is something that can be measured externally and empirically. I'm also aware that the frequency of the namespace sync has been modified, but issues still persist with NAS data, which for EMC/Quantum is a corner case. They have NAS, but it's too slow to be useful. I'm glad we agree on the performance of the replicated copy. That's where two important things happen. First, if you're using the replica for DR, then you have to account for this performance deficit when expressing the RTO. There is an EMC blogger you might be familiar with who directly stated that this performance hit is at least a factor of 4x. This also makes the replica useless for remote tape creation. It may not seem like much of a difference where you make the tape, but many of our customers are eliminating tape operations in numerous locations, and centrally spinning tape from one site for all. You can't create this fan-in architecture with slow reads from replica.
What you did say was that there are some presumably yet to be released products that leave us in the dust. To that I'm of two opinions. First is that it's nice to be the target. The great thing about the future is that everyone has the same runway between now and then. My second thought is that I've seen plenty of controlled demos and so-called validation reports that were not indicative of a product's performance in the real world with real customer data. When I see a shipping product in the real world that does post-process faster than inline, then I'll simply shrug my shoulders and wonder why they don't do inline with their super-fast deduplication technology.
And of course: SEPATON DOESN'T HAVE DEDUPE-AWARE REPLICATION! (loud enough?)
Curtis, I respect your attempt to bring balance to the force. But to put it quite simply(bad pun), from first byte backed up to the last byte safely replicated offsite, there isn't another horse in the race.
Posted by: Rich Colbert | 05/08/2009 at 08:37 PM
++**When poured, the wine is milky and brickish red. Floral and very feminine, it offers a nose of blackberries, lavender, and cassis, with lightly herbal overtones of basil and fresh strawberries. //--
Posted by: oakley sunglasses | 05/16/2011 at 05:47 PM
Thank you Phil for the coverage. It would indeed be great to meet on Tuesday evening and discuss your experience.
This post too really i enjoyed going through it with regards.
Posted by: Coach outlet | 09/15/2011 at 05:31 PM
I think not, for three reasons. The first is that even the adjustments that every economist would favor in principle, such as subtracting depreciation from market value, involve contestable judgments (there is a measure called Net Domestic Product that subtracts normal depreciation from GDP).
Posted by: UGG Amelie Suede | 10/19/2011 at 02:32 AM
I'd actually love to review these boxes on my shopping/lifestyle blog. i wonder if they'd send me a sample box to talk about?
Posted by: red bottom sole | 11/01/2011 at 01:43 PM
eLi ugg boot outlet tAj http://chainarticles.info/ugg-boot-outlet-will-get-throughout-the-cold-season.html wFz cheap ugg boots nMs http://www.mimeface.co.uk/pt/uggs-outlet-were-received-by-many-people/blog.htm
Posted by: RekOffekIcelf | 12/28/2011 at 12:09 AM
qdP chanel handbags udO http://www.kcom.com/member/16477
Posted by: GamnlymnFaimb | 02/02/2012 at 01:16 AM
PAPHUAXCGX kelly bag hermes SKORCEAJYN http://handbagsforlife.yolasite.com/hermes-birkin-replica.php
Posted by: FeedaDarehodA | 02/07/2012 at 01:05 AM