Rich Colbert

  • Rich Colbert is Director of Systems Engineering for Strategic Accounts for EMC Backup Recovery Systems. Rich interacts daily with the largest enterprise customers in an effort to ensure that the backup and recovery solutions provided by EMC meet the needs of the most demanding storage environments.

Subscriptions

  • Use this RSS feed or click on the following links to subscribe to Dedupe Matters.
Bookmark and Share

Main | "Second Generation" Deduplication »

05/07/2009

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00e5521a5e29883401156f7d2ccf970c

Listed below are links to weblogs that reference Collection Replication:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Your information on IBM/Diligent and Quantum are both incorrect.

First, the minor one. While IBM/Diligent does not have replication built into their VTL, their backups can be replicated to another VTL using storage-based replication. Since they're an inline vendor, the only thing that hits disk (and therefore gets replicated) is the deduplicated data. So while they don't have "dedupe aware" replication, they do have replication that takes advantage of dedupe.

EMC/Quantum never has had to wait for their dedupe process to complete before they replicate. They begin replicating as soon as they begin deduplicating. That's right in the manual. As to the "namespace sync," there was a process they USED to have to do once data had been fully replicated, but that process hasn't been necessary since their 1.1 code shipped, which was mid-Q1. (They do have a restore speed issue when reading from the replicated copy, which you did allude to.)

You are correct on SEPATON, but not for long, so be sure to play that tune as loud as you can for as long as you can.

And, I don't know how many times I have to say it, but post process is NOT slower than inline. What matters is the hardware you throw at the problem and the development team you put behind it. There are winners and losers on both sides of the fence. I watched a demo of a post-process system this week that would leave your fastest box in the dust. The question in the end is not whether it's inline or not; the questions are: how big is it, how fast is it, and how much does it cost?

A follow-up on my Diligent comment. The downside to the storage-based replication approach is that it's an all-or-nothing thing, similar to your container-based option. So they are working on VTL-based replication that will allow more granular control. But saying that they do have a way to replicate systems today that takes advantage of dedupe.

Curtis, thanks for the comments.

I agree with the minor classification on the Diligent issue. There's no argument however that array based replication is the entire 'set' of writes to disk, whereas Data Domain's collection replication is the 'subset' of writes to disk that are truly new and unique. More interesting are the operational implications. True that with Diligent, the 3rd party array replication gets the data to the remote site - but what must be done in order to use that data? Unless something's changed you have to reboot the remote system first, and probably want to create a writable clone of your luns so you don't start from square-one every time you touch the remote data. In the real world this is such a rare configuration that it's more of an academic question you present. Yes, this configuration exists. But with a polling sample error rate of +/- 3 percent it may in fact not exist.

I noticed you didn't say that EMC/Quantum post-process is not slower than Data Domain's inline. (And I would be willing to entertain a mention of who you believe to have faster or equal post-process speed) That would have directly addressed my message. I'm aware that we're talking about a rolling serial process when it comes to their architecture. However I'm also aware that from first byte backed up to last byte replicated for real-world data sets that EMC/Quantum replication doesn't come close to Data Domain's technology. This is something that can be measured externally and empirically. I'm also aware that the frequency of the namespace sync has been modified, but issues still persist with NAS data, which for EMC/Quantum is a corner case. They have NAS, but it's too slow to be useful. I'm glad we agree on the performance of the replicated copy. That's where two important things happen. First, if you're using the replica for DR, then you have to account for this performance deficit when expressing the RTO. There is an EMC blogger you might be familiar with who directly stated that this performance hit is at least a factor of 4x. This also makes the replica useless for remote tape creation. It may not seem like much of a difference where you make the tape, but many of our customers are eliminating tape operations in numerous locations, and centrally spinning tape from one site for all. You can't create this fan-in architecture with slow reads from replica.

What you did say was that there are some presumably yet to be released products that leave us in the dust. To that I'm of two opinions. First is that it's nice to be the target. The great thing about the future is that everyone has the same runway between now and then. My second thought is that I've seen plenty of controlled demos and so-called validation reports that were not indicative of a product's performance in the real world with real customer data. When I see a shipping product in the real world that does post-process faster than inline, then I'll simply shrug my shoulders and wonder why they don't do inline with their super-fast deduplication technology.

And of course: SEPATON DOESN'T HAVE DEDUPE-AWARE REPLICATION! (loud enough?)

Curtis, I respect your attempt to bring balance to the force. But to put it quite simply(bad pun), from first byte backed up to the last byte safely replicated offsite, there isn't another horse in the race.

++**When poured, the wine is milky and brickish red. Floral and very feminine, it offers a nose of blackberries, lavender, and cassis, with lightly herbal overtones of basil and fresh strawberries. //--

Thank you Phil for the coverage. It would indeed be great to meet on Tuesday evening and discuss your experience.
This post too really i enjoyed going through it with regards.

I think not, for three reasons. The first is that even the adjustments that every economist would favor in principle, such as subtracting depreciation from market value, involve contestable judgments (there is a measure called Net Domestic Product that subtracts normal depreciation from GDP).

I'd actually love to review these boxes on my shopping/lifestyle blog. i wonder if they'd send me a sample box to talk about?

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment