• Daniel Budiansky

Daniel Budiansky


  • Daniel Budiansky is Enterprise Applications Technologist, EMC Backup Recovery Systems. Daniel has more than 12 years of experience in the IT industry and has implemented Data Domain deduplication storage systems into a broad scope of customer environments.

Subscriptions

  • Use this RSS feed or click on the following links to subscribe to Dedupe Matters.

« WYSIWYG | Main | Measuring Up in the Enterprise »

04/23/2009

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00e5521a5e29883401156f4fb5c7970c

Listed below are links to weblogs that reference If At First You Don't Succeed...:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

I completely agree that deduplicated replication is a very important feature, and I never cease to mention that when I talk to the folks at SEPATON. You are also correct that they don't have it TODAY, but I don't think you will be able to say that for very long. But feel free to say it as loud as you want for as long as you can.

The feature that THEY have that you are missing is global deduplication, which is also a very important feature for the enterprise, and I never cease to mention that when I talk to the folks at Data Domain. The lack of global deduplication will inevitably require a large enterprise to buy more dedupe systems and require them to do things that a customer using global dedupe would not have to do.

Let's look at your 5 PB customer. If we assume their 4 locations are evenly distributed, that's 1.25 PB per location. Suppose they do a weekly full backup, daily incrementals and store backups for 90 days (my default configuration unless customers need more or less). That's 13 full backups (1.25 * 13 = 16.25 PB), and 90 incrementals (1.25 * .10 * 90 = 11.25 PB), for a total of 27.5 PB of backups PER LOCATION. If we dedupe that at 20:1, you'll need 1.375 PB of disk to hold it. Since each DD690 can hold only 35 TB of usable capacity (48 TB raw), your customer will need 40 DD690s PER LOCATION to hold this many backups.

Assuming a rolling weekly full backup and nightly (10%) incremental of everything, they'll be backing up 178.5 TB of fulls per night (1250/7) and 125 TB of incrementals (1250/10) each night. That's 303.5 TB, or just over 7000 MB/s., assuming a 12 hour backup window. The good news is that this is only 175 MB/s when you divide it by 40, so you won't be throughput bound.

BUT, your 5 PB customer will need to take each location's 1.25 PB and divide it into 40 equally-sized backup sets to fully utilize they 40 systems that they bought. While they MAY (and I really do mean MAY) be able to do this when they first configure the system, things never stay the same, and they will need to constantly move backups between devices as the size of some backups grow. This means that in reality, they will probably need more than 40 systems to deal with all of the back-and-forth. (They will also need more than 40 if they don't get 20:1 dedupe.)

If you had global dedupe, the customer would still need to buy 40 systems, but they wouldn't have to care about which system backed up which backup. They could just send all backups to any of the 40 systems and you would do your magic. That's why global dedupe is important for the enterprise.

Now that the Data Domain folks are thoroughly seething, let me throw you a bone or two.

First, while other vendors (like SEPATON) may have global dedupe, none of them have a supported config that can back up AND DEDUPE at 7000 MB/s. Ignoring higher ingest rates and looking only at dedupe rates, as I would when comparing to an inline model, SEPATON would still need five "systems," each consisting of five nodes. If I go with Falconstor's four-node system (the biggest I believe they've actually deployed), they would also need five "systems," each consisting of four nodes. So, while your 5 PB customer will still need to sub-divide their backups into equally sized chunks. It's just that they will need to split it into 5 chunks, not 40.

Also, while your limitation increases the operational difficulty and management cost of enterprise customers, SEPATON's limitation of not having replication stops them dead-cold if they plan to replicate. That is, until they ship deduplicated replication. Then I'm sure you will (rightfully so) change your story to "Well, they HAVE it, but we have thousands of customers using ours. How many customers do they have?"

May you live in interesting times.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.