Download presentation
Presentation is loading. Please wait.
Published byPhillip Hines Modified over 9 years ago
1
1 © Copyright 2010 EMC Corporation. All rights reserved. EMC Data Domain : Data Protection and Deduplication
2
2 © Copyright 2010 EMC Corporation. All rights reserved. Why backup? Goals –Backups are done for restores Operational Disaster Recovery –Disaster recovery requires offsite backup –Operational recovery requires onsite backup –Need both onsite and offsite copies on disk –Need quick restores Don’t have time for moving physical assets –Protection of personal data & intellectual property
3
3 © Copyright 2010 EMC Corporation. All rights reserved. Why So Much Interest in Data Deduplication? Backup & Archive processes have been overwhelmed by information growth Primary storage efficiency has become a necessity to cope with massive growth ROI drives the compelling appeal of Dedupe –Reduced Storage Capacities –Lower Infrastructure Costs –Improved SLA’s –Efficient Replication for Business Continuance/DR Very important In useEvaluating / In Near – Long Term planNot in Plan Deduplication One of the top 10 Technology Considerations 59% 24% Deploying Deduplication 55% 21% - Source: TheInfoPro Wave 11 Storage Study, 2008
4
4 © Copyright 2010 EMC Corporation. All rights reserved. Why Do Enterprises Still Use Tape? Low upfront cost Tape can store the massive amount of redundant data created by backups Transportable for offsite DR TAPE DISK Backup Storage 5x-10x Primary Primary Storage
5
5 © Copyright 2010 EMC Corporation. All rights reserved. EMC Data Domain: Leadership and Innovation Deduplication storage systems More than 12,000 systems installed More than 4,300 customers More than 2,600 PB under Data Domain protection worldwide A history of industry firsts First Deduplication NAS First Deduplication Volume Replication Largest Deduplication Array First Deduplication Directory Replication First Deduplication Virtual Tape Library First Deduplication Nearline Storage Fastest Backup Controller Cascaded Replication 2003200420052006200720082009 2010 First Deduplication Encryption First Distributed Processing
6
6 © Copyright 2010 EMC Corporation. All rights reserved. Data Domain – works with what you have Database Archive Backup VMware
7
7 © Copyright 2010 EMC Corporation. All rights reserved. Confidential7 De-duplication principles Unique segments (4KB-12KB) – varies “on-the-fly”
8
8 © Copyright 2010 EMC Corporation. All rights reserved. Confidential8 De-duplication principles Unique segments (4KB-12KB) – varies “on-the-fly”
9
9 © Copyright 2010 EMC Corporation. All rights reserved. Second Friday Full Backup BCDEFLGH Data Deduplication: Technology Overview Store more backups in a smaller footprint ABCDEFG HIJ Friday Full Backup ABCDAEFG Mon Incremental ABH Tues Incremental CBI Thurs Incremental ACK Weds Incremental EGJ Backup LogicalEstimated Physical DataReduction Monday Incremental100 GB7–10x 10 GB Tuesday Incremental100 GB7–10x 10 GB KL Wednesday Incremental100 GB7–10x 10 GB Thursday Incremental100 GB7–10x 10 GB Second FRIDAY FULL 1 TB50–60x 18 GB TOTAL 2.4 TB7.8x308 GB FRIDAY FULL 1 TB 2–4x250 GB
10
10 © Copyright 2010 EMC Corporation. All rights reserved. Deduplication Dramatically Reduces Storage Capacity Requirements Deduplication 10–30 times less data stored versus fulls + incrementals with typical retention policies 0 10 20 30 15101520 Weeks in Use Data Stored Deduplication storage Traditional storage
11
11 © Copyright 2010 EMC Corporation. All rights reserved. Multi-Controller Systems with Global Deduplication 1.25 1.5 0.04 Throughput GB/sec. Addressable Capacity in TB Post-RAID (Physical) DD200 (2004) 2011 (est.) Data Domain SISL™ Scalable Architecture: CPU-Centric 70>PB 5 3 Distributed Processing For Single-controller Systems DD880, 7/09 Industry’s Fastest Backup Storage Controller Data Domain Scale 6-Year Improvement Throughput: ~90x Capacity: ~225x
12
12 © Copyright 2010 EMC Corporation. All rights reserved. Inline vs Post-Process Deduplication: Provisioning & Admin Post Process: Deduplication After Storing Inline: Deduplication Before Storing Other activities unimpeded −Predictable −Simpler Process contention increases with #processes −Copy to tape: Too slow to stream tape − Recovery: SLA predictability − Replication: Poor time-to-DR − Deduplication itself if interleaved with backup or restore More admin needed to fight these issues At least 3x disk accesses to shared store Store Dedupe Restore Replicate Restore Replicate? Updedupe?
13
13 © Copyright 2010 EMC Corporation. All rights reserved. Data Integrity: Data Invulnerability Architecture Trust but verify—”hope” is not a strategy Other RAID 6 NVRAM Snapshots Data verification Checksum Deduplication, write to disk Verify Self-healing file system Cleaning Expired data Defrag Verify Global Compression Local Compression RAID File System Generate Checksum Verify Data Verify the file system metadata integrity Verify user data integrity Verify stripe integrity
14
14 © Copyright 2010 EMC Corporation. All rights reserved. Network-Efficient Replication for True Disaster Recovery Lowers WAN costs; improves service level agreements Source: Remote sites Destination: Data Center Hub Supports hundreds of remote sites 1–5% Archive data Backup data Data Domain DDX Array with DD880s Data Domain system Flexible replication One-to-many Many-to-one Bi-directional System-to- system Cascaded Home DB WAN HomeDIR A 95–99% cross-site bandwidth reduction Data Domain system
15
15 © Copyright 2010 EMC Corporation. All rights reserved. Industry’s Most Scalable Inline Deduplication Systems DDX Array Series Software options: DD Boost, DD Virtual Tape Library, DD Replicator, Retention Lock, and DD Encryption Up to 16 Controllers DD140 Remote Office Appliance DD600 Appliance Series DD880 Global Deduplication Array New DD140DD610DD630DD660DD690DD880 Global Deduplication Array DDX Array Speed (Other)450 GB/hr675 GB/hr1.1 TB/hr2.0 TB/hr2.7 TB/hr5.4 TB/hr86.4. TB/hr Speed (DD Boost)490 GB/hr1.3 TB/hr2.1 TB/hr2.7 TB/hr3.9 TB/hr8.8 TB/hr12.8 TB/hr140 TB/hr Logical capacity17–43 TB75–195 TB165–420 TB.520–1.31 PB.710–1.7 PB2.8–7.1 PB5.7–14.2 PB45.6–114 PB Raw capacity1.5 TBUp to 6 TBUp to 12 TBUp to 36 TBUp to 48 TBUp to 192 TBUp to 384 TBUp to 3.07 PB Usable capacity0.86 TBUp to 3.98 TBUp to 8.4 TBUp to 26.1 TBUp to 35.3 TBUp to 142.5 TBUp to 285 TBUp to 2.28 PB
16
16 © Copyright 2010 EMC Corporation. All rights reserved. Why Data Domain? Less disk to resource, less to manage –CPU-centric deduplication –Inline –Green Simple, mature, and flexible –Simple, mature appliance –Nearline tier: any fabric, any software, backup or nearline applications Resilience and disaster recovery –Storage of last resort –Cross-site global compression: data center or remote office
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.