Enterprise Storage Our Journey Thus Far John D. Halamka MD CIO, Harvard Medical School and Beth Israel Deaconess Medical Center
Agenda Exponential Growth –Issues & Resolution Disk Performance –Issues & Resolution File System Silos –Issues & Resolution
Exponential Growth “The Problem”
Exponential Growth 69,060,583 Files!81.5 Terabytes
Exponential Growth “The Resolution”
Exponential Growth (Resolved) Cluster Size (Number of Nodes) Capacity252 TB 360 TB 1.8 PB 3.45 PB Rack Units Gordon Hall Data Center Markley Data Center Nodes 77 Capacity 252 TB Globally Coherent Cache 28 GB Globally Coherent Cache
Performance Bottlenecks “Disk Performance Issues”
Performance Bottlenecks Research computational requirements change constantly: Orchestra Cluster contains 179 Cluster Nodes (810 CPU Cores) today. Future Growth – –50-75 nodes ( processor cores) each year –Stimulus Grants could realize an additional 392 nodes (3,136 processors). Storage Array Cache (The Problem) Cache on the storage arrays is not globally coherent, causing the single array to fill up its cache, due to the delayed disk reads and writes. Data must be manually moved to increase spindle performance. Disk Spindle Contention (The Problem) Data is striped across the disk spindles to meet the current performance SLA’s. Cluster jobs demand more reads and writes of the file system. The disk spindles that once delivered acceptable performance are no longer able to keep up.
Performance Bottlenecks “The Resolution”
12 AutoBalance: Automated data balancing across nodes EMPTY FULL BALANCED AutoBalance “automatically” migrates data to newly added storage nodes while the system is online and in production. Requires NO manual intervention, NO reconfiguration, NO server or client mount point or application changes. HMS performance requirements increase: Orchestra Cluster adds additional nodes and CPU cores. HMS Performance Solution: Add Storage cluster nodes – Storage capacity is increased – Storage processor is increased – Globally coherent cache is increased Performance Bottlenecks Resolved
File System Silos “The Problem”
File System Silos Storage is assigned to individual Data Movers, creating “Silos” of storage. Storage is provisioned in 2 TB File Systems. This provides maximum flexibility in the event backups fail. 233 File Systems! CPU and Memory are not globally coherent and shared, creating “Silos” of CPU and Memory.
File System Silos “The Resolution”
File System Silos (Resolution) Expandable to more than 3+ Petabytes In 1 File System! All Cluster Nodes have balanced connections! Cluster can grow up to (96) nodes.
Traditional Backups “The Problem”
Traditional Backups The Last 365 Days Fulls-Cumaltives-Differentials: 2,043,628, Megabytes 1, Terabytes 1.9 Petabytes The Last 365 Days Tapes 2073 Tapes $103, Tape Costs! Off-Site Tape Storage $1, per month $22, per year
Traditional Backups “The Resolution”
Replication Across Data Centers Resolved
Data Protection “No Problem”
Current & Future Data Protection Current Data Protection (Research and Administrative) Backup Strategy (Tape)RetentionDays of Protection Description Monthly Full Backups 90 Days3 versions of the file are potentially recoverable, 1 for each month that the “Monthly Backup” was executed. Weekly Cumulative90 Days12 version of the file are potentially recoverable, 1 for each week that the “Weekly Cumulative” was executed. Daily Incremental14 Days14 versions of the file are potentially recoverable, 1 for each day that the “Daily Incremental” was executed Checkpoints3 Days3 versions of the file are potentially recoverable, 1 for each day that the “Checkpoint” was executed. Future Data Protection (Research Data) Backup StrategyRetentionDays of Protection Description ReplicationInfinite 1 version of the file exists (in two locations) and represents the “live” copy of the file Checkpoints30 Days 30 Versions of the file are recoverable, 1 for each day that the “Checkpoint” was executed. Future Data Protection (Administrative Data) Folder TypeBackup Strategy Home DirectoriesMonthly Full, Weekly Cumulative, Daily Incremental ServersMonthly Full, Weekly Cumulative, Daily Incremental DatabaseMonthly Full, Weekly Cumulative, Daily Incremental Microsoft ExchangeMonthly Full, Weekly Cumulative, Daily Incremental
Storage Project Timeline