Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Lustre® Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast) Torben Kling Petersen, PhD Principal Architect.

Similar presentations


Presentation on theme: "Advanced Lustre® Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast) Torben Kling Petersen, PhD Principal Architect."— Presentation transcript:

1 Advanced Lustre® Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast) Torben Kling Petersen, PhD Principal Architect High Performance Computing

2 The Challenge

3 The REAL challenge File system Software Hardware Other Up/down
Slow Fragmented Capacity planning HA (Fail-overs etc) Hardware Nodes crashing Components breaking FRUs Disk rebuilds Cables ?? Software Upgrades / patches ?? Bugs Clients Quotas Workload optimization Other Documentation Scalability Power consumption Maintenance windows Back-ups

4 Tightly integrated solutions
The Answer ?? Tightly integrated solutions Hardware Software Support Extensive testing Clear roadmaps In-depth training Even more extensive testing …..

5 ClusterStor Software Stack Overview
ClusterStor 6000 Embedded Application Server Intel Sandy Bridge CPU, up to 4 DIMM slots FDR & 40GbE F/E, SAS-2 (6G) B/E SBB v2 Form Factor, PCIe Gen-3 Embedded RAID & Lustre support ClusterStor Manager Lustre File System (2.x) Data Protection Layer (RAID 6 / PD-RAID) Linux OS Unified System Management (GEM-USM) Embedded server modules CS 6000 SSU

6 ClusterStor dashboard
Problems found

7 Hardware inventory ….

8 Hardware inventory ….

9 Finding problems ???

10 But things brake …. Especially disk drives … What then ???
Most enterprise NL-SAS HDDs have an AFR of % And some companies use S-ATA with a stated 3% AFR …

11 Let’s do some math …. Large systems use many HDDs to deliver both performance and capacity NCSA BW uses 17,000+ HDDs for the main scratch FS At 3% AFR this means 531 HDDs fail annually That’s ~1.5 drives per day !!!! RAID 6 rebuild time under use is 24 – 36 hours Bottom line, the scratch system would NEVER be fully operational and there would constantly be a risk of loosing additional drives leading to data loss !!

12 Drive Technology/Reliability
Xyratex pre-tests all drives used in ClusterStor™ solutions Each drive is subjected to hours of intense I/O Reads and writes are performed to all sectors Ambient temperature cycles between 40 °C and 5°C Any drive surviving, goes on to additional testing As a result Xyratex disk drives deliver proven reliability with less that 0.3% annual failure rate Real Life Impact On a large system such as NCSA BlueWaters with 17,000+ disk drives, this means a predicted failure of 50 drives per year *“Other vendors” publically state a failure rate of 3%* which (given equivalent number of disk drives) means 500+ drive failures per year With fairly even distribution, the file system will ALWAYS be in a state of rebuild In addition as a file system with wide stripes will perform according to the slowest OST, the entire system will always run in degraded mode ….. *DDN, Keith Miller, LUG 2012

13 Annual Failure Rate of Xyratex Disks
Actual AFR Data (2012/13) Experienced by Xyratex Sourced SAS Drives Xyratex drive failure rate is less than half of industry standard ! At 0.3%, the annual failure would be 53 HDDs

14 Evolution of HDD technology: Impacts System Rebuild Time
As growth in areal density growth slows (<25% per generation), disk drive manufacturers are having to increase the number of heads/platters per drive to continue to increase max capacity per drive y/y 2TB drives today typically includes just 5 heads and 3 platters 6TB drives in 2014 will include a minimum of 12 heads and 6 platters More components will inevitably result in an increase in disk drive failures in the field Therefore systems using 6TB must be able to handle the increase in the number of array rebuild events

15 Why Does HDD Reliability Matter?
The three key factors you must consider are drive reliability, drive size and the rebuild rate of your system The scary fact is: new generation HDD, bigger drives will fail more often Such drive failures are even more impactful on the file system performance and the risk of data loss when using bigger drives such as 6TB or larger !! The rebuild window is bigger and risk of data loss is greater Traditional RAID technology will take up to days to rebuild a single failed 6TB drive Therefore Parity De-clustered RAID Rebuild technology is essential for any HPC system

16

17 Parity Declustered RAID - Geometry
PD RAID geometry for an array is defined as: P drive (N+K+A) example: 41 (8+2+2) P is the total number of disks in the array N is the number of data blocks per stripe K is the number of Parity blocks per stripe A is the number of distributed spare disk drives

18 Grid RAID advantage Rebuild speed increased by more than 3.5 x
No SSDs, no NV-RAM, no accelerators ….. PD-RAID as it was meant to be … Simulation of a 4 TB drive. Time scale factor x1000

19 Thank you ….


Download ppt "Advanced Lustre® Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast) Torben Kling Petersen, PhD Principal Architect."

Similar presentations


Ads by Google