Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory.

Similar presentations


Presentation on theme: "Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory."— Presentation transcript:

1 Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory Department of Computer Science North Carolina State University Contributors Sudharshan Vazhkudai Stephen Scott

2 Beckerman_0611 2 Ma_FL_0611  More data will be produced by PetaFlop machines  High-end computing (HEC) storage systems face severe performance and reliability challenges  Storage failures: significant contributor to system down time  There is a need to reduce I/O related job failures/resubmissions Storage availability challenge – data producer side System# CPUsMTBF/IOutage source ASCI Q81926.5 hrsStorage, CPU ASCI White819240 hrsStorage, CPU Google1500020 reboots/dayStorage, mem Microscopic view  3 to 7% per year of disk failures; 3 to 16% per year of controller failures; up to 12% per year of SAN switches  Ten times the rate expected from the disk vendor specification sheets

3 Beckerman_0611 3 Ma_FL_0611 Storage availability challenge – data consumer side  Data eventually get analyzed and visualized at desktop machines  These are indispensable human interaction devices  They are highly limited in both storage capacity and performance  Meanwhile, many nearby workstations have unused disk space and idle system resources  Large scientific datasets can be striped to donated space  Access can be much faster than using local disk  However, donated nodes can become unavailable at any time

4 Beckerman_0611 4 Ma_FL_0611 Common point: Need to address availability of transient data  Scientific data in HEC settings are different from general-purpose data  They are produced and access in a distributed manner  Supercomputers are neither the source nor the destination of the data  Input data are pre-staged, read-only  Output data are offloaded after run, write-once during simulation  Output data become read-only during analysis  Data are analyzed with temporal and space locality  Storage space is a precious shared resource  New technology is needed for improving data availability

5 Beckerman_0611 5 Ma_FL_0611 FreeLoader overview Enabling trends: Unused storage: More than 50% of desktop storage is unused Immutable data: Data are usually write-once, read-many, with remote source copies Connectivity: Well connected, secure LAN settings FreeLoader aggregate storage cache: Scavenges O(GB) of contributions from desktops Parallel I/O environment across loosely-connected workstations, aggregating I/O as well as network bandwidth NOT a file system, but a low-cost, local storage solution enabling client-side caching and locality

6 Beckerman_0611 6 Ma_FL_0611 Dealing with unreliable FreeLoader nodes Collective downloading Combination of two techniques:  Utilizing multiple nodes in patching missing data  Each node retrieving long, contiguous file segment  Local data shuffling to desired striping pattern Prefix caching  Only storing part of datasets  Overlapping tail-patching with serving cached prefix

7 Beckerman_0611 7 Ma_FL_0611 Extending FreeLoader fault-tolerence to machine room scenario  Node-local storage can be efficiently aggregated  Temporary scratch space to store transient job data  Augmenting parallel file system metadata  Extend file system metadata to include recovery hints  Information regarding “source” and “sink” of user’s job data  Sample metadata items: URIs, credentials, etc  Metadata automatically extracted from job scripts Enables elegant, automatic data recovery and offloading without manual intervention 7 Ma_FL_0611

8 Beckerman_0611 8 Ma_FL_0611 Online data recovery  Motivation  Staged input data have natural redundancy and are immutable  RAID does not help with I/O node failures  Idea: patch data from the staging source  How ?  Enhance parallel file systems with “recovery metadata”  Deploy multiple nodes for parallel patching  Preliminary results  Large remote I/O requests (256MB) and local shuffling can be overlapped with client activity  Data reconstruction from ORNL to PSC shows good scalability 8 Ma_FL_0611

9 Beckerman_0611 9 Ma_FL_0611 Eager offloading of result data  Offloading result-data equally important for end-user visualization and interpretation  Storage system failure and purging of scratch space can cause loss of result-data  Eager offloading:  Transparent data migration using destination embedded in metadata  Data offloading can be overlapped with computation  Can failover to intermediate storage/archives  Needs coordination with parallel file system and job management tools 9 Ma_FL_0611

10 Beckerman_0611 10 Ma_FL_0611 Summary: System overview Supercomputer Center End-user/Mirror Sites Parallel file system Archives Source copy of dataset accessed gsiftp://source/dataset Computer nodes I/O access Failure Online reconstruction of staged data Result-data offloading Data Source Caches near by io-node

11 Beckerman_0611 11 Ma_FL_0611 Contacts Xiaosong Ma Assistant Professor, Department of Computer Science North Carolina State University Joint Faculty, Computer Science and Mathematics Oak Ridge National Laboratory (919) 513-7577 ma@csc.ncsu.edu Sudharshan S. Vazhkudai Computing & Computational Sciences Directorate Computer Science and Mathematics Oak Ridge National Laboratory (865) 576-5547 vazhkudaiss@ornl.gov 11 Ma_FL_0611


Download ppt "Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory."

Similar presentations


Ads by Google