On the Role of Burst Buffers in Leadership-Class Storage Systems

Slides:



Advertisements
Similar presentations
Key Metrics for Effective Storage Performance and Capacity Reporting.
Advertisements

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Misbah Mubarak, Christopher D. Carothers
SLA-Oriented Resource Provisioning for Cloud Computing
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
Towards Autonomic Adaptive Scaling of General Purpose Virtual Worlds Deploying a large-scale OpenSim grid using OpenStack cloud infrastructure and Chef.
1 The Case for Versatile Storage System NetSysLab The University of British Columbia Samer Al-Kiswany, Abdullah Gharaibeh, Matei Ripeanu.
Energy Efficient Prefetching – from models to Implementation 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software Engineering.
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
Chapter 13 Embedded Systems
Measuring Performance Chapter 12 CSE807. Performance Measurement To assist in guaranteeing Service Level Agreements For capacity planning For troubleshooting.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Comparing Coordinated Garbage Collection Algorithms for Arrays of Solid-state Drives Junghee Lee, Youngjae Kim, Sarp Oral, Galen M. Shipman, David A. Dillow,
© Hitachi Data Systems Corporation All rights reserved. 1 1 Det går pænt stærkt! Tony Franck Senior Solution Manager.
Exploiting Virtualization for Delivering Cloud based IPTV Services Speaker : 吳靖緯 MA0G IEEE Conference on Computer Communications Workshops.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
Introduction To Windows Azure Cloud
Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.
Atlanta, Georgia TiNy Threads on BlueGene/P: Exploring Many-Core Parallelisms Beyond The Traditional OS Handong Ye, Robert Pavel, Aaron Landwehr, Guang.
The Center for Autonomic Computing is supported by the National Science Foundation under Grant No NSF CAC Seminannual Meeting, October 5 & 6,
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
MediaGrid Processing Framework 2009 February 19 Jason Danielson.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
Workshop on the Future of Scientific Workflows Break Out #2: Workflow System Design Moderators Chris Carothers (RPI), Doug Thain (ND)
Management for IP-based Applications Mike Fisher BTexaCT Research
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
Non-Data-Communication Overheads in MPI: Analysis on Blue Gene/P P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk Argonne National Laboratory University.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers 1.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Data Center Energy-Efficient Network-Aware Scheduling
HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Lawrence Livermore National Laboratory 1 Science & Technology Principal Directorate - Computation Directorate Scalable Fault Tolerance for Petascale Systems.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
A CASE STUDY IN USING MASSIVELY PARALLEL SIMULATION FOR EXTREME- SCALE TORUS NETWORK CO-DESIGN Misbah Mubarak, Rensselaer Polytechnic Institute Christopher.
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
Introduction to Operating Systems Concepts
Percipient StorAGe for Exascale Data Centric Computing Exascale Storage Architecture based on “Mero” Object Store Giuseppe Congiu Seagate Systems UK.
DENS: Data Center Energy-Efficient Network-Aware Scheduling
TV Broadcasting What to look for Architecture TV Broadcasting Solution
OPERATING SYSTEMS CS 3502 Fall 2017
Lecture 2: Performance Evaluation
University of Maryland College Park
How Much SSD Is Useful For Resilience In Supercomputers
Cultivating Software Quality In Cloud Via Load Testing Tools
Methodology: Aspects: cost models, modelling of system, understanding of behaviour & performance, technology evolution, prototyping  Develop prototypes.
CSE 775 – Distributed Objects Submitted by: Arpit Kothari
GWE Core Grid Wizard Enterprise (
Grid Computing.
Structural Simulation Toolkit / Gem5 Integration
Direct Attached Storage Overview
Software Architecture in Practice
Large Scale Test of a storage solution based on an Industry Standard
BitWarp Energy Efficient Analytic Data Processing on Next Generation General Purpose GPUs Jason Power || Yinan Li || Mark D. Hill || Jignesh M. Patel.
Cloud computing Anton Boyko .NET Developer.
Lock Ahead: Shared File Performance Improvements
World-Views of Simulation
Computer Systems Performance Evaluation
CLUSTER COMPUTING.
Computer Systems Performance Evaluation
Performance And Scalability In Oracle9i And SQL Server 2000
EE 122: Lecture 22 (Overlay Networks)
Dong Hyun Kang, Changwoo Min, Young Ik Eom
Presentation transcript:

On the Role of Burst Buffers in Leadership-Class Storage Systems Ning Liu, Jason Cope, Philip Carns, Christopher Carothers, Robert Ross, Gary Grider, Adam Crume, Carlos Maltzahn Contact: liun2@cs.rpi.edu, chrisc@cs.rpi.edu, rross@mcs.anl.gov

Why do we need burst buffer? Modern HPC storage system is designed to absorb peak I/O burst resulting in bandwidth waste. This storage system design cannot keep pace with the growing data demands as supercomputers evolve to the era of exascale. e.g. statistics from ALCF Intrepid Blue Gene/P system shows that 98.8% of the time the I/O system is at 33% bandwidth capacity or less, and 69.6% of the time the system is at 5% bandwidth capacity or less. Aggregate I/O throughput over one-minute intervals on ALCF BG/P [1] [1] CODES:Enabling Co-Design of Multi-Layer Exascale Storage Architectures Project Proposal

Storage System of Intrepid (BG/P) Assuming the audience need to know details about the targeted storage we modeled.

Software Stack on Intrepid Give the audience a brief view of the software stack on Intrepid, adding BB to IOFSL layer

Adding Burst Buffer to the Storage System Model We build a parallel discrete event based leadership class storage system model[2]. We propose a tier of solid-state disk (SSD) burst buffer integrating to existing I/O nodes. [2] Modeling a Leadership-scale Storage System, N Liu, C Carothers, J Cope, P Carns, R Ross, A Crume, C Maltzahn 9th International Conference on Parallel Processing and Applied Mathematics 2011 (PPAM 2011)

Methodology Why Parallel Discrete-Event Simulation (DES)? Large-scale systems are difficult to understand Analytical models are often constrained Parallel DES simulation offers: Dramatically shrink model’s execution-time Prediction of future “what-if” systems performance Potential for real-time decision support Minutes instead of days Analysis can be done right away Example models: national air space (NAS), ISP backbone(s), distributed content caches, next generation supercomputer systems.

ROSS: Parallel Discrete Event Simulator Developed in ANSI C, API is simple and lean. Using Jefferson’s Time Warp event scheduling mechanism. Reverse computation. Global virtual time algorithm exploits IBM Blue Gene’s fast barrier and collective networks. Ross main page: http://odin.cs.rpi.edu/ross/index.php/Main_Page

Study of Bursty Applications TABLE I: Top four write-intensive jobs on Intrepid, December 2011 Say that these are the drivers of the simulation. Explain these to detail as needed.

Burst Buffer Parallel DES Model

Burst Buffer Model Parameters TABLE II: Summary of relevant SSD device parameters and technology available as of January 2012 Vendor Size (TiB) NAND Bandwidth (GiB/s) Latency (μs)   Write Read FusionIO 0.40 SLC 1.30 1.40 15 47 1.20 MLC 68 Intel 0.25 0.32 0.50 80 65 Virident 0.30 1.10 16 0.60 19 62 Burst buffer latency model:

Single I/O Workload Case Using IOR benchmark ( consecutive 4MiB write requests) Capture write time, excludes open/close time Full storage system uses 128 file servers Half storage system uses 64 Starting from this slides, we will focus on the results. Simulated performance of IOR for various storage system and burst buffer configurations

Investigate Parameter Space Simulated IOR performance for various burst buffer configurations and I/O workloads Simulated application runtime performance for various I/O and burst buffer configurations

Mixed Workload Experiments (1) Full storage system with burst buffer disabled. (Compute Node View) Jobs execution time = 5.5 hours Full storage system with burst buffer enabled. (Compute Node View) Jobs execution time = 4.4 hours

Mixed Workload Experiments (2) Full storage system with burst buffer disabled. (Enterprise Storage View) Full storage system with burst buffer enabled. (Enterprise Storage View)

Mixed Workload Experiments (3) Full storage system with burst buffer enabled. (Compute Node View) Jobs execution time = 4.4 hours Full storage system with burst buffer enabled. (Enterprise Storage View)

Mixed Workload Experiments (4) Half storage system with burst buffer disabled. (Compute Node View) Jobs execution time = 5.8 hours Half storage system with burst buffer enabled. (Compute Node View) Jobs execution time = 4.4 hours

Mixed Workload Experiments (5) Full storage system with burst buffer enabled. (Compute Node View) Jobs execution time = 4.4 hours Half storage system with burst buffer enabled. (Compute Node View) Jobs execution time = 4.4 hours

Mixed Workload Experiments (6) Half storage system with burst buffer disabled. (Enterprise Storage View) Half storage system with burst buffer enabled. (Enterprise Storage View)

Mixed Workload Experiments (7) Half storage system with burst buffer enabled. (Compute Node View) Jobs execution time = 4.4 hours Half storage system with burst buffer enabled. (Enterprise Storage View)

Mixed Workload Experiments (8) Full storage system with burst buffer enabled. (Enterprise Storage View) Half storage system with burst buffer enabled. (Enterprise Storage View)

Summary and Future Work Burst buffer enables the following Increase the application perceived bandwidth Reduce the storage system while still achieving desirable bandwidth Decrease application time to solution Increase storage system bandwidth utilization Absorb bursty I/O workload Future work Investigate the use of burst buffer on different tiers of the storage system Enable the use of simulator framework to extreme scale based on current leadership class storage system model Understand complex system components and facilitate the design through simulation

Acknowledgements Questions? Special thanks to Dr. Jason Cope; Special thanks to my advisor Prof. Christopher Carothers, thesis advisor Dr. Robert Ross; Thanks to Philip Carns, Kevin Harms, Gary Grider, Carlos Maltzahn, and Adam Crume. Thanks everyone here today! Questions?