1 MosaStore -A Versatile Storage System Lauro Costa, Abdullah Gharaibeh, Samer Al-Kiswany, Matei Ripeanu, Emalayan Vairavanathan, (and many others from.

Slides:

Advertisements

Similar presentations

Copyright © 2007, GemStone Systems Inc. All Rights Reserved. Optimize computations with Grid data caching OGF21 Jags Ramnarayan Chief Architect, GemFire.

Advertisements

Introduction to Grid Application On-Boarding Nick Werstiuk

Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.

1 StoreGPU Exploiting Graphics Processing Units to Accelerate Distributed Storage Systems NetSysLab The University of British Columbia Samer Al-Kiswany.

Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon.

1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.

Cloud Computing to Satisfy Peak Capacity Needs Case Study.

Virtualizing Enterprises: Challenges Harrick M. Vin Vice President and Chief Scientist Tata Research Development and Design Centre, Tata Consultancy Services.

The Energy Case for Graph Processing on Hybrid Platforms Abdullah Gharaibeh, Lauro Beltrão Costa, Elizeu Santos-Neto and Matei Ripeanu NetSysLab The University.

1 The Case for Versatile Storage System NetSysLab The University of British Columbia Samer Al-Kiswany, Abdullah Gharaibeh, Matei Ripeanu.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

1 Harvesting the Opportunity of GPU- based Acceleration Matei Ripeanu Networked Systems Laboratory (NetSysLab) University of British Columbia Joint work.

Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.

An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.

1 Harvesting the Opportunity of GPU-Based Acceleration for Data-Intensive Applications Matei Ripeanu Networked Systems Laboratory (NetSysLab) University.

Are P2P Data-Dissemination Techniques Viable in Today's Data- Intensive Scientific Collaborations? Samer Al-Kiswany – University of British Columbia joint.

Integrated Scientific Workflow Management for the Emulab Network Testbed Eric Eide, Leigh Stoller, Tim Stack, Juliana Freire, and Jay Lepreau and Jay Lepreau.

Enabling Cross-Layer Optimizations in Storage Systems with Custom Metadata Elizeu Santos-Neto Samer Al-Kiswany Nazareno Andrade Sathish Gopalakrishnan.

Where to go from here? Get real experience building systems! Opportunities: 496 projects –More projects:

1 stdchk : A Checkpoint Storage System for Desktop Grid Computing Matei Ripeanu – UBC Sudharshan S. Vazhkudai – ORNL Abdullah Gharaibeh – UBC The University.

Beyond Music Sharing: An Evaluation of Peer-to-Peer Data Dissemination Techniques in Large Scientific Collaborations Thesis defense: Samer Al-Kiswany.

1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.

Resource Fabrics: The Next Level of Grids and Clouds Lei Shi.

1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.

A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu.

Energy Prediction for I/O Intensive Workflow Applications 1 Hao Yang, Lauro Beltrão Costa, Matei Ripeanu NetSysLab Electrical and Computer Engineering.

Computer System Architectures Computer System Software

11 If you were plowing a field, which would you rather use? Two oxen, or 1024 chickens? (Attributed to S. Cray) Abdullah Gharaibeh, Lauro Costa, Elizeu.

Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.

Emalayan Vairavanathan

DISTRIBUTED COMPUTING

The Center for Autonomic Computing is supported by the National Science Foundation under Grant No NSF CAC Seminannual Meeting, October 5 & 6,

1. 2 Corollary 3 System Overview Second Key Idea: Specialization Think GoogleFS.

Cluster Reliability Project ISIS Vanderbilt University.

Experience with Using a Performance Predictor During Development a Distributed Storage System Tale Lauro Beltrão Costa *, João Brunet +, Lile Hattori #,

1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

Armin Bahramshahry August  Background  Problem  Solution  Evaluation  Summary.

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

Energy Prediction for I/O Intensive Workflow Applications 1 MASc Exam Hao Yang NetSysLab The Electrical and Computer Engineering Department The University.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

1 A Framework for Data-Intensive Computing with Cloud Bursting Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The Ohio.

Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.

IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.

The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.

DOE PI Meeting at BNL 1 Lightweight High-performance I/O for Data-intensive Computing Jun Wang Computer Architecture and Storage System Laboratory (CASS)

Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.

Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

Case Studies in Storage Access by Loosely Coupled Petascale Applications Justin M Wozniak and Michael Wilde Petascale Data Storage Workshop at SC’09 Portland,

WS2012 File and Storage Services Management Name Jeff Alexander Technical Evangelist – Windows Infrastructure Microsoft Australia

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

LIGO-G E LIGO Scientific Collaboration Data Grid Status Albert Lazzarini Caltech LIGO Laboratory Trillium Steering Committee Meeting 20 May 2004.

TeraScale Supernova Initiative: A Networker’s Challenge 11 Institution, 21 Investigator, 34 Person, Interdisciplinary Effort.

1 If you were plowing a field, which would you rather use? Two oxen, or 1024 chickens? (Attributed to S. Cray)

1 If you were plowing a field, which would you rather use? Two oxen, or 1024 chickens? (Attributed to S. Cray)

Lawrence H. Landweber National Science Foundation SC2003 November 20, 2003

A Two-phase Execution Engine of Reduce Tasks In Hadoop MapReduce XiaohongZhang*GuoweiWang* ZijingYang*YangDing School of Computer Science and Technology.

Lawrence Livermore National Laboratory 1 Science & Technology Principal Directorate - Computation Directorate Scalable Fault Tolerance for Petascale Systems.

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

SDN-SF LANL Tasks. LANL Research Tasks Explore parallel file system networking (e.g. LNet peer credits) in order to give preferential treatment to isolated.

An Introduction to GPFS

ChinaGrid: National Education and Research Infrastructure Hai Jin Huazhong University of Science and Technology

Predrag Buncic CERN Data management in Run3. Roles of Tiers in Run 3 Predrag Buncic 2 ALICEALICE ALICE Offline Week, 01/04/2016 Reconstruction Calibration.

Mini-Workshop on multi-core joint project Peter van Gemmeren (ANL) I/O challenges for HEP applications on multi-core processors An ATLAS Perspective.

Flash based AFM caches in compute environments

Organizations Are Embracing New Opportunities

SDM workshop Strawman report History and Progress and Goal.

A Software-Defined Storage for Workflow Applications

Presentation transcript:

1 MosaStore -A Versatile Storage System Lauro Costa, Abdullah Gharaibeh, Samer Al-Kiswany, Matei Ripeanu, Emalayan Vairavanathan, (and many others from UBC, ANL, ORNL) Networked Systems Laboratory (NetSysLab) University of British Columbia

2 A golf course … … a (nudist) beach (… and 199 days of rain each year) Networked Systems Laboratory (NetSysLab) University of British Columbia

The Landscape Storage System Middleware SupercomputersDesktop GridsCloud Computing WorkflowsCheckpointingData Analysis Diverse platform capabilities Diverse workload characteristics Challenge: Design an efficient storage system middleware CCCC

4 Motivation: Underprovisioned storage systems on manyHPC platforms (e.g., BlueGene/P at ANL) 10 Gb/s Switch Complex GPFS 24 servers IO rate : 8GBps = 51KBps / core 2.5K IO Nodes 850 MBps per 64 nodes 160K cores Hi-Speed Network 2.5 GBps per node The shared storage is a bottleneck There are underutilized resources close to application

5 Solution: a temporary shared datastore 10 Gb/s Switch Complex GPFS 24 servers IO rate : 8GBps = 51KBps / core 2.5K IO Nodes 850 MBps per 64 nodes 160K cores Shared data-store 2.5 GBps per node Nodes dedicated to an application Storage system coupled with the application’s execution

6 Benefits 10 Gb/s Switch Complex GPFS 24 servers IO rate : 8GBps = 51KBps / core 2.5K IO Nodes 850 MBps per 64 nodes 160K cores Shared data-store 2.5 GBps per node Storage closer to the application. Ability to specialize

Evaluation: Harnessing ‘Close to Application’ Underutilized Resources Overall: 1.52x Workflow Stages (DOCK6) Read input, compute, and write temporary results Summarize, sort, and select Archive Storage Optimizations Cache the input data Cache temporary files Asynch. flush results to GPFS Results (8K cores) 1.06x 11.76x 1.51x Exploiting the underutilized resources can critically improve the storage system performance Zhang et. al., “Design and Evaluation of a Collective I/O Model for Loosely- coupled Petascale Programming”, MTAGS ’08.

Evaluation: Specialization MosaStore throughput at larger scale (pool of 35 nodes) Experiment by: Henry Monti (VirginiaTech) on Cray XT4 cluster at ORNL Deduplication benefits a checpointing workload 3x higher throughput 25-70% less storage space and network effort Scales to hundreds of clients Specialization can critically improve the storage system performance [S. Al-Kiswany, M. Ripeanu, S. Vazhkudai, A. Gharaibeh, “stdchk: A Checkpoint Storage System for Desktop Grid Computing”, ICDCS ‘08]

Summary so far MosaStore: versatile storage architecture, that :  Exploits underutilized resources ‘close`to the application.  Supports specialization and configurability System is  Configured at deployment time  Deployment lifetime coupled with that of the target application. [S. Al-Kiswany, A. Gharaibeh, M. Ripeanu, “The Case for a Versatile Storage System”, HotStorage’09]

MosaStore - Storage System Prototype Goals: (1) exploration platform, and (2) support for large-scale computational science research projects. MosaStore - Storage System Prototype Goals: (1) exploration platform, and (2) support for large-scale computational science research projects. Versatile Storage Configurable and extensible storage system that can be specialized for a broad set of apps. [ICDCS ’08, HotStorage ’09] Configurable and extensible storage system that can be specialized for a broad set of apps. [ICDCS ’08, HotStorage ’09] How to harness massively multicore processors to support storage system operations? [HPDC ’08, JoCC‘09, IPCCC’09, HPDC`10] How to harness massively multicore processors to support storage system operations? [HPDC ’08, JoCC‘09, IPCCC’09, HPDC`10] StoreGPU Cross-layer Optimizations Can one enable cross-layer optimizations? [HPDC HotTopics ’08, CCGrid`12, WSLF`11] Can one enable cross-layer optimizations? [HPDC HotTopics ’08, CCGrid`12, WSLF`11] CMFS API Automating config. choice How I choose a good configuration for my application? [ERSS`11¸ GRID`10] How I choose a good configuration for my application? [ERSS`11¸ GRID`10]

Application  Storage System  Applications can present hints on the desired use of the data: e.g., desired replication levels, caching, data importance, etc Storage System  Application  Storage can expose storage-level attributes e.g., file location characteristics, file health status, Today: applications and storage systems treat data items uniformly Opportunity: additional information can enable differentiated treatment of data items POSIX API Custom Metadata Our use-case: A workflow aware file system

Workflow Applications Montage workflow File based communication Irregular and application- dependant data access s of process, runs for weeks Generate large I/O volumes (100TB cumulative). 12 Source [Zhao et. al, 2012] 512 BG/P cores, GPFS intermediate file system

I/O patterns in Workflow Applications Pipeline Broadcast Reduce Scatter Gather 13 Case studies in storage access by loosely coupled petascale applications, Wozniak et al, PDWS, 2009

Application: Montage 14 < Stages 6, 7,8 Pipeline pattern Stage - 10 Reduce pattern Stage - 9 Pipeline pattern Stage - 5 Reduce pattern

I/O Patterns and Storage Optimizations PipelineLocality aware scheduling BroadcastReplication ReduceData placement Locality-aware scheduling ScatterBlock-level placement Locality-aware scheduling GatherBlock level co-placement Locality-aware scheduling PatternOptimizations 15 Data-item specific patterns and optimizations! Need for information flows in both directions Idea: Cross-layer communication to support this

A workflow-aware file system Thesis: cross-layer communication supported by file-level metadata the key mechanism to enable a workflow-aware file system Progress so far: promising evaluation of potential gains (CCGrid`12) Next step: build the system and evaluate it with applications (?SC`12) 16

MosaStore - Storage System Prototype Goals: (1) exploration platform, and (2) support for large-scale computational science research projects. MosaStore - Storage System Prototype Goals: (1) exploration platform, and (2) support for large-scale computational science research projects. Versatile Storage Configurable and extensible storage system that can be specialized for a broad set of apps. [ICDCS ’08, HotStorage ’09] Configurable and extensible storage system that can be specialized for a broad set of apps. [ICDCS ’08, HotStorage ’09] Harnessing massively multicore processors to support storage system operations. [HPDC ’08, JoCC‘09, IPCCC’09, HPDC`10] Harnessing massively multicore processors to support storage system operations. [HPDC ’08, JoCC‘09, IPCCC’09, HPDC`10] StoreGPU Cross-layer Optimizations Enabl bidirectional cross-layer optimizations. [HPDC HotTopics ’08, CCGrid`12, WSLF`11] Enabl bidirectional cross-layer optimizations. [HPDC HotTopics ’08, CCGrid`12, WSLF`11] CMFS API Automating config. choice How I choose a good configuration for my application? [ERSS`11¸ GRID`10] How I choose a good configuration for my application? [ERSS`11¸ GRID`10]

Thank you