A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu.

Slides:



Advertisements
Similar presentations
Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.
Advertisements

Three Perspectives & Two Problems Shivnath Babu Duke University.
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
CoMPI: Enhancing MPI based applications performance and scalability using run-time compression. Rosa Filgueira, David E.Singh, Alejandro Calderón and Jesús.
Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon.
1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.
Performance Analysis of Virtualization for High Performance Computing A Practical Evaluation of Hypervisor Overheads Matthew Cawood University of Cape.
1 Distributed Systems Meet Economics: Pricing in Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of.
The Energy Case for Graph Processing on Hybrid Platforms Abdullah Gharaibeh, Lauro Beltrão Costa, Elizeu Santos-Neto and Matei Ripeanu NetSysLab The University.
1 The Case for Versatile Storage System NetSysLab The University of British Columbia Samer Al-Kiswany, Abdullah Gharaibeh, Matei Ripeanu.
AME: An Any-scale many-task computing Engine Zhao Zhang, University of Chicago Daniel S. Katz, CI University of Chicago.
VMFlock: VM Co-Migration Appliance for the Cloud Samer Al-Kiswany With: Dinesh Subhraveti Prasenjit Sarkar Matei Ripeanu.
Where to go from here? Get real experience building systems! Opportunities: 496 projects –More projects:
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
1 stdchk : A Checkpoint Storage System for Desktop Grid Computing Matei Ripeanu – UBC Sudharshan S. Vazhkudai – ORNL Abdullah Gharaibeh – UBC The University.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Analyzing the Energy Efficiency of a Database Server Hanskamal Patel SE 521.
User-Level Process towards Exascale Systems Akio Shimada [1], Atsushi Hori [1], Yutaka Ishikawa [1], Pavan Balaji [2] [1] RIKEN AICS, [2] Argonne National.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
Energy Prediction for I/O Intensive Workflow Applications 1 Hao Yang, Lauro Beltrão Costa, Matei Ripeanu NetSysLab Electrical and Computer Engineering.
1 NETE4631 Managing the Cloud and Capacity Planning Lecture Notes #8.
How to Resolve Bottlenecks and Optimize your Virtual Environment Chris Chesley, Sr. Systems Engineer
Emalayan Vairavanathan
Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce.
การติดตั้งและทดสอบการทำคลัสเต อร์เสมือนบน Xen, ROCKS, และไท ยกริด Roll Implementation of Virtualization Clusters based on Xen, ROCKS, and ThaiGrid Roll.
1. 2 Corollary 3 System Overview Second Key Idea: Specialization Think GoogleFS.
Experience with Using a Performance Predictor During Development a Distributed Storage System Tale Lauro Beltrão Costa *, João Brunet +, Lile Hattori #,
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
Energy Prediction for I/O Intensive Workflow Applications 1 MASc Exam Hao Yang NetSysLab The Electrical and Computer Engineering Department The University.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
1 A Framework for Data-Intensive Computing with Cloud Bursting Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The Ohio.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
Sensitivity of Cluster File System Access to I/O Server Selection A. Apon, P. Wolinski, and G. Amerson University of Arkansas.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.
1/20 Optimization of Multi-level Checkpoint Model for Large Scale HPC Applications Sheng Di, Mohamed Slim Bouguerra, Leonardo Bautista-gomez, Franck Cappello.
1 MosaStore -A Versatile Storage System Lauro Costa, Abdullah Gharaibeh, Samer Al-Kiswany, Matei Ripeanu, Emalayan Vairavanathan, (and many others from.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Case Studies in Storage Access by Loosely Coupled Petascale Applications Justin M Wozniak and Michael Wilde Petascale Data Storage Workshop at SC’09 Portland,
Making Watson Fast Daniel Brown HON111. Need for Watson to be fast to play Jeopardy successfully – All computations have to be done in a few seconds –
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Problem-solving on large-scale clusters: theory and applications Lecture 4: GFS & Course Wrap-up.
Virtualization and Databases Ashraf Aboulnaga University of Waterloo.
An Efficient Threading Model to Boost Server Performance Anupam Chanda.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Load Rebalancing for Distributed File Systems in Clouds.
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
1 Thierry Titcheu Chekam 1,2, Ennan Zhai 3, Zhenhua Li 1, Yong Cui 4, Kui Ren 5 1 School of Software, TNLIST, and KLISS MoE, Tsinghua University 2 Interdisciplinary.
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Brief introduction about “Grid at LNS”
Organizations Are Embracing New Opportunities
Gwangsun Kim, Jiyun Jeong, John Kim
Diskpool and cloud storage benchmarks used in IT-DSS
Adaptive Cache Partitioning on a Composite Core
BD-CACHE Big Data Caching for Datacenters
PA an Coordinated Memory Caching for Parallel Jobs
A Software-Defined Storage for Workflow Applications
Declarative Transfer Learning from Deep CNNs at Scale
Presentation transcript:

A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu

Workflow Example - ModFTDock Protein docking application Simulates a more complex protein model from two known proteins Applications  Drugs design  Protein interaction prediction 2

Background – ModFTDock in Argonne BG/P 3 Backend file system (e.g., GPFS, NFS) Scale: Compute nodes File based communication Large IO volume Workflow Runtime Engine 1.2 M Docking Tasks IO rate : 8GBps = 51KBps / core App. task Local storage App. task Local storage App. task Local storage App. task Local storage App. task Local storage

Source [Zhao et. al] Background – Backend Storage Bottleneck Storage is one of the main bottlenecks for workflows Montage workflow (512 BG/P cores, GPFS backend file system) 4 Scheduling and Idle 40%

Intermediate Storage Approach 5 Backend file system (e.g., GPFS, NFS) App. task Local storage App. task Local storage App. task Local storage Intermediate Storage … POSIX API Workflow Runtime Engine Scale: Compute nodes Stage In Stage Out Source [Zhao et. al] MTAGS 2008

Research Question How can we improve the storage performance for workflow applications? 6

IO-Patterns in Workflow Applications – by Justin Wozniak et al PDSW’09 Pipeline Broadcast Reduce Scatter and Gather 7 Locality and location-aware scheduling Replication Collocation and location-aware scheduling Block-level data placement

IO-Patterns in ModFTDock 1.2 M Dock, Merge and Score instances at large run Average file size 100 KB– 75 MB Stage - 1 Broadcast pattern Stage - 2 Reduce pattern Stage - 3 Pipeline pattern 8 ModFTDock

Research Question How can we improve the storage performance for workflow applications? 9 Workflow-aware storage: Optimizing the storage for IO patterns Our Answer Traditional approach: One size fits all Our approach: File / block-level optimizations

Integrating with the workflow runtime engine 10 Backend file system (e.g., GPFS, NFS) Workflow Runtime Engine App. task Local storage App. task Local storage App. task Local storage Workflow-aware storage (shared) Compute Nodes … Stage In/Out Storage hints (e.g., location information) Application hints (e.g., indicating access patterns) POSIX API

Outline Background IO Patterns Workflow-aware storage system: Implementation Evaluation 11

Implementation: MosaStore File is divided into fixed size chunks. Chunks: stored on the storage nodes. Manager maintains a block-map for each file POSIX interface for accessing the system MosaStore distributed storage architecture 12

Implementation: Workflow-aware Storage System Workflow-aware storage architecture 13

Implementation: Workflow-aware Storage System Optimized data placement for the pipeline pattern  Priority to local writes and reads Optimized data placement for the reduce pattern  Collocating files in a single storage node Replication mechanism optimized for the broadcast pattern  Parallel replication Exposing file location to workflow runtime engine 14

Outline Background IO Patterns Workflow-aware storage system: Implementation Evaluation 15

Evaluation - Baselines 16 MosaStore, NFS and Node-local storage vs Workflow-aware storage Backend file system (e.g., GPFS, NFS) App. task Local storage App. task Local storage App. task Local storage Intermediate storage (shared) Compute Nodes … Stage In/Out MosaStore NFS Local storage Workflow- aware storage

Evaluation - Platform 17 Cluster of 20 machines.  Intel Xeon 4-core, 2.33-GHz CPU, 4-GB RAM, 1-Gbps NIC, and a RAID- 1 on two 300-GB 7200-rpm SATA disks Backend storage NFS server  Intel Xeon E core, 2.33-GHz CPU, 8-GB RAM, 1-Gbps NIC, and a 6 SATA disks in a RAID 5 configuration NFS server is better provisioned

Evaluation – Benchmarks and Application 18 Synthetic benchmark Application and workflow run-time engine  ModFTDock WorkloadPipelineBroadcastReduce Small 100KB, 200KB, 10KB100KB, 1KB10KB, 100KB Medium 100 MB, 200 MB, 1MB100 MB, 1MB10MB, 200 MB Large 1GB, 2GB, 10MB1 GB, 10 MB100MB, 2 GB

Synthetic Benchmark - Pipeline 19 Average runtime for medium workload Optimization: Locality and location-aware scheduling

Synthetic Benchmarks - Reduce 20 Optimization: Collocation and location-aware scheduling Average runtime for medium workload

Synthetic Benchmarks - Broadcast 21 Optimization: Replication Average runtime for medium workload

Not everything is perfect ! 22 Average runtime for small workload (pipeline, broadcast and reduce benchmarks)

Evaluation – ModFTDock 23 ModFTDock workflow Total application time on three different systems

Evaluation – Highlights 24 WASS shows considerable performance gain with all the benchmarks on medium and large workload (up to 18x faster than NFS and up to 2x faster than MosaStore). ModFTDock is 20% faster on WASS than on MosaStore, and more than 2x faster than running on NFS. WASS provides lower performance with small benchmarks due to metadata overheads and manager latency.

Summary 25 Problem How can we improve the storage performance for workflow applications? Approach Workflow aware storage system (WASS)  From backend storage to intermediate storage  Bi-directional communication using hints Future work Integrating more applications Large scale evaluation

THANK YOU 26 MosaStore: netsyslab.ece.ubc.ca/wiki/index.php/MosaStore Networked Systems Laboratory: netsyslab.ece.ubc.ca