DOSAS: Mitigating the Resource Contention in Active Storage Systems Chao Chen 1, Yong Chen 1 and Philip C. Roth 2 1 Texas Tech University 2 Oak Ridge National.

Slides:

Advertisements

Similar presentations

A Proposal of Capacity and Performance Assured Storage in The PRAGMA Grid Testbed Yusuke Tanimura 1) Hidetaka Koie 1,2) Tomohiro Kudoh 1) Isao Kojima 1)

Advertisements

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.

A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Jack Lee Yiu-bun, Raymond Leung Wai Tak Department.

Introduction  Data movement is a major bottleneck in data-intensive high performance computing  We propose a Fusion Active Storage System (FASS) to address.

Distributed Processing, Client/Server, and Clusters

RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.

Big Kernel: High Performance CPU-GPU Communication Pipelining for Big Data style Applications Sajitha Naduvil-Vadukootu CSC 8530 (Parallel Algorithms)

1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.

1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.

Speaker: Xin Zuo Heterogeneous Computing Laboratory (HCL) School of Computer Science and Informatics University College Dublin Ireland International Parallel.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Report ： Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.

1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

Computer System Architectures Computer System Software

An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.

MobSched: An Optimizable Scheduler for Mobile Cloud Computing S. SindiaS. GaoB. Black A.LimV. D. AgrawalP. Agrawal Auburn University, Auburn, AL 45 th.

COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.

1 A Look at PVFS, a Parallel File System for Linux Talk originally given by Will Arensman and Anila Pillai.

Pooja Shetty Usha B Gowda.  Network File Systems (NFS)  Drawbacks of NFS  Parallel Virtual File Systems (PVFS)  PVFS components  PVFS application.

Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.

Operating System Support for Virtual Machines Samuel T. King, George W. Dunlap,Peter M.Chen Presented By, Rajesh 1 References [1] Virtual Machines: Supporting.

The Center for Autonomic Computing is supported by the National Science Foundation under Grant No NSF CAC Seminannual Meeting, October 5 & 6,

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3: Operating Systems Computer Science: An Overview Tenth Edition.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.

Active Storage and Its Applications Jarek Nieplocha, Juan Piernas-Canovas Pacific Northwest National Laboratory 2007 Scientific Data Management All Hands.

AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author ： Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source ： Proceedings of the 2nd IASTED.

« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)

Scalable Web Server on Heterogeneous Cluster CHEN Ge.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.

Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.

Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.

Outline 3  PWA overview Computational challenges in Partial Wave Analysis Comparison of new and old PWA software design - performance issues Maciej Swat.

Opportunities in Parallel I/O for Scientific Data Management Rajeev Thakur and Rob Ross Mathematics and Computer Science Division Argonne National Laboratory.

DOE PI Meeting at BNL 1 Lightweight High-performance I/O for Data-intensive Computing Jun Wang Computer Architecture and Storage System Laboratory (CASS)

Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.

A Cyclic-Executive-Based QoS Guarantee over USB Chih-Yuan Huang,Li-Pin Chang, and Tei-Wei Kuo Department of Computer Science and Information Engineering.

Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.

Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.

Processes Introduction to Operating Systems: Module 3.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.

End-to-End Efficiency (E 3 ) Integrating Project of the EC 7 th Framework Programme General View of the E3 Prototyping Environment for Cognitive and Self-x.

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

Monte Carlo Data Production and Analysis at Bologna LHCb Bologna.

Active Storage Processing in Parallel File Systems Jarek Nieplocha Evan Felix Juan Piernas-Canovas SDM CENTER.

Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.

Accelerating High Performance Cluster Computing Through the Reduction of File System Latency David Fellinger Chief Scientist, DDN Storage ©2015 Dartadirect.

Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

Model-driven Data Layout Selection for Improving Read Performance Jialin Liu 1, Bin Dong 2, Surendra Byna 2, Kesheng Wu 2, Yong Chen 1 Texas Tech University.

Fast Data Analysis with Integrated Statistical Metadata in Scientific Datasets By Yong Chen (with Jialin Liu) Data-Intensive Scalable Computing Laboratory.

Synergy.cs.vt.edu VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units Shucai Xiao 1, Pavan Balaji 2, Qian Zhu 3,

CT101: Computing Systems Introduction to Operating Systems.

Using Pattern-Models to Guide SSD Deployment for Big Data in HPC Systems Junjie Chen 1, Philip C. Roth 2, Yong Chen 1 1 Data-Intensive Scalable Computing.

Jialin Liu, Surendra Byna, Yong Chen Oct Data-Intensive Scalable Computing Laboratory (DISCL) Lawrence Berkeley National Lab (LBNL) Segmented.

Parallel Virtual File System (PVFS) a.k.a. OrangeFS

WP18, High-speed data recording Krzysztof Wrona, European XFEL

Jiang Zhou, Wei Xie, Dong Dai, and Yong Chen

Auburn University COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques (2) Dr. Xiao Qin Auburn University.

Collaborative Offloading for Distributed Mobile-Cloud Apps

Outline Midterm results summary Distributed file systems – continued

Optimizing MapReduce for GPUs with Effective Shared Memory Usage

Hadoop Technopoints.

A Software-Defined Storage for Workflow Applications

Assoc. Prof. Dr. Syed Abdul-Rahman Al-Haddad

Department of Computer Science University of California, Santa Barbara

On the Role of Burst Buffers in Leadership-Class Storage Systems

Presentation transcript:

DOSAS: Mitigating the Resource Contention in Active Storage Systems Chao Chen 1, Yong Chen 1 and Philip C. Roth 2 1 Texas Tech University 2 Oak Ridge National Laboratory 1 Cluster-12

Outline Background Active Storage Motivation DOSAS (Dynamic Operation Scheduling Active Storage) Evaluation Conclusion and future work 2 Cluster-12

Background Applications from the area of climate science, astrophysics, etc. are becoming more and more data intensive − reads/ writes a large amount of data. FLASH: Buoyancy-Driven Turbulent Nuclear Burning (75TB-300TB) Climate science (10TB-355TB) GTC: 56TB per 100-hour run and generating 260GB per 120 seconds S3D: 90TB 120-hour run 3 Cluster-12

Background Processing model in current architecture: Data need to be transferred from Storage Nodes to Computer Nodes via network It is very time consuming I/O operations can dominate the system performance 4 Compute Node Disk Storage Node I/O request Data Application Analysis kernel Cluster-12

Active Storage Active Storage was proposed to mitigate such issue, and attracted intensive attention It moves appropriate computations close to where the data is stored, as opposed to moving the data to the compute devices 5 Compute Node Application Disk Storage Node I/O request Result Analysis kernel Data Network bandwidth cost is reduced Cluster-12

Active Storage Two examples of Active Storage: Felix et. al proposed the first prototype based on Lustre [1,2,3]  Implemented in kernel space first  Improved in user space later 6 NAL OST ASOBD OBDfilter ext3 ASDEV Processing Component User Space 1.Evan J Felix, Kevin Fox, Kevin Regimbal, and Jarek Nieplocha. Active Storage Processing in a Parallel File System. In 6th LCI International Conference on Linux Clusters: The HPC Revolution, Chapel Hill, North Carolina, Juan Piernas, Jarek Nieplocha, Evan J. Felix. "Evaluation of Active Storage Strategies for the Lustre Parallel File System". Proceedings of the Supercomputing'07 Conference, November, Juan Piernas, Jarek Nieplocha, "Efficient Management of Complex Striped Files in Active Storage", Proc. Europar' Cluster-12

Active Storage Woo et. al proposed another prototype based on PVFS [4]  It provides a more sophisticated prototype based on MPI  User can register their process kernels 7 Interconnection network Server 1 Client 1 Server nServer 2 Client 2Client n … … Parallel File System API Parallel File System API Active Storage API Active Storage API Parallel File System Client Application Parallel File System API Kernels Disk GPU 4.Seung Woo Son, Samuel Lang, Philip Carns, Robert Ross, and Rajeev Thakur. Enabling Active Storage on Parallel I / O Software Stacks. In 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST), Cluster-12

Performance Improvement of Active Storage The performance of the SUM operation for Traditional Storage (TS) and Active Storage (AS) schemes [4] Active Storage 50.9% improvement [4] 8 Cluster-12

Contention: A Problem for Active Storage High performance computing system may run dozens, or even hundreds of applications simultaneously. A system needs to give good performance to each of the running applications 9 Server 1 Server k Server m AINI AI NI AI NI AI I/O queue AI: Active I/O NI: Normal I/O I/O requests NI AI NI AI m < n APP2APP1 APP m p1p1 p1p1 p2p2 p2p2 pnpn pnpn p1p1 p1p1 p2p2 p2p2 pnpn pnpn p1p1 p1p1 p2p2 p2p2 pnpn pnpn Cluster-12

Offloading computation to storage nodes can improve performance but offloading too much computation causes resource contention and degrades overall performance DOSAS is proposed to balancing the performance gain versus its overhead Contention: A Problem for Active Storage 10 Performance Degradation Cluster-12 It coordinates computer nodes and storage nodes to complete the Active I/O requests automatically to achieve best system performance It enhanced MPI-IO library and is easy to use for application programmers

DOSAS Architecture 11 1.Active Storage Client Active API Processing Kernels 2.Active Storage Server Contention Estimator Active I/O runtime Processing Kernels Processing Kernels: a collection of predefined analysis operations that are widely used in data-intensive applications (such as k- mean, Gaussian-filter) Extended Parallel File System API Parallel File System Client Applications Normal I/O Active I/O Disks Parallel File System API Processing Kernels Client Server Active Storage Client Processing Kernels Active API Active Storage Server Contention Estimator Active I/O runtime Cluster-12

Active Storage Client Runs at each compute node Serve as an interface through enhanced MPI-IO interface (Active API) Assists the storage nodes to complete the I/O without the intervention of applications (Processing Kernels) 12 Cluster-12

Active API 13 Cluster-12 Operation parameter is added to MPI-IO function to invoke related analysis kernel Using a structure for returning result or operation

Active Storage Server Runs at each storage node Schedules the Active I/O requests between Compute node and Storage node (Contention estimator) Collect the result (Active I/O runtime) Serve for active I/O requests (Processing Kernels) 14 Cluster-12

Contention Estimator The task of Contention Estimator is scheduling the I/O requests between the compute node and storage node. The scheduling algorithm would decide whether a request can run using Active Storage or not 15 Active I/ONormal I/OActive I/O Normal I/OActive I/O… I/O queue: Which Active I/O should be served and which should be rejected ? Cluster-12

Contention Estimator Notations: 16 nthe number of I/O requests in I/O queue kThe number of active I/O requests in I/O queue didi The request data size of i-th I/O request DADA The total data size requested by active I/O requests. Thus (if i-th I/O is active I/O) DNDN The total data size requested by normal I/O requests. Thus (if i-th I/O is normal I/O) DThe total request data size in I/O queue. D=D A +D N S C,op The computation capability of each storage node given operation op C C,op The computation capability of each compute node given operation op f(x)The time needed to compute on x size data g(x)The time needed to transfer x size data from storage node to compute node h(x)The data size of the result computed on x size data by active I/O bwThe bandwidth of compute-storage network Table 1. Notations Cluster-12

Contention Estimator Based on above notations, the execution time of a given schedule can be estimated 17 All active I/Os are served: All active I/Os are rejected: Here: (in storage node) (in compute node) or Time for serving active I/O Time of transferring data of normal I/O Time for transferring result of active I/O EX: Cluster-12

Contention Estimator Scheduling problem is modeled as a binary optimization problem: for active I/O, storage node has two choices: accept or reject; for normal I/O, storage node will process it as normal Goal: minimize the total time combinations: 18 Where: for i-th active I/O is accepted; otherwise rejected Cluster-12

Active I/O runtime Execute the scheduling policy of Contention Estimator Interact with ASC and PKs for returning result by filling buf argument of struct result 19 Cluster-12

Evaluation Experiment Platform and Evaluated Operations: 20 PlatformDiscfarm Cluster at Texas Tech # of I/O requests per node1, 2, 4, 8, 16, 32, 64 Network Bandwidth118MB/s Data size of each I/O128MB, 256MB, 512MB and 1GB Total Data Size8GB, 16GB, 32GB and 64GB Evaluated schemesTS: traditional storage, AS: current active storage, DOSAS: proposed approach OperationsComputation ComplexityProcessing Rate SUM1 addition operation per data item860 MB/s 2D Gaussian Filter 9 multiplication operations, 9 addition operations and 1 divide operation per data item 80 MB/s Cluster-12

Impact of resource contention Execution time of SUM under AS and TS scheme with increasing I/O requests, each I/O request 128MB data Execution time of 2D Gaussian Filter under AS and TS scheme with increasing I/O requests, each I/O request 128MB data Processing rate 860MB/s Processing rate 80MB/s 21 Cluster-12 Network Bandwidth 118MB/s

DOSAS Performance Performance Improvement 22 Performance comparison of TS, AS and DOSAS, each I/O request 256MB data Cluster-12

Scheduling Algorithm 23 Case #Algorithm DecisionPractiseJudgment 1Active TRUE 2Active TRUE 3ActiveNormalFALES 4Normal TRUE 5Normal TRUE 6Normal TRUE 7Normal TRUE 8Active TRUE 9Active TRUE 10ActiveNormalFALES 11Normal TRUE 12Normal TRUE 13Normal TRUE 14Normal TRUE 15Active TRUE 16Active TRUE 17ActiveNormalFALES 18Normal TRUE 19Normal TRUE Correctness: 95% Table 2. Partial Scheduling Algorithm Evaluation Result Cluster-12 Evaluate the correctness of the scheduling algorithm

Bandwidth 24 Bandwidth comparison of TS, AS and DOSAS Cluster-12

Conclusion and Future work This study: Demonstrated resource contention has a great impact on performance of active storage DOSAS is introduced to mitigate such challenge issue Carried out experimental tests, and the result shows that DOSAS outperforms existing active storage architectures Evaluated the impact of computation complexity of operators The near-future exascale systems are likely to exhibit even more serious resource contention issues Further research required to address these challenges 25 Cluster-12

Thank You For more information Cluster The paper has been authored by a contractor of the U.S. Government under Contract No. DE-AC05-00OR Accordingly, the U.S. Government retains a non-exclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes