CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

Slides:

Advertisements

Similar presentations

IPDPS Boston Integrating Online Compression to Accelerate Large-Scale Data Analytics Applications Tekin Bicer, Jian Yin, David Chiu, Gagan Agrawal.

Advertisements

Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 15. Seminar Data-Intensive Scalable Computing Laboratory (DISCL) Locality-driven High-level.

CoMPI: Enhancing MPI based applications performance and scalability using run-time compression. Rosa Filgueira, David E.Singh, Alejandro Calderón and Jesús.

A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.

Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory

Adnan Ozsoy & Martin Swany DAMSL - Distributed and MetaSystems Lab Department of Computer Information and Science University of Delaware September 2011.

MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.

ANL Chicago Elastic and Efficient Execution of Data- Intensive Applications on Hybrid Cloud Tekin Bicer Computer Science and Engineering Ohio State.

Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.

Tanzima Z. Islam, Saurabh Bagchi, Rudolf Eigenmann – Purdue University Kathryn Mohror, Adam Moody, Bronis R. de Supinski – Lawrence Livermore National.

High Throughput Compression of Double-Precision Floating-Point Data Martin Burtscher and Paruj Ratanaworabhan School of Electrical and Computer Engineering.

Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,

Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.

1 SciCSM: Novel Contrast Set Mining over Scientific Datasets Using Bitmap Indices Gangyi Zhu, Yi Wang, Gagan Agrawal The Ohio State University.

1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.

HDF5 A new file format & software for high performance scientific data management.

Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal

1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.

Pursuing Faster I/O in COSMO POMPA Workshop May 3rd 2010.

Active Storage and Its Applications Jarek Nieplocha, Juan Piernas-Canovas Pacific Northwest National Laboratory 2007 Scientific Data Management All Hands.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.

HPDC 2014 Supporting Correlation Analysis on Scientific Datasets in Parallel and Distributed Settings Yu Su*, Gagan Agrawal*, Jonathan Woodring # Ayan.

1 A Framework for Data-Intensive Computing with Cloud Bursting Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The Ohio.

CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.

PFPC: A Parallel Compressor for Floating-Point Data Martin Burtscher 1 and Paruj Ratanaworabhan 2 1 The University of Texas at Austin 2 Cornell University.

Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.

VIRTUAL MEMORY By Thi Nguyen. Motivation  In early time, the main memory was not large enough to store and execute complex program as higher level languages.

Mehmet Can Kurt, The Ohio State University Gagan Agrawal, The Ohio State University DISC: A Domain-Interaction Based Programming Model With Support for.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

SC 2013 SDQuery DSI: Integrating Data Management Support with a Wide Area Data Transfer Protocol Yu Su*, Yi Wang*, Gagan Agrawal*, Rajkumar Kettimuthu.

SAGA: Array Storage as a DB with Support for Structural Aggregations SSDBM 2014 June 30 th, Aalborg, Denmark 1 Yi Wang, Arnab Nandi, Gagan Agrawal The.

CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.

GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Supporting Load Balancing for Distributed Data-Intensive Applications Leonid Glimcher, Vignesh Ravi, and Gagan Agrawal Department of ComputerScience and.

Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.

PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.

OSU – CSE 2014 Supporting Data-Intensive Scientific Computing on Bandwidth and Space Constrained Environments Tekin Bicer Department of Computer Science.

DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.

Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,

SDM Center Parallel I/O Storage Efficient Access Team.

Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

Model-driven Data Layout Selection for Improving Read Performance Jialin Liu 1, Bin Dong 2, Surendra Byna 2, Kesheng Wu 2, Yong Chen 1 Texas Tech University.

Evaluating and Optimizing Indexing Schemes for a Cloud-based Elastic Key- Value Store Apeksha Shetty and Gagan Agrawal Ohio State University David Chiu.

Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.

Fast Data Analysis with Integrated Statistical Metadata in Scientific Datasets By Yong Chen (with Jialin Liu) Data-Intensive Scalable Computing Laboratory.

Failure-Atomic Slotted Paging for Persistent Memory

WP18, High-speed data recording Krzysztof Wrona, European XFEL

Distributed Network Traffic Feature Extraction for a Real-time IDS

HDF5 October 8, 2017 Elena Pourmal Copyright 2016, The HDF Group.

Locality-driven High-level I/O Aggregation

Li Weng, Umit Catalyurek, Tahsin Kurc, Gagan Agrawal, Joel Saltz

Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

The Basics of Apache Hadoop

Gagan Agrawal The Ohio State University

Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz

Declarative Transfer Learning from Deep CNNs at Scale

COMP755 Advanced Operating Systems

Efficient Migration of Large-memory VMs Using Private Virtual Memory

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State University Pacific Northwest National Laboratories 1 ‡ ‡

CCGrid 2014 Introduction Increasing parallelism in HPC systems –Large-scale scientific simulations and instruments –NOT so scalable I/O Example: –PRACE-UPSCALE: Decrease in mesh sizes. i.e. more computation and data 2 TB per day; expectation:10-100PB per day –Management of data limits the performance “Big Compute” Opportunities → “Big Data” Problems –Output of large-scale applications slowdown simulation –Storage, management, and transfer issues of “Big Data” –Reading large data and analysis performance Compression 2

CCGrid 2014 Introduction (cont.) Community focus –Storing, managing and moving scientific dataset Compression can further help –Decreased amount of data Increased I/O throughput Better data transfer performance –Increased data analysis and simulation performance But… –Can it really benefit the application execution? Tradeoff between CPU utilization and I/O idle time –What about integration with scientific applications? Effort required by scientists to adopt their application 3

CCGrid 2014 Scientific Data Management Libs. Widely used by the community –PnetCDF (NetCDF), HDF5… NetCDF Format –Portable, self-describing, space-efficient High Performance Parallel I/O –MPI-IO Optimizations: Collective and Independent calls Hints about file system No Support for Compression 4

CCGrid 2014 Parallel and Transparent Compression for PnetCDF Parallel write operations –Size of data types and variables –Data item locations Parallel write operations with compression –Variable-size chunks –No priori knowledge about the locations –Many processes write at once 5

CCGrid 2014 Parallel and Transparent Compression for PnetCDF Desired features while enabling compression: Parallel Compression and Write –Sparse and Dense Storage Transparency –Minimum effort from application developer –Integration with PnetCDF Performance –Different variable may require different compression –Domain specific compression algorithm 6

CCGrid 2014 Outline Introduction Scientific Data Management Libraries PnetCDF Compression Approaches A Compression Methodology System Design Experimental Result Conclusion 7

CCGrid 2014 Compression: Sparse Storage Chunks/splits are created Compression layer applies user provided algs. Compressed splits are written w/ orig. offset addr. Still can benefit I/O –Only compressed data No benefit for storage space 8

CCGrid 2014 Compression: Dense Storage Generated compressed splits are appended locally Net offset addresses are calculated –Requires metadata exchange All compressed data blocks written using collective call Generated file is smaller –Advantages: I/O + storage space 9

CCGrid 2014 Compression: Hybrid Method Developer provides: –Compression ratio –Error ratio Does not require metadata exchange Error padding can be used for overflowed data Generated file is smaller Relies on user inputs 10 Off’ = Off x (1/(comp_ratio-err_ratio)

CCGrid 2014 Compression Methodology Common properties of scientific datasets –Consist of floating point numbers –Relationship between neighboring values Generic compression cannot perform well Domain specific solutions can help Approach: –Differential compression Predict the values of neighboring cells Store the difference 11

CCGrid 2014 Example: GCRM Temperature Variable Compression E.g.: Temperature record The values of neighboring cells are highly related X’ table (after prediction): X’’ compressed values –5bits for prediction + difference Lossless and lossy comp. Fast and good compression ratios 12

CCGrid 2014 PnetCDF Data Flow 1.Generated data is passed to PnetCDF lib. 2.Variable info. gathered from NetCDF header 3.Splits are compressed 1.User defined comp. alg. 4.Metadata info. exchanged 5.Parallel write ops. 6.Synch. and global view 1.Update NetCDF header 13

CCGrid 2014 Outline Introduction Scientific Data Management Libraries PnetCDF Compression Approaches A Compression Methodology System Design Experimental Result Conclusion 14

CCGrid 2014 Experimental Setup Local cluster: –Each node has 8 cores (Intel Xeon E5630, 2.53Ghz) –Memory: 12GB Infiniband network –Lustre file system: 8 OSTs, 4 storage nodes –1 Metadata Sert Microbenchmarks: 34 GB Two data analysis applications: 136 GB dataset –AT, MATT Scientific simulation application: 49 GB dataset –Mantevo Project: MiniMD 15

CCGrid 2014 Exp: (Write) Microbenchmarks 16

CCGrid 2014 Exp: (Read) Microbenchmarks 17

CCGrid 2014 Exp: Simulation (MiniMD) 18 Application Execution Times Application Write Times

CCGrid 2014 Exp: Scientific Analysis (AT) 19

CCGrid 2014 Conclusion Scientific data analysis and simulation app. –Deal with massive amount of data Management of “Big Data” –I/O throughput affects performance –Need for transparent compression –Minimum effort during integration Proposed two compression methods Implemented a compression layer in PnetCDF –Ported our proposed methods –Scientific data compression alg. Evaluated our system –MiniMD: 22% performance, 25.5% storage space –AT, MATT: 45.3% performance, 47.8% storage space 20

CCGrid 2014 Thanks 21

CCGrid 2014 PnetCDF: Example Header 22

CCGrid 2014 Exp: Microbenchmarks Dataset size: 34GB –Timestep: 270MB Comp.: 17.7GB –Timestep: 142MB Chunk size: 32MB # Processes: 64 Strip count: 8 23 Comparing Write Times with Varying Stripe Sizes