Gagan Agrawal The Ohio State University

Slides:

Advertisements

Similar presentations

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.

Advertisements

Spark: Cluster Computing with Working Sets

Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.

MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.

Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.

Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.

Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.

MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.

1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.

PAGE: A Framework for Easy Parallelization of Genomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering The Ohio.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal

HPDC 2014 Supporting Correlation Analysis on Scientific Datasets in Parallel and Distributed Settings Yu Su*, Gagan Agrawal*, Jonathan Woodring # Ayan.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

1 A Framework for Data-Intensive Computing with Cloud Bursting Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The Ohio.

ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

1 Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University.

HPDC 2013 Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices Yu Su*, Gagan Agrawal*, Jonathan Woodring # Kary Myers #, Joanne Wendelberger.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.

By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.

GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Rapid Tomographic Image Reconstruction via Large-Scale Parallelization Ohio State University Computer Science and Engineering Dep. Gagan Agrawal Argonne.

PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.

Research in In-Situ Data Analytics Gagan Agrawal The Ohio State University (Joint work with Yi Wang, Yu Su, and others)

AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.

Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.

Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.

Big Data is a Big Deal!.

WP18, High-speed data recording Krzysztof Wrona, European XFEL

SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.

A Dynamic Scheduling Framework for Emerging Heterogeneous Systems

Computing and Compressive Sensing in Wireless Sensor Networks

Conception of parallel algorithms

Spark Presentation.

Parallel Programming By J. H. Wang May 2, 2017.

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.

Smart: A MapReduce-Like Framework for In-Situ Scientific Analytics

Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.

Supporting Fault-Tolerance in Streaming Grid Applications

Year 2 Updates.

Tools and Techniques for Processing (and Management) of Data

Applying Twister to Scientific Applications

MapReduce Simplied Data Processing on Large Clusters

Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

湖南大学-信息科学与工程学院-计算机与科学系

Communication and Memory Efficient Parallel Decision Tree Construction

Cse 344 May 4th – Map/Reduce.

CS110: Discussion about Spark

Optimizing MapReduce for GPUs with Effective Shared Memory Usage

Lecture 7: Index Construction

Data-Intensive Computing: From Clouds to GPU Clusters

Parallel Computation Patterns (Reduction)

1/15/2019 Big Data Management Framework based on Virtualization and Bitmap Data Summarization Yu Su Department of Computer Science and Engineering The.

Declarative Transfer Learning from Deep CNNs at Scale

CS 345A Data Mining MapReduce This presentation has been altered.

Tomography at Advanced Photon Source

Yi Wang, Wei Jiang, Gagan Agrawal

Chapter 2 Lin and Dyer & MapReduce Basics Chapter 2 Lin and Dyer &

Emulator of Cosmological Simulation for Initial Parameters Study

MapReduce: Simplified Data Processing on Large Clusters

Lecture 29: Distributed Systems

Supporting Online Analytics with User-Defined Estimation and Early Termination in a MapReduce-Like Framework Yi Wang, Linchuan Chen, Gagan Agrawal The.

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

Research in Middleware Systems For In-Situ Data Analytics and Instrument Data Analysis Gagan Agrawal The Ohio State University (Joint work with Yi Wang, Yu Su, Tekin Bicer and others)

Compression/Summarization of Streaming Data 12/4/2018 Outline Middleware Systems Work on In Situ Analysis Analysis of Instrument Data Compression/Summarization of Streaming Data Post analysis using just summary Before I introduce our method, let’s first look at what is in-situ analysis, and why do we need in-situ analysis. By definition, in-situ analysis is the process of transforming data at run-time. Each time after the data is collected or simulated in memory, we do several types of transforms before the data is written to disk or transffered through the network.

In Situ Analysis – Simulation Data In-Situ Algorithms No disk I/O Indexing, compression, visualization, statistical analysis, etc. In-Situ Resource Scheduling Systems Enhance resource utilization Simplify the management of analytics code GoldRush, Glean, DataSpaces, FlexIO, etc. Algorithm/Application Level Seamlessly Connected? Platform/System Level These two layers have been extensively studied. Is there anything to do between the two levels?

Opportunity Explore the Programming Model Level in In-Situ Environment Between application level and system level Hides all the parallelization complexities by simplified API A prominent example: MapReduce + In Situ

Challenges Hard to Adapt MR to In-Situ Environment 4 Mismatches MR is not designed for in-situ analytics 4 Mismatches Data Loading Mismatch Programming View Mismatch Memory Constraint Mismatch Programming Language Mismatch

System Overview In-Situ System = Shared-Memory System + Combination = Distributed System – Partitioning In-Situ System Distributed System Shared-Memory System Distributed System = Partitioning + Shared-Memory System + Combination In-Situ System = Shared-Memory System + Combination = Distributed System – Partitioning This is because the input data is automatically partitioned based on simulation code.

Two In-Situ Modes Space Sharing Mode: Enhances resource utilization when simulation reaches its scalability bottleneck Time Sharing Mode: Minimizes memory consumption

Smart vs. Spark To Make a Fair Comparison Bypass programming view mismatch Run on an 8-core node: multi-threaded but not distributed Bypass memory constraint mismatch Use a simulation emulator that consumes little memory Bypass programming language mismatch Rewrite the simulation in Java and only compare computation time 40 GB input and 0.5 GB per time-step K-Means Histogram 62X 92X

Smart vs. Low-Level Implementations Setup Smart: time sharing mode; Low-Level: OpenMP + MPI Apps: K-means and logistic regression 1 TB input on 8–64 nodes Programmability 55% and 69% parallel codes are either eliminated or converted into sequential code Performance Up to 9% extra overheads for k-means Nearly unnoticeable overheads for logistic regression Logistic Regression K-Means Up to 9% extra overheads for k-means: low-level implementation can store reduction objects in contiguous arrays, whereas Smart stores data in map structures, and extra serialization is required for each synchronization. Nearly unnoticeable overheads for logistic regression: only a single key-value pair created.

Tomography at Advanced Photon Source Data deluge is happening in almost all science domains. Light source facilities in US generate several TBs of data per day now and is poised to increased by two orders of magnitude in the next few years. Figure shows the tomography system installed at APS. A sample (inserted by a robot for high-throughput operation) is rotated while synchrotron radiation is passed through it and images are captured by a CCD camera. The current camera-storage bus allows data to be transferred to disk at 100 frames per second (fps), corresponding to 820 MB/s or 67.6 TB/day. Scientific discovery at experimental facilities such as light sources often limited by computational and computing capabilities. EuroPar’15

Tomographic Image Reconstruction Analysis of tomographic datasets is challenging Long image reconstruction/analysis time E.g. 12GB Data, 12 hours with 24 Cores Different reconstruction algorithms Longer computation times Input dataset < Output dataset 73MB vs. 476MB Parallelization using MATE+ Predecessor of Smart System Beam time at light source facilities is a precious commodity, booked months in advance and typically for only a relatively short period of time (a day or perhaps a week). The data need to be analyzed rapidly so that results from one experiment can guide selection of the next—or even influence the course of a single experiment. Otherwise, the users collect as much as they can in the given time – sometimes the data is not useful, sometimes they collect more data than they need. Iterative reconstruction algorithms are computationally intensive and the output data size of the reconstruction is greater than the input data size. EuroPar’15

Mapping to a MapReduce-like API Inputs IS: Assigned projection slices Recon: Reconstruction object dist: Subsetting distance Output Recon: Final reconstruction object /* (Partial) iteration i */ For each assigned projection slice, is, in IS { IR = GetOrderedRaySubset(is, i, dist); For each ray, ir, in rays IR { (k, off, val) = LocalRecon(ir, Recon(is)); ReconRep(k) = Reduce (ReconRep(k), off, val); } /* Combine updated replicas */ Recon = PartialCombination(ReconRep) /* Exchange and update adjacent slices*/ Recon = GlobalCombination(Recon) Recon[i] Node 0 Partial Combination() Local Recon(…) Recon Rep Thread m … Partial Combination Phase Node n Thread 0 Local Reconstruction Phase Global Combination Phase Iteration i Projs. Recon [i-1] Inputs Tomography datasets are stored by using a scientific data format such as HDF5 [6]. Before beginning reconstruction, our middleware reads metadata information from the input dataset and allocates the resources required for the output dataset. Our middleware eliminates these race conditions by creating a replica of the output object for each thread, which we refer to as ReconRep in Fig. 3(a). Local reconstruction corresponds to the mapping phase of the MapReduce processing structure. The user implements the reconstruction algorithms in the local reconstruction function (LocalRecon in Fig. 3(a)); our middleware applies this function to each assigned data chunk. Each data chunk can be a set of slices (for per-slice parallelization) or a subset of rays in a slice (for in-slice parallelization). After all the rays in the assigned data chunks are processed, the threads that are working on the same slices synchronize and combine their replicas by using a user-defined function EuroPar’15

How insights can be obtained in-situ? 12/4/2018 In Situ Analysis How do we decide what data to save? This analysis cannot take too much time/memory Simulations already consume most available memory Scientists cannot accept much slowdown for analytics How insights can be obtained in-situ? Must be memory and time efficient What representation to use for data stored in disks? Effective analysis/visualization Disk/Network Efficient Before I introduce our method, let’s first look at what is in-situ analysis, and why do we need in-situ analysis. By definition, in-situ analysis is the process of transforming data at run-time. Each time after the data is collected or simulated in memory, we do several types of transforms before the data is written to disk or transffered through the network.

Specific Issues Bitmaps as data summarization In-Situ Data Reduction 12/4/2018 Specific Issues Bitmaps as data summarization Utilize extra computer power for data reduction Save memory usage, disk I/O and network transfer time In-Situ Data Reduction In-Situ generate bitmaps Bitmaps generation is time-consuming Bitmaps before compression has big memory cost In-Situ Data Analysis Time steps selection Can bitmaps support time step selection? Efficiency of time step selection using bitmaps Offline Analysis: Only keep bitmaps instead of data Types of analysis supported by bitmaps Before I introduce our method, let’s first look at what is in-situ analysis, and why do we need in-situ analysis. By definition, in-situ analysis is the process of transforming data at run-time. Each time after the data is collected or simulated in memory, we do several types of transforms before the data is written to disk or transffered through the network.

Time-Steps Selection

Efficiency Comparison for In-Situ Analysis - MIC More cores Lower bandwidth Full Data (original): Huge data writing time Bitmaps: Good scalability of both bitmaps generation and time step selection using bitmaps Much smaller data writing time Overall: 0.81x to 3.28x Simulation: Heat3D; Processor: MIC Time steps: select 25 over 100 time steps 1.6 GB per time step (200*1000*1000) Metrics: Conditional Entropy