RAMSES@Ohio-State Year 2 Updates.

Slides:

Advertisements

Similar presentations

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.

Advertisements

Distributed Approximate Spectral Clustering for Large- Scale Datasets FEI GAO, WAEL ABD-ALMAGEED, MOHAMED HEFEEDA PRESENTED BY : BITA KAZEMI ZAHRANI 1.

Spark: Cluster Computing with Working Sets

Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.

SAGE: Self-Tuning Approximation for Graphics Engines

Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information.

A Unified Modeling Framework for Distributed Resource Allocation of General Fork and Join Processing Networks in ACM SIGMETRICS

Software Reliability SEG3202 N. El Kadri.

Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.

Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.

Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.

Mining High Utility Itemset in Big Data

Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

Bi-Hadoop: Extending Hadoop To Improve Support For Binary-Input Applications Xiao Yu and Bo Hong School of Electrical and Computer Engineering Georgia.

Computer Science and Engineering Predicting Performance for Grid-Based P. 1 IPDPS’07 A Performance Prediction Framework.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3,

1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.

Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉教授 : 許毅然作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.

Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.

Euro-Par, 2006 ICS 2009 A Translation System for Enabling Data Mining Applications on GPUs Wenjing Ma Gagan Agrawal The Ohio State University ICS 2009.

ApproxHadoop Bringing Approximations to MapReduce Frameworks

PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.

A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.

System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Gagan Agrawal Department of Computer and Information Sciences Ohio.

High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.

Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,

Packet Size optimization for Supporting Coarse-Grained Pipelined Parallelism Wei Du Gagan Agrawal Ohio State University.

AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.

System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information.

Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.

Parallel Computing Chapter 3 - Patterns R. HALVERSON MIDWESTERN STATE UNIVERSITY 1.

GMProf: A Low-Overhead, Fine-Grained Profiling Approach for GPU Programs Mai Zheng, Vignesh T. Ravi, Wenjing Ma, Feng Qin, and Gagan Agrawal Dept. of Computer.

MapReduce MapReduce is one of the most popular distributed programming models Model has two phases: Map Phase: Distributed processing based on key, value.

TensorFlow– A system for large-scale machine learning

Efficient Evaluation of XQuery over Streaming Data

A Dynamic Scheduling Framework for Emerging Heterogeneous Systems

Spark Presentation.

Ge Yang Ruoming Jin Gagan Agrawal The Ohio State University

Resource Elasticity for Large-Scale Machine Learning

Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.

Supporting Fault-Tolerance in Streaming Grid Applications

Applying Twister to Scientific Applications

Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

Predictive Performance

On Spatial Joins in MapReduce

Communication and Memory Efficient Parallel Decision Tree Construction

Optimizing MapReduce for GPUs with Effective Shared Memory Usage

Gagan Agrawal The Ohio State University

Data-Intensive Computing: From Clouds to GPU Clusters

Gary M. Zoppetti Gagan Agrawal

Declarative Transfer Learning from Deep CNNs at Scale

Yi Wang, Wei Jiang, Gagan Agrawal

TensorFlow: A System for Large-Scale Machine Learning

Gary M. Zoppetti Gagan Agrawal Rishi Kumar

Fast, Interactive, Language-Integrated Cluster Computing

Lecture 29: Distributed Systems

Supporting Online Analytics with User-Defined Estimation and Early Termination in a MapReduce-Like Framework Yi Wang, Linchuan Chen, Gagan Agrawal The.

FREERIDE: A Framework for Rapid Implementation of Datamining Engines

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

RAMSES@Ohio-State Year 2 Updates

Outline Cost Model Based Optimization of Distributed Joins Under submission to SC 2016 Modeling In-Situ Analytics Using SKOPE Using an analytical model Paper in Preparation

DistriPlan - Context Processing (scientific) array data: Geographically distributed Processed using structured operators Stored in file data stores (NetCDF/HDF5/..) What we needed Have an optimizer and an execution engine. Can execute queries remotely and accumulate results. What Complicates It Joins! Many practical operations come down to join over values/dimensions

Motivation Process on one node Process on two nodes From To 1 2 3 From Total of 2 𝑛 Options !!! Too much for current optimizers!! …

DistriPlan Introducing DistriPlan A Cost Based Optimizer (CBO) for distributed array data querying engine. Introduces two new features: Isomorphic plans pruning. Cost model for distributed data

DistriPlan (Isomorphism prunning) SAME PLAN!! Will be counted as one plan, saving engine processing

DistriPlan (Cost Model) Node’s Expected Results Node’s Evaluated Cost

Feasible problem scale Performance Feasible problem scale

Performance Always reaches better performance (or same) as non CBO optimized plans

Performance Parameters for the CBO equations are essential for correct evaluation:

Summary Isomorphism pruning is essential CBO improves performance tremendously So far only emulating wide-area networks Start Using Models Developed by Others?

Modeling In-Situ Analytics Recap: Smart: A MapReduce-like Framework for Developing In-Situ Analytics Modeling problem: predict the time certain analytics will take Modelled a Typical Loop Using SKOPE Developed Analytical Model

Canonical Random Write Loop A canonical random write loop has the following characteristics*: Reduction objects are updated by only associative and commutative operations; No other loop carried dependencies; The element(s) of the reduction object updated in a particular iteration cannot be determined statically or with inexpensive runtime preprocessing Structure of a typical canonical random write loop for data mining task: * Jin, Ruoming, and Gagan Agrawal. "Performance prediction for random write reductions: a case study in modeling shared memory programs.” ACM SIGMETRICS Performance Evaluation Review. Vol. 30. No. 1. ACM, 2002.

A Closed-Format Expression Model for Canonical Loop Paramters Descriptions Selem size of input elements (GB) Sobj size of reduction objects (KB) Niter number of iterations Nnode number of nodes a, b, c, d, e, f factors determined by regression

Modeling Cache Performance For a cache of two-level hierarchy, the memory access time can be modeled as: 𝑇 𝑚𝑒𝑚 = 𝑁 𝑚𝑒𝑚 ∗( 𝑡 𝑙1−ℎ𝑖𝑡 + 𝑃 𝑙1−𝑚𝑖𝑠𝑠 ∗ 𝑡 𝑙2−ℎ𝑖𝑡 + 𝑃 𝑙2−𝑚𝑖𝑠𝑠 ∗ 𝐿 𝑚𝑒𝑚 ) For applications used in our experiments, the cache miss rate is dependent on dataset size(D), data access stride(s), cache capacity(C1 and C2 for L1 and L2 respectively) and block size(B1 and B2 for L1 and L2 respectively)

Modeling Cache Performance By defining parameters in the hardware model in SKOPE, and collecting memory access information such as read, write, access stride from code skeletons, we can utilize SKOPE to predict cache performance Cache Performance Model for a Two-Level Cache Hierarchy

Modeling Page Fault Penalty When the dataset size exceeds a threshold(~10 GB), the execution time shows super-linear growth trend and the number of page faults increases markedly. Execution Time and Page Faults of In-Situ Analytics with Varying Output Dataset Size (per time step)

Modeling Page Fault Penalty The page fault penalty can be modeled as: 𝑇 𝑝𝑎𝑔𝑒𝑓𝑎𝑢𝑙𝑡 = 𝑁 𝑚𝑒𝑚 ∗ 𝑃 𝑝𝑎𝑔𝑒𝑓𝑎𝑢𝑙𝑡 ∗ 𝐿 𝑝𝑎𝑔𝑒𝑓𝑎𝑢𝑙𝑡 The page fault rate is dependent on the memory required by the application (Mreq) and the available physical memory capacity(M): 𝑃 𝑝𝑎𝑔𝑒𝑓𝑎𝑢𝑙𝑡 =( 𝑀 𝑟𝑒𝑞 −𝑀)/ 𝑀 𝑟𝑒𝑞 Memory allocation information can be collected from code skeletons and other parameters can be defined in hardware models.

Predicting Scalability of In-Situ Analytics (Smart) Predicting Scalability of Smart (conducted on the TACC Stampede cluster using 4 nodes )

Predicting Scalability of In-Situ Analytics (Smart) For applications which have better scalability, such as histogram and moving average, both the prediction framework and analytical model perform accurate prediction. For K-means in which there are more synchronization overheads, the extended SKOPE framework outperforms the analytic model approach

Predicting Performance of In-Situ Analytics (Smart) over Large Dataset Involving Page Faults Comparison of Predicted Performance between Original SKOPE and Extended SKOPE (conducted on one node of OSU RI cluster using 4 threads, memory capacity = 12 GB)

Proposed Work Predicting performance for stencil computation Abstracting data access pattern of stencil computation Modeling cache performance Modeling performance for optimization approaches such as tiling