Adaptive Data Refinement for Parallel Dynamic Programming Applications

Slides:

Advertisements

Similar presentations

Dense Linear Algebra (Data Distributions) Sathish Vadhiyar.

Advertisements

Parallelizing Video Transcoding With Load Balancing On Cloud Computing Song Lin, Xinfeng Zhang, Qin Y, Siwei Ma Circuits and Systems, 2013 IEEE.

Achieving Elasticity for Cloud MapReduce Jobs Khaled Salah IEEE CloudNet 2013 – San Francisco November 13, 2013.

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.

A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

An Evaluation of a Framework for the Dynamic Load Balancing of Highly Adaptive and Irregular Parallel Applications Kevin J. Barker, Nikos P. Chrisochoides.

Adaptive Data Collection Strategies for Lifetime-Constrained Wireless Sensor Networks Xueyan Tang Jianliang Xu Sch. of Comput. Eng., Nanyang Technol. Univ.,

Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.

High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.

Task Based Execution of GPU Applications with Dynamic Data Dependencies Mehmet E Belviranli Chih H Chou Laxmi N Bhuyan Rajiv Gupta.

A Load Balancing Framework for Adaptive and Asynchronous Applications Kevin Barker, Andrey Chernikov, Nikos Chrisochoides,Keshav Pingali ; IEEE TRANSACTIONS.

Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.

Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.

Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

Transparent Grid Enablement Using Transparent Shaping and GRID superscalar I. Description and Motivation II. Background Information: Transparent Shaping.

An Implementation and Performance Evaluation of Language with Fine-Grain Thread Creation on Shared Memory Parallel Computer Yoshihiro Oyama, Kenjiro Taura,

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.

Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.

Advanced Spectrum Management in Multicell OFDMA Networks enabling Cognitive Radio Usage F. Bernardo, J. Pérez-Romero, O. Sallent, R. Agustí Radio Communications.

Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,

EasyPDP Introduction and Conclusion Shanjiang Tang

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.

An Evaluation of Partitioners for Parallel SAMR Applications Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001.

Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.

DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.

Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.

University of Texas at Arlington Scheduling and Load Balancing on the NASA Information Power Grid Sajal K. Das, Shailendra Kumar, Manish Arora Department.

Concurrency and Performance Based on slides by Henri Casanova.

Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.

Experience Report: System Log Analysis for Anomaly Detection

Auburn University

Hybrid BDD and All-SAT Method for Model Checking

Introduction to Load Balancing:

Ioannis E. Venetis Department of Computer Engineering and Informatics

Zhu Han University of Houston Thanks for Dr. Mingyi Hong’s slides

Controlled Kernel Launch for Dynamic Parallelism in GPUs

Urban Sensing Based on Human Mobility

Edinburgh Napier University

MPC and Verifiable Computation on Committed Data

Resource Elasticity for Large-Scale Machine Learning

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.

Design of a Multi-Agent System for Distributed Voltage Regulation

Task Scheduling for Multicore CPUs and NUMA Systems

Parallel Algorithm Design

Performance Evaluation of Adaptive MPI

Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries Bin Ren, Gagan Agrawal 9/18/2018.

Accelerating MapReduce on a Coupled CPU-GPU Architecture

Using SCTP to hide latency in MPI programs

Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

Linchuan Chen, Xin Huo and Gagan Agrawal

Spare Register Aware Prefetching for Graph Algorithms on GPUs

Department of Computer Science University of California, Santa Barbara

Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar

Optimizing MapReduce for GPUs with Effective Shared Memory Usage

Concurrent Graph Exploration with Multiple Robots

Shanjiang Tang1, Bingsheng He2, Shuhao Zhang2,4, Zhaojie Niu3

CS 584 Lecture7 Assignment -- Due Now! Paper Review is due next week.

Dense Linear Algebra (Data Distributions)

A Map-Reduce System with an Alternate API for Multi-Core Environments

Maria Méndez Real, Vincent Migliore, Vianney Lapotre, Guy Gogniat

Department of Computer Science University of California, Santa Barbara

MapReduce: Simplified Data Processing on Large Clusters

Li Li, Zhu Li, Vladyslav Zakharchenko, Jianle Chen, Houqiang Li

Presentation transcript:

Adaptive Data Refinement for Parallel Dynamic Programming Applications Shanjiang Tang1,2, Ce Yu1, Bu-Sung Lee2,3, Chao Sun1, Jizhou Sun1 1Tianjin University, China 2Nanyang Technological University, Singapore 3HP Labs, Singapore 25th May 2012

OutLine Background & Motivation Adaptive Data Refinement Approach Evaluation Conclusion Tianjin University

Dynamic Programming (DP) What is Dynamic Programming (DP) ? DP is a popular algorithm design technique for the solution of many decision and optimization problems by decomposing the problem at hand into a sequence of interrelated decision or optimization steps that are solved one after the others. Generally, if r represents the cost of a solution composed of sub-problems x1, x2…xl, then r can be written as: r = g(f(x1), f(x2),f(x3), … , f(xl)). Examples: Tianjin University

DP Applications: Bioinformatics DP Algorithms in Bioinformatics Tianjin University

a. Block divisions of DP matrix b. DAG mapping DP Parallelism Block-based Partitioning vs DAG a. Block divisions of DP matrix　　 b. DAG mapping Figure 1: The one-to-one mapping between the data blocks of DP matrix and its corresponding DAG Tianjin University

DP Parallelism Programming Model DAG Data Driven Model S.J. Tang, C. Yu, et al. EasyPDP: An Efficient Parallel Dynamic Programming Runtime System for Computational Biology, IEEE Transactions on Parallel and Distributed Systems, vol 23, no 5, pp. 862-872, May 2012. Tianjin University

DP Parallelism Framework: EasyPDP EasyPDP Framework A shared-memory system implementation of DAG Data Driven Model for DP applications Adopt the dynamic workers pool Current version works with C/C++ and use P-threads Source code (open-source now) downloading at: http://easypdp.sourceforge.net/ http://cs.tju.edu.cn/orgs/hpclab/release/EasyPDP/ Tianjin University

DP Parallelism: Workload Unbalance Wavefront Computation Non-saturated Computing Domain (NCD) Saturated Computing Domain (SCD) Workload Unbalance Not enough tasks evenly shared by workers during the NCD NCD becomes serious when there are hundreds of workers, e.g., cluster environment Varying Workloads for irregular DP in SCD Tianjin University

OutLine Background & Motivation Adaptive Data Refinement Approach Evaluation Conclusion Tianjin University

Adaptive Data Refinement Approach Dynamic Data Repartitioning Re-partition the data block into smaller ones whenever detecting workload unbalanced (i.e., there are idle workers) Challenging Issues: How to keep the data dependency for repartitioned block data? How to detect the workload unbalance during runtime? How many partitions are suitable for a data repartitioning? Tianjin University

Adaptive Data Refinement Approach Multi-Level DAG Model Tianjin University

Adaptive Data Refinement Approach Workload Unbalanced Detection Based on the principle that the load unbalanced is assumed whenever there are idle workers before the whole computation finishes. Our system just need to periodically check the status of workers in the pool. Size of Partitions for a data block Give an empirical result by experiment. It’s fine when set the number of partitions to the square of the number of workers. Tianjin University

Adaptive Data Refinement Approach Mechanism Tianjin University

OutLine Background & Motivation Adaptive Data Refinement Approach Evaluation Conclusion Tianjin University

Analysis of Detailed Execution Tianjin University

Varying Values of Partitions Figure 8: the exploration for the suitable value of partitions for repartitioning. Findings: It’s fine to set the divider d to the number of workers. Tianjin University

Performance Evaluation Figure 9: The improvement rate of the enhanced EasyPDP system with the adaptive data refinement mechanism against the former EasyPDP version with fixed block size. Tianjin University

OutLine Background & Motivation Adaptive Data Refinement Approach Evaluation Conclusion Tianjin University

Conclusion Adaptive Data Refinement Mechanism Currently, we implement it in our EasyPDP framework. In future, we will add it to our distributed framework (EasyHPS). The idea is not limited to DP applications, it can be used in other applications whose dependency can be modeled as DAG. Tianjin University

Thanks!