Harp: Collective Communication on Hadoop Bingjing Zhang, Yang Ruan, Judy Qiu.

Slides:

Advertisements

Similar presentations

epiC: an Extensible and Scalable System for Processing Big Data

Advertisements

UC Berkeley Spark Cluster Computing with Working Sets Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica.

Spark: Cluster Computing with Working Sets

Matei Zaharia Large-Scale Matrix Operations Using a Data Flow Engine.

Data-Intensive Computing with MapReduce/Pig Pramod Bhatotia MPI-SWS Distributed Systems – Winter Semester 2014.

SCALABLE PARALLEL COMPUTING ON CLOUDS : EFFICIENT AND SCALABLE ARCHITECTURES TO PERFORM PLEASINGLY PARALLEL, MAPREDUCE AND ITERATIVE DATA INTENSIVE COMPUTATIONS.

APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.

Hybrid MapReduce Workflow Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox Indiana University, US.

Big Data Open Source Software and Projects ABDS in Summary XIII: Level 14A I590 Data Science Curriculum August Geoffrey Fox

Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.

Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Dibbs Research at Digital Science

HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC

Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,

Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.

Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.

Cloud Computing Other High-level parallel processing languages Keke Chen.

Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Introduction to Hadoop and HDFS

SALSASALSASALSASALSA Design Pattern for Scientific Applications in DryadLINQ CTP DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox.

Big Data Open Source Software and Projects ABDS in Summary XVIII: Layer 14A Data Science Curriculum March Geoffrey Fox

SALSA HPC Group School of Informatics and Computing Indiana University.

MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.

Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!

Performance Model for Parallel Matrix Multiplication with Dryad: Dataflow Graph Runtime Hui Li School of Informatics and Computing Indiana University 11/1/2012.

Supporting Large-scale Social Media Data Analyses with Customizable Indexing Techniques on NoSQL Databases.

MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.

PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.

SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.

SALSASALSA Large-Scale Data Analysis Applications Computer Vision Complex Networks Bioinformatics Deep Learning Data analysis plays an important role in.

1 Tree and Graph Processing On Hadoop Ted Malaska.

Apache Tez : Accelerating Hadoop Query Processing Page 1.

SPIDAL Java High Performance Data Analytics with Java on Large Multicore HPC Clusters

MSBIC Hadoop Series Implementing MapReduce Jobs Bryan Smith

Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.

EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.

Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.

Distributed Programming in “Big Data” Systems Pramod Bhatotia wp

Pagerank and Betweenness centrality on Big Taxi Trajectory Graph

Spark Presentation.

Hadoop-Harp Applications Performance Analysis on Big Red II

Interactive Website (

Distinguishing Parallel and Distributed Computing Performance

Introduction to Spark.

I590 Data Science Curriculum August

Applications SPIDAL MIDAS ABDS

Applying Twister to Scientific Applications

High Performance Big Data Computing in the Digital Science Center

Convergence of HPC and Clouds for Large-Scale Data enabled Science

Data Science Curriculum March

湖南大学-信息科学与工程学院-计算机与科学系

AI First High Performance Big Data Computing for Industry 4.0

Distinguishing Parallel and Distributed Computing Performance

Scalable Parallel Interoperable Data Analytics Library

Apache Spark Lecture by: Faria Kalim (lead TA) CS425, UIUC

Distinguishing Parallel and Distributed Computing Performance

Pregelix: Think Like a Vertex, Scale Like Spandex

Parallel Applications And Tools For Cloud Computing Environments

Overview of big data tools

Group 15 Swathi Gurram Prajakta Purohit

Indiana University, Bloomington

Twister2: Design of a Big Data Toolkit

Apache Spark Lecture by: Faria Kalim (lead TA) CS425 Fall 2018 UIUC

2 Programming Environment for Global AI and Modeling Supercomputer GAIMSC 2/19/2019.

PHI Research in Digital Science Center

Big Data, Simulations and HPC Convergence

Motivation Contemporary big data tools such as MapReduce and graph processing tools have fixed data abstraction and support a limited set of communication.

Convergence of Big Data and Extreme Computing

Twister2 for BDEC2 Poznan, Poland Geoffrey Fox, May 15,

Presentation transcript:

Harp: Collective Communication on Hadoop Bingjing Zhang, Yang Ruan, Judy Qiu

Outline Motivations –Why do we bring collective communications to big data processing? Collective Communication Abstractions –Our approach to optimize data movement –Hierarchical data abstractions and operations defined on top of them MapCollective Programming Model –Extended from MapReduce model to support collective communications –Two Level BSP parallelism Harp Implementation –A plugin on Hadoop –Component layers and the job flow Experiments Conclusion

Motivation K-means Clustering in (Iterative) MapReduce K-means Clustering in Collective Communication gather M: Compute local points sum R: Compute global centroids broadcast shuffle M M MM RR M M MM allreduce M: Control iterations and compute local points sum More efficient and much simpler!

Large Scale Data Analysis Applications Iterative Applications –Cached and reused local data between iterations –Complicated computation steps –Large intermediate data in communications –Various communication patterns Computer Vision Complex Networks Bioinformatics Deep Learning

The Models of Contemporary Big Data Tools MapReduce Model DAG Model Graph Model BSP/Collective Model Storm Twister For Iterations / Learning For Streaming For Query S4 Hadoop DryadLINQ Pig Spark Spark SQL Spark Streaming MRQL Hive Tez Giraph Hama GraphLab Harp GraphX HaLoop Samza Dryad Stratosphere / Flink Many of them have fixed communication patterns!

Contributions Parallelism ModelArchitecture Shuffle M M MM Collective Communication M M MM RR MapCollective Model MapReduce Model YARN MapReduce V2 Harp MapReduce Applications MapCollective Applications Application Framework Resource Manager

Collective Communication Abstractions Hierarchical Data Abstractions –Basic Types Arrays, key-values, vertices, edges and messages –Partitions Array partitions, key-value partitions, vertex partitions, edge partitions and message partitions –Tables Array tables, key-value tables, vertex tables, edge tables and message tables Collective Communication Operations –Broadcast, allgather, allreduce –Regroup –Send messages to vertices, send edges to vertices

Hierarchical Data Abstractions Vertex Table Key-Value Partition Array Transferable Key-Values Vertices, Edges, Messages Double Array Int Array Long Array Array Partition Object Vertex Partition Edge Partition Array Table Message Partition Key-Value Table Byte Array Message Table Edge Table broadcast, send broadcast, allgather, allreduce, regroup, message-to-vertex… broadcast, send Table Partition Basic Types

Example: regroup Table Partition 0 Table Process 0 Process 1 Process 2 Partition 1 Table Partition 0 Partition 1 Regroup Partition 2 Partition 3 Partition 4 Partition 0 Partition 2 Partition 3 Partition 4 Partition 1Partition 2 Partition 0

Operations Operation NameData AbstractionAlgorithmTime Complexity broadcast arrays, key-value pairs & vertices chain allgather arrays, key-value pairs & vertices bucket allreduce arrays, key-value pairs bi-directional exchange regroup-allgather regroup arrays, key-value pairs & vertices point-to-point direct sending send messages to vertices messages, vertices point-to-point direct sending send edges to vertices edges, vertices point-to-point direct sending

MapCollective Programming Model BSP parallelism –Inter node parallelism and inner node parallelism Process Level Thread Level Process Level

The Harp Library Hadoop Plugin which targets on Hadoop Provides implementation of the collective communication abstractions and MapCollective programming model Project Link – Source Code Link –

YARN MapReduce V2 Harp MapReduce Applications MapCollective Applications Component Layers MapReduce Collective Communication Abstractions MapCollective Programming Model Applications: K-Means, WDA-SMACOF, Graph-Drawing… Collective Communication Operators Hierarchical Data Types (Tables & Partitions) Memory Resource Pool Collective Communication APIs Array, Key-Value, Graph Data Abstraction MapCollective Interface Task Management

A MapCollective Job YARN Resource Manager Client MapCollective Runner 1. Record Map task locations from original MapReduce AppMaster MapCollective AppMaster MapCollective Container Launcher MapCollective Container Allocator I. Launch AppMaster II. Launch Tasks CollectiveMapper setup mapCollective cleanup 3. Invoke collective communication APIs 4. Write output to HDFS 2. Read key-value pairs

Experiments Applications –K-means Clustering –Force-directed Graph Drawing Algorithm –WDA-SMACOF Test Environment –Big Red II

K-means Clustering M M MM allreduce centroids

Force-directed Graph Drawing Algorithm T. Fruchterman, M. Reingold. “Graph Drawing by Force-Directed Placement”, Software Practice & Experience 21 (11), M M MM allgather positions of vertices

WDA- SMACOF Y. Ruan et al. “A Robust and Scalable Solution for Interpolative Multidimensional Scaling With Weighting”. E-Science, MMM M allreduce the stress value allgather and allreduce results in the conjugate gradient process

Conclusions Harp is an implementation designed in a pluggable way to bring high performance to the Apache Big Data Stack and bridge the differences between Hadoop ecosystem and HPC system through a clear communication abstraction, which did not exist before in the Hadoop ecosystem. The experiments show that with Harp we can scale three applications to 128 nodes with 4096 CPUs on the Big Red II supercomputer, where the speedup in most tests is close to linear.