Towards a Collective Layer in the Big Data Stack Thilina Gunarathne Judy Qiu

Slides:



Advertisements
Similar presentations
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
Advertisements

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
SALSA HPC Group School of Informatics and Computing Indiana University.
Twister4Azure Iterative MapReduce for Windows Azure Cloud Thilina Gunarathne Indiana University Iterative MapReduce for Azure Cloud.
UC Berkeley Spark Cluster Computing with Working Sets Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica.
SCALABLE PARALLEL COMPUTING ON CLOUDS : EFFICIENT AND SCALABLE ARCHITECTURES TO PERFORM PLEASINGLY PARALLEL, MAPREDUCE AND ITERATIVE DATA INTENSIVE COMPUTATIONS.
Hybrid MapReduce Workflow Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox Indiana University, US.
Optimus: A Dynamic Rewriting Framework for Data-Parallel Execution Plans Qifa Ke, Michael Isard, Yuan Yu Microsoft Research Silicon Valley EuroSys 2013.
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
Distributed Computations
Distributed computing using Dryad Michael Isard Microsoft Research Silicon Valley.
Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.
Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
MapReduce in the Clouds for Science CloudCom 2010 Nov 30 – Dec 3, 2010 Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox {tgunarat, taklwu,
Scalable Parallel Computing on Clouds (Dissertation Proposal)
Next Generation of Apache Hadoop MapReduce Arun C. Murthy - Hortonworks Founder and Architect Formerly Architect, MapReduce.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
MapReduce.
Overview of Cloud Technologies and Parallel Programming Frameworks for Scientific Applications Original Author: Thilina Gunarathne Indiana University
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
School of Informatics and Computing Indiana University
SALSASALSA Twister: A Runtime for Iterative MapReduce Jaliya Ekanayake Community Grids Laboratory, Digital Science Center Pervasive Technology Institute.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 作者 :Rutvik Karve , Devendra Dahiphale , Amit Chhajer 報告 : 饒展榕.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
Portable Parallel Programming on Cloud and HPC: Scientific Applications of Twister4Azure Thilina Gunarathne Bingjing Zhang, Tak-Lon.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
MARISSA: MApReduce Implementation for Streaming Science Applications 作者 : Fadika, Z. ; Hartog, J. ; Govindaraju, M. ; Ramakrishnan, L. ; Gunter, D. ; Canon,
SALSASALSASALSASALSA Design Pattern for Scientific Applications in DryadLINQ CTP DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox.
Harp: Collective Communication on Hadoop Bingjing Zhang, Yang Ruan, Judy Qiu.
Parallel Applications And Tools For Cloud Computing Environments Azure MapReduce Large-scale PageRank with Twister Twister BLAST Thilina Gunarathne, Stephen.
SALSA HPC Group School of Informatics and Computing Indiana University.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Scalable Parallel Computing on Clouds : Efficient and scalable architectures to perform pleasingly parallel, MapReduce and iterative data intensive computations.
Performance Model for Parallel Matrix Multiplication with Dryad: Dataflow Graph Runtime Hui Li School of Informatics and Computing Indiana University 11/1/2012.
SALSASALSASALSASALSA Clouds Ball Aerospace March Geoffrey Fox
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,
SALSA Group Research Activities April 27, Research Overview  MapReduce Runtime  Twister  Azure MapReduce  Dryad and Parallel Applications 
Grid Appliance The World of Virtual Resource Sharing Group # 14 Dhairya Gala Priyank Shah.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.
Next Generation of Apache Hadoop MapReduce Owen
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
Thilina Gunarathne, Bimalee Salpitkorala, Arun Chauhan, Geoffrey Fox
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
MapReduce and Data Intensive Applications XSEDE’12 BOF Session
I590 Data Science Curriculum August
Applying Twister to Scientific Applications
Data Science Curriculum March
Scientific Data Analytics on Cloud and HPC Platforms
Twister4Azure : Iterative MapReduce for Azure Cloud
Parallel Applications And Tools For Cloud Computing Environments
Clouds from FutureGrid’s Perspective
Group 15 Swathi Gurram Prajakta Purohit
Twister2: Design of a Big Data Toolkit
MapReduce: Simplified Data Processing on Large Clusters
Lecture 29: Distributed Systems
Convergence of Big Data and Extreme Computing
Presentation transcript:

Towards a Collective Layer in the Big Data Stack Thilina Gunarathne Judy Qiu Dennis Gannon

Introduction Three disruptions – Big Data – MapReduce – Cloud Computing MapReduce to process the “Big Data” in cloud or cluster environments Generalizing MapReduce and integrating it with HPC technologies 2

Introduction Splits MapReduce into a Map and a Collective communication phase Map-Collective communication primitives – Improve the efficiency and usability – Map-AllGather, Map-AllReduce, MapReduceMergeBroadcast and Map-ReduceScatter patterns – Can be applied to multiple run times Prototype implementations for Hadoop and Twister4Azure – Up to 33% performance improvement for KMeansClustering – Up to 50% for Multi-dimensional scaling 3

Outline Introduction Background Collective communication primitives – Map-AllGather – Map-Reduce Performance analysis Conclusion 4

Outline Introduction Background Collective communication primitives – Map-AllGather – Map-Reduce Performance analysis Conclusion 5

Data Intensive Iterative Applications Growing class of applications – Clustering, data mining, machine learning & dimension reduction applications – Driven by data deluge & emerging computation fields – Lots of scientific applications k ← 0; MAX ← maximum iterations δ [0] ← initial delta value while ( k< MAX_ITER || f(δ [k], δ [k-1] ) ) foreach datum in data β[datum] ← process (datum, δ [k] ) end foreach δ [k+1] ← combine(β[]) k ← k+1 end while k ← 0; MAX ← maximum iterations δ [0] ← initial delta value while ( k< MAX_ITER || f(δ [k], δ [k-1] ) ) foreach datum in data β[datum] ← process (datum, δ [k] ) end foreach δ [k+1] ← combine(β[]) k ← k+1 end while 6

Data Intensive Iterative Applications Compute CommunicationReduce/ barrier New Iteration Larger Loop- Invariant Data Smaller Loop- Variant Data Broadcast 7

Iterative MapReduce MapReduceMergeBroadcast Extensions to support additional broadcast (+other) input data Map(,, list_of ) Reduce(, list_of, list_of ) Merge(list_of >,list_of ) MapCombineShuffleSortReduceMergeBroadcast 8

Twister4Azure – Iterative MapReduce Decentralized iterative MR architecture for clouds – Utilize highly available and scalable Cloud services Extends the MR programming model Multi-level data caching – Cache aware hybrid scheduling Multiple MR applications per job Collective communication primitives Outperforms Hadoop in local cluster by 2 to 4 times Sustain features of MRRoles4Azure – dynamic scheduling, load balancing, fault tolerance, monitoring, local testing/debugging 9

Outline Introduction Background Collective communication primitives – Map-AllGather – Map-Reduce Performance analysis Conclusion 10

Collective Communication Primitives for Iterative MapReduce Introducing All-to-All collective communications primitives to MapReduce Supports common higher-level communication patterns 11

Collective Communication Primitives for Iterative MapReduce Performance – Optimized group communication – Framework can optimize these operations transparently to the users Poly-algorithm (polymorphic) – Avoids unnecessary barriers and other steps in traditional MR and iterative MR – Scheduling using primitives Ease of use – Users do not have to manually implement these logic – Preserves the Map & Reduce API’s – Easy to port applications using more natural primitives 12

Goals Fit with MapReduce data and computational model – Multiple Map task waves – Significant execution variations and inhomogeneous tasks Retain scalability Programming model simple and easy to understand Maintain the same type of framework-managed excellent fault tolerance Backward compatibility with MapReduce model – Only flip a configuration option 13

Map-AllGather Collective Traditional iterative Map Reduce – The “reduce” step assembles the outputs of the Map Tasks together in order – “merge” assembles the outputs of the Reduce tasks – Broadcast the assembled output to all the workers. Map-AllGather primitive, – Broadcasts the Map Task outputs to all the computational nodes – Assembles them together in the recipient nodes – Schedules the next iteration or the application. Eliminates the need for reduce, merge, monolithic broadcasting steps and unnecessary barriers. Example : MDS BCCalc, PageRank with in-links matrix (matrix-vector multiplication) 14

Map-AllGather Collective 15

Map-AllReduce Map-AllReduce – Aggregates the results of the Map Tasks Supports multiple keys and vector values – Broadcast the results – Use the result to decide the loop condition – Schedule the next iteration if needed Associative commutative operations – Eg: Sum, Max, Min. Examples : Kmeans, PageRank, MDS stress calc 16

Map-AllReduce collective Map 1 Map 2 Map N (n+1) th Iteration Iterate Map 1 Map 2 Map N n th Iteration Op 17

Implementations H-Collectives : Map-Collectives for Apache Hadoop – Node-level data aggregations and caching – Speculative iteration scheduling – Hadoop Mappers with only very minimal changes – Support dynamic scheduling of tasks, multiple map task waves, typical Hadoop fault tolerance and speculative executions. – Netty NIO based implementation Map-Collectives for Twister4Azure iterative MapReduce – WCF Based implementation – Instance level data aggregation and caching 18

19 MPIHadoopH-CollectivesTwister4Azure All-to- One Gathershuffle-reduce* shuffle-reduce-merge Reduceshuffle-reduce* shuffle-reduce-merge One-to- All Broadcast shuffle-reduce- distributedcache merge-broadcast Scatter shuffle-reduce- distributedcache** merge-broadcast ** All-to-All AllGather Map-AllGather AllReduce Map-AllReduce Reduce- Scatter Map-ReduceScatter (future work) Map-ReduceScatter (future works) Synchron ization Barrier Barrier between Map & Reduce Barrier between Map & Reduce and between iterations Barrier between Map, Reduce, Merge and between iterations

Outline Introduction Background Collective communication primitives – Map-AllGather – Map-Reduce Performance analysis Conclusion 20

KMeansClustering Hadoop vs H-Collectives Map-AllReduce. 500 Centroids (clusters). 20 Dimensions. 10 iterations. Weak scaling Strong scaling 21

KMeansClustering Twister4Azure vs T4A-Collectives Map-AllReduce. 500 Centroids (clusters). 20 Dimensions. 10 iterations. Weak scaling Strong scaling 22

MultiDimensional Scaling Hadoop MDS – BCCalc onlyTwister4Azure MDS 23

Hadoop MDS Overheads Hadoop MapReduce MDS-BCCalc H-Collectives AllGather MDS-BCCalc H-Collectives AllGather MDS- BCCalc without speculative scheduling 24

Outline Introduction Background Collective communication primitives – Map-AllGather – Map-Reduce Performance analysis Conclusion 25

Conclusions Map-Collectives, collective communication operations for MapReduce inspired by MPI collectives – Improve the communication and computation performance Enable highly optimized group communication across the workers Get rid of unnecessary/redundant steps Enable poly-algorithm approaches – Improve usability More natural patterns Decrease the implementation burden Future where many MapReduce and iterative MapReduce frameworks support a common set of portable Map-Collectives Prototype implementations for Hadoop and Twister4Azure – Up to 33% to 50% speedups 26

Future Work Map-ReduceScatter collective – Modeled after MPI ReduceScatter – Eg: PageRank Explore ideal data models for the Map-Collectives model 27

Acknowledgements Prof. Geoffrey C Fox for his many insights and feedbacks Present and past members of SALSA group – Indiana University. Microsoft for Azure Cloud Academic Resources Allocation National Science Foundation CAREER Award OCI Persistent Systems for the fellowship 28

Thank You! 29

Backup Slides 30

Application Types Slide from Geoffrey Fox Advances in Clouds and their application to Data Intensive problems University of Southern California Seminar February Advances in Clouds and their application to Data Intensive problems 31

Feature Programming Model Data StorageCommunication Scheduling & Load Balancing HadoopMapReduceHDFSTCP Data locality, Rack aware dynamic task scheduling through a global queue, natural load balancing Dryad [1] DAG based execution flows Windows Shared directories Shared Files/TCP pipes/ Shared memory FIFO Data locality/ Network topology based run time graph optimizations, Static scheduling Twister [2] Iterative MapReduce Shared file system / Local disks Content Distribution Network/Direct TCP Data locality, based static scheduling MPI Variety of topologies Shared file systems Low latency communication channels Available processing capabilities/ User controlled 32

Feature Failure Handling MonitoringLanguage SupportExecution Environment Hadoop Re-execution of map and reduce tasks Web based Monitoring UI, API Java, Executables are supported via Hadoop Streaming, PigLatin Linux cluster, Amazon Elastic MapReduce, Future Grid Dryad [1] Re-execution of vertices C# + LINQ (through DryadLINQ) Windows HPCS cluster Twister [2] Re-execution of iterations API to monitor the progress of jobs Java, Executable via Java wrappers Linux Cluster, FutureGrid MPI Program level Check pointing Minimal support for task level monitoring C, C++, Fortran, Java, C# Linux/Windows cluster 33

Iterative MapReduce Frameworks Twister [1] – Map->Reduce->Combine->Broadcast – Long running map tasks (data in memory) – Centralized driver based, statically scheduled. Daytona [3] – Iterative MapReduce on Azure using cloud services – Architecture similar to Twister Haloop [4] – On disk caching, Map/reduce input caching, reduce output caching iMapReduce [5] – Async iterations, One to one map & reduce mapping, automatically joins loop-variant and invariant data 34