A Map-Reduce System with an Alternate API for Multi-Core Environments Wei Jiang, Vignesh T. Ravi and Gagan Agrawal.

Slides:

Advertisements

Similar presentations

Spark: Cluster Computing with Working Sets

Advertisements

Distributed Computations

MapReduce Simplified Data Processing on Large Clusters Google, Inc. Presented by Prasad Raghavendra.

Distributed Computations MapReduce

7/14/2015EECS 584, Fall MapReduce: Simplied Data Processing on Large Clusters Yunxing Dai, Huan Feng.

Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.

“Evaluating MapReduce for Multi-core and Multiprocessor Systems” Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis Computer.

MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.

SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

IPDPS, Supporting Fault Tolerance in a Data-Intensive Computing Middleware Tekin Bicer, Wei Jiang and Gagan Agrawal Department of Computer Science.

By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.

Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.

Computer Science and Engineering Advanced Computer Architectures CSE 8383 February 14 th, 2008 Presentation 1 By: Dina El-Sakaan.

A Map-Reduce-Like System for Programming and Optimizing Data-Intensive Computations on Emerging Parallel Architectures Wei Jiang Data-Intensive and High.

Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.

MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.

Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal

MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.

Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.

1 A Framework for Data-Intensive Computing with Cloud Bursting Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The Ohio.

Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.

MapReduce M/R slides adapted from those of Jeff Dean’s.

Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.

Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

Integrating and Optimizing Transactional Memory in a Data Mining Middleware Vignesh Ravi and Gagan Agrawal Department of ComputerScience and Engg. The.

Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.

Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal June 1,

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Hung-chih Yang 1, Ali Dasdan 1 Ruey-Lung Hsiao 2, D. Stott Parker 2

Optimizing MapReduce for GPUs with Effective Shared Memory Usage Department of Computer Science and Engineering The Ohio State University Linchuan Chen.

Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3,

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.

Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.

By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

MapReduce: Simplified Data Processing on Large Clusters Lim JunSeok.

Euro-Par, 2006 ICS 2009 A Translation System for Enabling Data Mining Applications on GPUs Wenjing Ma Gagan Agrawal The Ohio State University ICS 2009.

ApproxHadoop Bringing Approximations to MapReduce Frameworks

PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.

A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.

MATE-CG: A MapReduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters Wei Jiang and Gagan Agrawal.

High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.

Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies

MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.

AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P8-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 8: hadoop and Tera/Peta byte graphs.

MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.

COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University

Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.

TensorFlow– A system for large-scale machine learning

Accelerating MapReduce on a Coupled CPU-GPU Architecture

Tools and Techniques for Processing (and Management) of Data

MapReduce Simplied Data Processing on Large Clusters

February 26th – Map/Reduce

Cse 344 May 4th – Map/Reduce.

Optimizing MapReduce for GPUs with Effective Shared Memory Usage

Wei Jiang Advisor: Dr. Gagan Agrawal

Data-Intensive Computing: From Clouds to GPU Clusters

Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz

Yi Wang, Wei Jiang, Gagan Agrawal

A Map-Reduce System with an Alternate API for Multi-Core Environments

MapReduce: Simplified Data Processing on Large Clusters

Presentation transcript:

A Map-Reduce System with an Alternate API for Multi-Core Environments Wei Jiang, Vignesh T. Ravi and Gagan Agrawal

Outline October 24,  Introduction  MapReduce  Generalized Reduction  System Design and Implementation  Experiments  Related Work  Conclusion

October 24,  Growing need for analysis of large scale data  Scientific  Commercial  Data-intensive Supercomputing (DISC)  Map-Reduce has received a lot of attention  Database and Datamining communities  High performance computing community  E.g. this conference !! Motivation

October 24,  Positives:  Simple API  Functional language based  Very easy to learn  Support for fault-tolerance  Important for very large-scale clusters  Questions  Performance?  Comparison with other approaches  Suitability for different class of applications? Map-Reduce: Positives and Questions

Class of Data-Intensive Applications  Many different types of applications  Data-center kind of applications  Data scans, sorting, indexing  More ``compute-intensive`` data-intensive applications  Machine learning, data mining, NLP  Map-reduce / Hadoop being widely used for this class  Standard Database Operations  Sigmod 2009 paper compares Hadoop with Databases and OLAP systems  What is Map-reduce suitable for?  What are the alternatives?  MPI/OpenMP/Pthreads – too low level? October 24, 20155

This Paper  Proposes MATE (a Map-Reduce system with an AlternaTE API) based on Generalized Reduction  Phoenix implemented Map-Reduce in shared-memory systems  MATE adopted Generalized Reduction, first proposed in FREERIDE that was developed at Ohio State  Observed API similarities and subtle differences between MapReduce and Generalized Reduction  Comparison for  Data Mining Applications  Compare performance and API  Understand performance overheads  Will an alternative API be better for ``Map-Reduce``? October 24, 20156

7 Map-Reduce Execution

October 24, Phoenix Implementation  It is based on the same principles of MapReduce  But targets shared-memory systems  Consists of a simple API that is visible to application programmers  Users define functions like splitter, map, reduce, and etc..  An efficient runtime that handles low-level details  Parallelization  Resource management  Fault detection and recovery

October 24, Phoenix Runtime

Comparing Processing Structures 10 Reduction Object represents the intermediate state of the execution Reduce func. is commutative and associative Sorting, grouping.. overheads are eliminated with red. func/obj. October 24, 2015

Observations on Processing Structure  Map-Reduce is based on functional idea  Do not maintain state  This can lead to overheads of managing intermediate results between map and reduce  Map could generate intermediate results of very large size  MATE API is based on a programmer managed reduction object  Not as ‘clean’  But, avoids sorting of intermediate results  Can also help shared memory parallelization  Helps better fault-recovery October 24,

October 24,  Apriori pseudo-code using MATE An Example

October 24,  Apriori pseudo-code using MapReduce Example – Now with Phoenix

October 24, System Design and Implementation  Basic dataflow of MATE (MapReduce with AlternaTE API) in Full Replication scheme  Data structures in MATE scheduler  Used to communicate between the user code and the runtime  Three sets of functions in MATE  Internally used functions  MATE API provided by the runtime  MATE API to be defined by the users  Implementation considerations

October 24, MATE Runtime Dataflow  Basic one-stage dataflow (Full Replication scheme)

October 24, Data Structures-(I)  scheduler_args_t: Basic fields FieldDescription Input_dataInput data pointer Data_sizeInput dataset size Data_typeInput data type Stage_numComputation-Stage number (Iteration number) SplitterPointer to Splitter function ReductionPointer to Reduction function FinalizePointer to Finalize function

October 24, Data Structures-(II)  scheduler_args_t: Optional fields for performance tuning FieldDescription Unit_size# of bytes for one element L1_cache_size# of bytes for L1 data cache size ModelShared-memory parallelization model Num_reduction_workersMax # of threads for reduction workers(threads) Num_procsMax # of processor cores used

October 24, Functions-(I)  Transparent to users (R-required; O-optional) Function DescriptionR/O static inline void * schedule_tasks(thread_wrapper_arg_t *)R static void * combination_worker(void *)R static int array_splitter(void *, int, reduction_args_t *)R void clone_reduction_object(int num)R static inline int isCpuAvailable(unsigned long, int)R

October 24, Functions-(II)  APIs provided by the runtime Function DescriptionR/O int mate_init(scheudler_args_t * args)R int mate_scheduler(void * args)R int mate_finalize(void * args)O void reduction_object_pre_init()R int reduction_object_alloc(int size)—return the object idR void reduction_object_post_init()R void accumulate(int id, int offset, void * value)O void reuse_reduction_object()O void * get_intermediate_result(int iter, int id, int offset)O

October 24, Functions-(III)  APIs defined by the user Function DescriptionR/O int (*splitter_t)(void *, int, reduction_args_t *)O void (*reduction_t)(reduction_args_t *)R void (*combination_t)(void*)O void (*finalize_t)(void *)O

October 24, Implementation Considerations  Focus on the API differences in programming models  Data partitioning: dynamically assigns splits to worker threads, same as Phoenix  Buffer management: two temporary buffers  one for reduction objects  the other for combination results  Fault tolerance: re-executes failed tasks  Checking-point may be a better solution since reduction object maintains the computation state

October 24,  For comparison, we used three applications  Data Mining: KMeans, PCA, Apriori  Also evaluated the single-node performance of Hadoop on KMeans and Apriori  Experiments on two multi-core platforms  8 cores on one WCI node (intel cpu)  16 cores on one AMD node (amd cpu) Experiments Design

October 24, Results: Data Mining (I)  K-Means: 400MB dataset, 3-dim points, k = 100 on one WCI node with 8 cores Avg. Time Per Iteration (sec) # of threads 2.0 speedup

October 24, Results: Data Mining (II)  K-means: 400MB dataset, 3-dim points, k = 100 on one AMD node with 16 cores Avg. Time Per Iteration (sec) # of threads 3.0 speedup

October 24, Results: Data Mining (III)  PCA: 8000 * 1024 matrix, on one WCI node with 8 cores # of threads Total Time (sec) 2.0 speedup

October 24, Results: Data Mining (IV)  PCA: 8000 * 1024 matrix, on one AMD node with 16 cores Total Time (sec) # of threads 2.0 speedup

October 24, Results: Data Mining (V)  Apriori: 1,000,000 transactions, 3% support, on one WCI node with 8 cores # of threads Avg. Time Per Iteration (sec)

October 24, Results: Data Mining (VI)  Apriori: 1,000,000 transactions, 3% support, on one AMD node with 16 cores # of threads Avg. Time Per Iteration (sec)

Observations October 24,  MATE system has between reasonable to significant speedups for all the three datamining applications.  In most of the cases, it outperforms Phoenix and Hadoop significantly, while slightly better in only one case.  Reduction object can help to reduce the memory overhead associated with managing the large set of intermediate results between map and reduce

Related Work October 24,  Lots of work on improving and generalizing MapReduce’s API…  Industry: Dryad/DryadLINQ from Microsoft, Sawzall from Google, Pig/Map-Reduce-Merge from Yahoo!  Academia: CGL-MapReduce, Mars, MITHRA, Phoenix, Disco, Hive…  Address MapReduce limitations  One input, two-stage data flow is extremely rigid  Only two high-level primitives

October 24, Conclusions  MapReduce is simple and robust in expressing parallelism  Two-stage computation style may cause performance losses for some subclasses of applications in data-intensive computing  MATE provides an alternate API that is based on generalized reduction  This variation can reduce overheads of data management and communication between Map and Reduce