CUDA Performance Study on Hadoop MapReduce Clusters Chen He Peng Du University of Nebraska-Lincoln.

Slides:



Advertisements
Similar presentations
OLAP Query Processing in Grids
Advertisements

Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
+ Accelerating Fully Homomorphic Encryption on GPUs Wei Wang, Yin Hu, Lianmu Chen, Xinming Huang, Berk Sunar ECE Dept., Worcester Polytechnic Institute.
Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.
Motivation Desktop accelerators (like GPUs) form a powerful heterogeneous platform in conjunction with multi-core CPUs. To improve application performance.
SkewTune: Mitigating Skew in MapReduce Applications
Early Linpack Performance Benchmarking on IPE Mole-8.5 Fermi GPU Cluster Xianyi Zhang 1),2) and Yunquan Zhang 1),3) 1) Laboratory of Parallel Software.
Efficient Sparse Matrix-Matrix Multiplication on Heterogeneous High Performance Systems AACEC 2010 – Heraklion, Crete, Greece Jakob Siegel 1, Oreste Villa.
Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
A Scalable Heterogeneous Parallelization Framework for Iterative Local Searches Martin Burtscher 1 and Hassan Rabeti 2 1 Department of Computer Science,
Cyberinfrastructure for Scalable and High Performance Geospatial Computation Xuan Shi Graduate assistants supported by the CyberGIS grant Fei Ye (2011)
OpenFOAM on a GPU-based Heterogeneous Cluster
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian.
Back-Projection on GPU: Improving the Performance Wenlay “Esther” Wei Advisor: Jeff Fessler Mentor: Yong Long April 29, 2010.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
1 petaFLOPS+ in 10 racks TB2–TL system announcement Rev 1A.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Big Data and Hadoop and DLRL Introduction to the DLRL Hadoop Cluster Sunshin Lee and Edward A. Fox DLRL, CS, Virginia Tech Feb. 18, 2015 presentation for.
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
Jawwad A Shamsi Nouman Durrani Nadeem Kafi Systems Research Laboratories, FAST National University of Computer and Emerging Sciences, Karachi Novelties.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.
Location-aware MapReduce in Virtual Cloud 2011 IEEE computer society International Conference on Parallel Processing Yifeng Geng1,2, Shimin Chen3, YongWei.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Accelerating Simulation of Agent-Based Models on Heterogeneous Architectures.
Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce.
Zois Vasileios Α. Μ :4183 University of Patras Department of Computer Engineering & Informatics Diploma Thesis.
Spiros Papadimitriou Jimeng Sun IBM T.J. Watson Research Center Hawthorne, NY, USA Reporter: Nai-Hui, Ku.
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.
Advisor: Dr. Aamir Shafi Co-Advisor: Mr. Ali Sajjad Member: Dr. Hafiz Farooq Member: Mr. Tahir Azim Optimizing N-body Simulations for Multi-core Compute.
Y. Kotani · F. Ino · K. Hagihara Springer Science + Business Media B.V Reporter: 李長霖.
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
A GPU Implementation of Inclusion-based Points-to Analysis Mario Méndez-Lojo (AMD) Martin Burtscher (Texas State University, USA) Keshav Pingali (U.T.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
MRPGA : An Extension of MapReduce for Parallelizing Genetic Algorithm Reporter :古乃卉.
Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
Offloading to the GPU: An Objective Approach
GreenSched: An Energy-Aware Hadoop Workflow Scheduler
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
Matchmaking: A New MapReduce Scheduling Technique
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Streaming Big Data with Self-Adjusting Computation Umut A. Acar, Yan Chen DDFP January 2014 SNU IDB Lab. Namyoon Kim.
Dynamic Scheduling Monte-Carlo Framework for Multi-Accelerator Heterogeneous Clusters Authors: Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Source:
Page 1 A Platform for Scalable One-pass Analytics using MapReduce Boduo Li, E. Mazur, Y. Diao, A. McGregor, P. Shenoy SIGMOD 2011 IDS Fall Seminar 2011.
Load Rebalancing for Distributed File Systems in Clouds.
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
Matrix Multiplication in Hadoop
By: Joel Dominic and Carroll Wongchote 4/18/2012.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM.
Brief introduction about “Grid at LNS”
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
Accelerating MapReduce on a Coupled CPU-GPU Architecture
Linchuan Chen, Xin Huo and Gagan Agrawal
A Cloud System for Machine Learning Exploiting a Parallel Array DBMS
MapReduce: Data Distribution for Reduce
Presentation transcript:

CUDA Performance Study on Hadoop MapReduce Clusters Chen He Peng Du University of Nebraska-Lincoln CSE 930 Advanced Computer Fall 2010

Overview Introduction Methodology Evaluation Conclusions Future work

Hadoop MapReduce Introduction

CPU+GPU Architecture

Introduction Questions –Can we introduce CUDA into Hadoop MapReduce Clusters? Mechanism and implementation –Is this reasonable? Effects and Costs

Methodology Question-1:Can we introduce CUDA into Hadoop ?

Methodology Test cases –SDK programs Data intensive: Matrix Multiplication Computation intensive: Monte Carlo –MDMR Pure Java program Introduce Jcuda

Methodology Port CUDA programs onto Hadoop –GPU (CUDA-C) vs CPU (C) –Approach MapRed (processHadoopData & cudaCompute) Main (Hadoop Pipes) Scripts (runbase.sh, run- -CPU/GPU.sh) Input data generators

.c Methodology.c.cu CUDA MonteCarlo.cu void processHadoopData(..) void cudaCompute(..) MonteCarlo.cpp class Mapper class Reducer MapRed.cpp void generate(..) Input Generator.c data Hadoop DFS Hadoop-enabled MonteCarlo extracted generates

Methodology MDMR –Time Complexity by using CPU –We can simply employ GPU to parallel the n-squre portion to reduce the time complexity to linear

Evaluation Environment –Head: 2xAMD 2.2GHz, 4GB DDR400 RAM, 800GB HD –3 PCs (AMD 2.3G CPU, 2G DDR2-667 RAM, 400GB HD, 1Gbps Ethernet) –XFX 9400GT 64bit 512MB DDR3 ($20) –CUDA 3.2 Toolkit –Hadoop –ServerTech CWG-CDU power distribution unit (for the power consumption monitoring) Factors –Speedup –Power consumption –Cost

Evaluation Matrix Multiplication (Execution time)

Evaluation Matrix Multiplication (Power consumption)

Evaluation

MDMR –Execution time

Evaluation MDMR –Power consumption

Conclusions Introduced GPU into MapReduce cluster and obtained up to 20 times speedup. Reduced up to 19/20 power consumption with the current preliminary solution and work load. With as little as $20 per node, compared with upgrading CPUs and adding more nodes, deploying GPU on Hadoop has high cost-to-benefit ratio. Provided practical implementations for people wanting to construct MapReduce clusters with GPUs.

Future Work Port more CUDA programs onto Hadoop. Incorporate reducers into the experiments Support heterogeneous clusters which mixed GPU-nodes and non-GPU nodes.

Thank You! Questions?