Fault Tolerant Parallel Data-Intensive Algorithms Mucahid KutluGagan AgrawalOguz Kurt Department of Computer Science and Engineering The Ohio State University.

Slides:



Advertisements
Similar presentations
Presented by Dealing with the Scale Problem Innovative Computing Laboratory MPI Team.
Advertisements

LIBRA: Lightweight Data Skew Mitigation in MapReduce
Spark: Cluster Computing with Working Sets
Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.
Building Fault Survivable MPI Programs with FT-MPI Using Diskless Checkpointing Zizhong Chen, Graham E. Fagg, Edgar Gabriel, Julien Langou, Thara Angskun,
Distributed Computations MapReduce
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
MapReduce: Simplified Data Processing on Large Clusters
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
IPDPS, Supporting Fault Tolerance in a Data-Intensive Computing Middleware Tekin Bicer, Wei Jiang and Gagan Agrawal Department of Computer Science.
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
PAGE: A Framework for Easy Parallelization of Genomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering The Ohio.
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
Cluster-based SNP Calling on Large Scale Genome Sequencing Data Mucahid KutluGagan Agrawal Department of Computer Science and Engineering The Ohio State.
Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Map Reduce: Simplified Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat Google, Inc. OSDI ’04: 6 th Symposium on Operating Systems Design.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
MapReduce How to painlessly process terabytes of data.
MapReduce M/R slides adapted from those of Jeff Dean’s.
MARISSA: MApReduce Implementation for Streaming Science Applications 作者 : Fadika, Z. ; Hartog, J. ; Govindaraju, M. ; Ramakrishnan, L. ; Gunter, D. ; Canon,
1/20 Optimization of Multi-level Checkpoint Model for Large Scale HPC Applications Sheng Di, Mohamed Slim Bouguerra, Leonardo Bautista-gomez, Franck Cappello.
A Map-Reduce System with an Alternate API for Multi-Core Environments Wei Jiang, Vignesh T. Ravi and Gagan Agrawal.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Presenters: Rezan Amiri Sahar Delroshan
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Mehmet Can Kurt, The Ohio State University Sriram Krishnamoorthy, Pacific Northwest National Laboratory Kunal Agrawal, Washington University in St. Louis.
A Fault-Tolerant Environment for Large-Scale Query Processing Mehmet Can Kurt Gagan Agrawal Department of Computer Science and Engineering The Ohio State.
Parallel Checkpointing - Sathish Vadhiyar. Introduction  Checkpointing? storing application’s state in order to resume later.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
MapReduce: Simplified Data Processing on Large Clusters Lim JunSeok.
RE-PAGE: Domain-Specific REplication and PArallel Processing of GEnomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering.
HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.
Toward Efficient and Simplified Distributed Data Intensive Computing IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 6, JUNE 2011PPT.
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
Fault Tolerance and Checkpointing - Sathish Vadhiyar.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Parallel Checkpointing - Sathish Vadhiyar. Introduction  Checkpointing? storing application’s state in order to resume later.
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.
Next Generation of Apache Hadoop MapReduce Owen
Presented by Fault Tolerance Challenges and Solutions Al Geist Network and Cluster Computing Computational Sciences and Mathematics Division Research supported.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Jack Dongarra University of Tennessee
Fault Tolerance in MPI Programs
Software Engineering Introduction to Apache Hadoop Map Reduce
Supporting Fault-Tolerance in Streaming Grid Applications
Distributed Systems CS
Applying Twister to Scientific Applications
MapReduce Simplied Data Processing on Large Clusters
湖南大学-信息科学与工程学院-计算机与科学系
Cse 344 May 4th – Map/Reduce.
Hadoop Technopoints.
Distributed Systems CS
Cloud Computing MapReduce, Batch Processing
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Fault Tolerant Parallel Data-Intensive Algorithms Mucahid KutluGagan AgrawalOguz Kurt Department of Computer Science and Engineering The Ohio State University Department of Mathematics The Ohio State University † † HIPC'12 Pune, India

Outline Motivation Related Work Our Goal Data Intensive Algorithms Our Approach – Data Distribution – Fault Tolerant Algorithms – Recovery Experiments Conclusion HIPC'12 Pune, India 2

Why Is Fault Tolerant So Important? Typical first year for a new cluster * individual machine failures - 1 PDU failure (~ machines suddenly disappear) - 20 rack failures (40-80 machines disappear,1-6 hours to get back) - Other failures because of overheating, maintenance, … If you have a code which runs on 1000 computers more than one day, it is high probability that you will get a failure before your code finishes. *taken from Jeff Dean’s talk in Google IO( HIPC'12 Pune,India 3

Fault Tolerance So Far MPI fault-tolerance, which focus on checkpointing [1], [2], [3]. - High overhead MapReduce[4] - Good at fault tolerance for data-intensive applications Algorithm-based fault-tolerance - Most of them are for scientific computation like linear algebra routines [5], [6], iterative computations [7], including conjugate gradient [8]. HIPC'12 Pune,India 4

Our Goal Our main goal is to develop algorithm-based fault- tolerance solution for data intensive algorithms. Target Failure : – We focused on hardware failures. – We lose everything on the failed node. – We do not recover that failed node and continue process without it. In an iterative algorithm, when a failure occurs: – The system shouldn’t start the failed iteration from the beginning – The amount of data lost should be small as much as possible. HIPC'12 Pune,India 5

Data Intensive Algorithms We focused on two algorithms: K-Means & Apriori But it can be generalized to all algorithms that have the following reduction processing structure. HIPC'12 Pune,India 6

Our Approach We used Master-Slave approach. Replication : Divide the data to be replicated in parts, and distribute them among different processors. -The amount of lost data will be smaller. Summarization : The slaves can send the results of parts of their data before they processed all the data. -We won’t need to re-process the data that we already got the results. HIPC'12 Pune,India 7

Master P1 P2 P3 P4 D1 D2 D3 D4 D5 D6 D7 D8 D1 D2 D3 D4 D5 D6 D7 D8 Data Distribution Replication

P3 P1 P5 P7 P4 P2 P K-means with No Failure Primary Data Master Replicas Send the summary of data portion Broadcast new centroids

Single Node Failure Recovery -Master node notifies P3 to process D2 at this iteration and P3 to process D1 starting from next iteration. Case 1: P1 Fails after sending the result of D1 P3 P4 P5 P6 P7 P2 P Primary Data Master Replicas HIPC'12 Pune,India 10

Multiple Node Failure Recovery -Master node notifies P7 to read first data block(D1 and D2) from storage cluster. -Master Node notifies P6 to processD5 and D6, P4 to process D3, P5 to process D4. -P7 reads D1 and D2 P3 P4 P5 P6 P7 P2 P Primary Data Master Replicas Storage Cluster HIPC'12 Pune,India 11 Case 2: P1, P2 and P3 Fail, D1 and D2 are lost totally!

Experiments We used Glenn at OSC for our experiments. The nodes have : – Dual socket, quad core 2.5 GHz Opterons – 24 GB RAM Allocated 16 slave nodes with 1 core per each. Implemented in C programming language by using MPI library. Generated Datasets: – K-Means Size : 4.87 GB Coordinate Number : 20 Maximum Iteration : 50 – Apriori Size : 4.79 GB Item Number : 10 Support Vector :%1 Maximum Rule Size:6 (847 rules) HIPC'12 Pune,India 12

HIPC'12 Pune,India 13 Effect of Summarization Changing Message Numbers with Different Percentages Changing Data Block Numbers with Different Number of Failures

HIPC'12 Pune,India 14 Effect of Replication for Apriori Scalability Test for Apriori Effect of Replication & Scalability

Fault Tolerance in Map Reduce Replication data in file system – We replicate the data in processors, not in file system If a task fails, re-execute it. Completed map tasks also need to be re- executed since the results are stored in local disks. Because of dynamics scheduling, it can have better parallelism after a failure. HIPC'12 Pune,India 15

Experiments with Hadoop Experimental Setup – Allocated 17 nodes and 1 core per each. – Used one of nodes as master and the rest as slaves – Used default chunk size – No backup nodes are used – Replication is set to 3 Each test executed 5 times and took the average of 3 after eliminating the maximum and minimum results. For our system, we set R=3 and S=4 and M=2 Calculated total time, including I/O operation HIPC'12 Pune,India 16

HIPC'12 Pune,India 17 Single Failure Occurring at different Percentages Multiple Failure Test for Apriori Experiments with Hadoop(Cont’d)

Conclusion Summarization has good effect in fault tolerance – We recover faster when we divide the data into more parts Dividing data into small parts and distributing them with minimum intersection decrease the amount of data to be recovered in multiple failures. Our system behaves much better Hadoop HIPC'12 Pune,India 18

References 1.J. Hursey, J.M. Squyres, T.I. Mattox, and A. Lumsdaine. The design and implementation of checkpoint/restart process fault tolerance for open mpi. In Parallel and Distributed Processing Symposium, IPDPS IEEE International, pages 1 –8, march G. Bosilca, A. Bouteiller, F. Cappello, S. Djilali, G. Fedak, C. Germain, T. Herault, P. Lemarinier, O. Lodygensky, F. Magniette, V. Neri, and A. Selikhov. Mpich-v: Toward a scalable fault tolerant mpi for volatile nodes. In Supercomputing, ACM/IEEE 2002 Conference, page 29, nov Camille Coti, Thomas Herault, Pierre Lemarinier, Laurence Pilard, Ala Rezmerita, Eric Rodriguez, and Franck Cappello. Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant mpi. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing, SC ’06 ACM, Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, pages 137–150, J.S. Plank, Youngbae Kim, and J.J. Dongarra. Algorithm-based diskless checkpointing for fault tolerant matrix operations. In Fault-Tolerant Computing, FTCS-25. Digest of Papers., Twenty-Fifth International Symposium on, pages 351 –360, jun Teresa Davies, Christer Karlsson, Hui Liu, Chong Ding, and Zizhong Chen. High performance linpack benchmark: a fault tolerant implementation without checkpointing. In Proceedings of the international conference on Supercomputing, ICS ’11, pages 162–171. ACM, Zizhong Chen. Algorithm-based recovery for iterative methods without checkpointing. In Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, HPDC 2011, San Jose, CA, USA, June 8-11, 2011, pages 73–84, Zizhong Chen and J. Dongarra. A scalable checkpoint encoding algorithm for diskless checkpointing. In High Assurance Systems Engineering Symposium, HASE th IEEE, pages 71 –79, dec HIPC'12 Pune,India 19

Thank you for listening.. Any questions? HIPC'12 Pune,India 20