Riza Suminto, Agung Laksono *, Anang Satria *, Thanh Do †, Haryadi Gunawi †*

Slides:



Advertisements
Similar presentations
The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
Advertisements

Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
MapReduce Online Veli Hasanov Fatih University.
Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.
1 Degraded-First Scheduling for MapReduce in Erasure-Coded Storage Clusters Runhui Li, Patrick P. C. Lee, Yuchong Hu The Chinese University of Hong Kong.
Predicting Execution Bottlenecks in Map-Reduce Clusters Edward Bortnikov, Ari Frank, Eshcar Hillel, Sriram Rao Presenting: Alex Shraer Yahoo! Labs.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Distributed Computations
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
What Bugs Live in the Cloud? A Study of Issues in Cloud Systems Authors: Haryadi S. Gunawi, Mingzhe Hao, Tanakorn Leesatapornwongsa, Tiratat Patana-anake,
1 Community (Optimize both Yarn & Non Yarn Hadoop clusters)
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 3: Mapreduce and Hadoop All slides © IG.
Distributed Computations MapReduce
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
What Bugs Live in the Cloud? A Study of Issues in Cloud Systems Jeffry Adityatama, Kurnia J. Eliazar, Agung Laksono, Jeffrey F. Lukman, Vincentius.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
HADOOP ADMIN: Session -2
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Edge Based Cloud Computing as a Feasible Network Paradigm(1/27) Edge-Based Cloud Computing as a Feasible Network Paradigm Joe Elizondo and Sam Palmer.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
Introduction to MapReduce ECE7610. The Age of Big-Data  Big-data age  Facebook collects 500 terabytes a day(2011)  Google collects 20000PB a day (2011)
HAMS Technologies 1
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
HAMS Technologies 1
MapReduce M/R slides adapted from those of Jeff Dean’s.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) Introduction to MapReduce and Hadoop Shivnath Babu.
MARISSA: MApReduce Implementation for Streaming Science Applications 作者 : Fadika, Z. ; Hartog, J. ; Govindaraju, M. ; Ramakrishnan, L. ; Gunter, D. ; Canon,
Cloud Testing Haryadi Gunawi Towards thousands of failures and hundreds of specifications.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.
1 Making MapReduce Scheduling Effective in Erasure-Coded Storage Clusters Runhui Li and Patrick P. C. Lee The Chinese University of Hong Kong LANMAN’15.
MapReduce. What is MapReduce? (1) A programing model for parallel processing of a distributed data on a cluster It is an ideal solution for processing.
MAP REDUCE BASICS CHAPTER 2. Basics Divide and conquer – Partition large problem into smaller subproblems – Worker work on subproblems in parallel Threads.
SLIDE 1IS 240 – Spring 2013 MapReduce, HBase, and Hive University of California, Berkeley School of Information IS 257: Database Management.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Data Centers and Cloud Computing 1. 2 Data Centers 3.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
CSS534: Parallel Programming in Grid and Cloud
Introduction to MapReduce and Hadoop
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
MapReduce Simplied Data Processing on Large Clusters
The Basics of Apache Hadoop
Cloud Distributed Computing Environment Hadoop
湖南大学-信息科学与工程学院-计算机与科学系
February 26th – Map/Reduce
Cse 344 May 4th – Map/Reduce.
Charles Tappert Seidenberg School of CSIS, Pace University
5/7/2019 Map Reduce Map reduce.
COS 518: Distributed Systems Lecture 11 Mike Freedman
MapReduce: Simplified Data Processing on Large Clusters
CS Introduction to Operating Systems
Presentation transcript:

Riza Suminto, Agung Laksono *, Anang Satria *, Thanh Do †, Haryadi Gunawi †*

2 HotCloud ’15

 Users demand high dependability, reliability, and performance stability  Amazon found that every 100ms of latency cost them 1% in sales  Google found an extra 0.5 second in search page generation time dropped traffic by 20% 3 HotCloud ’15

What Bugs Live in the Cloud? A Study of Issues in Cloud Systems, SOCC’ % HotCloud ’15

Performance Bug System Performance Verifier HotCloud ’15 5

6  Jobs take multiple times than usual to finish  Improper speculative execution JCH 1 & TPL 1 & FPL 2 & FTY 1  Unnecessary repeated recovery TPL 1 & TPL 4 & FTY 4 & TOP 1 HotCloud ’15

7 Map read locally Mappers and reducers in different nodes All-to-All Fault at map node Slow NIC DLC A TPL A FPL A FTY A JCH A M1 M2 M3 Mappers Reducers All reducers slow! DLC A & TPL A & JCH A & FPL A & FTY A No straggler = No SpecExec HotCloud ’15 slow!

 DLC A & TPL A & JCH A & FPL A & FTY A 8 M1 M2 M3 Mapper s DN DLC B = read remote Straggler! HotCloud ’15  DLC A & TPL A & JCH A & FPL A & FTY A M1 M2 M3 Mappers Reducers

 DLC A & TPL A & JCH A & FPL A & FTY A 9 M1 M2 M3 Mappers Reducers FPL B slow reducer = Straggler! HotCloud ’15  DLC A & TPL A & JCH A & FPL A & FTY A M1 M2 M3 Mappers Reducers

10 Mappers and Reducers in different nodes Mappers and Reducers in different racks Large number of nodes per rack Slow inter-rack switch M M M M R Rack 1Rack 2 M TPL A TPL B TOP A FTY B TPL A & TPL B & TOP A & FTY B HotCloud ’15 slow!

 Untriggered Speculative Execution  MR = JCH 1 & TPL 1 & FPL 2 & FTY 1  MR = DSR 1 & DLC 1 & FPL 1 & FTY 1  MR-5533 = FTY 2 & FPL 3 & TPL 3 ……  O(n) Recovery  MR-5251 = FTY 3 & FPL 3 & FTM 1  MR-5060 = TPL 1 & TPL 3 & FTY 1 & FPL 2  MR-1800 = TPL 1 & TPL 4 & FTY 4 & TOP 1 ……  Long lock contention  MR-9191 = FTY 3 & FPL 3 & FTM 1  MR-9292 = TPL 1 & TPL 3 & FTY 1 & FPL 2  MR-9393 = TPL 1 & TPL 4 & FTY 4 & TOP 1 …… 11 Scenario TypePossible Condition DLC: Data Locality(1) Read from remote disk, (2) read from local disk,... DSR: Data Source(1) Some tasks read from same datanode, (2) all tasks read from different datanodes, … JCH: Job Characteristic Map-reduce is (1) many-to-all, (2) all-to-many, (3) large fan-in, (4) large fan-out,... JSZ: Job Size(1)1GBjarfile,(2)1MBjarfile,... LSZ: Load Size(1) Thousands of tasks, (2) small number of tasks, … FTY: Fault Type(1) Slow node/NIC, (2) Node disconnect/packet drop, (3) Disk error/out of space, (4) Rack switch, … FPL: Fault PlacementSlowdown fault injection at the (1) source datanode, (2) mapper, (3) reducer, … FGR: Fault Ganularity(1) Single disk/NIC, (2) single node (deadnode), (3) en- tire rack (network switch), … FTM: Fault Timing(1) During shuffling, (2) during 95% of task completion, … TOP: Topology(1) 30 nodes per rack, (2) 3 nodes per rack, … TPL: Task Placement(1) Mappers and reducers are in different nodes, (2) AM and reducers in different nodes, (3) Mappers are in the same node, (4) Most of reducers placed in the same rack,... HotCloud ’15

Performance Bug System Performance Verifier HotCloud ’15 12

HotCloud ’15 13  Benchmarking  Hundreds benchmark for every scenario  Injecting slowdowns and failures  Take days to weeks!!

14  Four goals in performance verification  Fast  Covers many deployment scenario  Runs in pre-deployment  Directly checks implementation code HotCloud ’15 Formal modeling tools!

15 HotCloud public class JobInProgress { JobID jobId; TaskInProgress maps[];... public HeartbeatResponse heartbeat (HeartbeatData hd){... } Target system (e.g., Hadoop code) SPV Compiler Auto-generated model (in Colored Petri Net) Performance Verification 20X larger than hand model Hand model

16 Tasks Node Task to Run input(node,task); output(assignment); action let val (id,type) = task in (node,id,type) node assignment task Schedule Task HotCloud ’15

17 CPNJava HotCloud ’15

 Java  SysJava  Data flattening  Code modularization  Annotation tagging  SysJava  Model compiler 18 HotCloud ’15

 Java system states = ArrayList, Map, Tree,…  CPN states= multisets 19 List runningJobs; public class JobInProgress { JobID jobId; TaskInProgress maps[];... } class TaskInProgress { TaskID id; double progress;... } Job In Progres s Task In Progres s Job Task Mappin g [(1)] [(1,a),(1,b)] [(a,10%),(b,15%)] HotCloud ’15

20 private boolean processHeartbeat( TaskTrackerStatus trackerStats) { synchronized (taskTrackers) {... } for (TaskStatus ts: trackerStats) { tasks.get(ts.id).updateStatus(ts); }... private void initCheck() { synchronized (taskTrackers) {... private void updateStatuses( TaskTrackerStatus trackerStats) { for (TaskStatus ts: trackerStats) {... private TaskInProgress getTask(TaskID id) { tasks.get(ts.id); private void tipUpdate(TaskInProgress tip, TaskStatus ts) { tip.updateStatus(ts); } Modular function Control Flow logic CRUD Logic HotCloud ’15

 Assist compiler  Annotation Category:  Data Structure  I/O  CRUD & Process  Miscellaneous 21 public HeartbeatResponse heartbeat (HeartbeatData hd) {... } public class JobInProgress { JobID jobId; TaskInProgress maps[];... }

 SPV Compiler  Executable XML  Define configurations, assertions, and specifications  Explore every non-deterministic choices  Task to node mapping 22 Tasks Node Task to Run (“T1”,map) B (A,“T1”,map) Tasks Node Task to Run (“T1”,map) B T1 on AT1 on B A Schedule Task A (B,“T1”,map) Schedule Task HotCloud ’15

 5305 lines of code on top of WALA & Access/CPN  Hadoop MapReduce 1.2.1, with 1067 lines code change  20x larger than hand-made model  34 scenario, 30 assertion violation, 4 performance bug  1.5 hour model checking 23 ConfigurationValue Worker NodeNode A, B Data NodeNode A, B, C Tasks2 Task Fault TypeSlow Data Node Fault PlacementNode C HotCloud ’15

24 HotCloud ’15

25  Is it time for pre-deployment detection of performance bugs?  Bridging system code and formal methods  Future of data-centric languages  Beyond Hadoop  Root cause anatomy of performance bugs  Beyond performance bugs HotCloud ’15