Investigation of Data Locality in MapReduce

Slides:



Advertisements
Similar presentations
Mobility Increase the Capacity of Ad-hoc Wireless Network Matthias Gossglauser / David Tse Infocom 2001.
Advertisements

MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
SDN + Storage.
SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
MapReduce Online Veli Hasanov Fatih University.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Investigation of Data Locality and Fairness in MapReduce Zhenhua Guo, Geoffrey Fox, Mo Zhou.
Towards Energy Efficient MapReduce Yanpei Chen, Laura Keys, Randy H. Katz University of California, Berkeley LoCal Retreat June 2009.
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Embedded System Design Framework for Minimizing Code Size and Guaranteeing Real-Time Requirements Insik Shin, Insup Lee, & Sang Lyul Min CIS, Penn, USACSE,
CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.
MapReduce Simplified Data Processing On large Clusters Jeffery Dean and Sanjay Ghemawat.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
B 葉彥廷 B 林廷韋 B 王頃恩. Why we choose this topic Introduction Programming Model Example Implementation Conclusion.
MapReduce.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce and Hadoop 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 2: MapReduce and Hadoop Mining Massive.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
Spiros Papadimitriou Jimeng Sun IBM T.J. Watson Research Center Hawthorne, NY, USA Reporter: Nai-Hui, Ku.
DisCo: Distributed Co-clustering with Map-Reduce S. Papadimitriou, J. Sun IBM T.J. Watson Research Center Speaker: 吳宏君 陳威遠 洪浩哲.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Introduction to Hadoop and HDFS
Map Reduce: Simplified Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat Google, Inc. OSDI ’04: 6 th Symposium on Operating Systems Design.
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
Maximum Network Lifetime in Wireless Sensor Networks with Adjustable Sensing Ranges Cardei, M.; Jie Wu; Mingming Lu; Pervaiz, M.O.; Wireless And Mobile.
Papers on Storage Systems 1) Purlieus: Locality-aware Resource Allocation for MapReduce in a Cloud, SC ) Making Cloud Intermediate Data Fault-Tolerant,
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
A Hierarchical MapReduce Framework Yuan Luo and Beth Plale School of Informatics and Computing, Indiana University Data To Insight Center, Indiana University.
Bi-Hadoop: Extending Hadoop To Improve Support For Binary-Input Applications Xiao Yu and Bo Hong School of Electrical and Computer Engineering Georgia.
GreenSched: An Energy-Aware Hadoop Workflow Scheduler
CARDIO: Cost-Aware Replication for Data-Intensive workflOws Presented by Chen He.
Hadoop System simulation with Mumak Fei Dong, Tianyu Feng, Hong Zhang Dec 8, 2010.
Performance Model for Parallel Matrix Multiplication with Dryad: Dataflow Graph Runtime Hui Li School of Informatics and Computing Indiana University 11/1/2012.
1 Making MapReduce Scheduling Effective in Erasure-Coded Storage Clusters Runhui Li and Patrick P. C. Lee The Chinese University of Hong Kong LANMAN’15.
Database Applications (15-415) Part II- Hadoop Lecture 26, April 21, 2015 Mohammad Hammoud.
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication.
1 Enabling Efficient and Reliable Transitions from Replication to Erasure Coding for Clustered File Systems Runhui Li, Yuchong Hu, Patrick P. C. Lee The.
Matchmaking: A New MapReduce Scheduling Technique
SECTION 5: PERFORMANCE CHRIS ZINGRAF. OVERVIEW: This section measures the performance of MapReduce on two computations, Grep and Sort. These programs.
Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
MapReduce and NoSQL CMSC 461 Michael Wilson. Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities.
MapReduce: Simplified Data Processing on Large Clusters Lim JunSeok.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
MapReduce Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
MapReduce : Simplified Data Processing on Large Clusters P 謝光昱 P 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
Tao Zhu1,2, Chengchun Shu1, Haiyan Yu1
Edinburgh Napier University
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
湖南大学-信息科学与工程学院-计算机与科学系
Cse 344 May 4th – Map/Reduce.
CS 345A Data Mining MapReduce This presentation has been altered.
5/7/2019 Map Reduce Map reduce.
Presentation transcript:

Investigation of Data Locality in MapReduce Zhenhua Guo, Geoffrey Fox, Mo Zhou

Outline Introduction Analysis of Data Locality Optimality of Data Locality Experiments Conclusions

MapReduce Execution Overview Google File System Read input data Data locality Input file block 0 1 2 map tasks Stored locally Shuffle between map tasks and reduce tasks reduce tasks Stored in GFS Google File System

Hadoop Implementation Storage: HDFS - Files are split into blocks. - Each block has replicas. - All blocks are managed by central name node. Metadata mgmt. Task scheduler HDFS Name node MapReduce Job tracker Replication mgmt. Fault tolerance …… Block placement Compute: MapReduce - Each node has map and reduce slots - Tasks are scheduled to task slots # of tasks <= # of slots Hadoop Hadoop …… Operating System Operating System task slot data block Worker node 1 Worker node N

Data Locality “Distance” between compute and data Different levels: node-level, rack-level, etc. For data-intensive computing, data locality is important Energy Network traffic Research goals Evaluate how system factors impact data locality and theoretically deduce their relationship Analyze state-of-the-art scheduling algorithms in MapReduce Propose a scheduling algorithm achieving optimal data locality Mainly theoretical study

Outline Introduction Analysis of Data Locality Optimality of Data Locality Experiments Conclusions

The Goodness of Data Locality (1/3) Theoretical deduction of relationship between system factors and data locality Symbol Description N the number of nodes S the number of map slots on each node I the ratio of idle slots T the number of tasks to be executed C replication factor IS the number of idle map slots (N * S * I) p(k, T) the probability that k out of T tasks can gain data locality goodness of data locality the percent of map tasks that gain node-level data locality The goodness of data locality depends on scheduling strategy, dist. of input, resource availability, etc.

The Goodness of Data Locality (2/3) Hadoop scheduling is analyzed data are randomly placed across nodes Idle slots are randomly chosen from all slots Idle and busy slots k out of T tasks can achieve data locality Split tasks into two groups

The Goodness of Data Locality (3/3) For simplicity Replication factor C is 1 # of slots on each node S is 1 T tasks, N nodes, IS idle slots The prob. that k out of T tasks gain data locality The expectation The goodness of data locality Working on the general cases where C and S are not 1

Outline Introduction and Motivation Analysis of Data Locality Optimality of Data Locality (Scheduling) Experiments Conclusions

Non-optimality of default Hadoop sched. Problem: given a set of tasks and a set of idle slots, assign tasks to idle slots Hadoop schedules tasks one by one Consider one idle slot each time Given an idle slot, schedule the task that yields the “best” data locality(from task queue) Achieve local optimum; global optimum is not guaranteed Each task is scheduled without considering its impact on other tasks

Optimal Data Locality All idle slots need to be considered at once to achieve global optimum We propose an algorithm lsap-sched which yields optimal data locality Reformulate the problem Use a cost matrix to capture data locality information Find a similar mathematical problem: Linear Sum Assignment Problem (LSAP) Convert the scheduling problem to LSAP (not directly mapped) Prove the optimality

Optimal Data Locality – Reformulation m idle map slots {s1,…sm} and n tasks {T1,…Tn} Construct a cost matrix C Cell Ci,j is the incurred cost if task Ti is assigned to idle slot sj 0: if compute and data are co-located 1: otherwise * Reflects data locality Represent task assignment with a function Φ Given task i, Φ(i) is the slot where it is assigned Cost sum: Find an assignment to minimize Csum s1 s2 … sm-1 sm T1 1 T2 Tn-1 Tn

Optimal Data Locality – LSAP LSAP: matrix C must be square When a cost matrix C is not square, cannot apply LSAP Solution 1: shrink C to a square matrix by removing rows/columns û Solution 2: expand C to a square matrix ü If n < m, create m-n dummy tasks, and use constant cost 1 Apply LSAP, and filter out the assignment of dummy tasks If n > m, create n-m dummy slots, and use constant cost 1 Apply LSAP, and filter our the tasks assigned to dummy slots s1 s2 … sm-1 sm T1 1 Tn Tn+1 Tm s1 … sm sm+1 sn T1 1 Ti Ti+1 Tn For solution 1, need to decide which rows/columns to purge without impacting the optimality. Linear Sum Assignment Problem: Given n items and n workers, the assignment of an item to a worker incurs a known cost. Each item is assigned to one worker and each worker has one item assigned. Find the assignment that minimizes the sum of cost. dummy tasks (a) n < m (b) n > m dummy slots

Optimal Data Locality – Proof Do our transformations preserve optimality? Yes! Assume LSAP algorithms give optimal assignments (for square matrices) Proof sketch (by contradiction): The assignment function found by lsap-sched is φ-lsap. Its cost sum is Csum(φ-lsap) The total assignment cost given by LSAP algorithms for the expanded square matrix is Csum(φ-lsap) + |n - m| The key point is that the total assignment cost of dummy tasks is |n-m| no matter where they are assigned. Assume that φ-lsap is not optimal. Another function φ-opt gives smaller assignment cost. Csum(φ-opt) < Csum(φ-lsap). We extend function φ-opt, cost sum is Csum(φ-opt) + |n-m| Csum(φ-opt) < Csum(φ-lsap) ⇨ Csum(φ-opt) + |n-m| < Csum(φ-lsap) + |n-m| ⇨ The solution given by LSAP algorithm is not optimal. ⇨ This contradicts our assumption

Outline Introduction Analysis of Data Locality Optimality of Data Locality Experiments Conclusions

Experiment – The Goodness of DL Measure the relationship between system factors and data locality and verify our simulation In each test, one factor is varied while others are fixed. System configuration Parameter Default value Range (used when a factor is tested) Env. in Delay Sched. Paper num. of nodes 1000 [300, 5000]; step 100 1500 Num. of slots per node 2 [1, 32]; step 1 num. of tasks 300 (20, 21, …, 213) (24, …, 213) ratio of idle slots 0.1 [0.01, 1]; step  0.02 0.01 replication factor 3 [1, 20]; step 1

Experiment – The Goodness of DL y-axis: the ratio of map tasks that achieve data locality. better (b) Number of slots per node (a) Number of tasks (normal scale) Number of tasks (log scale) Num. of idle slots / num. of tasks (redraw e) (c) Replication factor (d) Ratio of idle slots (e) Number of nodes (f) Num. of idle slots / num. of tasks (redraw a and e) (g) Real Trace* (h) Simu. Results w/ similar config. relevance of simulation * M. Zaharia, et al, “Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling,” EuroSys 2010

Experiment – lsap-sched Measure the performance advantage of our proposed algorithm y-axis: data locality improvement (%) over native Hadoop # of tasks ≤ # of idle slot* # of nodes: 100 # of idle slots: 50 Ratio of idle slots: 50% Vary # of tasks replication factor # of nodes Randomly generate tasks. For (a) and (b), 1 slot per node. For (c), 4 slots per node. For (a), rep factor is 5. For (c), rep factor is 1.

Experiment – Measurement on FutureGrid Measure the impact of DL on job exec. Dev. a random scheduler. Parameter randomness 0: degenerates to default sched. 1: randomly schedule all tasks 0.5: half random, half default sched. Compare efficiency on FutureGrid clusters Cross-cluster performance Single-cluster performance 1-10Gbps (a) with high-speed cross-cluster net Application: input-sink. IO-intensive, read input data and throw them away. Cross-cluster: 5 nodes in Hotel, and 5 nodes in India. Each node has 8 cores. HPC-style: 10 nodes in Hotel (HDFS), and 10 nodes in India (MapReduce) ViNe 1 – 10Mbps cross-cluster bandwidth 1-10Mbps (b) with drastically heterogeneous net

Conclusions Hadoop scheduling favors data locality Deduced closed-form formulas to depict the relationship between system factors and data locality Hadoop scheduling is not optimal We propose a new algorithm yielding optimal data locality Conducted experiments to demonstrate the effectiveness. More practical evaluation is part of future work

Questions?

Backup slides

MapReduce Model Input & Output: a set of key/value pairs Two primitive operations map: (k1,v1)  list(k2,v2) reduce: (k2,list(v2))  list(k3,v3) Each map operation processes one input key/value pair and produces a set of key/value pairs Each reduce operation Merges all intermediate values (produced by map ops) for a particular key Produce final key/value pairs Operations are organized into tasks Map tasks: apply map operation to a set of key/value pairs Reduce tasks: apply reduce operation to intermediate key/value pairs Each MapReduce job comprises a set of map and reduce (optional) tasks. Use Google File System to store data Optimized for large files and write-once-read-many access patterns HDFS is an open source implementation