MapReduce Scheduling in Cloud Computing

MapReduce Scheduling in Cloud Computing
Prof. Jenn-Wei Lin Department of Computer Science and Information Engineering Fu Jen Catholic University, Taiwan

Outline Introduction Proposed MapReduce Scheduling Scheme
Cloud Computing MapReduce Scheduling Proposed MapReduce Scheduling Scheme Experience Sharing for My Short-Term Research in Iowa State University

What is Cloud Computing?
Cloud Computing is a general term used to describe a new class of network based computing Using the network to provide hardware and software services to users. Hiding the complexity and details of the underlying infrastructure from users and applications Very simple graphical interface, API (Applications Programming Interface), Web-based Applications. 19th May, 09

Infrastructure as a Service
Delivery Models SaaS Software as a Service PaaS Platform as a Service IaaS Infrastructure as a Service 19th May, 09

Hadoop MapReduce Hadoop MapReduce is a software framework for easily writing applications to process vast amounts of data in HDFS. 應用層面大量數據資料的分析統計大量資料的排序彙整網頁存取紀錄的分析 Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.

MapReduce Framework …… Master node Jobtracker Job Submission Job 3
⁞ FIFO Queue Map 2 Map 1 Reduce 1 … Map m Reduce n Map task Map task Reduce task Reduce task …… Work node 1 (Tasktracker) Work node 2 (Tasktracker) Work node K-1 (Tasktracker) Work node K (Tasktracker) Hadoop runs a MapReduce.

MapReduce Framework Input Data

MapReduce Program Example
WordCount Map Function public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } } Reduce Function public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } }

MapReduce Program Example
WordCount

FIFO scheduler A default approach to scheduling users’ jobs.
All jobs run in order of submission. Each job would use the resources of whole cluster, so that all of incoming jobs have to wait their turn. Although a shared cluster offers great potential for offering large resources to many users, the problem of sharing resources fairly between users requires a better scheduler. Refer to Hadoop: The Definitive Guide by Tom White 2nd edition published by O'Reilly Media, Inc. Early versions of Hadoop had a very simple approach to scheduling users’ jobs: they ran in order of submission, using a FIFO scheduler. Typically, each job would use the whole cluster, so jobs had to wait their turn. Although a shared cluster offers great potential for offering large resources to many users, the problem of sharing resources fairly between users requires a better scheduler. Production jobs need to complete in a timely manner, while allowing users who are making smaller ad hoc queries to get results back in a reasonable time.

Fair scheduler All jobs get an equal share of resources over time.
When there is a single job running, that job uses the entire cluster. When other jobs are submitted, a portion of resources are freed up for the new jobs, so that each job gets roughly the same amount of resources. Unlike the default Hadoop scheduler, which forms a queue of jobs, this lets short jobs finish in reasonable time while not starving long jobs. The Fair Scheduler The Fair Scheduler aims to give every user a fair share of the cluster capacity over time. If a single job is running, it gets all of the cluster. As more jobs are submitted, free task slots are given to the jobs in such a way as to give each user a fair share of the cluster. A short job belonging to one user will complete in a reasonable time even while another user’s long job is running, and the long job will still make progress. Jobs are placed in pools, and by default, each user gets their own pool. A user who submits more jobs than a second user will not get any more cluster resources than the second, on average. It is also possible to define custom pools with guaranteed minimum capacities defined in terms of the number of map and reduce slots, and to set weightings for each pool. The Fair Scheduler supports preemption, so if a pool has not received its fair share for a certain period of time, then the scheduler will kill tasks in pools running over capacity in order to give the slots to the pool running under capacity.

Speculative execution (LATE Scheduler)
Consider heterogeneous environments. If a node is available but is performing poorly (we call it a straggler,) MapReduce runs a backup task (also called speculative task) on another machine to finish the computation faster. (The method is called speculative execution). Without this mechanism of speculative execution, a job would be as slow as the misbehaving task. Google has noted that speculative execution can improve job response times by 44%. Refer to Improving MapReduce Performance in Heterogeneous Environments A key benefit of MapReduce is that it automatically handles failures, hiding the complexity of fault-tolerance from the programmer. If a node crashes, MapReduce re-runs its tasks on a different machine. Equally impor-tantly, if a node is available but is performing poorly, a condition that we call a straggler, MapReduce runs a speculative copy of its task (also called a “backup task”) on another machine to finish the computation faster. Without this mechanism of speculative execution, a job would be as slow as the misbehaving task. Stragglers can arise for many reasons, including faulty hardware and misconfiguration. Google has noted that speculative ex-ecution can improve job response times by 44% [1].

When a node has an empty task slot, Hadoop chooses a task for it from one of three categories. First, any failed tasks are given highest priority. This is done to detect when a task fails repeatedly due to a bug and stop the job. Second, non-running tasks are considered. For maps, tasks with data local to the node are chosen ﬁrst. Finally, Hadoop looks for a task to execute speculatively.

Hadoop’s scheduler starts speculative tasks based on a simple heuristic comparing each task’s progress to the average progress. To select speculative tasks, Hadoop monitors task progress using a progress score between 0 and 1. For a map task, the progress score is the fraction of input data read. For a reduce task, the execution is divided into three phases (copy phase, sort phase, and reduce phase), each of which accounts for 1/3 of the score.

Deadline-Constrained MapReduce Scheduling Based on Graph Modelling
Chien-Hung Chen1, Jenn-Wei Lin2, and Sy-Yen Kuo1 1Department of Electrical Engineering, National Taiwan University 2Department of Computer Science & Information Engineering, Fu Jen Catholic University

Outline Introduction Background Proposed Scheduling Scheme
Performance Evaluation Conclusion

Introduction MapReduce is a software framework for processing data-intensive applications with a parallel manner in cloud computing systems. Many data-intensive jobs may be issued simultaneously in a cloud computing system. When users run important data-intensive jobs, they usually specify the expected deadlines of the jobs in their Service Level Agreements (SLAs) with the cloud provider.

Introduction In this paper, we propose a new scheduler that utilizes the Bipartite Graph modelling to integrate the following points in the MapReduce Scheduling. Slot performance heterogeneity Adaptive task deadline setting Combining data locality and job deadline Minimizing the number of deadline-over jobs The proposed MapReduce scheduler is called the BGMRS. In the BGMRS, a weighted bipartite graph is first formed. Based on the bipartite graph, we can obtain an optimal deadline-aware MapReduce scheduling strategy by solving the Minimum Weighted Bipartite Matching (MWBM) problem.

System Model We refer to the Hadoop cluster architecture to design our MapReduce scheduling scheme. The Hadoop cluster architecture can be used to implement a cloud system platform, which consists of a master node and multiple worker (slave) nodes. A process, the Jobtracker, is run on the master node, which can coordinates all the jobs on the cluster to place their map and reduce tasks on worker nodes. Each worker node has a TaskTracker process to control the execution of map and reduce tasks on the node. The TaskTracker also sends the progress reports of its responsible tasks to the JobTracker.

System Model In the MapReduce, the execution resource of a node is divided into a number of slots. The slot holds a portion execution resource of a node to run a map or reduce task. With the node heterogeneity, our system model specially concerns the slots of a system with performance heterogeneity.

Related Work (1/2) L.-Y. Ho, J.-J. Wu, and P. Liu, “Optimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework,” in Proc. IEEE CLOUD, Jul. 2011, pp. 420–427. Authors presented two optimal reduce placement algorithms to mitigate the network traffic among the racks while the job is in shuffle phase. By reducing the data communication, it can improve job performance. Authors assume the all-to-all communication model between the map and reduce tasks of a single job. The reduction of input data traffic to map tasks was not been discussed.

Related Work (2/2) X. Dong, Y. Wang, and H. Liao, “Scheduling Mixed Real-Time and Non-real-Time Applications in MapReduce Environment,” in Proc. IEEE ICPADS, Dec. 2011, pp. 9–16. Authors focused on the scheduling of mixed real-time and non-real-time jobs. To meet the job deadline, if the real-time jobs cannot obtain resources during an interval time, they can preempt resources from non-real-time jobs. This work does well in handling different types of the MapReduce jobs. However, the heterogeneity of computing resources is not discussed.

The DAMRS problem We investigated the Deadline-Aware MapReduce Scheduling (DAMRS) problem The main objective of the DAMRS problem is to find an optimal task scheduling strategy which can minimize both the total number of deadline-over tasks and the total task execution time. Unlike the traditional MapReduce problem, the DAMRS problem considers the tasks and slots of a system with different deadline requirements and execution performance, respectively.

Proposed Scheduling Scheme
The proposed scheme, the BGMRS, it considers multiple deadline requirements from different MapReduce jobs and various slot performance in the cloud computing system. The scheme consists of the following three steps: Deadline partition Bipartite graph formation Scheduling problem transformation

For the deadline partition, it divides a job deadline into a map deadline and a reduce deadline. As shown in the flowchart, a job deadline is divided based on the job’s execution phase. The details will be described in the following slides.

Ready phase The job deadline is divided into an initial map deadline and an initial reduce deadline. The deadline partition is done using the estimated map execution time and estimated reduce execution time to determine two partition ratios. As shown in the following equations, a task deadline is equal to the partition ratio times the job deadline. Initial map deadline Initial reduce deadline

Map phase The deadline partition in map phase also has to determine two partition ratios, as shown in the following equations. They are similar to the ready phase, but here the average map execution time is used instead of the estimated map execution time. And the original job deadline is replaced by the remaining job deadline. Map deadline Reduce deadline Reduce phase The remaining job deadline will be fully set for the reduce tasks.

For the bipartite graph formation, we first give an example here. In the example, there are 3 jobs run concurrently on the cloud system. With considering the data location and slot performance, we first find the feasible slots of each map (reduce) task and the estimated execution time on the corresponding slots. Based on the given tasks and their feasible slots, a weighted bipartite graph can be formed. The edges represent the relationship between the tasks and their corresponding feasible slots. And the costs of the edges are set by the estimated execution times. An execution scenario of MapReduce jobs Bipartite graph modelling

To reflect that the reduce tasks have higher priorities to obtain feasible slots than the map tasks, the costs of map edges on the graph are re-labelled. If a slot has one or more reduce edges, the maximum cost of the reduce edges is added to the cost of each map edge of the slot. Finally, based on the weighted bipartite graph, we can apply an existing algorithm to solve the minimum weighted bipartite matching (MWBM) problem, then obtain the optimal task scheduling.

Performance Evaluation
The simulations are performed using MATLAB. We assume that the cloud computing system has 400 servers, and the number of MapReduce jobs is set from 25 to 50. The cloud performance is refer to Amazon EC2. The schemes presented in related works are compared with our proposed scheme. Optimal Reduce Placement (ORP) The FIFO and Fair schedulers are used to extend the ORP for handling multiple jobs, called the ORP_FIFO and the ORP_FAIR, respectively. Approximately Uniform minimum Degree of parallelism (AUMD) Simulation Metrics Normalized total job elapsed time Deadline-over job ratio Average excess ratio with respect to the job deadline

Normalized total job elapsed time.
Simulation Results In Fig. (a), the total job elapsed time of the BGMRS is about 24% of that of the other schemes on average. From Fig. (b), we can see that the other schemes have longer total job elapsed time in comparison with our proposed scheme. (a) 25 jobs. (b) 50 jobs. Normalized total job elapsed time.

Deadline-over job ratio.
Simulation Results To compare the AUMD with the BGMRS, the BGMRS also significantly improve the deadline-over job ratios of the AUMD. The improvement ratio is at least 75%. (a) 25 jobs. (b) 50 jobs. Deadline-over job ratio.

Average excess ratio with respect to the job deadline.
Simulation Results The results show that the BGMRS has the smallest average excess ratio with respect to the job deadline. (a) 25 jobs. (b) 50 jobs. Average excess ratio with respect to the job deadline.

Conclusion We used the bipartite graph modelling to solve the DAMRS problem. Compared to the previous MapReduce schemes, the BGMRS has significant improvements in the total job elapsed time and the deadline-over job ratio, respectively. In the future, we will improve the computational time of the BGMRS in a large-scale cloud computing system. Furthermore, we also plan to implement the proposed scheme in real-life cloud system.

Thank you for your attention!

MapReduce Scheduling in Cloud Computing

Similar presentations

Presentation on theme: "MapReduce Scheduling in Cloud Computing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MapReduce Scheduling in Cloud Computing

Similar presentations

Presentation on theme: "MapReduce Scheduling in Cloud Computing"— Presentation transcript:

Similar presentations

About project

Feedback