Download presentation
Presentation is loading. Please wait.
Published byAdelia Alexander Modified over 9 years ago
1
Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi
2
2 발표 전날
3
3 이번에 발표 못하면 끝이야 !!!! 학점 받기는 불가능해 !!!! + 졸업시험 !!
4
4 시간안에 죽지않고 발표 준비를 마칠 수 있을까
5
5 목차 1. Introduction 2. Profiler 3. What-if engine 4. Cost-based optimizer 5. Experimental evaluation 6. Conclusion
6
6 Introduction MapReduce has emerged as a viable competitor to database systems in big data analytics. Profiler, What-if Engine, Cost-based Optimizer Profiler : collect detailed statistical information from unmodified MapReduce programs. What-if Engine : fine-grained costestimation. Cost-based Optimizer : optimize configuration parameter setting.
7
7 Introduction MapReduce job J J = p: MapReduce program d: map(k1, v1) 과 reduce(k2, list(v2)) 두 함수를 통해 입력되는 data r: Cluster resources c: Configuration parameter settings
8
8 Introduction Configuration parameter settings include.. The number of map tasks The number of reduce tasks The amount of memory The settings for multiphase external sorting Whether the output data from the map (reduce) tasks should be compressed before being written to disk Whether a program-specified Combiner function should be used to preaggregate map outputs before their transfer to reduce tasks.
9
9 Introduction
10
10 Introduction
11
11 Introduction Costbased Optimization to Select Configuration Parameter Settings Automatically perf = F(p, d, r, c) perf is some performance metric of interest for jobs Optimizing the performance of program p for given input data d and cluster resources r requires finding configuration parameter settings that give near-optimal values of perf.
12
12 Introduction MapReduce program optimization poses new challenges compared to conventional database query optimization Black-box map and reduce functions Lack of schema and statistics about the input data Differences in plan spaces Cost-based Optimizer Profiler What-if Engine Cost-based Optimizer
13
13 Profiler Phase of Map Task Execution Read, Map, Collect, Spill, Merge Phase of Reduce Task Execution Shuffle, Merge, Reduce, Write
14
14 Profiler Job Profiler A MapReduce job profile is a vector in which each field captures some unique aspect of dataflow or cost during job execution at the task level or the phase level within tasks. Data flow fields Cost fields Dataflow Statistics fields Cost Statistics fields
15
15 Profiler Using Profiles to Analyze Job Behavior
16
16 Profiler Generating Profiles via Measurement Job profiles are generated in two distinct ways.(Profiler, What-if Engine) Monitoring through dynamic instrumentation From raw monitoring data to profile fields Task-level sampling to generate approximate profiles
17
17 What-if Engine A what-if question has the following form Given the profile of a job j = hp; d1; r1; c1i that runs a MapReduce program p over input data d1 and cluster resources r1 using configuration c1, what will the performance of program p be if p is run over input data d2 and cluster resources r2 using configuration c2? That is, how will job j0 = hp; d2; r2; c2i perform? The What-if Engine executes the following two steps to answer a what-if question Estimating a virtual job profile for the hypothetical job j’. Using the virtual profile to simulate how j’ will execute. We will discuss these steps in turn.
18
18 What-if Engine Estimating the Virtual Profile Estimating Dataflow and Cost fields Estimating Dataflow Statistics fields Estimating Cost Statistics fields
19
19 What-if Engine Estimating Dataflow and Cost fields detailed set of analytical (white-box) models for estimating the Dataflow and Cost fields in the virtual job profile for j'. Estimating Dataflow Statistics fields Dataflow proportionality assumption Estimating Cost Statistics fields Cluster node homogeneity assumption Simulating the Job Execution Task Scheduler Simulator
20
20 Cost-based Optimizer (CBO) MapReduce program optimization can be defined as Given a MapReduce program p to be run on input data d and cluster resources r, find the setting of configuration parameters for the cost model F represented by the What-if Engine over the full space S of configuration parameter settings. The CBO addresses this problem by making what-if calls with settings c of the configuration parameters selected through an enumeration and search over S. Once a job profile to input to the What-if Engine is available, the CBO uses a two-step process, discussed next.
21
21 Cost-based Optimizer (CBO) Subspace Enumeration A straightforward approach the CBO can take is to apply enumeration and search techniques to the full space of parameter settings S. More efficient search techniques can be developed if the individual parameters in c can be grouped into clusters. Equation 2 states that the globally-optimal setting c opt can be found using a divide and conquer approach by : breaking the higher-dimensional space S into the lower-dimensional subspaces S (i) considering an independent optimization problem in each smaller subspace composing the optimal parameter settings found per subspace to give the setting c opt
22
22 Cost-based Optimizer (CBO) Search Strategy within a Subspace searching within each enumerated subspace to find the optimal configuration in the subspace. Gridding (Equispaced or Random) Recursive Random Search (RRS) RRS provides probabilistic guarantees on how close the setting it finds is to the optimal setting RRS is fairly robust to deviations of estimated costs from actual performance RRS scales to a large number of dimensions
23
23 Cost-based Optimizer (CBO) there are two choices for subspace enumeration: Full or Clustered that deal respectively with the full space or smaller subspaces for map and reduce tasks three choices for search within a subspace: Gridding (Equispaced or Random) and RRS.
24
24 Experimental Evaluation
25
25 Experimental Evaluation
26
26 Experimental Evaluation
27
27 Experimental Evaluation
28
28 Experimental Evaluation
29
29 Experimental Evaluation
30
30 Discussion and Future work Costbased Optimizer for simple to arbitrarily complex MapReduce programs. Several new research challenges arise when we consider the full space of optimization opportunities provided by these higher-level systems. proposed a lightweight Profiler to collect detailed statistical information from unmodified MapReduce programs. proposed a What-if Engine for the fine-grained cost estimation needed by the Cost-based Optimizer.
31
Q & A 31
32
32 좋아 ! 이정도면 선방했 …
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.