Resource Specification Prediction Model Richard Huang joint work with Henri Casanova and Andrew Chien.

Resource Specification Prediction Model Richard Huang ryhuang@cs.ucsd.edu joint work with Henri Casanova and Andrew Chien

Introduction  Increasing deployment of clusters  Decreasing hardware prices  More choices in cluster vendors  Increasing availability of open source cluster management tools (e.g. ROCKS)  Advances in networking technology  10-40Gbps aggregate bandwidth  Optical fibres  Large-scale distributed environments (LSDEs) can be used to run large-scale loosely synchronous apps such as scientific workflows  More resources mean bigger problems can be solved

Running Applications on LSDEs One key challenge in running scientific workflows is resource selection

What’s the problem? Different resource selection systems are out there (such as VGES) How does one go about writing the resource specification? We don’t know any other work that address this problem.

Solution depends on… Application (DAG) characteristics Type of scheduling algorithm employed Types of resources available

Assumptions Resources are plentiful —Therefore we can pick the right size RC Resources are dedicated or have advanced reservation OR Underlying middleware (such as VGES) can take care of interfacing with batch queue systems. Bandwidth is reasonably plentiful We don’t deal with network contention Performance models for applications so we know task runtimes

Resource Specification Prediction Model  Empirical Model uses input of DAG characteristics and optional utility function  Heuristic Prediction Model predicts the best scheduling heuristic to use  Size Prediction Model predicts the optimal resource collection (RC) size

Strategy in Formulating Prediction Model 1. Determine relevant DAG characteristics 2. Define what the best RC should be 3. Execute reference scheduling heuristic on an observation set of DAG configurations while varying relevant DAG characteristics 4. Derive model from the observation set results that predicts the best RC size

Relevant DAG Characteristics DAG size Communication-to-computation ratio (CCR) Amount of parallelism Regularity among tasks from different DAG levels Other possible characteristics: —DAG height and average number of tasks per level (subsumed by above characteristics)

Define best RC size Take an application and run it on different number of hosts Best RC size is where increasing the number of hosts does not improve performance (knee value)

Observation Set DAG CharacteristicsValues DAG size100,500,1000,5000,10000 CCR0.01,0.1,0.3,0.5,0.8,1.0 Parallelism0.3,0.4,0.5,0.6,0.7,0.8,0.9 Regularity0.01,0.1,0.3,0.5,0.8,1.0 Instantiate 10 random DAGs for each DAG configuration Vary number of tasks per level randomly while maintaining parallelism, regularity, and size Idea is to run scheduling heuristics on each DAG configuration and try to see if we can detect some trends

Size Prediction Model Formulation For better tractability, at first consider only parallelism (  ) and regularity (  ) for each (size, CCR) pair Knee size seems to approximately double for every 0.1 increase in  Knee size decreases with increase in regularity Based on tables similar to one on right, hypothesize size prediction could be modeled as 2 (a  +b  +c) \\ 0.010.10.30.50.81.0 0.3 3432221814 0.4 523628242220 0.5 806258505642 0.6 13614012811294128 0.7 328312280248212196 0.8 464456448 432 0.9 496 440 432392 Sample observation set knee values

Size Prediction Model Formulation We need to solve for a,b,and c for each (size CCR) pair Use linear regression to do planar fit of logarithm of knee value Interpolation between different DAG sizes and CCR values

Model Validation Two workloads — randomly generated DAGs (range of different DAG characteristics) —Montage DAGs Performance Metric: Application Turn-Around time (scheduling time + makespan) Cost: Derived cost from Amazon’s Elastic Cloud of $0.10 per hour for a 1.7 GHz processor Use brute force method to calculate optimal size

Randomly generated DAGs Observ. Set DAG Sizes Average Predicted Size Diff. Average Perform Degrad. Relative Cost 1009.59%0.18%-6.75% 50011.49%0.22%-5.29% 10009.62%0.32%-4.32% 500013.27%0.77%-4.72% Midpoint DAG sizes 30013.41%0.34%-11.31% 75011.85%0.29%-5.59% 300014.97%1.08%-9.98% Peak performance degradation was 15%, but on average, most were below 1% Prediction model predicted smaller RC sizes (reduces costs)

Comparison with using DAG width Using DAG width to try to maximize parallelism costs a lot more! Similar performance degradation as our model for smaller DAGs As DAG size increases, performce degrades rapidly as scheduling times becomes larger because of larger RC sizes Observ. Set DAG Sizes Average Perform Degrad. Relative Cost 1000.50%144.8% 5000.20%425.7% 10000.45%562.9% 500022.66%998.1% Midpoint DAG sizes 3000.30%219.2% 7500.26%503.0% 30006.80%759.8%

Performance Cost Tradeoff User can specify optional utility function For example 1% performance degradation for every 10% in cost Knee threshold of 2% provides best utility in this example

Montage Different thresholds did not degrade performance too much Better savings at higher thresholds (using fewer hosts) 1629-tasks4469-tasks Thresh.Perf. Degr. Relative Costs Perf. Degr. Relative Costs 0.1%0.08%11.2%0.00%0.0% 0.5%0.04%7.5%0.00%-2.4% 1.0%0.01%0.6%0.00%-4.0% 2.0%0.89%-13.5%1.35%-21.2% 5.0%0.75%-30.8%1.81%-30.4% 10.0%4.18%-48.2%4.67%-51.0%

Sensitivity Analysis Previous results all based on homogeneous clock rates and reference scheduling heuristic Should look at how model reacts to: —Different levels of clock rate heterogeneity —Different scheduling heuristics

Impact of Clock Rate Heterogeneity

Impact of Scheduling Heuristics

Summary We devised empirical model to predict good RC size We have shown that our model leads to good application performance at often reduced costs from optimal RC size Our model maintains good performance over range of DAG configurations, range of resource heterogeneity, and over different scheduling heurisitics

Future Work Heuristic Prediction Model to predict which heuristic to use given input DAG and optional utility function How do we degrade gracefully when the resource selection system cannot return the desired resource collection Translate output of our model into input to different resource selection systems

Resource Specification Prediction Model Richard Huang joint work with Henri Casanova and Andrew Chien.

Similar presentations

Presentation on theme: "Resource Specification Prediction Model Richard Huang joint work with Henri Casanova and Andrew Chien."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Resource Specification Prediction Model Richard Huang joint work with Henri Casanova and Andrew Chien.

Similar presentations

Presentation on theme: "Resource Specification Prediction Model Richard Huang joint work with Henri Casanova and Andrew Chien."— Presentation transcript:

Similar presentations

About project

Feedback