Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workload Modeling and its Effect on Performance Evaluation

Similar presentations


Presentation on theme: "Workload Modeling and its Effect on Performance Evaluation"— Presentation transcript:

1 Workload Modeling and its Effect on Performance Evaluation
Dror Feitelson Hebrew University Thanks toparticipants and progam committee; thanks to Monien; abuse hospitality – talk about agenda

2 Performance Evaluation
In system design Selection of algorithms Setting parameter values In procurement decisions Value for money Meet usage goals For capacity planing Important and basic activity

3 The Good Old Days… The skies were blue
The simulation results were conclusive Our scheme was better than theirs Focus on system design. Widely different designs lead to conclusive results. Feitelson & Jette, JSSPP 1997

4 Their scheme was better than ours!
But in their papers, Their scheme was better than ours! But literature is full of contradictory results.

5 How could they be so wrong?
Leads to question of what is the cause for contradictions.

6 Performance evaluation depends on:
The system’s design (What we teach in algorithms and data structures) Its implementation (What we teach in programming courses) The workload to which it is subjected The metric used in the evaluation Interactions between these factors Next: our focus is the workloads.

7 Performance evaluation depends on:
The system’s design (What we teach in algorithms and data structures) Its implementation (What we teach in programming courses) The workload to which it is subjected The metric used in the evaluation Interactions between these factors

8 Outline for Today Three examples of how workloads affect performance evaluation Workload modeling Getting data Fitting, correlations, stationarity… Heavy tails, self similarity… Research agenda In the context of parallel job scheduling Job scheduling, not task scheduling

9 Example #1 Gang Scheduling and Job Size Distribution

10 Gang What?!? Time slicing parallel jobs with coordinated context switching Ousterhout matrix Ousterhout, ICDCS 1982

11 Gang What?!? Time slicing parallel jobs with coordinated context switching Ousterhout matrix Optimization: Alternative scheduling Ousterhout, ICDCS 1982

12 Packing Jobs Use a buddy system for allocating processors
Feitelson & Rudolph, Computer 1990

13 Packing Jobs Use a buddy system for allocating processors
Start with full system in one block

14 Packing Jobs Use a buddy system for allocating processors
To allocate repeatedly partition in two to get desired size

15 Packing Jobs Use a buddy system for allocating processors

16 Packing Jobs Use a buddy system for allocating processors
Or use existing partition

17 The Question: The buddy system leads to internal fragmentation
But it also improves the chances of alternative scheduling, because processors are allocated in predefined groups Which effect dominates the other?

18 The Answer (part 1): Feitelson & Rudolph, JPDC 1996
Answer as function of workload, but not full answer because workload unknown. Dashed lines: provable bounds. Feitelson & Rudolph, JPDC 1996

19 The Answer (part 2): Note logarithmic Y axis

20 The Answer (part 2):

21 The Answer (part 2):

22 The Answer (part 2): Many small jobs Many sequential jobs
Many power of two jobs Practically no jobs use full machine Conclusion: buddy system should work well

23 Verification Using Feitelson workload Feitelson, JSSPP 1996

24 Parallel Job Scheduling
Example #2 Parallel Job Scheduling and Job Scaling

25 Variable Partitioning
Each job gets a dedicated partition for the duration of its execution Resembles 2D bin packing Packing large jobs first should lead to better performance But what about correlation of size and runtime? First-fit decreasing is optimal

26 Scaling Models Constant work Constant time Memory bound
Parallelism for speedup: Amdahl’s Law Large first  SJF Constant time Size and runtime are uncorrelated Memory bound Large first  LJF Full-size jobs lead to blockout Question is which model applies within the context of a single machine Worley, SIAM JSSC 1990

27 “Scan” Algorithm Keep jobs in separate queues according to size (sizes are powers of 2) Serve the queues Round Robin, scheduling all jobs from each queue (they pack perfectly) Assuming constant work model, large jobs only block the machine for a short time But the memory bound model would lead to excessive queueing of small jobs Important point: schedule order determined by size Krueger et al., IEEE TPDS 1994

28 The Data

29 The Data

30 The Data

31 The Data Data: SDSC Paragon, 1995/6

32 The Data Data: SDSC Paragon, 1995/6
Partitions with equal numbers of jobs; many more small jobs. Data: SDSC Paragon, 1995/6

33 The Data Data: SDSC Paragon, 1995/6
Similar range, different shape; 80th percentile moves from <1m to several h. Data: SDSC Paragon, 1995/6

34 Conclusion Parallelism used for better results, not for faster results
Constant work model is unrealistic Memory bound model is reasonable Scan algorithm will probably not perform well in practice

35 User Runtime Estimation
Example #3 Backfilling and User Runtime Estimation

36 Backfilling Variable partitioning can suffer from external fragmentation Backfilling optimization: move jobs forward to fill in holes in the schedule Requires knowledge of expected job runtimes

37 Variants EASY backfilling Make reservation for first queued job
Conservative backfilling Make reservation for all queued jobs

38 User Runtime Estimates
Lower estimates improve chance of backfilling and better response time Too low estimates run the risk of having the job killed So estimates should be accurate, right?

39 They Aren’t Mu’alem & Feitelson, IEEE TPDS 2001
Short=failed; killed typically exceeded runtime estimate, ~15% Mu’alem & Feitelson, IEEE TPDS 2001

40 Surprising Consequences
Inaccurate estimates actually lead to improved performance Performance evaluation results may depend on the accuracy of runtime estimates Example: EASY vs. conservative Using different workloads And different metrics Will focus on second bullet

41 EASY vs. Conservative Using CTC SP2 workload

42 EASY vs. Conservative Using Jann workload model
Note: jann model of CTC

43 EASY vs. Conservative Using Feitelson workload model

44 Conflicting Results Explained
Jann uses accurate runtime estimates This leads to a tighter schedule EASY is not affected too much Conservative manages less backfilling of long jobs, because respects more reservations Relative measure: more by EASY = less by conservative

45 Conservative is bad for the long jobs Good for short ones that are respected Conservative EASY

46 Conflicting Results Explained
Response time sensitive to long jobs, which favor EASY Slowdown sensitive to short jobs, which favor conservative All this does not happen at CTC, because estimates are so loose that backfill can occur even under conservative

47 Verification Run CTC workload with accurate estimates

48 But What About My Model? Simply does not have such small long jobs

49 Workload Data Sources

50 No Data Innovative unprecedented systems Use an educated guess
Wireless Hand-held Use an educated guess Self similarity Heavy tails Zipf distribution

51 Serendipitous Data Data may be collected for various reasons
Accounting logs Audit logs Debugging logs Just-so logs Can lead to wealth of information

52 NASA Ames iPSC/860 log 42050 jobs from Oct-Dec 1993
user job nodes runtime date time user cmd /10/93 10:13:17 user cmd /10/93 10:19:30 user nqs /10/93 10:22:07 user cmd /10/93 10:22:37 sysadmin pwd /10/93 10:22:42 user cmd /10/93 10:25:42 sysadmin pwd /10/93 10:30:43 user cmd /10/93 10:31:32 Feitelson & Nitzberg, JSSPP 1995

53 Distribution of Job Sizes

54 Distribution of Job Sizes

55 Distribution of Resource Use

56 Distribution of Resource Use

57 Degree of Multiprogramming

58 System Utilization

59 Job Arrivals

60 Arriving Job Sizes

61 Distribution of Interarrival Times

62 Distribution of Runtimes

63 User Activity

64 Repeated Execution

65 Application Moldability
Of jobs run more than once

66 Distribution of Run Lengths

67 Predictability in Repeated Runs
For jobs run more than 5 times

68 Recurring Findings Many small and serial jobs Many power-of-two jobs
Weak correlation of job size and duration Job runtimes are bounded but have CV>1 Inaccurate user runtime estimates Non-stationary arrivals (daily/weekly cycle) Power-law user activity, run lengths

69 Instrumentation Passive: snoop without interfering
Active: modify the system Collecting the data interferes with system behavior Saving or downloading the data causes additional interference Partial solution: model the interference

70 Data Sanitation Strange things happen
Leaving them in is “safe” and “faithful” to the real data But it risks situations in which a non-representative situation dominates the evaluation results

71 Arrivals to SDSC SP2

72 Arrivals to LANL CM-5

73 Arrivals to CTC SP2

74 Arrivals to SDSC Paragon
What are they doing at 3:30 AM?

75 3:30 AM Nearly every day, a set of 16 jobs are run by the same user
Most probably the same set, as they typically have a similar pattern of runtimes Most probably these are administrative jobs that are executed automatically

76 Arrivals to CTC SP2

77 Arrivals to SDSC SP2

78 Arrivals to LANL CM-5

79 Arrivals to SDSC Paragon

80 Are These Outliers? These large activity outbreaks are easily distinguished from normal activity They last for several days to a few weeks They appear at intervals of several months to more than a year They are each caused by a single user! Therefore easy to remove

81

82 Two Aspects In workload modeling, should you include this in the model? In a general model, probably not Conduct separate evaluation for special conditions (e.g. DOS attack) In evaluations using raw workload data, there is a danger of bias due to unknown special circumstances

83 Automation The idea: The problem:
Cluster daily data in based on various workload attributes Remove days that appear alone in a cluster Repeat The problem: Strange behavior often spans multiple days Cirne &Berman, Wkshp Workload Charact. 2001

84 Workload Modeling

85 Statistical Modeling Identify attributes of the workload
Create empirical distribution of each attribute Fit empirical distribution to create model Synthetic workload is created by sampling from the model distributions

86 Fitting by Moments Calculate model parameters to fit moments of empirical data Problem: does not fit the shape of the distribution

87 Jann et al, JSSPP 1997

88 Fitting by Moments Calculate model parameters to fit moments of empirical data Problem: does not fit the shape of the distribution Problem: very sensitive to extreme data values

89 Effect of Extreme Runtime Values
Change when top records omitted omit mean CV 0.01% -2.1% -29% 0.02% -3.0% -35% 0.04% -3.7% -39% 0.08% -4.6% 0.16% -5.7% -42% 0.31% -7.1% Downey & Feitelson, PER 1999

90 Alternative: Fit to Shape
Maximum likelihood: what distribution parameters were most likely to lead to the given observations Needs initial guess of functional form Phase type distributions Construct the desired shape Goodness of fit Kolmogorov-Smirnov: difference in CDFs Anderson-Darling: added emphasis on tail May need to sample observations

91 Correlations Correlation can be measured by the correlation coefficient It can be modeled by a joint distribution function Both may not be very useful

92

93 Correlation Coefficient
system CC CTC SP2 -0.029 KTH SP2 0.011 SDSC SP2 0.145 LANL CM-5 0.211 SDSCParagon 0.305 Gives low results for correlation of runtime and size in parallel systems

94 Distributions A restricted version of a joint distribution

95 Modeling Correlation Divide range of one attribute into sub-ranges
Create a separate model of other attribute for each sub-range Models can be independent, or model parameter can depend on sub-range

96 Stationarity Problem of daily/weekly activity cycle
Not important if unit of activity is very small (network packet) Very meaningful if unit of work is long (parallel job)

97 How to Modify the Load Multiply interarrivals or runtimes by a factor
Changes the effective length of the day Multiply machine size by a factor Modifies packing properties Add users

98 Stationarity Problem of daily/weekly activity cycle
Not important if unit of activity is very small (network packet) Very meaningful if unit of work is long (parallel job) Problem of new/old system Immature workload Leftover workload

99 Heavy Tails

100 Tail Types When a distribution has mean m, what is the distribution of samples that are larger than x? Light: expected to be smaller than x+m Memoryless: expected to be x+m Heavy: expected to be larger than x+m

101 Formal Definition Tail decays according to a power law
Test: log-log complementary distribution

102 Consequences Large deviations from the mean are realistic
Mass disparity small fraction of samples responsible for large part of total mass Most samples together account for negligible part of mass Crovella, JSSPP 2001

103 Unix File Sizes Survey, 1993

104 Unix File Sizes LLCD

105 Consequences Large deviations from the mean are realistic
Mass disparity small fraction of samples responsible for large part of total mass Most samples together account for negligible part of mass Infinite moments For mean is undefined For variance is undefined Crovella, JSSPP 2001

106 Pareto Distribution With parameter the density is proportional to
The expectation is then i.e. it grows with the number of samples

107 Pareto Samples

108 Pareto Samples

109 Pareto Samples

110 Effect of Samples from Tail
In simulation: A single sample may dominate results Example: response times of processes In analysis: Average long-term behavior may never happen in practice

111 Real Life Data samples are necessarily bounded
The question is how to generalize to the model distribution Arbitrary truncation Lognormal or phase-type distributions Something in between

112 Solution 1: Truncation Postulate an upper bound on the distribution
Question: where to put the upper bound Probably OK for qualitative analysis May be problematic for quantitative simulations

113 Solution 2: Model the Sample
Approximate the empirical distribution using a mixture of exponentials (e.g. phase-type distributions) In particular, exponential decay beyond highest sample In some cases, a lognormal distribution provides a good fit Good for mathematical analysis

114 Solution 3: Dynamic Place an upper bound on the distribution
Location of bound depends on total number of samples required Example: Note: does not change during simulation

115 Self Similarity

116 The Phenomenon The whole has the same structure as certain parts
Example: fractals

117

118 The Phenomenon The whole has the same structure as certain parts
Example: fractals In workloads: burstiness at many different time scales Note: relates to a time series

119 Job Arrivals to SDSC Paragon

120 Process Arrivals to SDSC Paragon

121 Long-Range Correlation
A burst of activity implies that values in the time series are correlated A burst covering a large time frame implies correlation over a long range This is contrary to assumptions about the independence of samples

122 Aggregation Replace each subsequence of m consecutive values by their mean If self-similar, the new series will have statistical properties that are similar to the original (i.e. bursty) If independent, will tend to average out

123 Poisson Arrivals

124 Tests Essentially based on the burstiness-retaining nature of aggregation Rescaled range (R/s) metric: the range (sum) of n samples as a function of n

125 R/s Metric

126 Tests Essentially based on the burstiness-retaining nature of aggregation Rescaled range (R/s) metric: the range (sum) of n samples as a function of n Variance-time metric: the variance of an aggregated time series as a function of the aggregation level

127 Variance Time Metric

128 Modeling Self Similarity
Generate workload by an on-off process During on period, generate work at steady pace During off period to nothing On and off period lengths are heavy tailed Multiplex many such sources Leads to long-range correlation

129 Research Areas

130 Effect of Users Workload is generated by users
Human users do not behave like a random sampling process Feedback based on system performance Repetitive working patterns

131 Feedback User population is finite
Users back off when performance is inadequate Negative feedback Better system stability Need to explicitly model this behavior

132 Locality of Sampling Users display different levels of activity at different times At any given time, only a small subset of users is active

133 Active Users

134 Locality of Sampling Users display different levels of activity at different times At any given time, only a small subset of users is active These users repeatedly do the same thing Workload observed by system is not a random sample from long-term distribution

135 SDSC Paragon Data

136 SDSC Paragon Data

137 Growing Variability

138 SDSC Paragon Data

139 SDSC Paragon Data

140 Locality of Sampling The questions:
How does this effect the results of performance evaluation? Can this be exploited by the system, e.g. by a scheduler?

141 Hierarchical Workload Models
Model of user population Modify load by adding/deleting users Model of a single user’s activity Built-in self similarity using heavy-tailed on/off times Model of application behavior and internal structure Capture interaction with system attributes

142 A Small Problem We don’t have data for these models
Especially for user behavior such as feedback Need interaction with cognitive scientists And for distribution of application types and their parameters Need detailed instrumentation

143 Final Words…

144 We like to think that we design systems based on solid foundations…

145 But beware: the foundations might be unbased assumptions!

146 Computer Systems are Complex
We should have more “science” in computer science: Collect data rather than make assumptions Run experiments under different conditions Make measurements and observations Make predictions and verify them Share data and programs to promote good practices and ensure comparability Science = experimental scince, like physics, chemistry, biology

147 Advice from the Experts
“Science if built of facts as a house if built of stones. But a collection of facts is no more a science than a heap of stones is a house” -- Henri Poincaré

148 Advice from the Experts
“Science if built of facts as a house if built of stones. But a collection of facts is no more a science than a heap of stones is a house” -- Henri Poincaré “Everything should be made as simple as possible, but not simpler” -- Albert Einstein

149 Acknowledgements Students: Ahuva Mu’alem, David Talby, Uri Lublin
Larry Rudolph / MIT Data in Parallel Workloads Archive Joefon Jann / IBM Allen Downey / Welselley CTC SP2 log / Steven Hotovy SDSC Paragon log / Reagan Moore SDSC SP2 log / Victor Hazelwood LANL CM-5 log / Curt Canada NASA iPSC/860 log / Bill Nitzberg


Download ppt "Workload Modeling and its Effect on Performance Evaluation"

Similar presentations


Ads by Google