Download presentation
Presentation is loading. Please wait.
Published byEugenia Melina Pope Modified over 9 years ago
2
The Forgotten Factor: FACTS on Performance Evaluation and its Dependence on Workloads Dror Feitelson Hebrew University
3
Performance Evaluation In system design –Selection of algorithms –Setting parameter values In procurement decisions –Value for money –Meet usage goals For capacity planing
4
The Good Old Days… The skies were blue The simulation results were conclusive Our scheme was better than theirs Feitelson & Jette, JSSPP 1997
5
But in their papers, Their scheme was better than ours!
6
How could they be so wrong?
7
The system’s design (What we teach in algorithms and data structures) Its implementation (What we teach in programming courses) The workload to which it is subjected The metric used in the evaluation Interactions between these factors Performance evaluation depends on:
8
The system’s design (What we teach in algorithms and data structures) Its implementation (What we teach in programming courses) The workload to which it is subjected The metric used in the evaluation Interactions between these factors Performance evaluation depends on:
9
Outline for Today Three examples of how workloads affect performance evaluation Workload modeling Research agenda In the context of parallel job scheduling
10
Example #1 Gang Scheduling and Job Size Distribution
11
Gang What?!? Time slicing parallel jobs with coordinated context switching Ousterhout matrix Ousterhout, ICDCS 1982
12
Gang What?!? Time slicing parallel jobs with coordinated context switching Ousterhout matrix Optimization: Alternative scheduling Ousterhout, ICDCS 1982
13
Packing Jobs Use a buddy system for allocating processors Feitelson & Rudolph, Computer 1990
14
Packing Jobs Use a buddy system for allocating processors
15
Packing Jobs Use a buddy system for allocating processors
16
Packing Jobs Use a buddy system for allocating processors
17
Packing Jobs Use a buddy system for allocating processors
18
The Question: The buddy system leads to internal fragmentation But it also improves the chances of alternative scheduling, because processors are allocated in predefined groups Which effect dominates the other?
19
The Answer (part 1): Feitelson & Rudolph, JPDC 1996
20
Proof of Utilization Bound A uniform distribution:
21
Proof of Utilization Bound Round up to next power of 2:
22
Proof of Utilization Bound Recover some fragmented space using selective disabling:
23
The Answer (part 2):
26
Many small jobs Many sequential jobs Many power of two jobs Practically no jobs use full machine Conclusion: buddy system should work well
27
Verification Feitelson, JSSPP 1996
28
Example #2 Parallel Job Scheduling and Job Scaling
29
Variable Partitioning Each job gets a dedicated partition for the duration of its execution Resembles 2D bin packing Packing large jobs first should lead to better performance But what about correlation of size and runtime?
30
“Scan” Algorithm Keep jobs in separate queues according to size (sizes are powers of 2) Serve the queues Round Robin, scheduling all jobs from each queue (they pack perfectly) Assuming constant work model, large jobs only block the machine for a short time Krueger et al., IEEE TPDS 1994
31
Scaling Models Constant work –Parallelism for speedup: Amdahl’s Law –Large first SJF Constant time –Size and runtime are uncorrelated Memory bound –Large first LJF –Full-size jobs lead to blockout Worley, SIAM JSSC 1990
32
The Data Data: SDSC Paragon, 1995/6
33
The Data Data: SDSC Paragon, 1995/6
34
The Data Data: SDSC Paragon, 1995/6
35
Conclusion Parallelism used for better results, not for faster results Constant work model is unrealistic Memory bound model is reasonable Scan algorithm will probably not perform well in practice
36
Example #3 Backfilling and User Runtime Estimation
37
Backfilling Variable partitioning can suffer from external fragmentation Backfilling optimization: move jobs forward to fill in holes in the schedule Requires knowledge of expected job runtimes
38
Variants EASY backfilling Make reservation for first queued job Conservative backfilling Make reservation for all queued jobs
39
User Runtime Estimates Lower estimates improve chance of backfilling and better response time Too low estimates run the risk of having the job killed So estimates should be accurate, right?
40
They Aren’t Mu’alem & Feitelson, IEEE TPDS 2001
41
Surprising Consequences Inaccurate estimates actually lead to improved performance Performance evaluation results may depend on the accuracy of runtime estimates –Example: EASY vs. conservative –Using different workloads –And different metrics
42
EASY vs. Conservative Using CTC SP2 workload
43
EASY vs. Conservative Using Jann workload model
44
EASY vs. Conservative Using Feitelson workload model
45
Conflicting Results Explained Jann uses accurate runtime estimates This leads to a tighter schedule EASY is not affected too much Conservative manages less backfilling of long jobs, because respects more reservations
46
Conservative is bad for the long jobs Good for short ones that are respected Conservative EASY
47
Conflicting Results Explained Response time sensitive to long jobs, which favor EASY Slowdown sensitive to short jobs, which favor conservative All this does not happen at CTC, because estimates are so loose that backfill can occur even under conservative
48
Verification Run CTC workload with accurate estimates
49
But What About My Model? Simply does not have such small long jobs
50
Workload Modeling
51
No Data Innovative unprecedented systems –Wireless –Hand-held Use an educated guess –Self similarity –Heavy tails –Zipf distribution
52
Serendipitous Data Data may be collected for various reasons –Accounting logs –Audit logs –Debugging logs –Just-so logs Can lead to wealth of information
53
NASA Ames iPSC/860 log 42050 jobs from Oct-Dec 1993 user job nodes runtime date time user4 cmd8 32 70 11/10/93 10:13:17 user4 cmd8 32 70 11/10/93 10:19:30 user42 nqs450 32 3300 11/10/93 10:22:07 user41 cmd342 4 54 11/10/93 10:22:37 sysadmin pwd 1 6 11/10/93 10:22:42 user4 cmd8 32 60 11/10/93 10:25:42 sysadmin pwd 1 3 11/10/93 10:30:43 user41 cmd342 4 126 11/10/93 10:31:32 Feitelson & Nitzberg, JSSPP 1995
54
Distribution of Job Sizes
56
Distribution of Resource Use
58
Degree of Multiprogramming
59
System Utilization
60
Job Arrivals
61
Arriving Job Sizes
62
Distribution of Interarrival Times
63
Distribution of Runtimes
64
Job Scaling
65
User Activity
66
Repeated Execution
67
Application Moldability
68
Distribution of Run Lengths
69
Predictability in Repeated Runs
70
Research Agenda
71
The Needs New systems tend to be more complex Differences tend to be finer Evaluations require more detailed data Getting more data requires more work Important areas: –Internal structure of applications –User behavior
72
Generic Application Model Iterations of –Compute granularity Memory working set / locality –I/O Interprocess locality –Communicate Pattern, volume Option of phases with different patterns of iterations compute I/O communicate
73
Consequences Model the interaction of the application with the system –Support for communication pattern –Availability of memory Application attributes depend on system Effect of multi-resource schedulers
74
Missing Data There has been some work on the characterization of specific applications There has been no work on the distribution of application types in a complete workload –Distribution of granularities –Distribution of working set sizes –Distribution of communication patterns
75
Effect of Users Workload is generated by users Human users do not behave like a random sampling process –Feedback based on system performance –Repetitive working patterns
76
Feedback User population is finite Users back off when performance is inadequate Negative feedback Better system stability Need to explicitly model this behavior
77
Locality of Sampling Users display different levels of activity at different times At any given time, only a small subset of users is active These users repeatedly do the same thing Workload observed by system is not a random sample from long-term distribution
78
Final Words…
79
We like to think that we design systems based on solid foundations…
80
But beware: the foundations might be unbased assumptions!
81
Computer Systems are Complex We should have more “science” in computer science: Run experiments under different conditions Make measurements and observations Make predictions and verify them
82
Acknowledgements Students: Ahuva Mu’alem, David Talby, Uri Lublin Larry Rudolph / MIT Data in Parallel Workloads Archive –Joefon Jann / IBM –CTC SP2 log –SDSC Paragon log –SDSC SP2 log –NASA iPSC/860 log
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.