Presentation is loading. Please wait.

Presentation is loading. Please wait.

SERC Research Seminar Day August 18, 2007 Predictions for Parallel Applications and Systems Sathish Vadhiyar Grid Applications Research Laboratory (GARL)

Similar presentations


Presentation on theme: "SERC Research Seminar Day August 18, 2007 Predictions for Parallel Applications and Systems Sathish Vadhiyar Grid Applications Research Laboratory (GARL)"— Presentation transcript:

1 SERC Research Seminar Day August 18, 2007 Predictions for Parallel Applications and Systems Sathish Vadhiyar Grid Applications Research Laboratory (GARL)

2 SERC Research Seminar Day August 18, 2007 GARL Research Grid Applications –Climate Modeling –Gene Mutations Performance Modeling Rescheduling Others –Prediction of queue wait times

3 SERC Research Seminar Day August 18, 2007 GARL Research Grid Applications –Climate Modeling –Gene Mutations Performance Modeling Rescheduling Others –Prediction of queue wait times

4 SERC Research Seminar Day August 18, 2007 Rescheduling The base is a parallel checkpointing library called SRS Checkpointing? – storing application’s state so as to continue from the previous state after interruption Interruption either by a scheduler or system faults SRS allows processor reconfiguration

5 SERC Research Seminar Day August 18, 2007 Application Progress System 1 Storage System 2

6 SERC Research Seminar Day August 18, 2007 Optimal Checkpoint Interval Storing checkpoints periodically will help in fault-tolerance How periodic? What is the optimal checkpoint interval? –More checkpointing will lead to increased checkpoint overhead –Less checkpointing frequency will lead to increase times for recovery from failures

7 SERC Research Seminar Day August 18, 2007 Illustration

8 SERC Research Seminar Day August 18, 2007 Dynamic Determination of Optimal Checkpointing Intervals Start the application on a set of resources Predict the next failure on the set of resources Checkpoint “just before” the next failure The prediction has to be really accurate But no prediction can be 100% accurate

9 SERC Research Seminar Day August 18, 2007 Probability Distribution of Failures Use a probability distribution of failures on the resources Need to know: The next time of failure with x% certainty But more certainty is also not good

10 SERC Research Seminar Day August 18, 2007 Markov Chains For parallel M-M checkpointing In SRS, there is almost no system down phase For sequential applications In SRS, transition from state 0 can lead to many states

11 SERC Research Seminar Day August 18, 2007 GARL Research Grid Applications –Climate Modeling –Gene Mutations Performance Modeling Rescheduling Others –Prediction of queue wait times

12 SERC Research Seminar Day August 18, 2007 Motivation for Queue Wait Times A Grid consisting of number of batch queues A meta system that will: –predict the wait times and execution times of jobs –Decide which queue is “most suitable” for the job

13 SERC Research Seminar Day August 18, 2007 What is a good predictor? There are number of prediction strategies Evaluating a predictor’s goodness: 1.Mean Absolute Percentage Error (MAPE) 2.Upper bound for actual/predicted 3.Average of (actual-predicted) [absolute error] 4.Absolute error/actual wait time [relative error] 5.Average error/average queue wait time 6.Coefficient of correlation Each of these metrics has flaws

14 SERC Research Seminar Day August 18, 2007 Illustration Method 1Method 2 Metric 3 value of Method 1 < Metric 3 value of Method 2 i.e. Method 1 is better

15 SERC Research Seminar Day August 18, 2007 Our goals To define useful metrics that can clearly say whether a method is “good” or “bad” Goodness of predictors –In terms of absolute wait times –In terms of execution times –In terms of resource demand

16 SERC Research Seminar Day August 18, 2007 Illustration: Prediction errors versus absolute wait times (A- P)/A% Wait times y1 x1, y1 f(x) x2, y2

17 SERC Research Seminar Day August 18, 2007 Reality??

18 SERC Research Seminar Day August 18, 2007 What we want to do… Define metrics that can evaluate a method in the “absolute” sense, not “comparative” sense –Stare at a single graph and ask “Is this graph good” as much as possible In some cases, it may just not be possible –Use comparisons Evaluate the existing methods on these sets of metrics Come up with a method that performs the best in terms of all of the defined metrics

19 SERC Research Seminar Day August 18, 2007 GARL Research Grid Applications –Climate Modeling –Gene Mutations Performance Modeling Rescheduling Others –Prediction of queue wait times

20 SERC Research Seminar Day August 18, 2007 Motivation Certain large computational phases of climate modeling (CCSM) are done only by some processors Load balancing – offload work from these processors to other processors –Increased processor utilization –Decreased execution time How much offloading? –Need to predict workload based on previous computations

21 SERC Research Seminar Day August 18, 2007 What is happening… Proc 0Proc 1Proc 2Proc 3Proc 4 Phase 1 Phase 2

22 SERC Research Seminar Day August 18, 2007 What should happen… Proc 0Proc 1Proc 2Proc 3Proc 4 Phase 1 Phase 2 For this, we need to know the workload in phase 1 We predict the workload based on previous time steps

23 SERC Research Seminar Day August 18, 2007 Advantages

24 SERC Research Seminar Day August 18, 2007 GARLians Yadnyesh Joshi (M.Sc) Karthikeyan Raman (M.Tech, jointly with Prof. Govindarajan) H.A. Sanjay (Ph.D, jointly with Prof. Ravi Nanjundiah, CAOS) Sivagama Sundari (Ph.D) Ashish Srivatsava (Project Assistant) Alumni –1 student intern from INSA, Lyon, France –Summer interns –Project assistants –2 M.Scs

25 SERC Research Seminar Day August 18, 2007 Questions ???? http://garl.serc.iisc.ernet.in


Download ppt "SERC Research Seminar Day August 18, 2007 Predictions for Parallel Applications and Systems Sathish Vadhiyar Grid Applications Research Laboratory (GARL)"

Similar presentations


Ads by Google