Predictive Application- Performance Modeling in a Computational Grid Environment (HPDC ‘99) Nirav Kapadia, José Fortes, Carla Brodley ECE, Purdue Presented by Peter Dinda, CMU
2 Summary Use locally-weighted memory-based learning (instance-based learning) to predict each application run’s resource usage based on parameters specified by an application expert and measurements of previous application runs. Surprising result: simplest is best Implemented in the PUNCH system
3 Outline PUNCH Resource usage and application parameters Locally-weighted, memory-based learning Synthetic datasets argue for a sophisticated approach Algorithm optimizations in PUNCH Datasets from a real application argue for a mind-numbingly simple approach
4 PUNCH “Purdue University Network Computing Hub” Web-based batch-oriented system for accessing non-interactive tools –Tool-specific forms guide user in setting up a run command-line parameters, input and output files –PUNCH schedules run on shared resources –Extensively used: 500 users, 135K runs Mostly students taking ECE classes Wide range of tools (over 40) –Paper focuses on T-Supreme3 Simulates silicon fabrication –Really bad ideas: batch-oriented matlab
5 Resource Usage PUNCH needs to know resource usage (CPU time) to schedule run Resource usage depends on application-specific parameters –command-line and input file parameters Which ones? Specified by app expert –7 parameters for T-Supreme3 What is the relationship? Learn it on- line using locally-weighted memory- based learning
6 Locally-weighted Memory-based Learning Each time you run the application, record the parameter values and the resource usage in a database –Parameter values x -> resource usage y is function to be learned –Parameter values x define a point in domain Predict resource usage y q of a new run whose parameters are x q based on database records x i ->y i where the x i are “close” to x q
7 Answering a Query –Compute distance d from query point x q to all points x i in database –Select subset of points within some distance (the neighborhood k w ) –Transform distances to neighborhood points into weights using a kernel function K (Gaussian, say) –Fit a local model that tries to minimize the weighted sum of squared errors for the neighborhood linear regression, ad hoc, mind-numbingly simple,... –Apply the model to the query
8 PUNCH Approaches I don’t understand their distance metric Kernel is 1.0 to nearest neighbor and then Gaussian 1-Nearest-Neighbor –Return the nearest neighbor 3-Point Weighted Average –Return weighted average of 3 nearest points Linear regression –16 nearest points for T-Supreme3 –Theoretically much better than the others
9 Optimizations 2-level database –Recent runs are preferred Not clear how –May help when function is time dependent when all students are doing the same homework –Significantly reduces query time Instance editing –Add new runs only if incorrectly predicted –Remove runs that produce incorrect predictions –Shrink database without losing information
10 Conclusions LWMBL looks like a promising approach to resource usage prediction in some cases Needs a much more thorough study, though, even for this batch-oriented use –Simplest is best is difficult to believe Paper is a reasonable introduction to LWMBL for the grid community