Predictive Application- Performance Modeling in a Computational Grid Environment (HPDC ‘99) Nirav Kapadia, José Fortes, Carla Brodley ECE, Purdue Presented.

Slides:



Advertisements
Similar presentations
1 Classification using instance-based learning. 3 March, 2000Advanced Knowledge Management2 Introduction (lazy vs. eager learning) Notion of similarity.
Advertisements

Nonparametric Methods: Nearest Neighbors
ECG Signal processing (2)
Mauro Sozio and Aristides Gionis Presented By:
Support Vector Machines
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
8 - 1 Multivariate Linear Regression Chapter Multivariate Analysis Every program has three major elements that might affect cost: – Size » Weight,
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.1 Introduction Instance-Based Learning: Local approximation to the.
Resource Prediction Based on Double Exponential Smoothing in Cloud Computing Authors: Jinhui Huang, Chunlin Li, Jie Yu The International Conference on.
ECE 8527 Homework Final: Common Evaluations By Andrew Powell.
Regression. So far, we've been looking at classification problems, in which the y values are either 0 or 1. Now we'll briefly consider the case where.
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
Paper Discussion: “Simultaneous Localization and Environmental Mapping with a Sensor Network”, Marinakis et. al. ICRA 2011.
Kernel methods - overview
K nearest neighbor and Rocchio algorithm
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
Instance Based Learning
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Operating Systems Simulator Jessica Craddock Kelvin Whyms CPSC 410.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
INSTANCE-BASE LEARNING
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
CS Instance Based Learning1 Instance Based Learning.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Item-based Collaborative Filtering Recommendation Algorithms
Machine Learning CS 165B Spring 2012
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
Efficient Model Selection for Support Vector Machines
© 2006 IBM Corporation Adaptive Self-Tuning Memory in DB2 Adam Storm, Christian Garcia-Arellano, Sam Lightstone – IBM Toronto Lab Yixin Diao, M. Surendra.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
EM and expected complete log-likelihood Mixture of Experts
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 Converting Categories to Numbers for Approximate Nearest Neighbor Search 嘉義大學資工系 郭煌政 2004/10/20.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Exploiting Group Recommendation Functions for Flexible Preferences.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Visual-FIR: A new platform for modeling and prediction of dynamical Systems Antoni Escobet 1 Àngela Nebot 2 François E. Cellier 3 1 Dept. ESAII Universitat.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
CS 8751 ML & KDDInstance Based Learning1 k-Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning.
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
ItemBased Collaborative Filtering Recommendation Algorithms 1.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Roberto Battiti, Mauro Brunato
Spatio-temporal Pattern Queries
K Nearest Neighbor Classification
Chap 8. Instance Based Learning
ITEM BASED COLLABORATIVE FILTERING RECOMMENDATION ALGORITHEMS
Facultad de Ingeniería, Centro de Cálculo
Presentation transcript:

Predictive Application- Performance Modeling in a Computational Grid Environment (HPDC ‘99) Nirav Kapadia, José Fortes, Carla Brodley ECE, Purdue Presented by Peter Dinda, CMU

2 Summary Use locally-weighted memory-based learning (instance-based learning) to predict each application run’s resource usage based on parameters specified by an application expert and measurements of previous application runs. Surprising result: simplest is best Implemented in the PUNCH system

3 Outline PUNCH Resource usage and application parameters Locally-weighted, memory-based learning Synthetic datasets argue for a sophisticated approach Algorithm optimizations in PUNCH Datasets from a real application argue for a mind-numbingly simple approach

4 PUNCH “Purdue University Network Computing Hub” Web-based batch-oriented system for accessing non-interactive tools –Tool-specific forms guide user in setting up a run command-line parameters, input and output files –PUNCH schedules run on shared resources –Extensively used: 500 users, 135K runs Mostly students taking ECE classes Wide range of tools (over 40) –Paper focuses on T-Supreme3 Simulates silicon fabrication –Really bad ideas: batch-oriented matlab

5 Resource Usage PUNCH needs to know resource usage (CPU time) to schedule run Resource usage depends on application-specific parameters –command-line and input file parameters Which ones? Specified by app expert –7 parameters for T-Supreme3 What is the relationship? Learn it on- line using locally-weighted memory- based learning

6 Locally-weighted Memory-based Learning Each time you run the application, record the parameter values and the resource usage in a database –Parameter values x -> resource usage y is function to be learned –Parameter values x define a point in domain Predict resource usage y q of a new run whose parameters are x q based on database records x i ->y i where the x i are “close” to x q

7 Answering a Query –Compute distance d from query point x q to all points x i in database –Select subset of points within some distance (the neighborhood k w ) –Transform distances to neighborhood points into weights using a kernel function K (Gaussian, say) –Fit a local model that tries to minimize the weighted sum of squared errors for the neighborhood linear regression, ad hoc, mind-numbingly simple,... –Apply the model to the query

8 PUNCH Approaches I don’t understand their distance metric Kernel is 1.0 to nearest neighbor and then Gaussian 1-Nearest-Neighbor –Return the nearest neighbor 3-Point Weighted Average –Return weighted average of 3 nearest points Linear regression –16 nearest points for T-Supreme3 –Theoretically much better than the others

9 Optimizations 2-level database –Recent runs are preferred Not clear how –May help when function is time dependent when all students are doing the same homework –Significantly reduces query time Instance editing –Add new runs only if incorrectly predicted –Remove runs that produce incorrect predictions –Shrink database without losing information

10 Conclusions LWMBL looks like a promising approach to resource usage prediction in some cases Needs a much more thorough study, though, even for this batch-oriented use –Simplest is best is difficult to believe Paper is a reasonable introduction to LWMBL for the grid community