Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm.

Slides:



Advertisements
Similar presentations
1 Classification using instance-based learning. 3 March, 2000Advanced Knowledge Management2 Introduction (lazy vs. eager learning) Notion of similarity.
Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
K-means method for Signal Compression: Vector Quantization
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Indian Statistical Institute Kolkata
Instance Based Learning
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Lazy vs. Eager Learning Lazy vs. eager learning
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.1 Introduction Instance-Based Learning: Local approximation to the.
Classification and Decision Boundaries
Instance Based Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
K nearest neighbor and Rocchio algorithm
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
Instance Based Learning
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Instance-Based Learning
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Decision Theory Naïve Bayes ROC Curves
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
Instance Based Learning IB1 and IBK Small section in chapter 20.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Instance Based Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán) 1.
INSTANCE-BASE LEARNING
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS Instance Based Learning1 Instance Based Learning.
INTRODUCTION TO Machine Learning 3rd Edition
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
Evaluating Classifiers
Classification II. 2 Numeric Attributes Numeric attributes can take many values –Creating branches for each value is not ideal The value range is usually.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
1 Instance Based Learning Ata Kaban The University of Birmingham.
Supervised Learning Approaches Bayesian Learning Neural Network Support Vector Machine Ensemble Methods Adapted from Lecture Notes of V. Kumar and E. Alpaydin.
CpSc 881: Machine Learning Instance Based Learning.
CpSc 810: Machine Learning Instance Based Learning.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
CS Machine Learning Instance Based Learning (Adapted from various sources)
K-Nearest Neighbor Learning.
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Kansas State University Department of Computing and Information Sciences CIS 890: Special Topics in Intelligent Systems Wednesday, November 15, 2000 Cecil.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
CS 8751 ML & KDDInstance Based Learning1 k-Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning.
1 Instance Based Learning Soongsil University Intelligent Systems Lab.
Supervise Learning. 2 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.”
KNN & Naïve Bayes Hongning Wang
ECE 471/571 - Lecture 19 Review 02/24/17.
Instance Based Learning
Instance Based Learning (Adapted from various sources)
Nearest-Neighbor Classifiers
Instance Based Learning
INTRODUCTION TO Machine Learning
Machine Learning: UNIT-4 CHAPTER-1
Foundations 2.
Presentation transcript:

Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm Recommendation

Introduction Local or instance-based learning divides into two simple steps: 1. Store all examples in the training set. 2. When a new example arrives, retrieve those examples similar to the new example and look at their classes. Disadvantages: Classification cost may be high; why? (we look for efficient indexing techniques). Irrelevant attributes may increase the distance of “truly” similar examples.

K-nearest neighbor To define how similar two examples are we need a metric. We assume all examples are points in an n-dimensional space R n and use the Euclidean distance: Let X i and X j be two examples. Their distance d(X i,X j ) is defined as d(X i, X j ) = Σ p [x ip – x jp ] 2 Where x ip is the value of attribute p on example X i.

K-nearest neighbor for discrete classes Example: K = 4 New example

Voronoi Diagram Decision surface induced by a 1-nearest neighbor. The decision surface is a combination of convex polyhedra surrounding each training example.

K-nearest neighbor for discrete classes Algorithm (parameter k) 1. For each training example (X,C(X)) add the example to our training list. 2. When a new example X q arrives, assign class: C(X q ) = majority voting on the k nearest neighbors of X q C(X q ) = argmax v Σ i δ(v, C(X i )) where δ(a,b) = 1 if a = b and 0 otherwise

Problems with k-nearest Neighbor  The distance between examples is based on all attributes. What if some attributes are irrelevant?  Consider the curse of dimensionality.  The larger the number of irrelevant attributes, the higher the effect on the nearest-neighbor rule. One solution is to use weights on the attributes. This is like stretching or contracting the dimensions on the input space. Ideally we would like to eliminate all irrelevant attributes.

Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm Recommendation

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Resampling and K-Fold Cross-Validation ► The need for multiple training/validation sets {X i,V i } i : Training/validation sets of fold i ► K-fold cross-validation: Divide X into k, X i,i=1,...,K ► T i share K-2 parts

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5×2 Cross-Validation ► 5 times 2 fold cross-validation (Dietterich, 1998)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11 Bootstrapping ► Draw instances from a dataset with replacement ► Prob that we do not pick an instance after N draws that is, only 36.8% is new!

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12 Measuring Error ► Error rate = # of errors / # of instances = (FN+FP) / N ► Recall = # of found positives / # of positives = TP / (TP+FN) = sensitivity = hit rate ► Precision = # of found positives / # of found = TP / (TP+FP) ► Specificity = TN / (TN+FP) ► False alarm rate = FP / (FP+TN) = 1 - Specificity

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) ROC Curve

Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm Recommendation

Algorithm Recommendation Algorithm Recommendation

How do we rank algorithms? How do we rank algorithms? Begin Characterize the new dataset T Identify k datasets in metadata T that are most similar to new dataset T Recommend ranking for new dataset T based on performance information from its nearest neighbors End Algorithm: k-NN ranking for algorithm recommendation

K-nearest neighbor approach K-nearest neighbor approach

For Prediction: Average Ranking For Prediction: Average Ranking R (a) = Σ i R(a,i) / k K is the no. of target rankings or no. of neighbors.

Example Example Classification algorithms bC5 Boosted decision trees (C5.0) C5r Decision tree-based rule set (C5.0) C5t Decision tree (C5.0) IB1 1-nearest neighbor (MLC++) LD Linear discriminant Lt Decision trees with linear combination of attributes MLP Multilayer perceptron (Clementine) NB Naıve Bayes RBFN Radial basis function network (Clementine) RIP Rule sets (RIPPER)

Example Example The goal in this example is to predict the ranking of the algorithms on the letter dataset with, say, the 3-NN ranking method. First, the three nearest neighbors are identified based on the L1 distance on the space of metafeatures used. These neighbors are the datasets byzantine, isolet and pendigits. The corresponding target rankings as well as the Ri scores and the ranking obtained by aggregating them with the AR method are presented in the table. This ranking provides guidance concerning the experiments to be carried out. It recommends the execution of bC5 and IB1 before all the others, then of Lt, and so on.

Example Example Example of a ranking predicted with 3-NN for the letter dataset, based on datasets byzantine, isolet and pendigits, and the corresponding target ranking Ranking bC5 C5r C5t MLP RBFN LD Lt IB1 NB RIP byzantine isolet pendigits ¯ Ri predicted target

Assessing Ranking Accuracy Assessing Ranking Accuracy We can measure the similarity between predicted and target rankings using Spearman’s rank correlation coefficient: Rs = 1 - [ 6 Σ (Ri’ – Ri) 2 / n 3 – n ] Values go from -1 (no correlation) to 1 (perfect correlation)

Example Example Accuracy of the default ranking on the letter dataset Ranking bC5 C5r C5t MLP RBFN LD Lt IB1 NB RIP default target (Ri’ − Ri) Rs = 0.879