Presentation is loading. Please wait.

Presentation is loading. Please wait.

Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm.

Similar presentations


Presentation on theme: "Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm."— Presentation transcript:

1 Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm Recommendation

2 Introduction Local or instance-based learning divides into two simple steps: 1. Store all examples in the training set. 2. When a new example arrives, retrieve those examples similar to the new example and look at their classes. Disadvantages: Classification cost may be high; why? (we look for efficient indexing techniques). Irrelevant attributes may increase the distance of “truly” similar examples.

3 K-nearest neighbor To define how similar two examples are we need a metric. We assume all examples are points in an n-dimensional space R n and use the Euclidean distance: Let X i and X j be two examples. Their distance d(X i,X j ) is defined as d(X i, X j ) = Σ p [x ip – x jp ] 2 Where x ip is the value of attribute p on example X i.

4 K-nearest neighbor for discrete classes Example: K = 4 New example

5 Voronoi Diagram Decision surface induced by a 1-nearest neighbor. The decision surface is a combination of convex polyhedra surrounding each training example.

6 K-nearest neighbor for discrete classes Algorithm (parameter k) 1. For each training example (X,C(X)) add the example to our training list. 2. When a new example X q arrives, assign class: C(X q ) = majority voting on the k nearest neighbors of X q C(X q ) = argmax v Σ i δ(v, C(X i )) where δ(a,b) = 1 if a = b and 0 otherwise

7 Problems with k-nearest Neighbor  The distance between examples is based on all attributes. What if some attributes are irrelevant?  Consider the curse of dimensionality.  The larger the number of irrelevant attributes, the higher the effect on the nearest-neighbor rule. One solution is to use weights on the attributes. This is like stretching or contracting the dimensions on the input space. Ideally we would like to eliminate all irrelevant attributes.

8 Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm Recommendation

9 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Resampling and K-Fold Cross-Validation ► The need for multiple training/validation sets {X i,V i } i : Training/validation sets of fold i ► K-fold cross-validation: Divide X into k, X i,i=1,...,K ► T i share K-2 parts

10 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5×2 Cross-Validation ► 5 times 2 fold cross-validation (Dietterich, 1998)

11 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11 Bootstrapping ► Draw instances from a dataset with replacement ► Prob that we do not pick an instance after N draws that is, only 36.8% is new!

12 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12 Measuring Error ► Error rate = # of errors / # of instances = (FN+FP) / N ► Recall = # of found positives / # of positives = TP / (TP+FN) = sensitivity = hit rate ► Precision = # of found positives / # of found = TP / (TP+FP) ► Specificity = TN / (TN+FP) ► False alarm rate = FP / (FP+TN) = 1 - Specificity

13 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) ROC Curve

14 Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm Recommendation

15 Algorithm Recommendation Algorithm Recommendation

16 How do we rank algorithms? How do we rank algorithms? Begin Characterize the new dataset T Identify k datasets in metadata T that are most similar to new dataset T Recommend ranking for new dataset T based on performance information from its nearest neighbors End Algorithm: k-NN ranking for algorithm recommendation

17 K-nearest neighbor approach K-nearest neighbor approach

18 For Prediction: Average Ranking For Prediction: Average Ranking R (a) = Σ i R(a,i) / k K is the no. of target rankings or no. of neighbors.

19 Example Example Classification algorithms bC5 Boosted decision trees (C5.0) C5r Decision tree-based rule set (C5.0) C5t Decision tree (C5.0) IB1 1-nearest neighbor (MLC++) LD Linear discriminant Lt Decision trees with linear combination of attributes MLP Multilayer perceptron (Clementine) NB Naıve Bayes RBFN Radial basis function network (Clementine) RIP Rule sets (RIPPER)

20 Example Example The goal in this example is to predict the ranking of the algorithms on the letter dataset with, say, the 3-NN ranking method. First, the three nearest neighbors are identified based on the L1 distance on the space of metafeatures used. These neighbors are the datasets byzantine, isolet and pendigits. The corresponding target rankings as well as the Ri scores and the ranking obtained by aggregating them with the AR method are presented in the table. This ranking provides guidance concerning the experiments to be carried out. It recommends the execution of bC5 and IB1 before all the others, then of Lt, and so on.

21 Example Example Example of a ranking predicted with 3-NN for the letter dataset, based on datasets byzantine, isolet and pendigits, and the corresponding target ranking Ranking bC5 C5r C5t MLP RBFN LD Lt IB1 NB RIP byzantine 2 6 7 10 9 5 4 1 3 8 isolet 2 5 7 10 9 1 6 4 3 8 pendigits 2 4 6 7 10 8 3 1 9 5 ¯ Ri 2.0 5.0 6.7 9.0 9.3 4.7 4.3 2.0 5.0 7.0 predicted 1 5 7 9 10 4 3 1 5 8 target 1 3 5 7 10 8 4 2 9 6

22 Assessing Ranking Accuracy Assessing Ranking Accuracy We can measure the similarity between predicted and target rankings using Spearman’s rank correlation coefficient: Rs = 1 - [ 6 Σ (Ri’ – Ri) 2 / n 3 – n ] Values go from -1 (no correlation) to 1 (perfect correlation)

23 Example Example Accuracy of the default ranking on the letter dataset Ranking bC5 C5r C5t MLP RBFN LD Lt IB1 NB RIP default 1 2 4 7 10 8 3 6 9 5 target 1 3 5 7 10 8 4 2 9 6 (Ri’ − Ri) 2 0 1 1 0 0 0 1 16 0 1 Rs = 0.879


Download ppt "Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm."

Similar presentations


Ads by Google