Instance-Based Learning

Slides:

Advertisements

Similar presentations

Computational Learning Theory

Advertisements

Artificial Neural Networks

Concept Learning and the General-to-Specific Ordering

1 Classification using instance-based learning. 3 March, 2000Advanced Knowledge Management2 Introduction (lazy vs. eager learning) Notion of similarity.

DECISION TREES. Decision trees  One possible representation for hypotheses.

Machine Learning Instance Based Learning & Case Based Reasoning Exercise Solutions.

K-means method for Signal Compression: Vector Quantization

1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.

Preventing Overfitting Problem: We don’t want to these algorithms to fit to ``noise’’ The generated tree may overfit the training data –Too many branches,

Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수

Instance Based Learning

1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)

Lazy vs. Eager Learning Lazy vs. eager learning

1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.1 Introduction Instance-Based Learning: Local approximation to the.

Classification and Decision Boundaries

Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)

Instance Based Learning

Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Lezione 8 - Instance based learning Prof. Giancarlo.

K nearest neighbor and Rocchio algorithm

Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.

Instance Based Learning

Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.

Case-based Reasoning System (CBR)

These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.

Aprendizagem baseada em instâncias (K vizinhos mais próximos)

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Instance Based Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán) 1.

INSTANCE-BASE LEARNING

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor.

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.

CS Instance Based Learning1 Instance Based Learning.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

1 Data Mining Lecture 5: KNN and Bayes Classifiers.

11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.

 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 9 Instance-Based.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.

1 Instance Based Learning Ata Kaban The University of Birmingham.

Instance Based Learning

CpSc 881: Machine Learning Instance Based Learning.

CpSc 810: Machine Learning Instance Based Learning.

Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.

Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Tuesday, November 23, 1999.

KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Data Mining and Decision Support

Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm.

CS Machine Learning Instance Based Learning (Adapted from various sources)

K-Nearest Neighbor Learning.

Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.

Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.

Kansas State University Department of Computing and Information Sciences CIS 890: Special Topics in Intelligent Systems Wednesday, November 15, 2000 Cecil.

Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

CS 8751 ML & KDDInstance Based Learning1 k-Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning.

1 Instance Based Learning Soongsil University Intelligent Systems Lab.

1 Instance Based Learning Soongsil University Intelligent Systems Lab.

Instance Based Learning

Data Mining: Concepts and Techniques (3rd ed

Instance Based Learning (Adapted from various sources)

K Nearest Neighbor Classification

Nearest-Neighbor Classifiers

یادگیری بر پایه نمونه Instance Based Learning Instructor : Saeed Shiry

Neuro-Computing Lecture 4 Radial Basis Function Network

Instance Based Learning

COSC 4335: Other Classification Techniques

Chap 8. Instance Based Learning

Machine Learning: UNIT-4 CHAPTER-1

FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor

Presentation transcript:

Instance-Based Learning

Content Motivation k-Nearest Neighbour Learning (kNN) Eager Learning Lazy Learning Instance-Based Learning k-Nearest Neighbour Learning (kNN) Distance-Weighted k-NN Locally Weighted Regression (LWR) Case-Based Reasoning (CBR) Summary

Motivation: Eager Learning THE LEARNING TASK: Try to approximate a target function through a hypothesis on the basis of training examples EAGER Learning: As soon as the training examples and the hypothesis space are received the search for the first hypothesis begins Training phase: given: training examples hypothesis space H search: best hypothesis Processing phase: for every new instance return Examples

Motivation: Lazy Algorithms Training examples are stored and sleeping Generalisation beyond these examples is postponed till new instances must be classified Every time a new query instance is encountered, its relationship to the previously stored examples is examined in order to compute the value of the target function for this new instance

Motivation: Instance-Based Learning Instance-Based Algorithms can establish a new local approximation for every new instance Training phase: given: training sample Processing phase: given: instance search: best local hypothesis return Examples: Nearest Neighbour Algorithm Distance Weighted Nearest Neighbour Locally Weighted Regression ....

Motivation: Instance-Based Learning 2 How are the instances represented? How can we measure the similarity of the instances? How can be computed?

Nearest Neighbour Algorithm IDEA: All instances correspond to the points in the n-dimensional space . Assign the value of the next, neighboured instance to the new instance REPRESENTATION: Let be an instance, where denotes the value of the r-th attribute of an instance x TARGET FUNCTION: Discrete valued or real valued

Nearest Neighbour Algorithm 2 HOW IS THE NEAREST NEIGHBOUR DEFINED : Metric as similarity measure Minkowski Norm: where Euclidean distance: This algorithm never forms an explicit general hypothesis regarding the target function f

Nearest Neighbour Algorithm 3 HOW IS FORMED? Discrete target function: where V: set of s classes Continuous target function: Let the next neighbour of ==>

k-Nearest Neighbour IDEA: If we choose k=1, then the algorithm assigns to the value where is the nearest training instance to For larger values of k the algorithm assigns the most common value among the k nearest training examples HOW CAN BE ESTABLISHED? where if and otherwise

k-Nearest Neighbour 2 Example: 1NN: 5-NN: Voronoi Diagram Voronoi Diagram: The decision surface is induced by a 1-Nearest Neighbour algorithm for a typical set of training examples. The convex surrounding of each training example indicates the region of query points whose classification will be completely determined by the training example.

k-Nearest Neighbour 3 REFINEMENT: The weights of the neighbours are taken into account relative to their distance to the query point. The farther a neighbour the less is its influence... where To accommodate the case where the query point exactly matches one of the training instances and the denominator therefore is zero, we assign to be in this case Distance-weight for real-valued target function:

Remarks on k-Nearest Neighbour Algorithm PROBLEM: The measurement of the distance between two instances considers every attribute. So even irrelevant attributes can influence the approximation. EXAMPLE: n =20 but only 2 attributes are relevant SOLUTION: Weight each attribute differently when calculating the distance between two neighbours: stretching the relevant axes in Euclidian space: shortening the axes that correspond to less relevant attributes lengthening the axes that correspond to more relevant attribute PROBLEM: Determine which weight belongs to which attribute automatically? Cross-validation Leave-one-out

Remarks on k-Nearest Neighbour Algorithm 2 ADVANTAGE: The training phase is processed very fast Can learn complex target function Robust to noisy training data Quite effective when a sufficiently large set of training data is provided Under very general conditions holds: where P is the probability of the error DISADVANTAGE: Alg. delays all processing until a new query is received => significant computation can be required to process; efficient memory indexing Processing is slow Sensibility about escape of the dimensions BIAS: Inductive bias corresponds to an assumption that the classification of an instance will be most similar to the classification of other instances that are nearby in Euclidean distance

Locally Weighted Regression IDEA: Generalization of Nearest Neighbour Alg. It constructs an explicit approximation to f over a local region surrounding . It uses nearby or distance-weighted training examples to form the local approximation to f. Local: The function is approximated based solely on the training data near the query point Weighted: The construction of each training example is weighted by its distance from the query point Regression: Means approximating a real-valued target function

Locally Weighted Regression PROCEDURE: Given a new query , construct an approximation that fits the training examples in the neighbourhood surrounding This approximation is used to calculate , which is as the estimated target value assigned to the query instance. The description of may change, because a different local approximation will be calculated for each instance

Locally Weighted Regression 2 PROCEDURE: Given new query , construct an approximation that fits the training examples in the surrounding neighbourhood How can be calculated? Linear function Quadratic function Multilayer neural network ... This approximation is used to calculate , which is the output of the estimated target value for the query instance . The description of may be deleted, because a different local approximation will be calculated for every distinct query instance

Locally Weighted Linear Regression Special case of LWR, simple computation LINEAR HYPOTHESIS SPACE: where the rth attribute of x, x variable of the hypotheses space Define the error criterion E in order to emphasize the fitting of the local training example Minimise the squared error over just k nearest neighbours: Minimise the squared error over the entire set D using some kernel function K to decrease this error based on the distance Combine and

Locally Weighted Linear Regression 2 The third error criterion is a good approximation to the second one and it has the advantage that computational costs are independent of the total number of training examples If is chosen and the gradient descent rule is rederived (see NN) the following training rule is obtained

Evaluation Locally Weighted Regression ADVANTAGE Pointwise approximation of a complex target function Earlier data has no influence on the new ones DISADVANTAGE The quality of the result depends on Choice of the function Choice of the kernel function K Choice of the hypothesis space H Sensibility against the relevant and irrelevant attributes

Case-Based Reasoning (CBR) Instance-based methods and locally weighted regression: lazy learning; They classify new query instances by analysing similar instances and ignoring the very different ones They represent instances as real-valued points in an n-dimensional Euclidian space CBR: first two principles and instances are represented by using a richer symbolic description and the methods used to retrieval

Case-Based Reasoning 2 Given: a new case (instance) Search for relevant cases in the Case-Library Select the best one from them Derive a solution Evaluate the found solution Add the solved case in the Case-Library

Case-Based Reasoning 3 HOW ARE THE INSTANCES REPRESENTED? complex logical relational description Example ((user-complaint error53 on shutdown) (CPU-model Power PC) (operating-system Windows) (network connection PCIA) (memory 48meg) (installed-application Excel Netscape) (disk 1gig) (likely-causes ???)) HOW CAN THE SIMILARITY BE MEASURED? See Example CADET

CADET Prototype example of case based reasoning systems Assists in the conceptual design of simple mechanical devices, such as water faucets It uses a library containing approximately 75 previous designs and design fragments to suggest a conceptual design to meet the specifications of the new design Each instance <qualitative function, mechanical structure> is stored New design problem: Specify desired function Desired: Corresponding structure

CADET Example

CADET Example 2 Searches for subgraph isomorphisms between the two function graphs, so that parts of a case can be found to match parts of the design specification The system elaborates the original function specification graph in order to create functionally equivalent graphs that match still more cases It uses general knowledge about physical influences to create these elaborated function graphs: rewrite rule: x is a universally quantified variable Combination to gain new solution: based on the knowledge-based reasoning

Evaluation of CBR ADVANTAGE: DISADVANTAGE Formation of autonomous thinking systems ??? DISADVANTAGE Hierarchical system Memory indexing Syntactical similarity measurement Possibility of incompability between two neighboured cases -> impossible combination Evaluation of the recognised solution

Evaluation of Lazy Algorithms DIFFERENCE TO EAGER LEARNING Computational time less during the training phase longer during the classification Classification: training samples always remain obtained compute an instance specification approximation Generalization accuracy local approximations are computed Bias: consider the query instance when deciding how to generalize beyond the training data PROBLEMS: Efficiently labeling new instances Determining an appropriate distance measure Influence of irrelevant attributes

Summary Lazy learning: Delay processing of training examples until they must label a new query instance. The result is several local approximations. k-Nearest neighbour: An instance is a point in the n-dimensional Euclidean space. The target function value for a new query is estimated from the known values of the k nearest training examples. Locally weighted regression: Explicit local approximation to the target function is constructed for each query instance (form: constant, linear,...) Case-based reasoning: Instances are represented by complex logical description. A rich variety of methods is proposed for mapping from the training examples to the target function values for new instances.