Instance Based Learning IB1 and IBK Find in text Early approach.

Slides:



Advertisements
Similar presentations
Computational Learning An intuitive approach. Human Learning Objects in world –Learning by exploration and who knows? Language –informal training, inputs.
Advertisements

Data Mining Classification: Alternative Techniques
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Indian Statistical Institute Kolkata
Instance Based Learning
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Instance Based Learning
Algorithms: The basic methods. Inferring rudimentary rules Simplicity first Simple algorithms often work surprisingly well Many different kinds of simple.
K nearest neighbor and Rocchio algorithm
x – independent variable (input)
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Linear Regression Demo using PolyAnalyst Generating Linear Regression Formula Generating Regression Rules for Categorical classification.
Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.
Instance Based Learning
Data Mining with Naïve Bayesian Methods
Machine Learning Group University College Dublin Nearest Neighbour Classifiers Lazy v’s Eager k-NN Condensed NN.
© 2002 by Prentice Hall 1 SI 654 Database Application Design Winter 2003 Dragomir R. Radev.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Decision Trees Chapter 18 From Data to Knowledge.
Instance Based Learning IB1 and IBK Small section in chapter 20.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Algorithms for Classification: The Basic Methods.
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor.
Nearest Neighbor Classifiers other names: –instance-based learning –case-based learning (CBL) –non-parametric learning –model-free learning.
CS Instance Based Learning1 Instance Based Learning.
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
Module 04: Algorithms Topic 07: Instance-Based Learning
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
K Nearest Neighborhood (KNNs)
Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering Chapter 4, Section 4.4.
Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior.
1 Instance Based Learning Ata Kaban The University of Birmingham.
W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef StefanNovember 2005.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
CSSE463: Image Recognition Day 11 Lab 4 (shape) tomorrow: feel free to start in advance Lab 4 (shape) tomorrow: feel free to start in advance Test Monday.
Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
Algorithms for Classification: The Basic Methods.
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
CpSc 881: Machine Learning Instance Based Learning.
CpSc 810: Machine Learning Instance Based Learning.
Most of contents are provided by the website Data Mining Essentials TJTSD66: Advanced Topics in Social.
W E K A Waikato Environment for Knowledge Aquisition.
Data Warehouse [ Example ] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001, ISBN Data Mining: Concepts and.
Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm.
Chapter 4: Algorithms CS 795. Inferring Rudimentary Rules 1R – Single rule – one level decision tree –Pick each attribute and form a single level tree.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Fundamentals, Design, and Implementation, 9/e KDD and Data Mining Instructor: Dragomir R. Radev Winter 2005.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.
Data and its Distribution. The popular table  Table (relation)  propositional, attribute-value  Example  record, row, instance, case  Table represents.
An Empirical Comparison of Supervised Learning Algorithms
Data Mining – Algorithms: Instance-Based Learning
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Instance Based Learning
Ch8: Nonparametric Methods
Classification Nearest Neighbor
Data Science Algorithms: The Basic Methods
Instance Based Learning (Adapted from various sources)
K Nearest Neighbor Classification
Classification Nearest Neighbor
Clustering.
Instance Based Learning
COSC 4335: Other Classification Techniques
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Instance Based Learning IB1 and IBK Find in text Early approach

1- Nearest Neighbor Basic distance function between attribute-values –If real, the absolute value –If nominal, d(v1,v2) = 1 if v1 \=v2, else 0. Distance between 2 instances is square root of sum of squares, i.e. euclidean distance –Square root of sum of squares –Sqrt( (x1-y1)^2 +….(xn-yn)^2 ) May normalize real-value distances for fairness amongst attributes.

Prediction or classification For instance x, let y be closest instance to x in training set. Predict class x is the class of y. On some data sets, best algorithm. In general, no best learning algorithm.

Voronoi Diagram

For each point, draw the boundary of all points closest to it. Each point’s sphere of influence in convex. If noisy, can be bad. /Delaunay.html - nice applet. /Delaunay.html

Problems and solutions Noise –Remove bad examples –Use voting Bad distance measure –Use probability class vector Memory –Remove unneeded examples

Voting schemes K nearest neighbor –Let all the closest k neighbors vote (use k odd) Kernel K(x,y) – a similarity function –Let everyone vote, with decreasing weight according to K(x,y) –Ex: K(x,y) = e^(-distance(x,y)^2) –Ex. K(x,y) = inner product of x and y –Ex K(x,y) = inner product of f(x) and f(y) where f is some mapping of x and y into R^n.

Choosing the parameter K Divide data into train and test Run multiple values of k on train Choose k that does best on test.

NOT This is a serious methological error You have used test data to pick the k. Common in commercial evaluation of systems Occasional in academic papers

Fix: Internal Cross-validation This can be used for selecting any parameter. Divide Data into Train and Test. Now do 10-fold CV on the training data to determine the appropriate value of k. Note: never touch the test data.

Probability Class Vector Let A be an attribute with values v1, v2,..vn Suppose class C1,C2,..Ck Prob Class Vector for vi is: Distance(vi,vj) = distance between probability class vectors.

Weather outlook {sunny, overcast, temperature humidity windy {TRUE, play {yes, sunny,85,85,FALSE,no sunny,80,90,TRUE,no overcast,83,86,FALSE,yes rainy,70,96,FALSE,yes rainy,68,80,FALSE,yes rainy,65,70,TRUE,no overcast,64,65,TRUE,yes sunny,72,95,FALSE,no sunny,69,70,FALSE,yes rainy,75,80,FALSE,yes sunny,75,70,TRUE,yes overcast,72,90,TRUE,yes overcast,81,75,FALSE,yes rainy,71,91,TRUE,no

Distance(sunny,rainy) =? = = prob class vector for sunny = Distance(sunny,rainy) = 1/5*sqrt(2). Similarly: distance(sunny,overcast) = d(, ) = 2/5*sqrt(2)

PCV If an attribute is irrelevant and v and v’ are values, then PCV(v) ~ PCV(v’) so the distance will be close to 0. This discounts irrelevant attributes. It also works for real-attributes, after binning. Binning is a way to make real-values symbolic. Simple break data into k bins, eg. K = 5 or 10 seems to work. Or use DTs.

Regression by NN If 1-NN, use value of nearest example If k-nn, interpolate values of k nearest neighbors. Kernel methods work to. You avoid choice of k, but hide it in choice of kernel function.

Summary NN works for multi-class and regression. Sometimes called “poor man’s neural net’’ With enough data, it achieves ½ the “bayes optimal” error rate. Mislead by bad examples and bad features. Separates classes via piecewise linear boundaries.