Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Tuesday, November 23, 1999.

Slides:



Advertisements
Similar presentations
1 Classification using instance-based learning. 3 March, 2000Advanced Knowledge Management2 Introduction (lazy vs. eager learning) Notion of similarity.
Advertisements

Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
Instance Based Learning
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Lazy vs. Eager Learning Lazy vs. eager learning
Statistical Classification Rong Jin. Classification Problems X Input Y Output ? Given input X={x 1, x 2, …, x m } Predict the class label y  Y Y = {-1,1},
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.1 Introduction Instance-Based Learning: Local approximation to the.
Classification and Decision Boundaries
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Instance Based Learning
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Lezione 8 - Instance based learning Prof. Giancarlo.
K nearest neighbor and Rocchio algorithm
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.
Learning from Observations Chapter 18 Section 1 – 4.
Instance Based Learning
Instance-Based Learning
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Instance Based Learning IB1 and IBK Small section in chapter 20.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Instance Based Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán) 1.
INSTANCE-BASE LEARNING
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor.
CS Instance Based Learning1 Instance Based Learning.
Radial Basis Function Networks
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 9 Instance-Based.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.
1 Instance Based Learning Ata Kaban The University of Birmingham.
CpSc 881: Machine Learning Instance Based Learning.
CpSc 810: Machine Learning Instance Based Learning.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Thursday, 05 April 2007 William.
Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 22 February 2008 William.
CS Machine Learning Instance Based Learning (Adapted from various sources)
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Kansas State University Department of Computing and Information Sciences CIS 890: Special Topics in Intelligent Systems Wednesday, November 15, 2000 Cecil.
CS 8751 ML & KDDInstance Based Learning1 k-Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning.
1 Instance Based Learning Soongsil University Intelligent Systems Lab.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Data Mining: Concepts and Techniques (3rd ed
Data Mining Lecture 11.
Instance Based Learning (Adapted from various sources)
K Nearest Neighbor Classification
یادگیری بر پایه نمونه Instance Based Learning Instructor : Saeed Shiry
Instance Based Learning
COSC 4335: Other Classification Techniques
Chap 8. Instance Based Learning
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Machine Learning: UNIT-4 CHAPTER-1
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Tuesday, November 23, 1999 William H. Hsu Department of Computing and Information Sciences, KSU Readings: Chapter 8, Mitchell Instance-Based Learning (IBL): k-Nearest Neighbor and Radial Basis Functions Lecture 25

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Lecture Outline Readings: Chapter 8, Mitchell Suggested Exercises: 8.3, Mitchell Next Week’s Paper Review (Last One!) –“An Approach to Combining Explanation-Based and Neural Network Algorithms”, Shavlik and Towell –Due Tuesday, 11/30/1999 k-Nearest Neighbor (k-NN) –IBL framework IBL and case-based reasoning Prototypes –Distance-weighted k-NN Locally-Weighted Regression Radial-Basis Functions Lazy and Eager Learning Next Lecture (Tuesday, 11/30/1999): Rule Learning and Extraction

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Instance-Based Learning (IBL) Intuitive Idea –Store all instances –Given: query instance x q –Return: function (e.g., label) of closest instance in database of prototypes –Rationale Instance closest to x q tends to have target function close to f(x q ) Assumption can fail for deceptive hypothesis space or with too little data! Nearest Neighbor –First locate nearest training example x n to query x q –Then estimate k-Nearest Neighbor –Discrete-valued f: take vote among k nearest neighbors of x q –Continuous-valued f:

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Ideal Properties –Instances map to points in R n –Fewer than 20 attributes per instance –Lots of training data Advantages –Training is very fast –Learn complex target functions –Don’t lose information Disadvantages –Slow at query time –Easily fooled by irrelevant attributes When to Consider Nearest Neighbor

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Delaunay Triangulation Voronoi (Nearest Neighbor) Diagram Voronoi Diagram Training Data: Labeled Instances Query Instance x q ?

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning k-NN and Bayesian Learning: Behavior in the Limit Consider: Probability Distribution over Labels –Let p denote learning agent’s belief in the distribution of labels –p(x)  probability that instance x will be labeled 1 (positive) versus 0 (negative) –Objectivist view: as more evidence is collected, approaches “true probability” Nearest Neighbor –As number of training examples  , approaches behavior of Gibbs algorithm –Gibbs: with probability p(x) predict 1, else 0 k-Nearest Neighbor –As number of training examples   and k gets large, approaches Bayes optimal –Bayes optimal: if p(x) > 0.5 then predict 1, else 0 Recall: Property of Gibbs Algorithm – –Expected error of Gibbs no worse than twice that of Bayes optimal

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Distance-Weighted k-NN Intuitive Idea –Might want to weight nearer neighbors more heavily –Rationale Instances closer to x q tend to have target functions closer to f(x q ) Want benefit of BOC over Gibbs (k-NN for large k over 1-NN) Distance-Weighted Function –Weights are proportional to distance: –d(x q, x i ) is Euclidean distance –NB: now it makes sense to use all instead of just k  Shepard’s method Jargon from Statistical Pattern Recognition –Regression: approximating a real-valued target function –Residual: error –Kernel function: function K such that

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Curse of Dimensionality A Machine Learning Horror Story –Suppose Instances described by n attributes (x 1, x 2, …, x n ), e.g., n = 20 Only n’ << n are relevant, e.g., n’ = 2 –Horrors! Real KDD problems usually are this bad or worse… (correlated, etc.) –Curse of dimensionality: nearest neighbor learning algorithm is easily mislead when n large (i.e., high-dimension X) Solution Approaches –Dimensionality reducing transformations (e.g., SOM, PCA; see Lecture 15) –Attribute weighting and attribute subset selection Stretch jth axis by weight z j : (z 1, z 2, …, z n ) chosen to minimize prediction error Use cross-validation to automatically choose weights (z 1, z 2, …, z n ) NB: setting z j to 0 eliminates this dimension altogether See [Moore and Lee, 1994; Kohavi and John, 1997]

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Locally Weighted Regression Global versus Local Methods –Global: consider all training examples when estimating f(x q ) –Local: consider only examples within local neighborhood (e.g., k nearest) Locally Weighted Regression –Local method –Weighted: contribution of each training example is weighted by distance from x q –Regression: approximating a real-valued target function Intuitive Idea –k-NN forms local approximation to f(x) for each x q –Explicit approximation to f(x) for region surrounding x q –Fit parametric function : e.g., linear, quadratic (piecewise approximation) Choices of Error to Minimize –Sum squared error (SSE) over k-NN –Distance-weighted SSE over all neighbors

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning What Are RBF Networks? –Global approximation to target function f, in terms of linear combination of local approximations –Typical uses: image, signal classification –Different kind of artificial neural network (ANN) –Closely related to distance-weighted regression, but “eager” instead of “lazy” Activation Function –where a i (x) are attributes describing instance x and –Common choice for K u : Gaussian kernel function Radial Basis Function (RBF) Networks a1(x)a1(x)a2(x)a2(x) an(x)an(x) … 1 …

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning RBF Networks: Training Issue 1: Selecting Prototypes –What x u should be used for each kernel function K u (d(x u, x)) –Possible prototype distributions Scatter uniformly throughout instance space Use training instances (reflects instance distribution) Issue 2: Training Weights –Here, assume Gaussian K u –First, choose hyperparameters Guess variance, and perhaps mean, for each K u e.g., use EM –Then, hold K u fixed and train parameters Train weights in linear output layer Efficient methods to fit linear function

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Case-Based Reasoning (CBR) Symbolic Analogue of Instance-Based Learning (IBL) –Can apply IBL even when X  R n –Need different “distance” metric –Intuitive idea: use symbolic (e.g., syntactic) measures of similarity Example –Declarative knowledge base –Representation: symbolic, logical descriptions ((user-complaint rundll-error-on-shutdown) (system-model thinkpad-600-E) (cpu-model mobile-pentium-2) (clock-speed 366) (network-connection PC- MCIA-100-base-T) (memory 128-meg) (operating-system windows-98) (installed-applications office-97 MSIE-5) (disk-capacity 6-gigabytes)) (likely-cause ?)

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Case-Based Reasoning in CADET CADET: CBR System for Functional Decision Support [Sycara et al, 1992] –75 stored examples of mechanical devices –Each training example: –New query: desired function –Target value: mechanical structure for this function Distance Metric –Match qualitative functional descriptions –X  R n, so “distance” is not Euclidean even if it is quantitative

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Stored Case: T-Junction Pipe –Diagrammatic knowledge –Structure, function Problem Specification: Water Faucet –Desired function: –Structure: ? CADET: Example StructureFunction + QcQc QhQh TcTc ThTh QmQm TmTm CtCt CfCf Q3Q3 Q1Q1 Q2Q2 T3T3 T1T1 T2T Q 1, T 1 Q 2, T 2 Q 3, T 3 T = temperature Q = water flow

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning CADET: Properties Representation –Instances represented by rich structural descriptions –Multiple instances retreived (and combined) to form solution to new problem –Tight coupling between case retrieval and new problem Bottom Line –Simple matching of cases useful for tasks such as answering help-desk queries Compare: technical support knowledge bases –Retrieval issues for natural language queries: not so simple… User modeling in web IR, interactive help) –Area of continuing research

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Lazy and Eager Learning Lazy Learning –Wait for query before generalizing –Examples of lazy learning algorithms k-nearest neighbor (k-NN) Case-based reasoning (CBR) Eager Learning –Generalize before seeing query –Examples of eager learning algorithms Radial basis function (RBF) network training ID3, backpropagation, simple (Naïve) Bayes, etc. Does It Matter? –Eager learner must create global approximation –Lazy learner can create many local approximations –If they use same H, lazy learner can represent more complex functions –e.g., consider H  linear functions

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Terminology Instance Based Learning (IBL): Classification Based On Distance Measure –k-Nearest Neighbor (k-NN) Voronoi diagram of order k: data structure that answers k-NN queries x q Distance-weighted k-NN: weight contribution of k neighbors by distance to x q –Locally-weighted regression Function approximation method, generalizes k-NN Construct explicit approximation to target function f(  ) in neighborhood of x q –Radial-Basis Function (RBF) networks Global approximation algorithm Estimates linear combination of local kernel functions Case-Based Reasoning (CBR) –Like IBL: lazy, classification based on similarity to prototypes –Unlike IBL: similarity measure not necessarily distance metric Lazy and Eager Learning –Lazy methods: may consider query instance x q when generalizing over D –Eager methods: choose global approximation h before x q observed

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Summary Points Instance Based Learning (IBL) –k-Nearest Neighbor (k-NN) algorithms When to consider: few continuous valued attributes (low dimensionality) Variants: distance-weighted k-NN; k-NN with attribute subset selection –Locally-weighted regression: function approximation method, generalizes k-NN –Radial-Basis Function (RBF) networks Different kind of artificial neural network (ANN) Linear combination of local approximation  global approximation to f(  ) Case-Based Reasoning (CBR) Case Study: CADET –Relation to IBL –CBR online resource page: Lazy and Eager Learning Next Week –Rule learning and extraction –Inductive logic programming (ILP)