INSTANCE-BASED LEARNING ALGORITHMS Presented by Yan T. Yang.

Slides:



Advertisements
Similar presentations
1 Classification using instance-based learning. 3 March, 2000Advanced Knowledge Management2 Introduction (lazy vs. eager learning) Notion of similarity.
Advertisements

Computational Learning An intuitive approach. Human Learning Objects in world –Learning by exploration and who knows? Language –informal training, inputs.
DECISION TREES. Decision trees  One possible representation for hypotheses.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Support Vector Machines
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Identifying Competence-Critical Instances for Instance-Based Learners Presenter: Kyu-Baek Hwang.
Instance Based Learning
K nearest neighbor and Rocchio algorithm
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
CS292 Computational Vision and Language Pattern Recognition and Classification.
Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
Instance Based Learning
Instance Based Learning. Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return the answer associated.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Instance Based Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán) 1.
INSTANCE-BASE LEARNING
Introduction. 1.Data Mining and Knowledge Discovery 2.Data Mining Methods 3.Supervised Learning 4.Unsupervised Learning 5.Other Learning Paradigms 6.Introduction.
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor.
Nearest Neighbor Classifiers other names: –instance-based learning –case-based learning (CBL) –non-parametric learning –model-free learning.
CS Instance Based Learning1 Instance Based Learning.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Learning: Nearest Neighbor Artificial Intelligence CMSC January 31, 2002.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CSC 196k Semester Project: Instance Based Learning
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 9 Instance-Based.
Today Ensemble Methods. Recap of the course. Classifier Fusion
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Visual Information Systems Recognition and Classification.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
CpSc 881: Machine Learning Instance Based Learning.
CpSc 810: Machine Learning Instance Based Learning.
COMP 2208 Dr. Long Tran-Thanh University of Southampton K-Nearest Neighbour.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
CS Machine Learning Instance Based Learning (Adapted from various sources)
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Kansas State University Department of Computing and Information Sciences CIS 890: Special Topics in Intelligent Systems Wednesday, November 15, 2000 Cecil.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.5: Instance-based Learning Rodney Nielsen Many / most of these slides were adapted.
Machine Learning: Ensemble Methods
General-Purpose Learning Machine
Instance Based Learning
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Classification Nearest Neighbor
K Nearest Neighbors and Instance-based methods
Instance Based Learning (Adapted from various sources)
K Nearest Neighbor Classification
Data Mining Practical Machine Learning Tools and Techniques
Learning with Identification Trees
Classification Nearest Neighbor
یادگیری بر پایه نمونه Instance Based Learning Instructor : Saeed Shiry
Instance Based Learning
COSC 4335: Other Classification Techniques
Chap 8. Instance Based Learning
Nearest Neighbors CSC 576: Data Mining.
CSE4334/5334 Data Mining Lecture 7: Classification (4)
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Data Mining CSCI 307, Spring 2019 Lecture 11
Presentation transcript:

INSTANCE-BASED LEARNING ALGORITHMS Presented by Yan T. Yang

Agenda Background what is instance-based learning? Two simple algorithms Extensions [Aha, 1994]: Feedback algorithm Noise reduction Irrelevant attribute elimination Novel attribute adoption

Learning Paradigms Cognitive psychology: how people/animals/ machines learn? Jerome Bruner Two schools of thoughts: [Bruner, Goodnow and Austin 1967] Abstraction-based: Form a generalized idea from the examples, then use it to classify new objects.

Learning Paradigms Cognitive psychology: how people/animals/ machines learn? Jerome Bruner Two schools of thoughts: [Bruner, Goodnow and Austin 1967] Abstraction-based: Examples: Artificial Neural Network, Support Vector Machine, Rule based learner/decision trees: If not animated… then not an animal

Learning Paradigms Cognitive psychology: how people/animals/ machines learn? Jerome Bruner Two schools of thoughts: [Bruner, Goodnow and Austin 1967] Instance-based: Store all (suitable) training examples, compare new objects to the examples.

Comparison Between Two Paradigms Abstraction Based Generalization: Rules Discriminant planes or functions Trees Workload is during training time Little work during query time Instance Based Store (suitable) examples Saved instances Workload is during query time Little work during training time

Instance-based Learning Training Set Example [Aha, 1994]: Attributes – is enrolled, has MS degree, and is married (, PhD student) (, not PhD student) (, PhD student)

Instance-based Learning Training Set Instance-based learning Algorithm Concept Description

Instance-based Learning Training Set Instance-based learning Algorithm Concept Description Similarity Function

Instance-based Learning Training Set Instance-based learning Algorithm Concept Description Similarity Function Classification Function

Instance-based Learning Algorithm Input: Training set Output: Concept Description Similarity function Classification function Optional: Keep track of each concept description instances correct and incorrect rates Concept Description Adder Concept Description Remover

Instance-based Learning Algorithm Advantages and disadvantages [Mitchell, 1997] Advantages: Training is very fast Learn complex class membership Do not lose information Disadvantages: Slow at query time Easily fooled by irrelevant attributes

Instance-based Learning Algorithm Example IBL1: Assign the class of the most similar concept description instance to the new instance. Nearest neighbor Save all training instances in concept description CD= concept description

Instance-based Learning Algorithm Example IBL1: –Assign the class of the most similar concept description instance to the new instance. –Nearest neighbor –Save all training instances in concept description Voronoi Tessellation Training data

Instance-based Learning Algorithm Example IBL2: Similar to IBL1: nearest neighbor Save only incorrectly classified instances in training set: Intuition: These are nearly always lies in the boundary between two classes. So, only if these are fully saved, the rest which are far from boundaries, can be easily deduced by using the similarity function [Karadeniz,1996] CD= concept description

Criticisms Mainly because of Nearest Neighbor Algorithms as the basis: [Brieman, Friedman, Olshen and Stone, 1984 ] 1. They are expensive due to their storage 2. They are sensitive to the choice of the similarity function 3. They cannot easily work with missing attribute values 4. They cannot easily work with nominal attributes 5. They do not yield concise summaries of concepts

Criticisms Mainly because of Nearest Neighbor Algorithms as the basis: [Brieman, Friedman, Olshen and Stone, 1984 ] 1. They are expensive due to their storage 2. They are sensitive to the choice of the similarity function 3. They cannot easily work with missing attribute values 4. They cannot easily work with nominal attributes 5. They do not yield concise summaries of concepts [Aha, 1992] –IBL2 rectifies 1. –Extensions (following slides) rectifies 1,2,3. –[Stanfill and Waltz, 1986] rectifies 4. –[Salzberg, 1990] rectifies 5.

Extension: Filtering Noisy Training Instances (IBL3) Modification: 1. Maintain classification records 2. Only significantly good instances are saved; and 3. Discard noisy saved instance (i.e. those instances with significantly poor classification performance)

Extension: Filtering Noisy Training Instances (IBL3) ComponentIBL2IBL3 Similarity FunctionEuclidean distance Classification Function Nearest acceptable neighbor Concept Description Updater - Save only misclassified instances -Use only significantly good saved instances -Remove significantly bad saved instances

Extension: Filtering Noisy Training Instances (IBL3) Signficantly good or bad: use statistical confidence intervals (CI). construct CI for the current instances classification accuracy. construct CI for its classs current observed relative frequency. Class frequency Classification accuracy Significantly good

Extension: Filtering Noisy Training Instances (IBL3) Signficantly good or bad: use statistical confidence intervals (CI). construct CI for the current instances classification accuracy. construct CI for its classs current observed relative frequency. Class frequency Classification accuracy Significantly bad

Extension: Filtering Noisy Training Instances (IBL3) Signficantly good or bad: use statistical confidence intervals (CI). construct CI for the current instances classification accuracy. construct CI for its classs current observed relative frequency. [Hogg and Tanis, 1983]

Extension: Tolerate irrelevant attributes (IBL4) IBL1-IBL3: Assume all attributes have equal relevance ; Real World: some attributes are more discriminative than others; Irrelevant attributes cause poor performance.

Extension: Tolerate irrelevant attributes (IBL4) Regular similarity measure (Euclidean Distance) IBL4s similarity measure (Euclidean Distance) Concept-dependent: sim(animal, tiger, cat) > sim(pet, tiger, cat)

Extension: Tolerate irrelevant attributes (IBL4) IBL4s similarity measure (Euclidean Distance)

Extension: Tolerate irrelevant attributes (IBL4) IBL4s similarity measure (Euclidean Distance)

Extension: Tolerate novel attributes (IBL5) (IBL1– IBL4) assume: all attributes are known a priori to the training process; Everyday situations: instances may not initially described by all possible attributes; Missing value: a different issue. 1) assigning dont know; 2) assigning the most probable value; 3) assigning all possible values [Gams and Lavrac, 1987]

Extension: Tolerate novel attributes (IBL5) Extension (IBL5): allow novel attributes introduced late in the training process (extra: handle missing values in a novel way) IBL4s similarity measure (Euclidean Distance) IBL5s similarity measure (Euclidean Distance)

Extension: Tolerate novel attributes (IBL5) Extension (IBL5): allow novel attributes introduced late in the training process (extra: handle missing values in a novel way) IBL5s similarity measure (Euclidean Distance)

Results IB = instance based learning (IBL)

Results

Thanks Q and A