Classification Algorithms Covering, Nearest-Neighbour.

Slides:



Advertisements
Similar presentations
Rule-Based Classifiers. Rule-Based Classifier Classify records by using a collection of “if…then…” rules Rule: (Condition)  y –where Condition is a conjunctions.
Advertisements

Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Lazy vs. Eager Learning Lazy vs. eager learning
Classification Algorithms – Continued. 2 Outline  Rules  Linear Models (Regression)  Instance-based (Nearest-neighbor)
Classification Algorithms – Continued. 2 Outline  Rules  Linear Models (Regression)  Instance-based (Nearest-neighbor)
1 Input and Output Thanks: I. Witten and E. Frank.
Naïve Bayes: discussion
Primer Parcial -> Tema 3 Minería de Datos Universidad del Cauca.
Classification and Decision Boundaries
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
Instance-based representation
Instance Based Learning. Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return the answer associated.
Comparison of Instance-Based Techniques for Learning to Predict Changes in Stock Prices iCML Conference December 10, 2003 Presented by: David LeRoux.
Machine Learning: finding patterns. 2 Outline  Machine learning and Classification  Examples  *Learning as Search  Bias  Weka.
Data Mining Classification: Alternative Techniques
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
Knowledge Representation. 2 Outline: Output - Knowledge representation  Decision tables  Decision trees  Decision rules  Rules involving relations.
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.
CES 514 – Data Mining Lec 9 April 14 Mid-term k nearest neighbor.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Instance Based Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán) 1.
Nearest Neighbor Classifiers other names: –instance-based learning –case-based learning (CBL) –non-parametric learning –model-free learning.
Probabilistic techniques. Machine learning problem: want to decide the classification of an instance given various attributes. Data contains attributes.
Chapter 6 Decision Trees
Module 04: Algorithms Topic 07: Instance-Based Learning
1 Lazy Learning – Nearest Neighbor Lantz Ch 3 Wk 2, Part 1.
Issues with Data Mining
Decision Trees.
K Nearest Neighborhood (KNNs)
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Chapter 7: Transformations. Attribute Selection Adding irrelevant attributes confuses learning algorithms---so avoid such attributes Both divide-and-conquer.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.4: Covering Algorithms Rodney Nielsen Many.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
CpSc 881: Machine Learning Instance Based Learning.
CpSc 810: Machine Learning Instance Based Learning.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
COMP 2208 Dr. Long Tran-Thanh University of Southampton K-Nearest Neighbour.
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
CSCI 347, Data Mining Chapter 4 – Functions, Rules, Trees, and Instance Based Learning.
RULE-BASED CLASSIFIERS
Chapter 4: Algorithms CS 795. Inferring Rudimentary Rules 1R – Single rule – one level decision tree –Pick each attribute and form a single level tree.
CS Machine Learning Instance Based Learning (Adapted from various sources)
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 By Gun Ho Lee Intelligent Information Systems Lab Soongsil.
Data Science Algorithms: The Basic Methods
Data Mining – Algorithms: Instance-Based Learning
Classification Nearest Neighbor
Data Science Algorithms: The Basic Methods
K Nearest Neighbor Classification
Classification Nearest Neighbor
Nearest-Neighbor Classifiers
Classification Algorithms
COSC 4335: Other Classification Techniques
CSC 558 – Data Analytics II, Prep for assignment 1 – Instance-based (lazy) machine learning January 2018.
Nearest Neighbors CSC 576: Data Mining.
CSE4334/5334 Data Mining Lecture 7: Classification (4)
Data Mining CSCI 307, Spring 2019 Lecture 21
Data Mining CSCI 307, Spring 2019 Lecture 11
Presentation transcript:

Classification Algorithms Covering, Nearest-Neighbour

Outline  Covering algorithms  Instance-based (Nearest-neighbour)

Covering algorithms  Strategy for generating a rule set directly:  for each class in turn find rule set that covers all instances in it (excluding instances not in the class)  This approach is called a covering approach  at each stage a rule is identified that covers some of the instances

Example: generating a rule entire dataset

Example: generating a rule

a single rule describes a convex concept

Example: generating a rule  ‘ b ’ is not convex  need extra rules for else part  possible rule set for class ‘ b ’  more rules could be added for a ‘ perfect ’ rule set

Rules vs. Trees  Corresponding decision tree: (produces exactly the same predictions)

A simple covering algorithm: PRISM  Generate a rule by adding tests that maximize a rule ’ s accuracy  Each additional test reduces a rule ’ s coverage:

Selecting a test  Goal: maximize accuracy  t total number of instances covered by rule  p positive examples of the class covered by rule  t – p number of errors made by rule  Select test that maximizes the ratio p/t  We are finished when p/t = 1 or the set of instances cannot be split any further

agespectacle prescriptionastigmatismtear production raterecommendation youngmyopenoreducednone youngmyopenonormalsoft youngmyopeyesreducednone youngmyopeyesnormalhard younghypermetropenoreducednone younghypermetropenonormalsoft younghypermetropeyesreducednone younghypermetropeyesnormalhard pre-presbyopicmyopenoreducednone pre-presbyopicmyopenonormalsoft pre-presbyopicmyopeyesreducednone pre-presbyopicmyopeyesnormalhard pre-presbyopichypermetropenoreducednone pre-presbyopichypermetropenonormalsoft pre-presbyopichypermetropeyesreducednone pre-presbyopichypermetropeyesnormalnone presbyopicmyopenoreducednone presbyopicmyopenonormalnone presbyopicmyopeyesreducednone presbyopicmyopeyesnormalhard presbyopichypermetropenoreducednone presbyopichypermetropenonormalsoft presbyopichypermetropeyesreducednone presbyopichypermetropeyesnormalnone Contact lens data

Example: contact lens data  Rule we seek:  Possible tests: Age = Young2/8 Age = Pre-presbyopic1/8 Age = Presbyopic1/8 Spectacle prescription = Myope3/12 Spectacle prescription = Hypermetrope1/12 Astigmatism = no0/12 Astigmatism = yes4/12 Tear production rate = Reduced0/12 Tear production rate = Normal4/12 if ? then recommendation = hard

Modified rule and resulting data  Rule with best test added (4/12):  Instances covered by modified rule: AgeSpectacle prescriptionAstigmatismTear production rateRecommendation YoungMyopeYesReducedNone YoungMyopeYesNormalHard YoungHypermetropeYesReducedNone YoungHypermetropeYesNormalHard Pre-presbyopicMyopeYesReducedNone Pre-presbyopicMyopeYesNormalHard Pre-presbyopicHypermetropeYesReducedNone Pre-presbyopicHypermetropeYesNormalNone PresbyopicMyopeYesReducedNone PresbyopicMyopeYesNormalHard PresbyopicHypermetropeYesReducedNone PresbyopicHypermetropeYesNormalNone if astigmatism = yes then recommendation = hard

Further refinement  Current state:  Possible tests: Age = Young2/4 Age = Pre-presbyopic1/4 Age = Presbyopic1/4 Spectacle prescription = Myope3/6 Spectacle prescription = Hypermetrope1/6 Tear production rate = Reduced0/6 Tear production rate = Normal4/6 if astigmatism = yes and ? then recommendation = hard

Modified rule and resulting data  Rule with best test added:  Instances covered by modified rule: AgeSpectacle prescriptionAstigmatismTear production rateRecommendation YoungMyopeYesNormalHard YoungHypermetropeYesNormalHard Pre-presbyopicMyopeYesNormalHard Pre-presbyopicHypermetropeYesNormalNone PresbyopicMyopeYesNormalHard PresbyopicHypermetropeYesNormalNone if astigmatism = yes and tear production rate = normal then recommendation = hard

Further refinement  Current state:  Possible tests:  Tie between the first and the fourth test  We choose the one with greater coverage Age = Young2/2 Age = Pre-presbyopic1/2 Age = Presbyopic1/2 Spectacle prescription = Myope3/3 Spectacle prescription = Hypermetrope1/3 if astigmatism = yes and tear production rate = normal and ? then recommendation = hard

The result  Final rule:  Second rule for recommending “ hard lenses ” : (built from instances not covered by first rule)  These two rules cover all “ hard lenses ”  Process is repeated with other two classes if astigmatism = yes and tear production rate = normal and spectacle prescription = myope then recommendation = hard if age = young and astigmatism = yes and tear production rate = normal then recommendation = hard

Pseudo-code for PRISM For each class C Initialize D to the instance set While D contains instances in class C Create a rule R with an empty left-hand side that predicts class C Until R is perfect (or there are no more attributes to use) do For each attribute A not mentioned in R, and each value v, Consider adding the condition A = v to the left-hand side of R Select A and v to maximize the accuracy p/t (break ties by choosing the condition with the largest p) Add A = v to R Remove the instances covered by R from D

Separate and conquer  Methods like PRISM (for dealing with one class) are separate-and-conquer algorithms:  First, a rule is identified  Then, all instances covered by the rule are separated out  Finally, the remaining instances are “ conquered ”  Difference to divide-and-conquer methods:  Subset covered by rule doesn ’ t need to be explored any further

Instance-based Classification  k-nearest neighbour

Instance-based representation  Simplest form of learning: rote learning  Training instances are searched for instance that most closely resembles new instance  The instances themselves represent the knowledge  Also called instance-based learning  Similarity function defines what ’ s “ learned ”  Instance-based learning is lazy learning  Methods:  nearest neighbor  k-nearest neighbor  …

1-NN example

The distance function  Simplest case: one numeric attribute  Distance is the difference between the two attribute values involved (or a function thereof)  Several numeric attributes: normally, Euclidean distance is used and attributes are normalized  Nominal attributes: distance is set to 1 if values are different, 0 if they are equal  Are all attributes equally important?  Weighting the attributes might be necessary

Instance-based learning  Most instance-based schemes use Euclidean distance: a (1) and a (2) : two instances with k attributes  Taking the square root is not required when comparing distances  Other popular metric: city-block (Manhattan) metric  Adds differences without squaring them

Nearest Neigbour & Voronoi diagram

Normalization and other issues  Different attributes are measured on different scales  Need to be normalized: v i : the actual value of attribute i  Nominal attributes: distance either 0 or 1  Common policy for missing values: assumed to be maximally distant (given normalized attributes) or

k-NN example  k-NN approach: perform a majority vote to derive label

Discussion of 1-NN  Can be very accurate  But: curse of dimensionality Every added dimension increases distances Exponentially more training data needed  … but slow:  simple version scans entire training data to make prediction  better training set representations: kD-tree, ball tree,...  Assumes all attributes are equally important  Remedy: attribute selection or weights  Possible remedies against noisy instances:  Take a majority vote over the k nearest neighbors  Removing noisy instances from dataset (difficult!)

Summary  Simple methods frequently work well  robust against noise, errors  Advanced methods, if properly used, can improve on simple methods  No method is universally best