Classification. An Example (from Pattern Classification by Duda & Hart & Stork – Second Edition, 2001)

Slides:



Advertisements
Similar presentations
Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)
Advertisements

Pattern Recognition Ku-Yaw Chang
ECE 8443 – Pattern Recognition Objectives: Course Introduction Typical Applications Resources: Syllabus Internet Books and Notes D.H.S: Chapter 1 Glossary.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Bayesian Decision Theory
Chapter 1: Introduction to Pattern Recognition
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
OUTLINE Course description, What is pattern recognition, Cost of error, Decision boundaries, The desgin cycle.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.
Pattern Classification, Chapter 1 1 Basic Probability.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Introduction to machine learning
Bayesian Decision Theory Making Decisions Under uncertainty 1.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Introduction to Pattern Recognition Charles Tappert Seidenberg School of CSIS, Pace University.
Introduction to Pattern Recognition Chapter 1 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis 1.
Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.
Lecture 2: Bayesian Decision Theory 1. Diagram and formulation
IBS-09-SL RM 501 – Ranjit Goswami 1 Basic Probability.
Perception Introduction Pattern Recognition Image Formation
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.
2. Bayes Decision Theory Prof. A.L. Yuille Stat 231. Fall 2004.
Compiled By: Raj G Tiwari.  A pattern is an object, process or event that can be given a name.  A pattern class (or category) is a set of patterns sharing.
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 3. Bayes Decision Theory: Part II. Prof. A.L. Yuille Stat 231. Fall 2004.
1 E. Fatemizadeh Statistical Pattern Recognition.
Optimal Bayes Classification
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Introduction to Pattern Recognition (การรู้จํารูปแบบเบื้องต้น)
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Objectives: Loss Functions Risk Min. Error Rate Class. Resources: DHS – Chap. 2 (Part 1) DHS – Chap. 2 (Part 2) RGO - Intro to PR MCE for Speech MCE for.
Introduction to Pattern Recognition Chapter 1 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis 1.
Lecture 1.31 Criteria for optimal reception of radio signals.
Artificial Intelligence
Machine Learning for Computer Security
Pattern Classification Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 Dr. Ding Yuxin Pattern Recognition.
Introduction Machine Learning 14/02/2017.
Special Topics In Scientific Computing
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
LECTURE 01: COURSE OVERVIEW
Comp328 tutorial 3 Kai Zhang
CS668: Pattern Recognition Ch 1: Introduction
Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas
Introduction to Pattern Recognition and Machine Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
An Introduction to Supervised Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 01: COURSE OVERVIEW
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Classification

An Example (from Pattern Classification by Duda & Hart & Stork – Second Edition, 2001)

A fish-packing plant wants to automate the process of sorting incoming fish according to species As a pilot project, it is decided to try to separate sea bass from salmon using optical sensing

Length Lightness Width Position of the mouth … Features to explore for use in our classifier

Preprocessing: Images of different fishes are isolated from one another and from background; Feature extraction: The information of a single fish is then sent to a feature extractor, that measure certain “features” or “properties”; Classification: The values of these features are passed to a classifier that evaluates the evidence presented, and build a model to discriminate between the two species

Domain knowledge: a sea bass is generally longer than a salmon Feature: Length Model: Sea bass have some typical length, and this is greater than the length of a salmon

Classification rule: If Length >= l* then sea bass otherwise salmon How to choose l* ? Use length values of sample data (training data)

Histograms of the length feature for the two categories Leads to the smallest number of errors on average We cannot reliably separate sea bass from salmon by length alone!

New Feature: Average lightness of the fish scales

Histograms of the lightness feature for the two categories Leads to the smallest number of errors on average The two classes are much better separated!

Histograms of the lightness feature for the two categories Our actions are equally costly Classifying a sea bass as salmon costs more. Thus we reduce the number of sea bass classified as salmon.

In Summary: The overall task is to come up with a decision rule (i.e., a decision boundary) so as to minimize the cost (which is equal to the average number of mistakes for equally costly actions).

No single feature gives satisfactory results We must rely on using more than one feature We observe that: sea bass usually are wider than salmon Two features: Lightness and Width Resulting fish representation:

Decision rule: Classify the fish as a sea bass if its feature vector falls above the decision boundary shown, and as salmon otherwise Should we be satisfied with this result?

Options we have: Consider additional features: –Which ones? –Some features may be redundant (e.g., if eye color perfectly correlates with width, then we gain no information by adding eye color as feature.) –It may be costly to attain more features –Too many features may hurt the performance! Use a more complex model

All training data are perfectly separated Should we be satisfied now?? We must consider: Which decisions will the classifier take on novel patterns, i.e. fishes not yet seen? Will the classifier suggest the correct actions? This is the issue of GENERALIZATION

Generalization A good classifier should be able to generalize, i.e. perform well on unseen data The classifier should capture the underlying characteristics of the categories The classifier should NOT be tuned to the specific (accidental) characteristics of the training data Training data in practice contain some noise

As a consequence: We are better off with a slightly poorer performance on the training examples if this means that our classifier will have better performance on novel patterns.

The decision boundary shown may represent the optimal tradeoff between accuracy on the training set and on new patterns How can we determine automatically when the optimal tradeoff has been reached?

error Generalization error Error on training data Complexity of the modelOptimal complexity Tradeoff between performance on training and novel examples Evaluation of the classifier on novel data is important to avoid overfitting

Summarizing our example: –Classification is the task of recovering (approximating) the model that generated the patterns (generally expressed in terms of probability densities) –Given a new vector of feature values, the classifier needs to determine the corresponding probability for each of the possible categories

Performance evaluation: –Error rate: the percentage of new patterns that are assigned to the wrong class –Risk: total expected cost Can we estimate the lowest possible risk of any classifier, to see how close ours meets this ideal? Bayesian Decision Theory: considers the ideal case in which the probability structure underlying the classes is known perfectly. This is rarely true in practice, but it allows to determine the optimal classifier, against which we can compare all classifiers.

Class-conditional probability density functions: represent the probability of measuring a certain value x given that the pattern is in a certain class