Introduction to Pattern Recognition Chapter 1 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis 1.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
CS479/679 Pattern Recognition Dr. George Bebis
Pattern Recognition Ku-Yaw Chang
ECE 8443 – Pattern Recognition Objectives: Course Introduction Typical Applications Resources: Syllabus Internet Books and Notes D.H.S: Chapter 1 Glossary.
Bayesian Decision Theory
What is Statistical Modeling
Supervised learning Given training examples of inputs and corresponding outputs, produce the “correct” outputs for new inputs Two main scenarios: –Classification:
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Chapter 1: Introduction to Pattern Recognition
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
CS292 Computational Vision and Language Pattern Recognition and Classification.
OUTLINE Course description, What is pattern recognition, Cost of error, Decision boundaries, The desgin cycle.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification, Chapter 1 1 Basic Probability.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Lecture #1COMP 527 Pattern Recognition1 Pattern Recognition Why? To provide machines with perception & cognition capabilities so that they could interact.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
Part I: Classification and Bayesian Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Introduction to machine learning
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture Notes by Neşe Yalabık Spring 2011.
PATTERN RECOGNITION AND MACHINE LEARNING
Pattern Recognition Vidya Manian Dept. of Electrical and Computer Engineering University of Puerto Rico INEL 5046, Spring 2007
Introduction to Pattern Recognition Charles Tappert Seidenberg School of CSIS, Pace University.
: Chapter 1: Introduction 1 Montri Karnjanadecha ac.th/~montri Principles of Pattern Recognition.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
Institute of Systems and Robotics ISR – Coimbra Mobile Robotics Lab Bayesian Approaches 1 1 jrett.
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.
Classification. An Example (from Pattern Classification by Duda & Hart & Stork – Second Edition, 2001)
IBS-09-SL RM 501 – Ranjit Goswami 1 Basic Probability.
Perception Introduction Pattern Recognition Image Formation
Compiled By: Raj G Tiwari.  A pattern is an object, process or event that can be given a name.  A pattern class (or category) is a set of patterns sharing.
1 E. Fatemizadeh Statistical Pattern Recognition.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Introduction to Pattern Recognition (การรู้จํารูปแบบเบื้องต้น)
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Data Mining and Decision Support
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Pattern recognition – basic concepts. Sample input attribute, attribute, feature, input variable, independent variable (atribut, rys, příznak, vstupní.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Machine learning & object recognition Cordelia Schmid Jakob Verbeek.
Introduction to Pattern Recognition Chapter 1 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis 1.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
CS479/679 Pattern Recognition Dr. George Bebis
Machine Learning with Spark MLlib
Artificial Intelligence
Machine Learning for Computer Security
Pattern Classification Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 Dr. Ding Yuxin Pattern Recognition.
Introduction Machine Learning 14/02/2017.
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
CS668: Pattern Recognition Ch 1: Introduction
Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas
Introduction to Pattern Recognition and Machine Learning
Introduction to Pattern Recognition
An Introduction to Supervised Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Presentation transcript:

Introduction to Pattern Recognition Chapter 1 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis 1

What is a Pattern? A pattern could be an object or event. 2 biometric patternshand gesture patterns

What is a Pattern? (con’t) Loan/Credit card applications – Income, # of dependents, mortgage amount for credit worthiness classification. Dating services – Age, hobbies, income for “desirability” classification. Web documents – Key-word based descriptions (e.g., documents containing “football”, “NFL”) for document classification. 3

Pattern Class A collection of “ similar ” objects – two challenges: – Intra-class variability – Inter-class variability 4 Letters/Numbers that look similar The letter “T” in different typefaces

What is Pattern Recognition? Assign a pattern to one of several known categories (or classes). 5 Gender Classification

What is Pattern Recognition? (cont’d) 6 Character Recognition

What is Pattern Recognition? (cont’d) 7 Speech Recognition

Modeling Pattern Classes Typically expressed in terms of a statistical model. – e.g., probability density function (Gaussian) 8 Gender Classification male female

Pattern Recognition Objectives Hypothesize the models that describe each pattern class (e.g., recover the process that generated the patterns). Given a novel pattern, choose the best-fitting model for it and then assign it to the pattern class associated with the model. 9

Classification vs Clustering – Classification (known categories) – Clustering (creation of categories) 10 Category “A” Category “B” Classification (Recognition) (Supervised Classification) Clustering (Unsupervised Classification)

Pattern Recognition Applications 11

Handwriting Recognition 12

License Plate Recognition 13

Face Detection 14 Example of unbalanced classes (i.e., faces vs non-faces)

Gender Classification 15 Example of balanced classes (i.e., male vs female)

Fingerprint Classification 16

Biometric Recognition 17

Land Cover Classification (from aerial or satellite images) 18

“Hot” Pattern Recognition Applications Recommendation systems – Amazon, Netflix Targeted advertising 19

The Netflix Prize Predict how much someone is going to enjoy a movie based on their movie preferences – $1M awarded in Sept Can software recommend movies to customers? – Not Rambo to Woody Allen fans – Not Saw VI if you’ve seen all previous Saw movies 20

Main Classification Approaches Assumption: x is the input vector (pattern) y is the class label (class) Generative – Model the joint probability, p( x, y) – Make predictions by using Bayes rules to calculate p(ylx) – Pick the most likely label y Discriminative – Model p(ylx) directly, or learn a direct map from inputs x to the class labels y. – Pick the most likely label y 21

Syntactic Pattern Recognition Approach Represent patterns in terms of simple primitives. Describe patterns using deterministic grammars or formal languages. 22

Complexity of PR – An Example 23 Problem: Sorting incoming fish on a conveyor belt. Assumption: Two kind of fish: (1) sea bass (2) salmon

Pre-processing 24 (1) Image enhancement (2) Separating touching or occluding fish (3) Finding the boundary of each fish

Feature Extraction Assume a fisherman told us that a sea bass is generally longer than a salmon. We can use length as a feature and decide between sea bass and salmon according to a threshold on length. How can we choose the threshold? 25

“Length” Histograms Even though sea bass is longer than salmon on the average, there are many examples of fish where this observation does not hold. 26 threshold l*

“Average Lightness” Histograms Consider a different feature such as “average lightness” It seems easier to choose the threshold x * but we still cannot make a perfect decision. 27

Cost of miss-classifications There are two possible classification errors: (1) Deciding the fish was a sea bass when it was a salmon. (2) Deciding the fish was a salmon when it was a sea bass. Are both errors equally important ? 28

Cost of miss-classifications (cont’d) Suppose the fish packing company knows that: – Customers who buy salmon will object vigorously if they see sea bass in their cans. – Customers who buy sea bass will not be unhappy if they occasionally see some expensive salmon in their cans. How does this knowledge affect our decision? 29

Multiple Features To improve recognition accuracy, we might have to use more than one features at a time. – Single features might not yield the best performance. – Using combinations of features might yield better performance. 30

Multiple Features (cont’d) Partition the feature space into two regions by finding the decision boundary that minimizes the error. 31

How Many Features? Does adding more features always improve performance? – It might be difficult and computationally expensive to extract certain features. – Correlated features do not improve performance. – “Curse” of dimensionality … 32

Curse of Dimensionality Adding too many features can, paradoxically, lead to a worsening of performance. – Divide each of the input features into a number of intervals, so that the value of a feature can be specified approximately by saying in which interval it lies. – If each input feature is divided into M divisions, then the total number of cells is M d (d: # of features) which grows exponentially with d. – Since each cell must contain at least one point, the number of training data grows exponentially! 33

Model Complexity We can get perfect classification performance on the training data by choosing complex models. Complex models are tuned to the particular training samples, rather than on the characteristics of the true model. 34 How well can the model generalize to unknown samples? overfitting

Generalization Generalization is defined as the ability of a classifier to produce correct results on novel patterns. How can we improve generalization performance ? – More training examples (i.e., better model estimates). – Simpler models usually yield better performance. 35 complex model simpler model

More on model complexity 36 Regression example: Consider the following 10 sample points assuming some noise. Green curve is the true function that generated the data. Approximate the true function from the sample points.

More on model complexity (cont’d) 37 Polynomial curve fitting: polynomials having various orders, shown as red curves, fitted to the set of 10 sample points.

More on complexity (cont’d) 38 Polynomial curve fitting: 9’th order polynomials fitted to 15 and 100 sample points.

PR System – Two Phases 39 Training Phase Test Phase

PR System (cont’d) Sensing: – Use a sensor (camera or microphone) – PR depends on bandwidth, resolution, sensitivity, distortion of the sensor. Pre-processing: – Removal of noise in data. – Segmentation (i.e., isolation of patterns of interest from background). 40

PR System (cont’d) Training/Test data – How do we know that we have collected an adequately large and representative set of examples for training/testing the system? 41

PR System (cont’d) Feature extraction: – Discriminative features – Invariant features (e.g., translation, rotation and scale) – How many should we use ? – Are there ways to automatically learn which features are best ? 42

PR System (cont’d) Missing features: – Certain features might be missing (e.g., due to occlusion). – How should we train the classifier with missing features ? – How should the classifier make the best decision with missing features ? 43

PR System (cont’d) Model learning and estimation: – Models complex than necessary lead to overfitting (i.e., good performance on the training data but poor performance on novel data). – How can we adjust the complexity of the model ? (i.e., not very complex or simple). 44

PR System (cont’d) Classification: – Using features and learned models to assign a novel pattern to a category. – Performance can be improved using a "pool" of classifiers. – How should we build and combine multiple classifiers ? 45 "pool" of classifiers

PR System (cont’d) Post-processing: – Exploit context to improve performance. 46 How m ch info mation are y u mi sing?

PR System (cont’d) Computational Complexity: – How does an algorithm scale with the number of: features patterns categories – Consider tradeoffs between computational complexity and performance. 47

General Purpose PR Systems Humans have the ability to switch rapidly and seamlessly between different pattern recognition tasks. It is very difficult to design a system that is capable of performing a variety of classification tasks. – Different decision tasks may require different features. – Different features might yield different solutions. – Different tradeoffs exist for different tasks. 48