Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
K Means Clustering , Nearest Cluster and Gaussian Mixture
An Overview of Machine Learning
Bayesian Decision Theory
Visual Recognition Tutorial
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture 20 Object recognition I
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Project 4 out today –help session today –photo session today Project 2 winners Announcements.
Visual Recognition Tutorial
Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Introduction to machine learning
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Line detection Assume there is a binary image, we use F(ά,X)=0 as the parametric equation of a curve with a vector of parameters ά=[α 1, …, α m ] and X=[x.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
EECS 274 Computer Vision Segmentation by Clustering II.
Classification 1: generative and non-parameteric methods Jakob Verbeek January 7, 2011 Course website:
1 E. Fatemizadeh Statistical Pattern Recognition.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Jakob Verbeek December 11, 2009
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Fuzzy Pattern Recognition. Overview of Pattern Recognition Pattern Recognition Procedure Feature Extraction Feature Reduction Classification (supervised)
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
The EM algorithm for Mixture of Gaussians & Classification with Generative models Jakob Verbeek December 2, 2011 Course website:
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Deep Feedforward Networks
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
LECTURE 11: Advanced Discriminant Analysis
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
Ch8: Nonparametric Methods
Classification of unlabeled data:
Announcements Project 1 artifact winners
Outline Parameter estimation – continued Non-parametric methods.
Lecture 26: Faces and probabilities
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
REMOTE SENSING Multispectral Image Classification
REMOTE SENSING Multispectral Image Classification
CS4670: Intro to Computer Vision
INTRODUCTION TO Machine Learning
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
LECTURE 07: BAYESIAN ESTIMATION
Announcements Project 4 out today Project 2 winners help session today
Where are we? We have covered: Project 1b was due today
Multivariate Methods Berlin Chen
EM Algorithm and its Applications
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
The “Margaret Thatcher Illusion”, by Peter Thompson
Presentation transcript:

Pattern Recognition. Introduction. Definitions.

Recognition process. Recognition process relates input signal to the stored concepts about the object. Object Signal Conceptual representation of the object Machine recognition relates signal to the stored domain knowledge. Signal Machine Domain knowle dge Signal measurement and search

Definitions. Similar objects produce similar signals. Class is a set of similar objects. Patterns are collections of signals originating from similar objects. Pattern recognition is the process of identifying signal as originating from particular class of objects.

Pattern recognition steps. Measure signal - capture and preprocessing Extract features Recognize Digital data Feature vector Class label

Training of the recognizer. Domain knowle dge Signal measurement and search Operational mode Signal Class label Training mode Training signals Change parameters of recognition algorithm and domain knowledge

Types of Training Supervised training – uses training samples with associated class labels. -Character images with corresponding labels. Unsupervised training – training samples are not labeled. - Character images: cluster images and assign labels to clusters later. Reinforcement training – feedback is provided during recognition to adjust system parameters. - Use word images to train character recognizer. Segmentation Character recognition Combine results Word image Ranking of lexicon words Adjust parameters

Template Matching(1). Image is converted into 12x12 bitmap.

Template Matching(2) Bitmap is represented by 12x12-matrix or by 144-vector with 0 and 1 coordinates.

Template Matching(3). Training samples – templates with corresponding class: Template of the image to be recognized: Algorithm:

Template Matching(4). Number of templates to store: If fewer templates are stored, some images might not be recognized. Improvements Use fewer features Use better matching function

Features. Features are numerically expressed properties of the signal. The set of features used for pattern recognition is called feature vector. The number of used features is the dimensionality of the feature vector. n-dimensional feature vectors can be represented as points in n-dimensional feature space. Class 1 Class 2 Class 1 Class 2

Guidelines for Features. Use fewer features if possible 1.Reduce number of required training samples. 2.Improve quality of recognizing function. Use features that differentiate classes well 1.Good features: elongation of the image, presence of large loops or strokes. 2.Bad features: number of black pixels, number of connected components.

Distance between feature vectors. Instead of finding template exactly matching input template look at how close feature vectors are. Nearest neighbor classification algorithm: Class 1 Class 2 1.Find template closest to the input pattern. 2.Classify pattern to the same class as closest template.

Examples of distances in feature space.

K-nearest neighbor classifier. Modification of nearest neighbor classifier: use k nearest neighbors instead of 1 to classify pattern. Class 1 Class 2

Clustering. Reduce the number of stored templates – keep only cluster centers. Class 1 Class 2 Clustering algorithms reveal the structure of classes in feature space and are used in unsupervised training.

Statistical pattern recognition. Treat patterns (feature vectors) as observations of random variable (vector). Random variable is defined by the probability density function. Probability density function of random variable and few observations.

Bayes classification rule(1) Suppose we have 2 classes and we know probability density functions of their feature vectors. How some new pattern should be classified?

Bayes classification rule(2) Bayes formula: Above formula is a consequent of following probability theory equations:

Bayes classification rule(3) Bayes classification rule: classify x to the class which has biggest posterior probability Using Bayes formula, we can rewrite classification rule:

Estimating probability density function. In applications, probability density function of class features is unknown. Solution: model unknown probability density function of class by some parametric function and determine parameters based on training samples. Example: model pdf as a Gaussian function with unitary covariance matrix and unknown mean

Maximum likelihood parameter estimation What is the criteria for estimating parameters ? Maximum likelihood parameter estimation: Parameter should maximize the likelihood of observed training samples Equivalently, parameter should maximize loglikelihood function:

ML-estimate for Gaussian pdf To find an extremum of function (with respect to ) we equal its gradient to 0: Thus, estimate for parameter is:

Mixture of Gaussian functions No direct computation of optimal values of parameters is possible. Generic methods for finding extreme points of non-linear functions can be used: gradient descent, Newton’s algorithm, Lagrange multipliers. Usually used: expectation-maximization (EM) algorithm.

Nonparametric pdf estimation Histogram method: Split feature space into bins of width h. Approximate p(x) by:

Nearest neighbor pdf estimation Find k nearest neighbors. Let V be the volume of the sphere containing these k training samples. Then approximate pdf by:

Parzen windows. Each training point contributes one Parzen kernel function to pdf construction: Important to choose proper h.

Parzen windows for cluster centers Take cluster centers as centers for Parzen kernel functions. Make contribution of the cluster proportional to the number of training samples cluster has.

Overfitting When number of trainable parameters is comparable to the number of training samples, overfitting problem might appear. Approximation might work perfectly on training data, but not well on testing data.