Prénom Nom Document Analysis: Fundamentals of pattern recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Slides:



Advertisements
Similar presentations
Prénom Nom Document Analysis: Document Image Processing Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Advertisements

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Bayesian Decision Theory
What is Statistical Modeling
Visual Recognition Tutorial
Chapter 1: Introduction to Pattern Recognition
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
Lecture 20 Object recognition I
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Prénom Nom Document Analysis: Segmentation & Layout Analysis Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
Visual Recognition Tutorial
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Thanks to Nir Friedman, HU
Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Crash Course on Machine Learning
Bayesian Decision Theory Making Decisions Under uncertainty 1.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
Principles of Pattern Recognition
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
Statistical Decision Theory
Classification. An Example (from Pattern Classification by Duda & Hart & Stork – Second Edition, 2001)
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
1 E. Fatemizadeh Statistical Pattern Recognition.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Chapter 12 Object Recognition Chapter 12 Object Recognition 12.1 Patterns and pattern classes Definition of a pattern class:a family of patterns that share.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1 A Statistical Matching Method in Wavelet Domain for Handwritten Character Recognition Presented by Te-Wei Chiang July, 2005.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Lecture 1.31 Criteria for optimal reception of radio signals.
Lecture 15. Pattern Classification (I): Statistical Formulation
LECTURE 03: DECISION SURFACES
Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas
Pattern Classification, Chapter 3
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Pattern Recognition PhD Course.
REMOTE SENSING Multispectral Image Classification
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
EE513 Audio Signals and Systems
LECTURE 23: INFORMATION THEORY REVIEW
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presentation transcript:

Prénom Nom Document Analysis: Fundamentals of pattern recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008

© Prof. Rolf Ingold 2 Outline  Introduction  Feature extraction and decision  Role of training  Feature selection  Example : Font recognition  Bayesian decision theory  Evaluation

© Prof. Rolf Ingold 3 Goals of Pattern Recognition  Pattern recognition aims at discovering and identifying patterns in raw data  it consists of assigning symbols to data (patterns)‏  it is based on a a priori knowledge, often statistical information  Pattern recognition is used for computer perception (image/sound analysis)‏  in a preliminary step, a sensor captures raw information  this information is interpreted to take decisions  Pattern recognition can be thought as a methodic way of reducing the information in order to keep only the relevant meaning

© Prof. Rolf Ingold 4 Pattern Recognition Applications  Pattern recognition is involved in many applications  seismological survey  speech recognition  scientific imagery (biology, health-care, physics,...)‏  satellite based observation (military and civil applications,...)‏  document analysis, with several components:  optical character recognition (OCR)‏  font identification  handwriting recognition (off-line )‏  graphics recognition  computer vision (3D scene analysis)‏  biometry: person identification and authentication ...  Pattern recognition methodologies rely on other scientific domains: statistics, operation research, graph theory, artificial intelligence,...

© Prof. Rolf Ingold 5 Origin of Difficulties  Pattern recognition is mainly an information overload problem  The difficulty is issued from  variability of objects belonging to the same class  distortion of captured data (noise, degradations,...)‏

© Prof. Rolf Ingold 6 Steps Involved in Pattern Recognition  Pattern recognition is basically a two stage process:  Feature extraction, aiming at removing redundancy while keeping significant information  Classification, consisting in making a decision by associating a class label observation feature vector  class

© Prof. Rolf Ingold 7 Role of Training Features classes decision training extraction Models  Classifiers (tools that perform classification tasks) are generally designed to be trained  Each class is characterized by a model  Models are built with representative training data

© Prof. Rolf Ingold 8 Supervised vs. Unsupervised Training  Two different situations may occur regarding training material:  Supervised training is performed when the training samples are labeled with the class they belong to  each class is associated with a set of training samples T i ={x i1, x i2,..., x iN i }, supposed to be statistically representative for the class  Unsupervised training is performed when the training samples are statistically representative but mixed over all classes T={x 1, x 2,..., x n },

© Prof. Rolf Ingold 9 Feature Selection  Features are selected accordingly to the application  Features should be chosen carefully by considering  discrimination power between classes  robustness to intra-class distortions and noise  global statistical independency (spread over the entire feature space)‏  "fast computation"  reasonable dimension (number of features)‏

© Prof. Rolf Ingold 10 Features for Character Recognition  Given a binary image of a character, a lot of features can be used for character recognition  Size, i.e., width and height of the bounding box  Position of baseline (if available)‏  Weight (number of black pixels)‏  Perimeter (length of the contours)‏  Center of gravity  Moments (second and third order in both directions)‏  Distributions of horizontal and vertical runs  Number of intersections with a (eventually random) set of lines  Length and structure (singular points, holes) of skeleton ...  Local features computed on sub-images  …

© Prof. Rolf Ingold 11 Font Recognition: Goal  Goal: recognize fonts of synthetically generated isolated words  as binary (black & white) or grey level images  at 300 dpi  12 standard font classes are considered  3 families:  Arial  Courier New  Times New Roman  4 styles:  Plain  Italic  Bold  Bold Italic  single size : 12 pt

© Prof. Rolf Ingold 12 Font Recognition: Extracted Features  Words are segmented with a surrounding white border of 1 pixel  Some preprocessing steps are used  Horizontal projection profile (hp)‏  Derivative of horizontal projection profile (hpd)‏  The following features are calculated  hp-mean (or density): mean of hp  hpd-stdev (or slanting): standard deviation of hpd  hr-mean: mean of horizontal runs (up to length 12)‏  hr-stdev: standard deviation of horizontal runs (up to length 12)‏  vr-mean: mean of vertical runs (up to length 12)‏  vr-stdev: standard vertical of horizontal runs (up to length 12)‏

© Prof. Rolf Ingold 13 Font Recognition: Illustration of Features  Basic image processing features used are  horizontal projection profile  distribution of horizontal runs (from 1 to 11)‏  distribution of vertical runs (from 1 to 11)‏

© Prof. Rolf Ingold 14 Font Recognition: decision boundaries on single feature (1)‏  Some single features are highly discriminant for some font sets  hpd-stdev is discriminating ■ roman and ■ italic fonts  hr-mean is discriminating ■ normal and ■ bold fonts

© Prof. Rolf Ingold 15 Font Recognition: decision boundaries on single feature (2)‏  Other features may partly discriminate font sets  hr-mean can partly discriminate ■ Arial, ■ Courier and ■ Times

© Prof. Rolf Ingold 16 Font Recognition: decision boundaries on multiple features (1)‏  By combining two features, font discrimination is improved  (hpd-stdev, vr-stdev) discriminate ■ roman and ■ italic fonts hpd-stdev vr-stdev

© Prof. Rolf Ingold 17 Font Recognition: decision boundaries on multiple features (2)‏  font family discrimination (■ Arial, ■ Courier and ■ Times) becomes possible by combining several couple of features

© Prof. Rolf Ingold 18 Bayesian Decision Theory  Bayesian decision makes the assumption that all information contributing to the decision can be stated in form of probabilities  P(  i ) : the a priori probability (or prior) of each class  p(x|  i ) : the class conditional density function of the feature vector x, also called likelihood of the class  i with respect to x  The goal is to determine the class  i, for which the a posteriori probability (or posterior) P(  i |x) is the highest

© Prof. Rolf Ingold 19 Bayesian Rule  The Bayes rule allows to calculate the a posteriori probability of each class, as a function of priors and likelihoods where p(x) is called evidence and can be considered as a normalization factor, i.e.,

© Prof. Rolf Ingold 20 Influence of Posterior Probabilities P(  1 )=0.5, P(  2 )=0.5P(  1 )=0.1, P(  2 )=0.9  Example with a single feature: posterior probabilities in two different cases regarding a priori probabilities 22 11 22 11

© Prof. Rolf Ingold 21 Probability of Error  Given a feature x of a given sample, the probability of error for a decision  (x)=  i is equal to  The probability of error is given by

© Prof. Rolf Ingold 22 Optimal Decision Boundaries  The minimal error is obtained by the decision  (x)=  i with

© Prof. Rolf Ingold 23 Decision Theory  In the simplest case a decision consist in assigning to an observation x a class label  i =  x   A natural extension consists in adding a “rejection class”  R so that  x  R   In the most general case, the decision results in an action  i =  x 

© Prof. Rolf Ingold 24 Optimal Decision Theory  Let us consider a loss function  i  j  defining the loss incurred by taking action  i when the true state of nature is  j ; usually  The risk of taking an action  i for a particular sample x is  The optimal decision consists in choosing  i that minimizes the risk

© Prof. Rolf Ingold 25 Optimal decision  When  i  i  = 0 and  i  j  = 1  j ≠ i, the optimal decision consists of minimizing the probability of error  The minimal error is obtained by the decision  (x)=  i with or equivalently  In the case when all a priori probabilities are equivalent

© Prof. Rolf Ingold 26 Minimum Risk for Two Classes  Let ij  i  j  be the loss of action  i when the true state is  j  The conditional risks of each decision is expressed as  Then, the optimal decision rule becomes or equivalently  And in the case of 11  22

© Prof. Rolf Ingold 27 Discriminant Functions  In the case of multiple classes a pattern classifier can be specified by a set of discriminant functions g i (x) such that the decision  i corresponds to  Thus, a Bayesian classifier is naturally represented by  The choice of discriminant functions is not unique  g i (x) can be replaced by f (g i (x)) for any monotonic increasing function f(x)  A minimum error-rate classifier can be obtained with

© Prof. Rolf Ingold 28 Bayesian Rule in Higher Dimensions  The Bayesian rule can easily be generalized to the multidimensional case, where features are represented by a vector x. where

© Prof. Rolf Ingold 29 Conclusion about Bayesian Decision  Bayesian decision theory provides a theoretical framework for statistical pattern recognition  This theory supposes the following probabilistic information to be known:  the number of classes  a priori probabilities of each class  class dependent feature distributions for each class  The remaining problem is: how to estimate all these things  feature distributions are hard to be estimated  priors are seldom known  even the number of classes is not always given

© Prof. Rolf Ingold 30 Performance Evaluation  Performance evaluation is a very important issue of PR  it gives an objective measure of the performance  it allows to compare different methods  Performance evaluation requires correctly labeled test data  test data should be different from training data  a strategy consists in cyclically using 80% of the data for training, and the remaining 20% for evaluation

© Prof. Rolf Ingold 31 Performance Measures: Recognition / Error Rates  Performance evaluation uses several measures  recognition rate corresponds to the ratio number of correct answers / number of total answers  error rate corresponds to the ratio number of incorrect answers / number of total answers  rejection rate corresponds to the ratio number of rejections / number of total answers recognition rate = 1 – (rejection rate + error rate)‏

© Prof. Rolf Ingold 32 Performance Measures: Recall & Precision  On binary decisions (a sample belongs to the class or not) two other measurements are frequently used  recall corresponds to the ratio of correctly assigned samples to the size of the class  precision corresponds to the ratio of correctly assigned samples to the number of assigned samples  Recall and precision are changing in opposite directions  equal error rate is sometimes considered to be the best trade- off  Additionally, the harmonic mean of precision and recall, called F-measure is frequently used