Prénom Nom Document Analysis: Fundamentals of pattern recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Prénom Nom Document Analysis: Fundamentals of pattern recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008

© Prof. Rolf Ingold 3 Goals of Pattern Recognition  Pattern recognition aims at discovering and identifying patterns in raw data  it consists of assigning symbols to data (patterns)‏  it is based on a a priori knowledge, often statistical information  Pattern recognition is used for computer perception (image/sound analysis)‏  in a preliminary step, a sensor captures raw information  this information is interpreted to take decisions  Pattern recognition can be thought as a methodic way of reducing the information in order to keep only the relevant meaning

© Prof. Rolf Ingold 4 Pattern Recognition Applications  Pattern recognition is involved in many applications  seismological survey  speech recognition  scientific imagery (biology, health-care, physics,...)‏  satellite based observation (military and civil applications,...)‏  document analysis, with several components:  optical character recognition (OCR)‏  font identification  handwriting recognition (off-line )‏  graphics recognition  computer vision (3D scene analysis)‏  biometry: person identification and authentication ...  Pattern recognition methodologies rely on other scientific domains: statistics, operation research, graph theory, artificial intelligence,...

© Prof. Rolf Ingold 5 Origin of Difficulties  Pattern recognition is mainly an information overload problem  The difficulty is issued from  variability of objects belonging to the same class  distortion of captured data (noise, degradations,...)‏

© Prof. Rolf Ingold 6 Steps Involved in Pattern Recognition  Pattern recognition is basically a two stage process:  Feature extraction, aiming at removing redundancy while keeping significant information  Classification, consisting in making a decision by associating a class label observation feature vector  class

© Prof. Rolf Ingold 7 Role of Training Features classes decision training extraction Models  Classifiers (tools that perform classification tasks) are generally designed to be trained  Each class is characterized by a model  Models are built with representative training data

© Prof. Rolf Ingold 8 Supervised vs. Unsupervised Training  Two different situations may occur regarding training material:  Supervised training is performed when the training samples are labeled with the class they belong to  each class is associated with a set of training samples T i ={x i1, x i2,..., x iN i }, supposed to be statistically representative for the class  Unsupervised training is performed when the training samples are statistically representative but mixed over all classes T={x 1, x 2,..., x n },

© Prof. Rolf Ingold 9 Feature Selection  Features are selected accordingly to the application  Features should be chosen carefully by considering  discrimination power between classes  robustness to intra-class distortions and noise  global statistical independency (spread over the entire feature space)‏  "fast computation"  reasonable dimension (number of features)‏

© Prof. Rolf Ingold 10 Features for Character Recognition  Given a binary image of a character, a lot of features can be used for character recognition  Size, i.e., width and height of the bounding box  Position of baseline (if available)‏  Weight (number of black pixels)‏  Perimeter (length of the contours)‏  Center of gravity  Moments (second and third order in both directions)‏  Distributions of horizontal and vertical runs  Number of intersections with a (eventually random) set of lines  Length and structure (singular points, holes) of skeleton ...  Local features computed on sub-images  …

© Prof. Rolf Ingold 11 Font Recognition: Goal  Goal: recognize fonts of synthetically generated isolated words  as binary (black & white) or grey level images  at 300 dpi  12 standard font classes are considered  3 families:  Arial  Courier New  Times New Roman  4 styles:  Plain  Italic  Bold  Bold Italic  single size : 12 pt

© Prof. Rolf Ingold 12 Font Recognition: Extracted Features  Words are segmented with a surrounding white border of 1 pixel  Some preprocessing steps are used  Horizontal projection profile (hp)‏  Derivative of horizontal projection profile (hpd)‏  The following features are calculated  hp-mean (or density): mean of hp  hpd-stdev (or slanting): standard deviation of hpd  hr-mean: mean of horizontal runs (up to length 12)‏  hr-stdev: standard deviation of horizontal runs (up to length 12)‏  vr-mean: mean of vertical runs (up to length 12)‏  vr-stdev: standard vertical of horizontal runs (up to length 12)‏

© Prof. Rolf Ingold 13 Font Recognition: Illustration of Features  Basic image processing features used are  horizontal projection profile  distribution of horizontal runs (from 1 to 11)‏  distribution of vertical runs (from 1 to 11)‏

© Prof. Rolf Ingold 14 Font Recognition: decision boundaries on single feature (1)‏  Some single features are highly discriminant for some font sets  hpd-stdev is discriminating ■ roman and ■ italic fonts  hr-mean is discriminating ■ normal and ■ bold fonts

© Prof. Rolf Ingold 16 Font Recognition: decision boundaries on multiple features (1)‏  By combining two features, font discrimination is improved  (hpd-stdev, vr-stdev) discriminate ■ roman and ■ italic fonts hpd-stdev vr-stdev

© Prof. Rolf Ingold 17 Font Recognition: decision boundaries on multiple features (2)‏  font family discrimination (■ Arial, ■ Courier and ■ Times) becomes possible by combining several couple of features

© Prof. Rolf Ingold 18 Bayesian Decision Theory  Bayesian decision makes the assumption that all information contributing to the decision can be stated in form of probabilities  P(  i ) : the a priori probability (or prior) of each class  p(x|  i ) : the class conditional density function of the feature vector x, also called likelihood of the class  i with respect to x  The goal is to determine the class  i, for which the a posteriori probability (or posterior) P(  i |x) is the highest

© Prof. Rolf Ingold 19 Bayesian Rule  The Bayes rule allows to calculate the a posteriori probability of each class, as a function of priors and likelihoods where p(x) is called evidence and can be considered as a normalization factor, i.e.,

© Prof. Rolf Ingold 20 Influence of Posterior Probabilities P(  1 )=0.5, P(  2 )=0.5P(  1 )=0.1, P(  2 )=0.9  Example with a single feature: posterior probabilities in two different cases regarding a priori probabilities 22 11 22 11

© Prof. Rolf Ingold 23 Decision Theory  In the simplest case a decision consist in assigning to an observation x a class label  i =  x   A natural extension consists in adding a “rejection class”  R so that  x  R   In the most general case, the decision results in an action  i =  x 

© Prof. Rolf Ingold 24 Optimal Decision Theory  Let us consider a loss function  i  j  defining the loss incurred by taking action  i when the true state of nature is  j ; usually  The risk of taking an action  i for a particular sample x is  The optimal decision consists in choosing  i that minimizes the risk

© Prof. Rolf Ingold 25 Optimal decision  When  i  i  = 0 and  i  j  = 1  j ≠ i, the optimal decision consists of minimizing the probability of error  The minimal error is obtained by the decision  (x)=  i with or equivalently  In the case when all a priori probabilities are equivalent

© Prof. Rolf Ingold 26 Minimum Risk for Two Classes  Let ij  i  j  be the loss of action  i when the true state is  j  The conditional risks of each decision is expressed as  Then, the optimal decision rule becomes or equivalently  And in the case of 11  22

© Prof. Rolf Ingold 27 Discriminant Functions  In the case of multiple classes a pattern classifier can be specified by a set of discriminant functions g i (x) such that the decision  i corresponds to  Thus, a Bayesian classifier is naturally represented by  The choice of discriminant functions is not unique  g i (x) can be replaced by f (g i (x)) for any monotonic increasing function f(x)  A minimum error-rate classifier can be obtained with

© Prof. Rolf Ingold 29 Conclusion about Bayesian Decision  Bayesian decision theory provides a theoretical framework for statistical pattern recognition  This theory supposes the following probabilistic information to be known:  the number of classes  a priori probabilities of each class  class dependent feature distributions for each class  The remaining problem is: how to estimate all these things  feature distributions are hard to be estimated  priors are seldom known  even the number of classes is not always given

© Prof. Rolf Ingold 30 Performance Evaluation  Performance evaluation is a very important issue of PR  it gives an objective measure of the performance  it allows to compare different methods  Performance evaluation requires correctly labeled test data  test data should be different from training data  a strategy consists in cyclically using 80% of the data for training, and the remaining 20% for evaluation

© Prof. Rolf Ingold 31 Performance Measures: Recognition / Error Rates  Performance evaluation uses several measures  recognition rate corresponds to the ratio number of correct answers / number of total answers  error rate corresponds to the ratio number of incorrect answers / number of total answers  rejection rate corresponds to the ratio number of rejections / number of total answers recognition rate = 1 – (rejection rate + error rate)‏

© Prof. Rolf Ingold 32 Performance Measures: Recall & Precision  On binary decisions (a sample belongs to the class or not) two other measurements are frequently used  recall corresponds to the ratio of correctly assigned samples to the size of the class  precision corresponds to the ratio of correctly assigned samples to the number of assigned samples  Recall and precision are changing in opposite directions  equal error rate is sometimes considered to be the best trade- off  Additionally, the harmonic mean of precision and recall, called F-measure is frequently used

Prénom Nom Document Analysis: Fundamentals of pattern recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Similar presentations

Presentation on theme: "Prénom Nom Document Analysis: Fundamentals of pattern recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Prénom Nom Document Analysis: Fundamentals of pattern recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Similar presentations

Presentation on theme: "Prénom Nom Document Analysis: Fundamentals of pattern recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008."— Presentation transcript:

Similar presentations

About project

Feedback