Download presentation
Presentation is loading. Please wait.
Published byBaldwin Britton Dixon Modified over 8 years ago
1
OCR a survey Csink László 2009
2
2 Problems to Solve Recognize good quality printed text Recognize good quality printed text Recognize neatly written handprinted text Recognize neatly written handprinted text Recognize omnifont machine-printed text Recognize omnifont machine-printed text Deal with degarded, bad quality documents Deal with degarded, bad quality documents Recognize unconstrained handwritten text Recognize unconstrained handwritten text Lower substitution error rates Lower substitution error rates Lower rejection rates Lower rejection rates
3
3 OCR accprding to Nature of Input
4
4 Feature Extraction Large number of feature extraction methods are available in the literature for OCR Large number of feature extraction methods are available in the literature for OCR Which method suits which application? Which method suits which application?
5
5 A Typical OCR System 1. Gray-level scanning (300-600 dpi) 2. Preprocessing –Binarization using a global or locally adaptive method –Segmentation to isolate individual characters –(optional) conversion to another character representation (e.g. skeleton or contour curve) 3. Feature extraction 4. Recognition using classifiers 5. Contextual verification or post-processing
6
6 Feature Extraction (Devivjer and Kittler) Feature Extraction = the problem of extracting from the raw data the information which is most relevant for classification purposes, in the sense of minimizing the within-class variability while enhancing the between- class pattern variability Feature Extraction = the problem of extracting from the raw data the information which is most relevant for classification purposes, in the sense of minimizing the within-class variability while enhancing the between- class pattern variability Extracted features must be invariant to the expected distortions and variations Extracted features must be invariant to the expected distortions and variations Curse of dimensionality= if the training set is small, the number of features cannot be high either Curse of dimensionality= if the training set is small, the number of features cannot be high either Rule of thumb: number of training patterns = 10×(dim of feature vector) Rule of thumb: number of training patterns = 10×(dim of feature vector)
7
7 Some issues Do the characters have known orientation and size? Do the characters have known orientation and size? Are they handwritten, machine-printed or typed? Are they handwritten, machine-printed or typed? Degree of degradation? Degree of degradation? If a character may be written in two ways (e.g. ‘a’ or ‘α’), it might be represented by two patterns If a character may be written in two ways (e.g. ‘a’ or ‘α’), it might be represented by two patterns
8
8 Variations of the same character Size invariance can be achieved by normalization, but norming can cause discontinuities in the character Rotation invariance is important if chaarcters may appear in any orientation (P or d ?) Skew invariance is important for hand-printed text or multifont machine-printed text
9
9 Features Extracted from Grayscale Images Goal: locate candidate characters. If the image is binarized, one may find the connected components of expected character size by a flood fill type algorithm (4-way recursive method, 8-way recursive method, non- recursive scanline method etc., check http://www.codeproject.com/KB/GDI/QuickFill.aspx http://www.codeproject.com/KB/GDI/QuickFill.aspx Then the bounding box is found. A grayscale method is typically used when recognition based on the binary representation fails. Then the localization may be difficult. Assuming that there is a standard size for a character, one may simply try all possible locations. In a good case, after localization one has a subimage containing one character and no other objects.
10
10 Template Matching (not often used in OCR systems for grayscale characters) No feature extraction is used, the template character image itself is compared to the input character image: where the character Z and the template T j are of the same size and summation is taken over all the M pixels of Z. The problem is to find j for which D j is minimal; then Z is identified with T j. No feature extraction is used, the template character image itself is compared to the input character image: where the character Z and the template T j are of the same size and summation is taken over all the M pixels of Z. The problem is to find j for which D j is minimal; then Z is identified with T j.
11
11 Limitations of Template Matching Characters and templates must be of the same size Characters and templates must be of the same size The method is not invariant to changes in illumination The method is not invariant to changes in illumination Very vulnerable to noise Very vulnerable to noise In template matching, all pixels are used as templates. It is a better idea to use unitary (dfistance-preserving) transforms to character images, obtaining a reduction of features while preserving most of the informations of the character shape.
12
12 The Radon Transform The Radon transform computes projections of an image matrix along specified directions. A projection of a two-dimensional function f(x,y) is a set of line integrals. The Radon function computes the line integrals from multiple sources along parallel paths, or beams, in a certain direction. The beams are spaced 1 pixel unit apart. To represent an image, the radon function takes multiple, parallel-beam projections of the image from different angles by rotating the source around the center of the image. The following figure shows a single projection at a specified rotation angle.
13
13 Projections to Various Axes
14
14
15
15 Zoning Consider a candidate area (connected set) surrounded by a bounded box. Divide it to 5×5 equal parts and compute the average gray level in each part, yielding a 25-length feature vector.
16
16 Thinning Thinning is possible both for grayscale and for binary images Thinning is possible both for grayscale and for binary images Thinning= skeletonization of characters Thinning= skeletonization of characters Advantage: few features, easy to extract Advantage: few features, easy to extract The informal definition of a skeleton is a line representation of an object that is: i) one-pixel thick, ii) through the "middle" of the object, and, iii) preserves the topology of the object.
17
17 When No Skeleton Exists a)Impossible to egnerate a one-pixel width skeleton to be in the middle b)No pixel can be left out while preserving the connectedness
18
18 Possible Defects Specific defects of data may cause misrecognition Small holes loops in skeleton Single element irregularities false tails Acute angles false tails
19
19 How Thinning Works Most thinning algorithms rely on the erosion of the boundary while maintaining connectivity,see http://www.ph.tn.tudelft.nl/Courses/FIP/noframes/fip- Morpholo.html for mathematical morphology Most thinning algorithms rely on the erosion of the boundary while maintaining connectivity,see http://www.ph.tn.tudelft.nl/Courses/FIP/noframes/fip- Morpholo.html for mathematical morphologyhttp://www.ph.tn.tudelft.nl/Courses/FIP/noframes/fip- Morpholo.htmlhttp://www.ph.tn.tudelft.nl/Courses/FIP/noframes/fip- Morpholo.html To avoid defects, preprocessing is desirable To avoid defects, preprocessing is desirable As an example, in a black and white application As an example, in a black and white application –They remove very small holes –They remove black elements having less than 3 black neighbours and having connectivity 1
20
20 An Example of Noise Removal This pixel will be removed (N=1; has 1 black neighbour)
21
21 Generation of Feature Vectors Using Invariant Moments Given a grayscale subimage Z containing a character candidate, the moments of order p+q are defined by Given a grayscale subimage Z containing a character candidate, the moments of order p+q are defined by where the sum is taken over all M pixels of the subimage. The translation-invariant central moments of order p+q are obtained by shifting the origin to the center of gravity: where
22
22 Hu’s (1962) Central Moments η pq –s are scale invariant to scale M i –s are rotation invariant
23
23 K-Nearest Neighbor Classification Example of k-NN classification. The test sample (green circle) should be classified either to the first class of blue squares or to the second class of red triangles. If k = 3 it is classified to the second class because there are 2 triangles and only 1 square inside the inner circle. If k = 5 it is classified to first class (3 squares vs. 2 triangles inside the outer circle). Disadvantage in practice: the distance of the green circle to all blue squares and to all red triangle shave to be computed, this may take much time
24
24 From now on we will deal with binary (black and white) images only From now on we will deal with binary (black and white) images only
25
25 Projection Histograms These methods are typically used for These methods are typically used for –segmenting characters, words and text lines –detecting if a scanned text page is rotated But they can also provide features for recognition, too! Using the same number of bins on each axis – and dividing by the total number of pixels - the features can be made scale independent Projection to the y-axis is slant invariant, but projection to the x-axis is not Histograms are very sensitive to rotation
26
26 Comparision of Histograms It seems plausible to compare two histograms y 1 and y 2 (where n is the number of bins) in the following way: However, the dissimilarity using cumulative histograms is less sensitive to errors. Define the cumulative histogram Y as follows: For the cumulative histograms Y 1 and Y 2 define D as:
27
27 Zoning for Binary Characters 1 Contour extraction or thinning may be unusable for self-touching characters. This kind of error often occurs to degraded machine-printed texts (generations of photocopying ) The self-touching problem may be healed by morphological opening.
28
28 Similarly to the grayscale case, we consider a candidate area (connected set) surrounded by a bounded box. Divide it to 5×5 equal parts and compute the number of black pixels in each part, yielding a 25-length feature vector. Zoning for Binary Characters 2
29
29 Generation of Moments in the Binary Case Given a binary subimage Z containing a character candidate, the moments of order p+q are defined by where the sum is taken over all black pixels of the subimage The translation-invariant central moments of order p+q are obtained by shifting the origin to the center of gravity: where
30
30 The Central Moments can be used similarly to the grayscale case η pq –s are scale invariant to scale M i –s are rotation invariant
31
31 Contour Profiles The profiles may be outer profiles or inner profiles. To construct profiles, find the uppermost and lowermost pixels on the contour. The contour is split at these points. To obtain the outer profiles, for each y select the outermost x on each contour half. Profiles to the other axis can be constructed similarly.
32
32 Features Generated by Contour Profiles First differences of profiles: X’ L =X L (y+1)-x L (y) First differences of profiles: X’ L =X L (y+1)-x L (y) Width: w(y)=x R (y)-x L (y) Height/max y (w(y)) Location of minima and maxima of the profiles Location of peaksin the first differences (which may indicate discontinuities)
33
33 Zoning on Contour Curves 1 (Kimura & Sridhar) Enlarged zone A feature vector of size (4× 4) × 4 isgenerated
34
34 Zoning on Contour Curves 2 (Takahashi) Contour codes were extracted from inner contours (if any) as well as outer contours, the feature vector had dimension (4 ×6 ×6 ×6) ×4 ×(2) (size ×four directions × (inner and outer))
35
35 Zoning on Contour Curves 3 (Cao) When the contour curve is close to a zone border, small variations in the curve may lead to large variations in the feature vector Solution: Fuzzy border
36
36 Zoning of Skeletons Features: length of the character graph in each zone (9 or 3). By dividing the length with the total length of the graph, size independence can be achieved. Additional features: the presence or absence of junctions or endpoints
37
37 The Neural Network Approach for Digit Recognition Le Cun et al: Each character is scaled to a 16×16 grid Three intermediate hidden layers Training on a large set Advantage: feature extraction is automatic Disadvantage: We do not know how it works The output set (here 0-19) is small
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.