Download presentation
Presentation is loading. Please wait.
Published byHamza Gorbet Modified over 9 years ago
1
Prénom Nom Document Analysis: Document Image Processing Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008
2
© Prof. Rolf Ingold 2 Outline Image acquisition Image enhancement Foreground / background separation Binarization Color clustering Skew detection and correction Skew estimation Deskewing Text normalization
3
© Prof. Rolf Ingold 3 Image acquisition Document images are acquired by drum scanners flatbed scanners high resolution digital cameras specialized book scanners or extracted from 3D scene images video sequences
4
© Prof. Rolf Ingold 4 Image quality Various types of document images binary images (fax) gray level images (256 levels) RGB images (24 bits, or more) at different resolutions 200 dpi (low fax quality) 300 - 400 dpi (standard resolution for office automation) 8 -15 Mpixels for A4 format 600 dpi or higher for special applications Images may be degraded distorted, non planar noisy, with artifacts (JPEG)
5
© Prof. Rolf Ingold 5 Document image examples 200 dpi images400 dpi images
6
© Prof. Rolf Ingold 6 Overview of document image processing Image preprocessing is an initial step of document analysis it aims at preparing the image for further processing The most important initial steps are Image enhancement Binarization, i.e., foreground / background separation Skew correction More specialized techniques are used locally Text size normalization Slant correction ...
7
© Prof. Rolf Ingold 7 Image enhancement Classical image filtering algorithms are applied To reduce or remove color information To enhance the contrast between foreground and background To correct irregular illumination To strengthen contours To smooth contours To remove salt and pepper noise To thin or thicken strokes … Image enhancement is often combined with segmentation or shape analysis
8
© Prof. Rolf Ingold 8 Foreground / background separation Document image analysis requires the separation between foreground (ink) and background (paper) Foreground / background is trivial for simple document classes Binarization determined by appropriate threshold Problems arise in following situations Non uniform background (mixing colors and “reverse video”) Textured backgrounds Halftoning artifacts Non uniformly illuminated documents Degraded documents (bad inking, old paper, with holes, …) Paper Transparency, ink traversing
9
© Prof. Rolf Ingold 9 Binarization in presence of dithering In case of dithering a low pass filter should first be used to smooth the background
10
© Prof. Rolf Ingold 10 Niblack’s method Niblack’s method is using a local threshold where x,y and x,y represent respectively the mean and standard deviation of gray levels in a N x N neighborhood around pixel x,y k is a constant between 0 and 1 (suggested value 0.2) R is the range of gray levels
11
© Prof. Rolf Ingold 11 Sauvola's method Sauvola at al. has proposed a variant which assumes that text is dark in bright background where R =128, k =0.5 Problems remain when the hypothesis is not true (even after reversing)
12
© Prof. Rolf Ingold 12 Binarization in case of colored background Binarisation by global thresholding and Sauvola's method
13
© Prof. Rolf Ingold 13 Comparison of binarization techniques Original image Fisher Fisher (wind.) Yanowitz B. Niblack Sauvola et al. INSA, Lyon from F. Lebourgeois, INSA, Lyon
14
© Prof. Rolf Ingold 14 Color clustering For rich colored documents Check, forms, … Geographic maps Historical documents Advertising foreground background separation is performed by color clustering Color clustering may be achieved automatically k-means Gaussian mixtures …
15
© Prof. Rolf Ingold 15 Skew detection and correction Most document image recognition algorithms need perfectly, horizontally and vertically aligned text Very often, acquisition systems are not accurate enough Skew correction requires two steps Skew estimation (with a precision < 1 degree) Image deskewing (rotation with a small angle) For book reading systems, due to page curvatures, more sophisticated image correction algorithms are required
16
© Prof. Rolf Ingold 16 Skew estimation Many different methods have been proposed for skew estimation for printed documents Margin detection by white stream analysis by projection profile analysis Hough transforms at pixel level of centers of connected components Linear regressions of centers of connected components Most methods can be applied on down-sampled images Skew detection for handwriting is more difficult, but less useful
17
© Prof. Rolf Ingold 17 Projection profiles Projection profiles are simple histograms accumulating pixels along a line or a column
18
© Prof. Rolf Ingold 18 Hough Transform The Hough transform is a global transformation mapping the spatial space (x,y) to a parametric space ( , ) each pixel is accumulated on a beam of lines defined in polar coordinates, i.e
19
© Prof. Rolf Ingold 19 Skew estimation by Hough transform The Hough transform allows to estimate the skew angle
20
© Prof. Rolf Ingold 20 Deskewing of document image Deskewing requires an image rotation rotation of color or gray level images needs re-sampling rotation of binary images has several pitfalls they introduce distortions and noise they are not reversible (except for Pythagoras angles) Deskewing can also be approximated by combining two affine transforms
21
© Prof. Rolf Ingold 21 Rotation of binary images Pixel based rotations of binary images introduce distortions this artifact can be avoided by connected component replacement
22
© Prof. Rolf Ingold 22 Rotation of binary images (2) Better results are obtained by rotating the original gray level image (before binarization)
23
© Prof. Rolf Ingold 23 Normalization of character size For text recognition normalization of character sizes is often required Size normalization can be achieved By bounding boxes of isolated characters By base line, ascenders and descenders
24
© Prof. Rolf Ingold 24 Normalization techniques for handwriting In case of handwriting additional normalization may be applied size normalization for ascenders and descenders slant correction Slant estimation is performed by averaging the direction of the median of straight vertical segments
25
© Prof. Rolf Ingold 25 Run Length Smearing Algorithm (RLSA) The Run Length Smearing Algorithm (RLSA) consists in replacing white runs by black runs, if their length is smaller than a given threshold it can be applied horizontally or vertically RLSA is often usefull for segmentation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.