Download presentation
Presentation is loading. Please wait.
1
1 Détection des textes dans les images issues d ’un flux vidéo pour l´indexation sémantique Laboratoire d'Informatique en Images et Systèmes d'information LIRIS, FRE 2672 CNRS Bât. Jules Verne, INSA de Lyon 69621 Villeurbanne cedex 5 décembre 2003 http://rfv.insa-lyon.fr/~wolf Christian Wolf Directeur de thèse: Jean-Michel Jolion
2
2 The framework of the thesis 2 Industrial contracts with France Télécom: ECAV I, ECAV II “Enrichissement du Contenu Audio- Visuel” Collaboration with the Language and Media Processing Laboratory, University of Maryland. 2 research internships: 2001: character segmentation 2002: video indexing (TREC)
3
3 Indexing using Text keyword-based Search Patrick Mayhew Min. chargé de l´irlande de Nord ISRAEL Jerusalem montage T.Nouel... Result Key word Indexing phase
4
4 Still imagesIntroductionVideosConclusionCharacter segmentationResults Introduction Detection in still images Detection in video sequences Character segmentation Conclusion Experimental Results Plan
5
5 Videos vs. scanned documents Still imagesIntroductionVideosConclusionCharacter segmentationResults Temporal aspects Complex and moving background Artificial shadows Low resolution
6
6 What is text? - character segmentation Still imagesIntroductionVideosConclusionCharacter segmentationResults Artificial text Scene text
7
7 What is text? - texture Example: Gabor energy features on a text image Still imagesIntroductionVideosConclusionCharacter segmentationResults Original imageFilter tuned to the example text Gabor energyThresholded Gabor energy
8
8 What is text? - contrast & geometry Example image Accumulated horizontal Sobel edges Still imagesIntroductionVideosConclusionCharacter segmentationResults
9
9 A text detection system for videos Text occurrences Detection per single frame Initial frame integration (averaging) OCR “Soukaina Oufkir” TrackingImage Enhancement - Multiple frame integration Binarization Suppression of false alarms Still imagesIntroductionVideosConclusionCharacter segmentationResults
10
10 Introduction Detection in still images Detection in video sequences Character segmentation Conclusion Experimental Results Plan Still imagesIntroductionVideosConclusionCharacter segmentationResults
11
11 2 Algorithms for still images Calculate a text probability image according to a text model (1 value/ pixel) Calculate a text feature image (N values/pixel) Separate the probability values into 2 classes. Classify each pixel in the feature image Find the optimal threshold Post processing Still imagesIntroductionVideosConclusionCharacter segmentationResults
12
12 The local contrast method Calculate a text probability image according to a text model (1 value/ pixel) Separate the probability values into 2 classes. Post processing Fisher/Otsu Mathematical morphology Geometrical constraints Verification of special cases Combination of rectangles F. LeBourgeois Still imagesIntroductionVideosConclusionCharacter segmentationResults
13
13 Properties of the local contrast method +High detection accuracy (accurate localization). +Not very sensitive to the type of text. +Low computational complexity (very fast!). –False alarms due to the assumption of text presence. Geometrical constraints are imposed in the post-processing step. Still imagesIntroductionVideosConclusionCharacter segmentationResults
14
14 Method 2: why learning? +Hope to increase the precision (decrease the number of false alarms) of the detection algorithm by learning the characteristics of text. +More complex text models are very difficult to derive analytically. +The discovery of support vector machine (SVM) learning and its ability to generalize even in high dimensional spaces opened the door to complex decision functions and feature models. Inconvenience: –Specialization to a specific type of text (generalization)? Text exists in wide varies of forms, fonts, sizes, orientations and deformations (especially scene text). Still imagesIntroductionVideosConclusionCharacter segmentationResults
15
15 Geometrical features Learning gray values and edge maps alone may not generalize enough. Texture alone is not reliable, especially if the text is short. Geometry is a valuable feature. State of the art: enforce geometrical constraints in the post-processing step (mathematical morphology) We propose the usage of geometrical features very early in the detection process, i.e. not during post-processing. Still imagesIntroductionVideosConclusionCharacter segmentationResults
16
16 Geometrical features: baseline Text consists of: A high density of strokes in direction of the text baseline. A consistent baseline (a rectangular region with an upper and lower border). Two detection philosophies: Detection of the baseline directly before detecting the text region. Detection of the baseline as the boundary area of the detected text region in order to refine the detection quality. Still imagesIntroductionVideosConclusionCharacter segmentationResults
17
17 Estimation of the text rectangle height Original image Accumulated gradients Still imagesIntroductionVideosConclusionCharacter segmentationResults
18
18 Mode width (=rectangle height)Mode height (=Contrast)Difference height left-right Mode meanMode standard deviationDifference in mode width Still imagesIntroductionVideosConclusionCharacter segmentationResults Features
19
19 Learning with Support Vector Machines Training image database positive samplesnegative samples Classification step: a reduction of the computational complexity is necessary: Sub-sampling of the pixels to classify (4x4) Approximation of the SVM model by SVM-regression. Bootstrapping, cross-validation Still imagesIntroductionVideosConclusionCharacter segmentationResults
20
20 Introduction Detection in still images Character segmentation Conclusion Experimental Results Plan Still imagesIntroductionVideosConclusionCharacter segmentationResults Detection in video sequences
21
21 Text occurrences Frame nr. (time) Tracking the text appearances List of rectangles detected for the current frame The integration is done using greedy search in the overlap matrix. List containing the most recent rectangle of each text occurrence Still imagesIntroductionVideosConclusionCharacter segmentationResults
22
22 Tracking: content verification Verification of the text box contents: L 2 comparison of a signature vector (vertical projection profile of the Sobel edges). Frequently text occurrences appear at the same location without significant temporal pause between them Same text Different textFading text Still imagesIntroductionVideosConclusionCharacter segmentationResults
23
23 Enhancement Still imagesIntroductionVideosConclusionCharacter segmentationResults Multiple frame integration: Averaging Bi-linear interpolation Bi-cubic splines Super-resolution (interpolation) Detected text occurence
24
24 Introduction Detection in still images Conclusion Experimental Results Plan Still imagesIntroductionVideosConclusionCharacter segmentationResults Character segmentation Detection in video sequences
25
25 Adaptive binarization Niblack’s adaptive method: Sauvola’s improvement: Still imagesIntroductionVideosConclusionCharacter segmentationResults
26
26 Our solution: contrast maximization Contrast at the center of the image The maximum local contrast The contrast of the window We keep the following pixels: Threshold: Still imagesIntroductionVideosConclusionCharacter segmentationResults
27
27 Character segmentation: examples Original image Fisher/Otsu Fisher/Otsu (windowed) Yanowitz-B. Yanowitz-B. +post-proc. Niblack Sauvola et al. Contrast maximiz. Still imagesIntroductionVideosConclusionCharacter segmentationResults
28
28 Modeling text with a Markov random field Binarization as a Bayesian maximum a posteriori estimation problem using a Markov random field model. Prior models the prior knowledge on the spatial relationships in the image as a MRF. Likelihood of the observation depends on the observation and noise model. In our case: Gaussian Noise corrected by Niblack’s threshold surface. Collaboration with Laboratory for language and Media Processing, University of Maryland (David Doermann) Still imagesIntroductionVideosConclusionCharacter segmentationResults
29
29 The prior knowledge The clique labelings of the repaired pixel before and after flipping it. All 16 cliques favor the change of the pixel. The clique energies (4x4) are learned and interpolated from training data. Optimization of the energy function with simulated annealing. Still imagesIntroductionVideosConclusionCharacter segmentationResults
30
30 Introduction Detection in still images Conclusion Plan Detection in video sequences Still imagesIntroductionVideosConclusionCharacter segmentationResults Experimental Results Character segmentation
31
31 Evaluation measures ICDAR: 1-1 matches overlap information only CRISP: 1-1, 1-M, M-1 matches thresholded matches no overlap information AREA: 1-1, 1-M, M-1 matches thresholded matches overlap information Still imagesIntroductionVideosConclusionCharacter segmentationResults DetectionGround truth
32
32 Still imagesIntroductionVideosConclusionCharacter segmentationResults AIM3 News AIM4 Cartoons, News AIM5 News AIM2 Commercials
33
33 Detection in still images Local contrast SVM learning Still imagesIntroductionVideosConclusionCharacter segmentationResults
34
34 Still imagesIntroductionVideosConclusionCharacter segmentationResults Local contrast SVM learning
35
35 Still imagesIntroductionVideosConclusionCharacter segmentationResults Local contrast SVM learning
36
36 The influence of falling generality Local contrastSVM learning Still imagesIntroductionVideosConclusionCharacter segmentationResults
37
37 Detection in video sequences Still imagesIntroductionVideosConclusionCharacter segmentationResults
38
38 OCR results Local contrast based binarization Recognition by Abby Finereader 5.0 Still imagesIntroductionVideosConclusionCharacter segmentationResults Sauvola et al. MRF Baysian estimation using a Markov random field prior
39
39 TREC 2002 “Dance” “Energy Gas” “Music” “Oil” The type of videos present in the collection does not favor the use of recognized text: text is only rarely present. “Airline” “Air plane” Still imagesIntroductionVideosConclusionCharacter segmentationResults
40
40 Conclusion êWe developed a new system for detection, tracking, enhancement and binarisation of text. êDetection performance is high due to the integration of several types of features in a very early stage. The learning method is less sensitive to textured noise in the image. êWe proposed a new evaluation method which takes into account several measures of detection quality. êWe derived a new binarisation method adapted to the type of text found in videos. ê2 patents 2 publications in international journals (+1 submitted) 3 publications in international conferences 6 publications in national conferences Still imagesIntroductionVideosConclusionCharacter segmentationResults
41
41 Outlook êPossible improvement of the features (e.g. contrast normalization, non-linear texture filters). êIntegration of different feature types (statistical, structural,...) êMulti orientation processing is not yet complete (new training set, implementation of the post processing) êAdaptation of the tracking algorithm to general types of motion. êOCR on low resolution grayscale images. êUsage of a priori knowledge on text in order to decrease the number of false alarms êIntegration of the detected text into a indexing/browsing/segmentation framework Still imagesIntroductionVideosConclusionCharacter segmentationResults
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.