1 Détection des textes dans les images issues d ’un flux vidéo pour l´indexation sémantique Laboratoire d'Informatique en Images et Systèmes d'information.

1 Détection des textes dans les images issues d ’un flux vidéo pour l´indexation sémantique Laboratoire d'Informatique en Images et Systèmes d'information LIRIS, FRE 2672 CNRS Bât. Jules Verne, INSA de Lyon 69621 Villeurbanne cedex 5 décembre 2003 http://rfv.insa-lyon.fr/~wolf Christian Wolf Directeur de thèse: Jean-Michel Jolion

2 The framework of the thesis 2 Industrial contracts with France Télécom: ECAV I, ECAV II “Enrichissement du Contenu Audio- Visuel” Collaboration with the Language and Media Processing Laboratory, University of Maryland. 2 research internships: 2001: character segmentation 2002: video indexing (TREC)

3 Indexing using Text keyword-based Search Patrick Mayhew Min. chargé de l´irlande de Nord ISRAEL Jerusalem montage T.Nouel... Result Key word Indexing phase

4 Still imagesIntroductionVideosConclusionCharacter segmentationResults Introduction Detection in still images Detection in video sequences Character segmentation Conclusion Experimental Results Plan

5 Videos vs. scanned documents Still imagesIntroductionVideosConclusionCharacter segmentationResults Temporal aspects Complex and moving background Artificial shadows Low resolution

6 What is text? - character segmentation Still imagesIntroductionVideosConclusionCharacter segmentationResults Artificial text Scene text

7 What is text? - texture Example: Gabor energy features on a text image Still imagesIntroductionVideosConclusionCharacter segmentationResults Original imageFilter tuned to the example text Gabor energyThresholded Gabor energy

8 What is text? - contrast & geometry Example image Accumulated horizontal Sobel edges Still imagesIntroductionVideosConclusionCharacter segmentationResults

9 A text detection system for videos Text occurrences Detection per single frame Initial frame integration (averaging) OCR “Soukaina Oufkir” TrackingImage Enhancement - Multiple frame integration Binarization Suppression of false alarms Still imagesIntroductionVideosConclusionCharacter segmentationResults

10 Introduction Detection in still images Detection in video sequences Character segmentation Conclusion Experimental Results Plan Still imagesIntroductionVideosConclusionCharacter segmentationResults

11 2 Algorithms for still images Calculate a text probability image according to a text model (1 value/ pixel) Calculate a text feature image (N values/pixel) Separate the probability values into 2 classes. Classify each pixel in the feature image Find the optimal threshold Post processing Still imagesIntroductionVideosConclusionCharacter segmentationResults

12 The local contrast method Calculate a text probability image according to a text model (1 value/ pixel) Separate the probability values into 2 classes. Post processing Fisher/Otsu Mathematical morphology Geometrical constraints Verification of special cases Combination of rectangles F. LeBourgeois Still imagesIntroductionVideosConclusionCharacter segmentationResults

13 Properties of the local contrast method +High detection accuracy (accurate localization). +Not very sensitive to the type of text. +Low computational complexity (very fast!). –False alarms due to the assumption of text presence. Geometrical constraints are imposed in the post-processing step. Still imagesIntroductionVideosConclusionCharacter segmentationResults

14 Method 2: why learning? +Hope to increase the precision (decrease the number of false alarms) of the detection algorithm by learning the characteristics of text. +More complex text models are very difficult to derive analytically. +The discovery of support vector machine (SVM) learning and its ability to generalize even in high dimensional spaces opened the door to complex decision functions and feature models. Inconvenience: –Specialization to a specific type of text (generalization)? Text exists in wide varies of forms, fonts, sizes, orientations and deformations (especially scene text). Still imagesIntroductionVideosConclusionCharacter segmentationResults

15 Geometrical features Learning gray values and edge maps alone may not generalize enough. Texture alone is not reliable, especially if the text is short. Geometry is a valuable feature. State of the art: enforce geometrical constraints in the post-processing step (mathematical morphology) We propose the usage of geometrical features very early in the detection process, i.e. not during post-processing. Still imagesIntroductionVideosConclusionCharacter segmentationResults

16 Geometrical features: baseline Text consists of: A high density of strokes in direction of the text baseline. A consistent baseline (a rectangular region with an upper and lower border). Two detection philosophies: Detection of the baseline directly before detecting the text region. Detection of the baseline as the boundary area of the detected text region in order to refine the detection quality. Still imagesIntroductionVideosConclusionCharacter segmentationResults

17 Estimation of the text rectangle height Original image Accumulated gradients Still imagesIntroductionVideosConclusionCharacter segmentationResults

18 Mode width (=rectangle height)Mode height (=Contrast)Difference height left-right Mode meanMode standard deviationDifference in mode width Still imagesIntroductionVideosConclusionCharacter segmentationResults Features

19 Learning with Support Vector Machines Training image database positive samplesnegative samples Classification step: a reduction of the computational complexity is necessary: Sub-sampling of the pixels to classify (4x4) Approximation of the SVM model by SVM-regression. Bootstrapping, cross-validation Still imagesIntroductionVideosConclusionCharacter segmentationResults

20 Introduction Detection in still images Character segmentation Conclusion Experimental Results Plan Still imagesIntroductionVideosConclusionCharacter segmentationResults Detection in video sequences

21 Text occurrences Frame nr. (time) Tracking the text appearances List of rectangles detected for the current frame The integration is done using greedy search in the overlap matrix. List containing the most recent rectangle of each text occurrence Still imagesIntroductionVideosConclusionCharacter segmentationResults

22 Tracking: content verification Verification of the text box contents: L 2 comparison of a signature vector (vertical projection profile of the Sobel edges). Frequently text occurrences appear at the same location without significant temporal pause between them Same text Different textFading text Still imagesIntroductionVideosConclusionCharacter segmentationResults

23 Enhancement Still imagesIntroductionVideosConclusionCharacter segmentationResults Multiple frame integration: Averaging Bi-linear interpolation Bi-cubic splines Super-resolution (interpolation) Detected text occurence

24 Introduction Detection in still images Conclusion Experimental Results Plan Still imagesIntroductionVideosConclusionCharacter segmentationResults Character segmentation Detection in video sequences

25 Adaptive binarization Niblack’s adaptive method: Sauvola’s improvement: Still imagesIntroductionVideosConclusionCharacter segmentationResults

26 Our solution: contrast maximization Contrast at the center of the image The maximum local contrast The contrast of the window We keep the following pixels: Threshold: Still imagesIntroductionVideosConclusionCharacter segmentationResults

27 Character segmentation: examples Original image Fisher/Otsu Fisher/Otsu (windowed) Yanowitz-B. Yanowitz-B. +post-proc. Niblack Sauvola et al. Contrast maximiz. Still imagesIntroductionVideosConclusionCharacter segmentationResults

28 Modeling text with a Markov random field Binarization as a Bayesian maximum a posteriori estimation problem using a Markov random field model. Prior models the prior knowledge on the spatial relationships in the image as a MRF. Likelihood of the observation depends on the observation and noise model. In our case: Gaussian Noise corrected by Niblack’s threshold surface. Collaboration with Laboratory for language and Media Processing, University of Maryland (David Doermann) Still imagesIntroductionVideosConclusionCharacter segmentationResults

29 The prior knowledge The clique labelings of the repaired pixel before and after flipping it. All 16 cliques favor the change of the pixel. The clique energies (4x4) are learned and interpolated from training data. Optimization of the energy function with simulated annealing. Still imagesIntroductionVideosConclusionCharacter segmentationResults

30 Introduction Detection in still images Conclusion Plan Detection in video sequences Still imagesIntroductionVideosConclusionCharacter segmentationResults Experimental Results Character segmentation

31 Evaluation measures ICDAR: 1-1 matches overlap information only CRISP: 1-1, 1-M, M-1 matches thresholded matches no overlap information AREA: 1-1, 1-M, M-1 matches thresholded matches overlap information Still imagesIntroductionVideosConclusionCharacter segmentationResults DetectionGround truth

32 Still imagesIntroductionVideosConclusionCharacter segmentationResults AIM3 News AIM4 Cartoons, News AIM5 News AIM2 Commercials

33 Detection in still images Local contrast SVM learning Still imagesIntroductionVideosConclusionCharacter segmentationResults

34 Still imagesIntroductionVideosConclusionCharacter segmentationResults Local contrast SVM learning

35 Still imagesIntroductionVideosConclusionCharacter segmentationResults Local contrast SVM learning

36 The influence of falling generality Local contrastSVM learning Still imagesIntroductionVideosConclusionCharacter segmentationResults

37 Detection in video sequences Still imagesIntroductionVideosConclusionCharacter segmentationResults

38 OCR results Local contrast based binarization Recognition by Abby Finereader 5.0 Still imagesIntroductionVideosConclusionCharacter segmentationResults Sauvola et al. MRF Baysian estimation using a Markov random field prior

39 TREC 2002 “Dance” “Energy Gas” “Music” “Oil” The type of videos present in the collection does not favor the use of recognized text: text is only rarely present. “Airline” “Air plane” Still imagesIntroductionVideosConclusionCharacter segmentationResults

40 Conclusion êWe developed a new system for detection, tracking, enhancement and binarisation of text. êDetection performance is high due to the integration of several types of features in a very early stage. The learning method is less sensitive to textured noise in the image. êWe proposed a new evaluation method which takes into account several measures of detection quality. êWe derived a new binarisation method adapted to the type of text found in videos. ê2 patents 2 publications in international journals (+1 submitted) 3 publications in international conferences 6 publications in national conferences Still imagesIntroductionVideosConclusionCharacter segmentationResults

41 Outlook êPossible improvement of the features (e.g. contrast normalization, non-linear texture filters). êIntegration of different feature types (statistical, structural,...) êMulti orientation processing is not yet complete (new training set, implementation of the post processing) êAdaptation of the tracking algorithm to general types of motion. êOCR on low resolution grayscale images. êUsage of a priori knowledge on text in order to decrease the number of false alarms êIntegration of the detected text into a indexing/browsing/segmentation framework Still imagesIntroductionVideosConclusionCharacter segmentationResults

1 Détection des textes dans les images issues d ’un flux vidéo pour l´indexation sémantique Laboratoire d'Informatique en Images et Systèmes d'information.

Similar presentations

Presentation on theme: "1 Détection des textes dans les images issues d ’un flux vidéo pour l´indexation sémantique Laboratoire d'Informatique en Images et Systèmes d'information."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Détection des textes dans les images issues d ’un flux vidéo pour l´indexation sémantique Laboratoire d'Informatique en Images et Systèmes d'information.

Similar presentations

Presentation on theme: "1 Détection des textes dans les images issues d ’un flux vidéo pour l´indexation sémantique Laboratoire d'Informatique en Images et Systèmes d'information."— Presentation transcript:

Similar presentations

About project

Feedback