1/25 Detection and Extraction of Artificial Text for Semantic Indexing Laboratoire Reconnaissance de Formes et Vision Bât. Jules Verne, INSA de Lyon 69621.

1/25 Detection and Extraction of Artificial Text for Semantic Indexing Laboratoire Reconnaissance de Formes et Vision Bât. Jules Verne, INSA de Lyon 69621 Villeurbanne cedex, France January 9 th 2002 Dagstuhl Seminar on Content-Based Image and Video Retrieval Christian Wolf and Jean-Michel Jolion http://rfv.insa-lyon.fr/~wolf/presentations This presentation can be downloaded from:

2/25 Plan of the presentation êIntroduction êDetection and tracking êEnhancement and binarization of the text boxes êExperiments and results êOpen problems êConclusion and Outlook 6 3 4 2 9 1 25 Slides: This work resulted in a patent submitted by France Télécom on May 23th, 2001 under the reference FR 01 06776. Enh/BinarizationExp.ResultsOpen problemsConclusionIntroductionDetection

3/25 Content based image retrieval Similarity Function Result Example image Indexing phase Enh/BinarizationExp.ResultsOpen problemsConclusionDetectionIntroduction

4/25 Similarity measures similar Not similar Enh/BinarizationExp.ResultsOpen problemsConclusionDetectionIntroduction

5/25 Indexing using Text Keyword based Search Patrick Mayhew Min. chargé de l´irlande de Nord ISRAEL Jerusalem montage T.Nouel... Result Key word Indexing phase Enh/BinarizationExp.ResultsOpen problemsConclusionDetectionIntroduction

6/25 Video properties 80 px 12 px 8 px Enh/BinarizationExp.ResultsOpen problemsConclusionDetectionIntroduction

7/25 Text extraction: general scheme Tracking Detection of the text in single frames Image enhancement - Multiple frame integration Segmentation/ Binarisation OCR "EVENEMENT" "ACTU" "SPELEOS" "Gouffre Berger (Isére)" "aujourd'hui" "France 3 Alpes" "un spéléologue sauveteur" Video Enh/BinarizationExp.ResultsOpen problemsConclusionIntroductionDetection

8/25 Text detection by accumulation of horizontal gradients (LeBourgeois, 1997). Justification: Text forms a regular texture containing vertical edges which are aligned horizontally. Post processing by mathematical morphology. Enh/BinarizationExp.ResultsOpen problemsConclusionIntroductionDetection

9/25 Detection in video sequences Detection per single frame List of rectangles per frame Tracking - keeping track of text occurrences Suppression of false alarms Image Enhancement - Multiple frame integration Text occurrences Frame nr. (time) Enh/BinarizationExp.ResultsOpen problemsConclusionIntroductionDetection

10/25 Image enhancement Super-resolution (interpolation) Multiple frame integration: Averaging Integration of multiple frames to create a single image of higher quality. M1M1 M4M4 M2M2 M3M3 An additional weight is included into the interpolation scheme, which decreases the weights of temporal outlier pixels. Exp.ResultsOpen problemsConclusionIntroductionDetectionEnh/Binarization

11/25 Binarization Niblack: Sauvola et al.: mmean of the window sstandard deviation of the window kparameter Rdynamics of the gray values of the image Contrast in the center of the image The maximum local contrast The contrast of the window M minimum gray value of the image Exp.ResultsOpen problemsConclusionIntroductionDetectionEnh/Binarization

12/25 Binarization methods: examples Original image Fisher Fisher (windowed) Yanowitz B. Niblack Sauvola et al. Our method Exp.ResultsOpen problemsConclusionIntroductionDetectionEnh/Binarization

13/25 Binarization using a priori knowledge Bayesian MAP estimation using prior knowledge on the spatial relationships in the image, modeled as a Markov random field. Exp.ResultsOpen problemsConclusionIntroductionDetectionEnh/Binarization (In collaboration with David Doermann from the Language and Media Processing Laboratory of the University of Maryland)

14/25 5 different MPEG 1 videos of resolution 384x288. 62 minutes 93000 frames 413 text appearances Enh/BinarizationOpen problemsConclusionIntroductionDetectionExp.Results

15/25 Detection and OCR results Detection resultsOCR Results, classified by binarization method Enh/BinarizationOpen problemsConclusionIntroductionDetectionExp.Results True pos. False pos. True neg. False neg.

16/25 Open questions êScene text (general orientations, deformations) êMoving text Enh/BinarizationExp.ResultsConclusionIntroductionDetectionOpen problems

17/25 What is scene text? Video frames Frames containing scene text We do not have enough information about the importance of text in the destination domain. How many frames do contain text and scene text? Enh/BinarizationExp.ResultsConclusionIntroductionDetectionOpen problems Frames containing artificial text

18/25 Detection: From artificial text to scene text Several constraints have to be removed passing from artificial text to scene text: !The constraints on temporal stability need to be abandoned or at least softened (no initial frame integration) !Text can be aligned in all orientations (Creation of an oriented feature in multiple directions, similar to invariant features) !Contrast is possibly lower because scene text is not designed to be read easily (Is detection of unreadable text necessary?). Enh/BinarizationExp.ResultsConclusionIntroductionDetectionOpen problems

19/25 Text models Simple Models sets of edges or vertical strokes... Complex Models templates, probabilistic models (MRF)... +Generalize well, respond to many kinds of text -Many false alarms +Powerful  less false alarms -Do not generalize well  Assumptions are necessary (on the font, size, style, contrast, color, length, etc.) but not sufficient. Main problem: Distinction between characters and structures similar to text according to the chosen model. Enh/BinarizationExp.ResultsConclusionIntroductionDetectionOpen problems

20/25 Enh/BinarizationExp.ResultsConclusionIntroductionDetectionOpen problems Sven Dickinson: evolution of models

21/25 What is text? Whatever model we choose, we cannot detect/recognize all kinds of text without solving the general image understanding problem. The best thing we can do is to include richer features into the detection process: a composite model for text. êStructural analysis (e.g. detection and recognition of characters by strokes). Very hard and very unlikely to work in the case of noisy images, low resolutions and difficult fonts. êStatistical modeling of text features (e.g. by learning techniques). Problem: For a robust detection high neighborhood sizes are needed, which lead to combinatorial explosions. E.g.: Texture based methods for small text and segmentation + perceptual grouping, structural methods for big text. Enh/BinarizationExp.ResultsConclusionIntroductionDetectionOpen problems

22/25 Learning techniques: pro et contra Bibliography: êLearning directly the gray levels of the input image (Jung 2001) êLearning features, i.e. coefficients of the Haar wavelet (Li and Doermann 2000) or edge strength (Lienhart 2000) +Learning is an easy way to handle the complexity of text. -Text can appear in videos in many different fonts, sizes, styles, colors, orientations etc. Learning all different forms is maybe not feasible. Enh/BinarizationExp.ResultsConclusionIntroductionDetectionOpen problems

23/25 Color processing for detection? Original image Sobel on grayscale imageSobel on L*u*v* image êSaturating distance or non saturating distance? êReflection processing? Enh/BinarizationExp.ResultsConclusionIntroductionDetectionOpen problems

24/25 Tracking of moving scene text Do we detect the text in single frames (like artificial text), or do we treat the flow in its integrality? êSingle frames: Multiple frame integration of moving text needs robust registration of the text boxes in different frames (e.g. rough segmentation into text and background pixels before the registration of the text pixels only). Robust methods, which are able to track objects in clutter, are needed. êDetection of moving objects, e.g. by optical flow, spatio- temporal methods. êMosaicing techniques can be employed for image enhancement. Enh/BinarizationExp.ResultsConclusionIntroductionDetectionOpen problems

25/25 Conclusion and Outlook êWe developed a system for detection, tracking, enhancement and binarization of artificial text in videos. êThe total recognition rate for artificial text is surprisingly high, given the quality of the text, but not yet good enough for indexing purposes. êThe remaining problems in text extraction seem to be typical for applications in visual information management: We went as far as we could with low level features. We can’t do the necessary step to semantic information. What is text? Possible definition: text is, what (a human or an OCR) can recognize as text. êWe have to include as much a priori knowledge as possible into the process. Enh/BinarizationExp.ResultsOpen problemsConclusionIntroductionDetection

1/25 Detection and Extraction of Artificial Text for Semantic Indexing Laboratoire Reconnaissance de Formes et Vision Bât. Jules Verne, INSA de Lyon 69621.

Similar presentations

Presentation on theme: "1/25 Detection and Extraction of Artificial Text for Semantic Indexing Laboratoire Reconnaissance de Formes et Vision Bât. Jules Verne, INSA de Lyon 69621."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1/25 Detection and Extraction of Artificial Text for Semantic Indexing Laboratoire Reconnaissance de Formes et Vision Bât. Jules Verne, INSA de Lyon 69621.

Similar presentations

Presentation on theme: "1/25 Detection and Extraction of Artificial Text for Semantic Indexing Laboratoire Reconnaissance de Formes et Vision Bât. Jules Verne, INSA de Lyon 69621."— Presentation transcript:

Similar presentations

About project

Feedback