Presentation is loading. Please wait.

Presentation is loading. Please wait.

Detection and Extraction of Artificial Text from Videos PROJECT France Télécom Research & Development 001B575 Laboratoire de Reconnaissance de Formes et.

Similar presentations


Presentation on theme: "Detection and Extraction of Artificial Text from Videos PROJECT France Télécom Research & Development 001B575 Laboratoire de Reconnaissance de Formes et."— Presentation transcript:

1 Detection and Extraction of Artificial Text from Videos PROJECT France Télécom Research & Development 001B575 Laboratoire de Reconnaissance de Formes et Vision Bât. Jules Verne INSA 69621 Villeurbanne CEDEX 10 th July 2001 Christian Wolf and Jean-Michel Jolion http://rfv.insa-lyon.fr/~{wolf,jolion}

2 Plan of the presentation êIntroduction êDetection êImage enhancement - multiple frame integration êBinarisation of the text boxes êSetup of the experiments êResults ÔDetection ÔBinarisation ÔOCR êConclusion and outlook 6 8 3 10 11 6 2 46 Slides: IntroDetectionEnhancementBinarisationResultsExperiments

3 Content based image retrieval Similarity Function Result Example image Indexing phase DetectionEnhancementBinarisationResultsExperimentsIntro

4 Similarity measures similar Not similar DetectionEnhancementBinarisationResultsExperimentsIntro

5 Indexing using Text Keyword based Search Patrick Mayhew Min. chargé de l´irlande de Nord ISRAEL Jerusalem montage T.Nouel... Result Key word Indexing phase DetectionEnhancementBinarisationResultsExperimentsIntro

6 Video properties 80 px 12 px 8 px DetectionEnhancementBinarisationResultsExperimentsIntro

7 Text extraction: general scheme Tracking Detection of the text in single frames Image enhancement - Multiple frame integration Segmentation/ Binarisation OCR "EVENEMENT" "ACTU" "SPELEOS" "Gouffre Berger (Isére)" "aujourd'hui" "France 3 Alpes" "un spéléologue sauveteur" Video IntroDetectionEnhancementBinarisationResultsExperiments

8 Detection in single frames Calculation of the gradient Accumulation Binarisation Mathematical Morphology Connected components Analysis Verification of geometric constraints Combination of the rectangles Verification of special cases Video List of rectangles IntroDetectionEnhancementBinarisationResultsExperiments

9 Detection in single frames: examples IntroDetectionEnhancementBinarisationResultsExperiments

10 A filter for text detection Accumulation of horizontal gradients. Justification: Text forms a regular texture containing vertical edges which are aligned horizontally. WM-W IntroDetectionEnhancementBinarisationResultsExperiments

11 Mathematical morphology Close Deletion of small bridges between the components dilate (special) to connect characters erode (special) to connect characters erode horizontally dilate horizontally IntroDetectionEnhancementBinarisationResultsExperiments

12 Detection in video sequences Detection per single frame List of rectangles per frame Tracking - keeping track of text occurrences Suppression of false alarms Image Enhancement - Multiple frame integration Text occurrences Frame nr. (time) IntroDetectionEnhancementBinarisationResultsExperiments

13 Integration of the rectangles  occurrences At every new frame, the detected rectangles must be matched with the stored text occurrences List of rectangles detected for the current frame Text occurrences Frame nr. (time) List containing the most recent rectangle of each text occurrence The integration is done using overlap information (overlap matrix) IntroDetectionEnhancementBinarisationResultsExperiments

14 Suppression of false alarms: Examples All detections After suppression of false alarms IntroDetectionEnhancementBinarisationResultsExperiments

15 Image enhancement Super-resolution (interpolation) Multiple frame integration: Averaging IntroDetectionEnhancementBinarisationResultsExperiments Integration of multiple frames to create a single image of higher quality. M1M1 M4M4 M2M2 M3M3 F i i th image MMean image VStd.deviation image An additional weight is included into the interp.scheme: Robust bi-linear Robust bi-cubic

16 Interpolation: Examples Bi-linear interpolation Robust bi-linear interpolation Robust bi-cubic interpolation IntroDetectionEnhancementBinarisationResultsExperiments

17 Interpolation: thresholded examples Bi-linear interpolation Robust bi-linear interpolation Robust bi-cubic interpolation IntroDetectionEnhancementBinarisationResultsExperiments

18 Binarisation Different Binarisation algorithms have been implemented and evaluated: Fisher/Otsu and windowed Fisher/Otsu algorithm Yanowitz-Bruckstein Niblack, Sauvola Our adaptive version of Niblack/Sauvola´s method. IntroDetectionEnhancementBinarisationResultsExperiments

19 Binarisation methods Yanowitz Bruckstein: The threshold surface is calculated from the edge information. Windowed-Fisher, Niblack-Sauvola: The threshold surface is calculated from the statistics collected in a window which is shifted across the image. Threshold surface IntroDetectionEnhancementBinarisationResultsExperiments

20 Binarisation by Niblack Niblack proposed a method which calculates a threshold surface by gliding a rectangular window over the image and calculating statistics on this window: mmean sstandard deviation kparameter, = -0.2 IntroDetectionEnhancementBinarisationResultsExperiments

21 Binarisation by Niblack: Problems Problems are light textures in the background, which are considered as text with small contrast: IntroDetectionEnhancementBinarisationResultsExperiments

22 Binarisation: Improvement by Sauvola mmean sstandard deviation kparameter, = 0.5 Rparameter (dynamic range of std.dev.), R = 128 To overcome these problems, Sauvola et al. proposed a new improved formula to calculate the threshold: Reformulation shows, that a hypothesis on the gray values of text and non-text are used to remove the noise produced by background textures: IntroDetectionEnhancementBinarisationResultsExperiments

23 Binarisation by Sauvola, examples Original image Binarised using Niblack´s method Binarised using Sauvola et al.´s method IntroDetectionEnhancementBinarisationResultsExperiments

24 Improvement: Adaptive dynamic range Nib Sauv. R=128 R ad. Fixing the dynamic range R=128 might be ok for document images, but not for text boxes taken from videos. Binarisation will not be correct, if the contrast of the image is smaller. We therefore set the parameter R to the maximum standard deviation for all windows calculated: To avoid two passes of the windowing algorithm, the mean and standard deviation can be stored in a table during the first pass and the threshold surface calculated on this data. IntroDetectionEnhancementBinarisationResultsExperiments

25 Improvement: Shift of the image range The strong hypothesis on the gray values (text pixels must be near zero) is not justified for some video text boxes: Gray value histogram Niblack Sauvola R=128 R ad. IntroDetectionEnhancementBinarisationResultsExperiments

26 Improvement: Shift of the image range A correction of the image´s histogram resolves this problem: Original image Corrected imagebinarised, R adaptive IntroDetectionEnhancementBinarisationResultsExperiments mmean sstandard deviation kparameter, = 0.5 R= maximum of the std.dev. of all windows M= minimum gray value of the text box The same effect can also be achieved by changing the threshold formula:

27 Fast incremental calculation Mean and variance can be calculated in one pass: L R At the beginning of each line, the full window is calculated and the variables a and b kept. After each shift, a and b are calculated incrementally by subtracting the column of pixels which left the window and adding the column which entered the window. Mean and standard deviation are stored in 2d tables, then the maximum R=max(s) is computed before calculating the threshold surface IntroDetectionEnhancementBinarisationResultsExperiments

28 The experiments Description of the experiments êThe videos used in the experiments. êDescription of the evaluation process (OCR Evaluation). Results for: êText detection êBinarisation êOCR IntroDetectionEnhancementBinarisationResultsExperiments

29 Test videos We performed experiments on 5 different MPEG 1 videos of resolution 384x288: IntroDetectionEnhancementBinarisationResultsExperiments

30 AIM3 News AIM4 Cartoon, News AIM5 News AIM2 Commercials IntroDetectionEnhancementBinarisationResultsExperiments

31 Video example - France Télécom ~22 minutes of video ~33000 frames IntroDetectionEnhancementBinarisationResultsExperiments

32 The interface to the OCR software Ideal situation: Pass individual (binarised) text boxes to an OCR software which recognises the contents box after box. In reality: We used standard commercial OCR software for our tests. This software has been designed to recognise scanned A4 or US letter pages and cannot directly process text boxes. A4 page IntroDetectionEnhancementBinarisationResultsExperiments

33 OCR Page - Manual An input image, ready for the OCR IntroDetectionEnhancementBinarisationResultsExperiments

34 OCR Output 051Q07Ô7 N*Verf 05JQ0707 PUBLICITE IPUBIIÏITE IPUBLICITE prenez prenez prenez boyard boyard boyard ^française ^française ^française FRANCE FRANCE FRANCE FRANCE FRANCE c'est plus musclé iï 'J fort fort fort fort fort.fort.fort.fort cotHfUet blé cotHfUet blé cQ#tfUet blé uutàfruuk On va beaucoup {&*$ loin avec Itineris. Partout Partout Partout Partout Partout I22h35 I22h35 I22h35 I22h35 I22h35 PUBLICITE \PUBLICITE \PUBLICITE >3h55l23h55l23h55l23h55l23h55l23h55 20h.50120h50 |20h50120h50 |20h50120h50,f ort boyard,f ort boyard 2,4 Kg J 2,4 Kg g 2,4 Kg J 2,4 Kg g 2,4 Kg J 2,4 Kg g 2,4 Kg J 2,4 Kg g 2,4 Kg J II II II II II II II II II gà dentsgà dents gà dents IIH r Lessive classique lljir Lessive classique I[HT Lessive classique le temps le temps le temps le temps le temps ^PUBLICITE ^PUBLICITE ^PUBLICITE I Par Amour du Goût. Il Par Amour du Goût. I en en en en en en en en en révolution révolution révolution IntroDetectionEnhancementBinarisationResultsExperiments

35 Post processing of OCR output 23h55 051Q07Ô7 PUBLICITE prenez boyard ^française FRANCE c'est plus musclé fort blé cotHfUet uutàfruuk On va beaucoup {&*$ loin avec Itineris. Partout I22h35 PUBLICITE \ >3h55l 20h.50,f ort boyard dimanche 23h55 N Vert 05100707 Berlingo PUBLICITE prenez diffusion simultanée en stéréo sur boyard française FRANCE c'est plus musclé PUBLICITE fort Coral blé complet fruits On va beaucoup Plus loin avec Itineris. Bohême Partout 22h35 PUBLICITE 23h55 20h50 fort fort boyard Post processed OCR outputGround truth IntroDetectionEnhancementBinarisationResultsExperiments

36 Automatic evaluation using markers The manual processing of the OCR output (separation of the output strings and search of the corresponding input box) is time consuming and error prone, especially in cases where the quality of the OCR output is very poor. Automatic OCR output processing can be achieved by placing marker images between the text boxes. The marker boxes contain text which is easily recognised by the OCR software. In the results section we will present results for both types of evaluation. IntroDetectionEnhancementBinarisationResultsExperiments

37 An input image with markers, ready for the OCR IntroDetectionEnhancementBinarisationResultsExperiments

38 OCR Evaluation Tkenchar 037 'gfrançaise 'gfrançaise Tkenchar 038 Mpe pire de| fj^e pire de| fj^e pire de| Tkenchar 039 @S Par Amour du Goût. @S en @S révolution @S la @S française @S le pire de @S 20H45 OCR outputRaw ground truth Search output for individual text boxes List of strings, each corres- ponding to the output for a text box, but eventually multiple times # Page 1: P 1 T 1 2 M 1 2 T 2 3 M 2 2 T 3 2 Structure log Prepare ground truth List of strings, each corresponding to the ground truth for a text box. Each string is repeated the same number of times as the corresponding text image in the OCR input image Evaluation Transformation cost Recall Precision IntroDetectionEnhancementBinarisationResultsExperiments

39 OCR Evaluation: Wagner & Fischer cost Substitution: Insertion: Deletion: AirbagGtroônn Airbag Citroën A measure for resemblance of two character strings. The cost to transform string A into string B is calculated. Basic transformation operations are used, which correspond to a certain cost. The cost function is minimised. IntroDetectionEnhancementBinarisationResultsExperiments

40 Detection results - INA Videos No suppression of false alarms IntroDetectionEnhancementBinarisationResultsExperiments

41 Binarisation methods: Examples Original image Fisher Fisher (windowed) Yanowitz B. Yanowitz B. + PP Niblack Sauvola et al. Our method IntroDetectionEnhancementBinarisationResultsExperiments

42 Binarisation methods: Examples Original image Fisher Fisher (windowed) Yanowitz B. Yanowitz B. + PP Niblack Sauvola et al. Our method IntroDetectionEnhancementBinarisationResultsExperiments

43 OCR Results - Classification by binarisation method Results obtained using the manual evaluation method (no markers in the input page). 44 pages Robust bi-cubic interpolation IntroDetectionEnhancementBinarisationResultsExperiments

44 OCR Results: Interpolation methods Robust bi-linear interpolation Robust bi-cubic interpolation 97 pages Results obtained using the automatic evaluation method (including markers in the input page). Robust bi-cubic interpolation IntroDetectionEnhancementBinarisationResultsExperiments

45 Conclusion êWe developed a system for detection, tracking, enhancement and binarisation of text. êA detection performance of 93.5% is obtained. êWe derived a new binarisation method adapted to the type of text found in videos. êThe total recognition rate is surprisingly high, given the quality of the text, but not yet good enough for indexation purposes. êOCR integration problem: No software development kits for direct access to the recognition functions available. A collaboration with an OCR company seems to be inevitable. IntroDetectionEnhancementBinarisationResultsExperiments

46 Outlook The perspectives of our work are situated in the extension of the existing algorithms to text with more difficult properties, and the enhancement and deeper studies of the existing techniques: Scene text: The binarisation techniques developed in the last 30 years are aimed either at document images or images from computer vision. The method we introduced in the framework of this project is an improvement of the work already presented, but the quality of the text is not yet satisfying enough. Especially the binarisation of scene text will demand the development of new methods. Detection recall: We are convinced, that the recall of the detection system can still be increased by further research, e.g. on the binarisation technique applied to the map of accumulated gradients. IntroDetectionEnhancementBinarisationResultsExperiments


Download ppt "Detection and Extraction of Artificial Text from Videos PROJECT France Télécom Research & Development 001B575 Laboratoire de Reconnaissance de Formes et."

Similar presentations


Ads by Google