Lukáš Neumann and Jiří Matas Centre for Machine Perception, Department of Cybernetics Czech Technical University, Prague 1
Neumann, Matas, ICDAR 2015 Problem Introduction Contributions: 1. Text Fragments – Generalization of character detection 2. Stroke Support Pixels 3. Text-line Resegmentation Experiments Conclusion 2/22
Neumann, Matas, ICDAR 2015 Text ◦ Anything that can be represented as a sequence of Unicode characters 3/22
Neumann, Matas, ICDAR 2015 Scene Text (Text in the Wild) Typically short snippet(s) of text, arbitrary script and orientation, non-standard fonts, out-of-vocabulary words, complex backgrounds Image/video taken by a camera Text in the wild Other text 4/22
Neumann, Matas, ICDAR 2015 Region-based methods assume: one region (connected component) represents one character We generalize this assumption by detecting arbitrary Text Fragments in a single pass Text Fragment ◦ Part of a Character ◦ Character ◦ Group of Characters ◦ Word 5/22
Neumann, Matas, ICDAR 2015 Text Fragments in the majority of scripts and fonts share the “strokeness” property This observation was popularized in the Stroke Width Transform [1] to detect individual characters [1] B. Epshtein et al., “Detecting text in natural scenes with stroke width transform,” in CVPR /22
Neumann, Matas, ICDAR 2015 Text Fragment candidates detected as MSERs over multiple scales and color projections MSERs classified as either ◦ Character (character or a character part) ◦ Multi-character (group of characters or words) ◦ Background Characters and multi-characters grouped into text lines with an efficient exhaustive search strategy [2] Each text line is refined using a local text model Character segmentations are recognized using an OCR module trained on synthetic data [3] [2] L. Neumann, J. Matas, “Text localization in real-world images using efficiently pruned exhaustive search,” in ICDAR 2011 [3] L. Neumann, J. Matas, “On combining multiple segmentations in scene text recognition,” in ICDAR /22
Neumann, Matas, ICDAR 2015 Area A of a stroke is approximately equal to the product of the stroke axis length s l and the stroke width s w Stroke area ratio A s / A is a very discriminative feature to eliminate non-text regions A character can be “drawn” by a circular brush with a possibly changing diameter d i equal the stroke width s w sweeping a curve S – the stroke axis. The non-constant diameter models characters made of strokes of different width swsw w s l didi = S 8/22
Neumann, Matas, ICDAR 2015 The stroke is “in the mind of the writer” (it could be easily found in a online handwriting setup) The Stroke Support Pixels (SSP) is a subset of pixels that lie on the stroke (but unlike skeleton, it does not have to be continuous) The subset is found as local maxima in a region’s distance map Stroke area discretization effects are compensated by weighing all SSPs in a 3x3 neighborhood 9/22
Neumann, Matas, ICDAR 2015 Less sensitive to discretization effects and scale change than standard skeleton algorithms; detection trivial 10/22
Neumann, Matas, ICDAR 2015 Less sensitive to discretization effects and scale change than standard skeleton algorithms 11/22
Neumann, Matas, ICDAR /22
Neumann, Matas, ICDAR 2015 Character/ FragmentMulti-characterBackground * only not rotation invariant, replaced in current work to achieve full rotation invariance 13/22
Neumann, Matas, ICDAR 2015 Key feature in the classification Works for wide variety of scripts and fonts Example: MSERs 460 Character Multi-character Non-character MSER 14/22
Neumann, Matas, ICDAR 2015 Not all characters (even their fragments or groups) are detected as MSERs Characters which are detected can have many different segmentations (over-complete representation) The detected Text Fragments are used to initialize a hypotheses-verification iterative process For each text line, a local color model is iteratively updated using a standard graph cut framework The graph cut is initialized using the stroke support pixels Note that unlike with MSERs, the segmentation is not limited to threshold a scalar value 15/22
Neumann, Matas, ICDAR 2015 Source ImageMSER detectionInitialization Iteration #1 Iteration #2Final iteration (#6) After every iteration: the text box position is re-estimated connected components are classified (character, multi, non- char ) stroke support pixels in green 16/22
Neumann, Matas, ICDAR 2015 Source ImageText Fragment detection Final Segmentation Latin (stencil), Hebrew Script 17/22
Neumann, Matas, ICDAR 2015 Source ImageText Fragment detection Final Segmentation Indian (Kanada), “Latin”, Armenian Script 18/22
Neumann, Matas, ICDAR 2015 pipelinerecallprecisionf Proposed method Yin et al. [4] TexStar (ICDAR’13 winner) our previous method [3] Kim (ICDAR’11 winner) ICDAR 2013 Dataset – Text Localization [4] X.-C. Yin, X. Yin, K. Huang, and H.-W. Hao, “Robust text detection in natural scene images,”, TPAMI /22
Neumann, Matas, ICDAR 2015 TAXI CARLINGD8LL iMacTHE DOLLAR ARMSPANTENE PROV 20/22
Neumann, Matas, ICDAR 2015 Arbitrary Text Fragments detected in a single pass An efficiently calculated “strokeness” feature exploited to discriminate between Text Fragments and background clutter Detected Text-lines are refined by re-segmentation in a hypotheses-verification iterative process that exploits local text line properties Competitive results with the state-of-the-art Online demo available at Current and future work ◦ Rotation-invariant real-time character detector (~ 5fps) ◦ OCR accuracy improvement 21/22
Neumann, Matas, ICDAR 2015 Thank you for your attention! 22/22