High-level Component Filtering for Robust Scene Text Detection

Name: High-level Component Filtering for Robust Scene Text Detection
Uploaded: 2017-08-27T18:15:21+00:00
Duration: PTM10S10
Channel: Barry Walsh
Description: High-level Component Filtering for Robust Scene Text Detection

High-level Component Filtering for Robust Scene Text Detection
Weilin Huang (黄韡林) Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences Multimedia Laboratory, The Chinese University of Hongkong

Outline ♦ Connected Component and Sliding-Window Methods
■ Introduction ♦ Connected Component and Sliding-Window Methods ♦ Stroke Width Transform (SWT) ♦ SWT based Text Detection ■ Stroke Feature Transform ♦ Colour Information on Text Stroke Detection ■ Text Covariance Descriptor (TCD) ♦ TCD for Component Filtering ♦ TCD for Text-line Filtering ■ Convolution Neural Network Induced MSER Trees ♦ Maximally Stable Extremal Regions (MSERs) ♦ CNN for Component Classification ♦ Component Splitting

I. Introduction: Text Detection Methods
■ Connected Component Methods ♦ Step 1: Separate text and non-text information at pixel-level ♦ Step 2: Group text pixels to construct character components ♦ Advantages: fast computing ♦ Limitations: not robust, erroneous components, many false alarms ♦ Examples: SWT, MSERs ■ Sliding-Window Methods ♦ Step 1: Train a text classifier ♦ Step 1I: Scan a sliding sub-window though the image ♦ Advantages: high-level text classification ♦ Limitations: computing costly, difficulty in feature design

I. Introduction: Stroke Width Transform(1)
■ Example SWT Operator Stroke width constraint: |Op - Oq|<λ SWT Map ■ Problem 1: Erroneous connection ■ Low-level pixel filter Connecting multiple characters ■ Canny edges Separating single characters ■ Gradient orientation for ray tracking ■ Problem 2: many non-text components ■ Compute stroke width bwt. paired pixels

I. Introduction: SWT based Text Detection
■ Complete Processing: Comp. filtering SWT Heuristic Filtering Random Forest classifier (heuristic and geometric features) Our Improvements TL filtering GP More powerful high-level filters Text components Grouped text lines Final text lines C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu, Detecting texts of arbitrary orientations in natural images, CVPR, 2012.

Stroke Width Constraint: Neighborhood Coherency Constraint
II. Stroke Feature Transform (SFT) (1) ■ Stroke Feature Transform(SFT): Stroke Width Constraint: |Op - Oq|<λ1 Stroke Color Constraint: |Cp - Cq|<λ2 Stroke width constraint: |Op - Oq|<λ Neighborhood Coherency Constraint SWT SFT Stroke Width Map Output Stroke Width Map Stroke Color Map

II. Stroke Feature Transform (SFT) (2)
■ SFT vs SWT  Mitigate inter-component connections  Enhance intra-component connections  Better character candidate detection  Higher Recall

…… II. Stroke Feature Transform (SFT) (3)
■ Limitation: not robust by low-level operation  Text-like outliers ■ Bricks ■ Windows ■ Leaves …… Many false alarms  Low Precision  Heuristic filter not work well  High-level learning based filtering required

III. Text Covariance Descriptor (TCD) (1)
 Each pixel represented by d-features  TCD is computed as:  U is a given region:  Multiple features are incorporated in a matrix

■ TCD for components  Pixel coordinates in X- and Y-axis Encode spatial information  Pixel intensities and RGB values Color uniformity 9x9 Covariance Features  Stroke width and distance values Stroke width/distance consistency  Edge information by Canny detector Stroke spatial layout ■ Totally 9 features to construct a 9 x 9 matrix ■ Transform to a 45-dim feature vector ■ Get component confident maps by RF classifier

■ TCD for Text-line  Mean properties of component features Uniformity  Coordinates of component centers 12x12 Covariance Features Spatial information  Heights of components Consistency  Horizontal distances between components Text spatial layout  16-bins HOG on edge pixels 16x16 Covariance Features Orientated spatial features ■ Get Text-line Confident Maps by RF classifier

■ Component and text-line confidence maps

■ Top: TCD for component; Middle: TCD for text-line; Bottom: detection

■ Results ■ Failure Cases W. Huang, Z. Lin, J. Yang and J. Wang, Text localization in natural images using stroke feature transform and text covariance descriptors, ICCV, 2013.

Convolution Neural Network Induced MSER Trees (1)
■ Maximally Stable Extremal Region (MSER) Tree L. Neumann and J. Matas. Text localization in real-world images using efficiently pruned exhaustive search, ICDAR, 2011. ■ MSER vs SWT ♦ Detect low-quality texts  Higher Recall ♦ Generate more non-text components  Lower Precision ♦ Require a more powerful classifier/filter

■ A Two-layers Convolution Neural Network (CNN) T. Wang, D. J. Wu, A. Coates and A. Y. Ng, End-to-end text recognition with convolutional neural networks, ICPR, 2012.

■ Training Data: Synthetic samples ■ Data Transformation ♦ Fixed-size of 32x32 ♦ Horizontal warp ♦ Include additional image context

■ CNN Confident Scores MSERs CNN Scores Comp. Splitting Detection

■ Component Splitting Erroneously connected Component ■ High aspect ratio ■ Positive conf. score ■ Leaf of the MESR tree or conf. score> all children

■ Comparisons with SFT-TCD

■ Results

■ Results on the ICDAR 2011 Database W. Huang, Y. Qiao, and X. Tang, Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees, ECCV, 2014.

The End Thank You!

High-level Component Filtering for Robust Scene Text Detection

Similar presentations

Presentation on theme: "High-level Component Filtering for Robust Scene Text Detection"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

High-level Component Filtering for Robust Scene Text Detection

Similar presentations

Presentation on theme: "High-level Component Filtering for Robust Scene Text Detection"— Presentation transcript:

Similar presentations

About project

Feedback