Download presentation
Published byBarry Walsh Modified over 9 years ago
1
High-level Component Filtering for Robust Scene Text Detection
Weilin Huang (黄韡林) Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences Multimedia Laboratory, The Chinese University of Hongkong
2
Outline ♦ Connected Component and Sliding-Window Methods
■ Introduction ♦ Connected Component and Sliding-Window Methods ♦ Stroke Width Transform (SWT) ♦ SWT based Text Detection ■ Stroke Feature Transform ♦ Colour Information on Text Stroke Detection ■ Text Covariance Descriptor (TCD) ♦ TCD for Component Filtering ♦ TCD for Text-line Filtering ■ Convolution Neural Network Induced MSER Trees ♦ Maximally Stable Extremal Regions (MSERs) ♦ CNN for Component Classification ♦ Component Splitting
3
I. Introduction: Text Detection Methods
■ Connected Component Methods ♦ Step 1: Separate text and non-text information at pixel-level ♦ Step 2: Group text pixels to construct character components ♦ Advantages: fast computing ♦ Limitations: not robust, erroneous components, many false alarms ♦ Examples: SWT, MSERs ■ Sliding-Window Methods ♦ Step 1: Train a text classifier ♦ Step 1I: Scan a sliding sub-window though the image ♦ Advantages: high-level text classification ♦ Limitations: computing costly, difficulty in feature design
4
I. Introduction: Stroke Width Transform(1)
■ Example SWT Operator Stroke width constraint: |Op - Oq|<λ SWT Map ■ Problem 1: Erroneous connection ■ Low-level pixel filter Connecting multiple characters ■ Canny edges Separating single characters ■ Gradient orientation for ray tracking ■ Problem 2: many non-text components ■ Compute stroke width bwt. paired pixels
5
I. Introduction: SWT based Text Detection
■ Complete Processing: Comp. filtering SWT Heuristic Filtering Random Forest classifier (heuristic and geometric features) Our Improvements TL filtering GP More powerful high-level filters Text components Grouped text lines Final text lines C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu, Detecting texts of arbitrary orientations in natural images, CVPR, 2012.
6
Stroke Width Constraint: Neighborhood Coherency Constraint
II. Stroke Feature Transform (SFT) (1) ■ Stroke Feature Transform(SFT): Stroke Width Constraint: |Op - Oq|<λ1 Stroke Color Constraint: |Cp - Cq|<λ2 Stroke width constraint: |Op - Oq|<λ Neighborhood Coherency Constraint SWT SFT Stroke Width Map Output Stroke Width Map Stroke Color Map
7
II. Stroke Feature Transform (SFT) (2)
■ SFT vs SWT Mitigate inter-component connections Enhance intra-component connections Better character candidate detection Higher Recall
8
…… II. Stroke Feature Transform (SFT) (3)
■ Limitation: not robust by low-level operation Text-like outliers ■ Bricks ■ Windows ■ Leaves …… Many false alarms Low Precision Heuristic filter not work well High-level learning based filtering required
9
III. Text Covariance Descriptor (TCD) (1)
Each pixel represented by d-features TCD is computed as: U is a given region: Multiple features are incorporated in a matrix
10
III. Text Covariance Descriptor (TCD) (2)
■ TCD for components Pixel coordinates in X- and Y-axis Encode spatial information Pixel intensities and RGB values Color uniformity 9x9 Covariance Features Stroke width and distance values Stroke width/distance consistency Edge information by Canny detector Stroke spatial layout ■ Totally 9 features to construct a 9 x 9 matrix ■ Transform to a 45-dim feature vector ■ Get component confident maps by RF classifier
11
III. Text Covariance Descriptor (TCD) (3)
■ TCD for Text-line Mean properties of component features Uniformity Coordinates of component centers 12x12 Covariance Features Spatial information Heights of components Consistency Horizontal distances between components Text spatial layout 16-bins HOG on edge pixels 16x16 Covariance Features Orientated spatial features ■ Get Text-line Confident Maps by RF classifier
12
III. Text Covariance Descriptor (TCD) (4)
■ Component and text-line confidence maps
13
III. Text Covariance Descriptor (TCD) (5)
■ Top: TCD for component; Middle: TCD for text-line; Bottom: detection
14
III. Text Covariance Descriptor (TCD) (5)
■ Results ■ Failure Cases W. Huang, Z. Lin, J. Yang and J. Wang, Text localization in natural images using stroke feature transform and text covariance descriptors, ICCV, 2013.
15
Convolution Neural Network Induced MSER Trees (1)
■ Maximally Stable Extremal Region (MSER) Tree L. Neumann and J. Matas. Text localization in real-world images using efficiently pruned exhaustive search, ICDAR, 2011. ■ MSER vs SWT ♦ Detect low-quality texts Higher Recall ♦ Generate more non-text components Lower Precision ♦ Require a more powerful classifier/filter
16
Convolution Neural Network Induced MSER Trees (2)
■ A Two-layers Convolution Neural Network (CNN) T. Wang, D. J. Wu, A. Coates and A. Y. Ng, End-to-end text recognition with convolutional neural networks, ICPR, 2012.
17
Convolution Neural Network Induced MSER Trees (3)
■ Training Data: Synthetic samples ■ Data Transformation ♦ Fixed-size of 32x32 ♦ Horizontal warp ♦ Include additional image context
18
Convolution Neural Network Induced MSER Trees (3)
■ CNN Confident Scores MSERs CNN Scores Comp. Splitting Detection
19
Convolution Neural Network Induced MSER Trees (4)
■ Component Splitting Erroneously connected Component ■ High aspect ratio ■ Positive conf. score ■ Leaf of the MESR tree or conf. score> all children
20
Convolution Neural Network Induced MSER Trees (5)
■ Comparisons with SFT-TCD
21
Convolution Neural Network Induced MSER Trees (6)
■ Results
22
Convolution Neural Network Induced MSER Trees (7)
■ Results on the ICDAR 2011 Database W. Huang, Y. Qiao, and X. Tang, Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees, ECCV, 2014.
23
The End Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.