Download presentation
Published byMyrtle Mosley Modified over 9 years ago
1
End-to-End Text Recognition with Convolutional Neural Networks
Tao Wang*, David J. Wu*, Adam Coates, Andrew Y. Ng Computer Science Department Stanford University * Denotes equal contribution
2
Scene Text Recognition Overview
Text “in the wild” are hard to recognize Wide range of variations in backgrounds, textures, fonts, and lighting conditions ICDAR 2003 Dataset S. Lucas et al., 2003 Street View Text Dataset K.Wang et al., 2011 Tao Wang
3
Detection/Classification
Two-Stage Framework Detection/Classification High-level Inference “HOTEL” Tao Wang
4
Classification and detection
Works Classification and detection High-level inference Weinman et al., 2008 Appearance + Geometry Semi-Markov CRF K. Wang et al., 2011 HOG + Random Ferns Pictorial Structure Mishra et al., 2012 HOG + SVM with RBF Kernel CRF + N-gram model Neumann and Matas, 2012 MSER + SVM with RBF Kernel Exhaustive Graph Search Tao Wang
5
Classification and detection
High-level inference Most other approaches Hand-designed features + off-the-shelf classifier Graph based inference models Our approach Learnt features layer CNN Simple off-the-shelf heuristics Tao Wang
6
SOTA Various Benchmarks SOTA SOTA on ICDAR Detection/Classification
End-to-end system after high-level inference ICDAR 62-way cropped character classification ICDAR and SVT end-to-end text recognition SOTA Lexicon ICDAR and SVT Cropped word recognition SOTA SOTA on ICDAR Tao Wang
7
Unsupervised Feature Learning
Contrast Normalization + ZCA whitening K-Means Coates et al., 2011 Tao Wang
8
~10K parameters for detection
~50K parameters for classification L2-SVM Classifier √ Text × Non-Text Large representation but not enough data. Overfitting? 96 256 Spatial Pooling Spatial Pooling Convolution Convolution 1st layer 2nd layer Backpropagation Tao Wang
9
Java.Font + Natural backgrounds
Synthetic Data Real Real Data Unrealistic Synthetic Data Synthetic Java.Font + Natural backgrounds Color Statistics Synthetic “hard negatives” Tao Wang
10
Detector Performance Tao Wang
11
Text Line Bounding boxes
Candidate spaces Tao Wang
12
Classifier Performance
62-way classification accuracy on ICDAR cropped characters 83.9 Higher is better Accuracy(%) (on ICDAR-Sample characters) Tao Wang
13
Tao Wang
14
Sliding window position
Char Class Sliding window position Tao Wang
15
Word Recognition max ∑ Lexicon: … MAKE S E R I E S SERIES ESTATE
POKER S E R I E S -5.45 7.82 -1.74 -9.02 max ∑ Tao Wang
16
Cropped Word Recognition Accuracy
Higher is better Cropped Words Benchmarks Tao Wang
17
Candidate spaces generated by detector … … Tao Wang
18
Tao Wang
19
End-to-end text recognition results
F-Score Higher is better End-to-end Benchmarks Tao Wang
20
Sample Output Images from SVT
Tao Wang
21
Sample Output Images from ICDAR-FULL
Tao Wang
22
c Hunspell -- “confidence margin” LEXICON POSE POST Suggested Words
PEOPLE PISTOL … Suggested Words POS POST Our F-score: 0.38 Neumann and Matas, 2010: 0.40 Hunspell PEOST PEOSTEL Tao Wang
23
Conclusion Learnt features + 2-layer CNN for+ character detection and classification Simple heuristics to build end-to-end scene text recognition system State-of-the-art performances on - ICDAR cropped character classification - ICDAR cropped word recognition - Lexicon based end-to-end recognition on ICDAR and SVT Extensible to more general lexicon with off-the-shelf spelling checker Tao Wang
24
Questions? Tao Wang
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.