Char-Net A Character-Aware Neural Network for Distorted Scene Text Recognition Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong Department of Computer Science, The University of Hong Kong Email: wliu@cs.hku.hk AAAI-18
Motivation of Char-Net *The original images are from Google Street View. Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Motivation of Char-Net *The original images are from Google Street View. Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Motivation of Char-Net *The original images are from Google Street View. Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Motivation of Char-Net *The original images are from Google Street View. Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Design of Char-Net Distorted Text ([Shi et al, CVPR’2016], [Liu et al, BMVC’2016]) Encoder Spatial Transformer Text Image Convolutional Neural Network Bi-Directional LSTMs … convolutional feature map CTC-based Decoder Attention-based Decoder or Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Design of Char-Net Spatial Transformer TPS Transformation Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Design of Char-Net Spatial Transformer TPS Transformation Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Design of Char-Net Spatial Transformer TPS Transformation Rotation Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Design of Char-Net Global and Complicated Transformation Local and Simple Transformation Spatial Transformer TPS Transformation Rotation Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Architecture of Char-Net Word-Level Encoder hyper-connection Character-Level Encoder Input Image hyper-connection Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Architecture of Char-Net Word-Level Encoder hyper-connection Character-Level Encoder Input Image hyper-connection Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Architecture of Char-Net Word-Level Encoder hyper-connection Character-Level Encoder Input Image hyper-connection Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Architecture of Char-Net Word-Level Encoder hyper-connection Character-Level Encoder Input Image hyper-connection Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Architecture of Char-Net Word-Level Encoder hyper-connection Character-Level Encoder Input Image hyper-connection Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Architecture of Char-Net Word-Level Encoder hyper-connection Character-Level Encoder Input Image hyper-connection Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Traditional Attention Hierarchical Attention Mechanism (HAM) Input Image Traditional Attention Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
HAM: Recurrent RoIWarp Layer Input Image Traditional Attention Grid Generator Bilinear Sampler Recurrent Localization Network Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
HAM: Recurrent RoIWarp Layer Input Image Traditional Attention Grid Generator Bilinear Sampler Recurrent Localization Network Character Location and Size: Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
HAM: Recurrent RoIWarp Layer Input Image Traditional Attention Traditional Attention Mechanism: Grid Generator Bilinear Sampler Recurrent Localization Network Character Location and Size: Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
HAM: Recurrent RoIWarp Layer Input Image Traditional Attention Traditional Attention Mechanism: Grid Generator Bilinear Sampler Recurrent Localization Network Character Location and Size: Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Recurrent Localization Network HAM: Recurrent RoIWarp Layer Input Image crop warp Variable-size character of interest with a fixed size Grid Generator Bilinear Sampler Recurrent Localization Network Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Recurrent Localization Network HAM: Recurrent RoIWarp Layer Input Image crop warp Variable-size character of interest with a fixed size 1. Grid Generator: where (u, v) is a point in and (u’, v’) is its corresponding sampling point in . Grid Generator Bilinear Sampler Recurrent Localization Network Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Recurrent Localization Network HAM: Recurrent RoIWarp Layer Input Image crop warp Variable-size character of interest with a fixed size 1. Grid Generator: where (u, v) is a point in and (u’, v’) is its corresponding sampling point in . Grid Generator Bilinear Sampler Recurrent Localization Network 2. Bilinear Sampler Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
HAM: Character-Level Attention Input Image Take the form of the traditional attention mechanism Essential for the end-to-end training Semi-supervised learning for character locations Distortion of the whole text Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Experiment Six public benchmarks: ICDAR-2003 (IC-03) Street View Text (SVT) IIIT5K Street View Text Perspective (SVT-P) ICDAR Incidental Scene Text (IC-IST) Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Experiment - Comparison with Previous Methods Experimental Setting: 37 classes (26 case-insensitive characters + 10 digits + eos) Training datasets: 8-million synthetic images (Jaderberg et al. 2014). Image size: 100 x 32 Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Experiment - General Scene Text Recognition Experimental Setting: 96 classes (26 upper-case letters + 26 lower-case letters + 10 digits + 33 punctuations + eos) Training datasets: 12-million synthetic images (Jaderberg et al. 2014 + Gupta et al. 2016) Image size: 100 x 100 Data augmentation Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Experiment – Qualitative Results prediction: toast Character-Level Attention prediction: beyond Character-Level Attention prediction: wishing Character-Level Attention Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.
Conclusion Input Image A simple but efficient network for distorted text recognition End-to-end trainable framework Hierarchical attention mechanism Character-level encoder Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.