Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong

Slides:



Advertisements
Similar presentations
Real-time Computer Vision with Scanning N-Tuple Grids Simon Lucas Computer Science Dept.
Advertisements

Spatial Pyramid Pooling in Deep Convolutional
Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab
End-to-End Text Recognition with Convolutional Neural Networks
Skeleton Based Action Recognition with Convolutional Neural Network
Convolutional LSTM Networks for Subcellular Localization of Proteins
Lecture 4b Data augmentation for CNN training
Zhuode Liu University of Texas at Austin CS 381V: Visual Recognition Experiment Presentation of Synthetic Data and Artificial Neural Networks for Natural.
Attention Model in NLP Jichuan ZENG.
R-NET: Machine Reading Comprehension With Self-Matching Networks
Recent developments in object detection
Convolutional Sequence to Sequence Learning
Unsupervised Learning of Video Representations using LSTMs
SUNY Korea BioData Mining Lab - Journal Review
Convolutional Neural Network
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Summary of “Efficient Deep Learning for Stereo Matching”
Inverse Compositional Spatial Transformer Networks
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Deep Predictive Model for Autonomous Driving
Jure Zbontar, Yann LeCun
Matt Gormley Lecture 16 October 24, 2016
Combining CNN with RNN for scene labeling (segmentation)
Rotational Rectification Network for Robust Pedestrian Detection
Saliency detection Donghun Yeo CV Lab..
Intelligent Information System Lab
Synthesis of X-ray Projections via Deep Learning
Mean Euclidean Distance Error (mm)
Deepak Kumar1, Chetan Kumar1, Ming Shao2
Deep Learning Workshop
CS6890 Deep Learning Weizhen Cai
Image Question Answering
State-of-the-art face recognition systems
Grid Long Short-Term Memory
Introduction of MATRIX CAPSULES WITH EM ROUTING
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
The Big Health Data–Intelligent Machine Paradox
Age and Gender Classification using Convolutional Neural Networks
GAN Applications.
Spatial Transformer Networks
Outline Background Motivation Proposed Model Experimental Results
Neural Speech Synthesis with Transformer Network
Socialized Word Embeddings
Recurrent Encoder-Decoder Networks for Time-Varying Dense Predictions
RCNN, Fast-RCNN, Faster-RCNN
Iterative Crowd Counting
Learning Object Context for Dense Captioning
Natural Language to SQL(nl2sql)
Heterogeneous convolutional neural networks for visual recognition
Department of Computer Science Ben-Gurion University of the Negev
Chuan Wang1, Haibin Huang1, Xiaoguang Han2, Jue Wang1
Automatic Handwriting Generation
Human-object interaction
Rgh
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Unrolling the shutter: CNN to correct motion distortions
Learning Deconvolution Network for Semantic Segmentation
End-to-End Facial Alignment and Recognition
Sequence-to-Sequence Models
Week 3 Volodymyr Bobyr.
Deep screen image crop and enhance
Week 7 Presentation Ngoc Ta Aidean Sharghi
Fig. 2 Visualization of features.
Deep Video Quality Assessor: From Spatio-temporal Visual Sensitivity to A convolutional Neural Aggregation Network Woojae Kim1, Jongyoo Kim2, Sewoong Ahn1,Jinwoo.
CRCV REU 2019 Aaron Honculada.
SDSEN: Self-Refining Deep Symmetry Enhanced Network
The experiment based on hier-attention
Presented By: Firas Gerges (fg92)
Shengcong Chen, Changxing Ding, Minfeng Liu 2018
Presentation transcript:

Char-Net A Character-Aware Neural Network for Distorted Scene Text Recognition Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong Department of Computer Science, The University of Hong Kong Email: wliu@cs.hku.hk AAAI-18

Motivation of Char-Net *The original images are from Google Street View. Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Motivation of Char-Net *The original images are from Google Street View. Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Motivation of Char-Net *The original images are from Google Street View. Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Motivation of Char-Net *The original images are from Google Street View. Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Design of Char-Net Distorted Text ([Shi et al, CVPR’2016], [Liu et al, BMVC’2016]) Encoder Spatial Transformer Text Image Convolutional Neural Network Bi-Directional LSTMs … convolutional feature map CTC-based Decoder Attention-based Decoder or Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Design of Char-Net Spatial Transformer TPS Transformation Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Design of Char-Net Spatial Transformer TPS Transformation Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Design of Char-Net Spatial Transformer TPS Transformation Rotation Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Design of Char-Net Global and Complicated Transformation Local and Simple Transformation Spatial Transformer TPS Transformation Rotation Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Architecture of Char-Net Word-Level Encoder hyper-connection Character-Level Encoder Input Image hyper-connection Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Architecture of Char-Net Word-Level Encoder hyper-connection Character-Level Encoder Input Image hyper-connection Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Architecture of Char-Net Word-Level Encoder hyper-connection Character-Level Encoder Input Image hyper-connection Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Architecture of Char-Net Word-Level Encoder hyper-connection Character-Level Encoder Input Image hyper-connection Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Architecture of Char-Net Word-Level Encoder hyper-connection Character-Level Encoder Input Image hyper-connection Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Architecture of Char-Net Word-Level Encoder hyper-connection Character-Level Encoder Input Image hyper-connection Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Traditional Attention Hierarchical Attention Mechanism (HAM) Input Image Traditional Attention Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

HAM: Recurrent RoIWarp Layer Input Image Traditional Attention Grid Generator Bilinear Sampler Recurrent Localization Network Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

HAM: Recurrent RoIWarp Layer Input Image Traditional Attention Grid Generator Bilinear Sampler Recurrent Localization Network Character Location and Size: Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

HAM: Recurrent RoIWarp Layer Input Image Traditional Attention Traditional Attention Mechanism: Grid Generator Bilinear Sampler Recurrent Localization Network Character Location and Size: Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

HAM: Recurrent RoIWarp Layer Input Image Traditional Attention Traditional Attention Mechanism: Grid Generator Bilinear Sampler Recurrent Localization Network Character Location and Size: Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Recurrent Localization Network HAM: Recurrent RoIWarp Layer Input Image crop warp Variable-size character of interest with a fixed size Grid Generator Bilinear Sampler Recurrent Localization Network Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Recurrent Localization Network HAM: Recurrent RoIWarp Layer Input Image crop warp Variable-size character of interest with a fixed size 1. Grid Generator: where (u, v) is a point in and (u’, v’) is its corresponding sampling point in . Grid Generator Bilinear Sampler Recurrent Localization Network Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Recurrent Localization Network HAM: Recurrent RoIWarp Layer Input Image crop warp Variable-size character of interest with a fixed size 1. Grid Generator: where (u, v) is a point in and (u’, v’) is its corresponding sampling point in . Grid Generator Bilinear Sampler Recurrent Localization Network 2. Bilinear Sampler Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

HAM: Character-Level Attention Input Image Take the form of the traditional attention mechanism Essential for the end-to-end training Semi-supervised learning for character locations Distortion of the whole text Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Experiment Six public benchmarks: ICDAR-2003 (IC-03) Street View Text (SVT) IIIT5K Street View Text Perspective (SVT-P) ICDAR Incidental Scene Text (IC-IST) Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Experiment - Comparison with Previous Methods Experimental Setting: 37 classes (26 case-insensitive characters + 10 digits + eos) Training datasets: 8-million synthetic images (Jaderberg et al. 2014). Image size: 100 x 32 Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Experiment - General Scene Text Recognition Experimental Setting: 96 classes (26 upper-case letters + 26 lower-case letters + 10 digits + 33 punctuations + eos) Training datasets: 12-million synthetic images (Jaderberg et al. 2014 + Gupta et al. 2016) Image size: 100 x 100 Data augmentation Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Experiment – Qualitative Results prediction: toast Character-Level Attention prediction: beyond Character-Level Attention prediction: wishing Character-Level Attention Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Conclusion Input Image A simple but efficient network for distorted text recognition End-to-end trainable framework Hierarchical attention mechanism Character-level encoder Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.