Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong

Slides:

Advertisements

Similar presentations

Real-time Computer Vision with Scanning N-Tuple Grids Simon Lucas Computer Science Dept.

Advertisements

Spatial Pyramid Pooling in Deep Convolutional

Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab

End-to-End Text Recognition with Convolutional Neural Networks

Skeleton Based Action Recognition with Convolutional Neural Network

Convolutional LSTM Networks for Subcellular Localization of Proteins

Lecture 4b Data augmentation for CNN training

Zhuode Liu University of Texas at Austin CS 381V: Visual Recognition Experiment Presentation of Synthetic Data and Artificial Neural Networks for Natural.

Attention Model in NLP Jichuan ZENG.

R-NET: Machine Reading Comprehension With Self-Matching Networks

Recent developments in object detection

Convolutional Sequence to Sequence Learning

Unsupervised Learning of Video Representations using LSTMs

SUNY Korea BioData Mining Lab - Journal Review

Convolutional Neural Network

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

Summary of “Efficient Deep Learning for Stereo Matching”

Inverse Compositional Spatial Transformer Networks

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Deep Predictive Model for Autonomous Driving

Jure Zbontar, Yann LeCun

Matt Gormley Lecture 16 October 24, 2016

Combining CNN with RNN for scene labeling (segmentation)

Rotational Rectification Network for Robust Pedestrian Detection

Saliency detection Donghun Yeo CV Lab..

Intelligent Information System Lab

Synthesis of X-ray Projections via Deep Learning

Mean Euclidean Distance Error (mm)

Deepak Kumar1, Chetan Kumar1, Ming Shao2

Deep Learning Workshop

CS6890 Deep Learning Weizhen Cai

Image Question Answering

State-of-the-art face recognition systems

Grid Long Short-Term Memory

Introduction of MATRIX CAPSULES WITH EM ROUTING

MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.

The Big Health Data–Intelligent Machine Paradox

Age and Gender Classification using Convolutional Neural Networks

GAN Applications.

Spatial Transformer Networks

Outline Background Motivation Proposed Model Experimental Results

Neural Speech Synthesis with Transformer Network

Socialized Word Embeddings

Recurrent Encoder-Decoder Networks for Time-Varying Dense Predictions

RCNN, Fast-RCNN, Faster-RCNN

Iterative Crowd Counting

Learning Object Context for Dense Captioning

Natural Language to SQL(nl2sql)

Heterogeneous convolutional neural networks for visual recognition

Department of Computer Science Ben-Gurion University of the Negev

Chuan Wang1, Haibin Huang1, Xiaoguang Han2, Jue Wang1

Automatic Handwriting Generation

Human-object interaction

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Unrolling the shutter: CNN to correct motion distortions

Learning Deconvolution Network for Semantic Segmentation

End-to-End Facial Alignment and Recognition

Sequence-to-Sequence Models

Week 3 Volodymyr Bobyr.

Deep screen image crop and enhance

Week 7 Presentation Ngoc Ta Aidean Sharghi

Fig. 2 Visualization of features.

Deep Video Quality Assessor: From Spatio-temporal Visual Sensitivity to A convolutional Neural Aggregation Network Woojae Kim1, Jongyoo Kim2, Sewoong Ahn1,Jinwoo.

CRCV REU 2019 Aaron Honculada.

SDSEN: Self-Refining Deep Symmetry Enhanced Network

The experiment based on hier-attention

Presented By: Firas Gerges (fg92)

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Presentation transcript:

Char-Net A Character-Aware Neural Network for Distorted Scene Text Recognition Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong Department of Computer Science, The University of Hong Kong Email: wliu@cs.hku.hk AAAI-18

Motivation of Char-Net *The original images are from Google Street View. Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Motivation of Char-Net *The original images are from Google Street View. Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Motivation of Char-Net *The original images are from Google Street View. Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Motivation of Char-Net *The original images are from Google Street View. Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Design of Char-Net Distorted Text ([Shi et al, CVPR’2016], [Liu et al, BMVC’2016]) Encoder Spatial Transformer Text Image Convolutional Neural Network Bi-Directional LSTMs … convolutional feature map CTC-based Decoder Attention-based Decoder or Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Design of Char-Net Spatial Transformer TPS Transformation Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Design of Char-Net Spatial Transformer TPS Transformation Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Design of Char-Net Spatial Transformer TPS Transformation Rotation Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Design of Char-Net Global and Complicated Transformation Local and Simple Transformation Spatial Transformer TPS Transformation Rotation Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Architecture of Char-Net Word-Level Encoder hyper-connection Character-Level Encoder Input Image hyper-connection Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Architecture of Char-Net Word-Level Encoder hyper-connection Character-Level Encoder Input Image hyper-connection Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Architecture of Char-Net Word-Level Encoder hyper-connection Character-Level Encoder Input Image hyper-connection Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Architecture of Char-Net Word-Level Encoder hyper-connection Character-Level Encoder Input Image hyper-connection Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Architecture of Char-Net Word-Level Encoder hyper-connection Character-Level Encoder Input Image hyper-connection Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Architecture of Char-Net Word-Level Encoder hyper-connection Character-Level Encoder Input Image hyper-connection Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Traditional Attention Hierarchical Attention Mechanism (HAM) Input Image Traditional Attention Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

HAM: Recurrent RoIWarp Layer Input Image Traditional Attention Grid Generator Bilinear Sampler Recurrent Localization Network Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

HAM: Recurrent RoIWarp Layer Input Image Traditional Attention Grid Generator Bilinear Sampler Recurrent Localization Network Character Location and Size: Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

HAM: Recurrent RoIWarp Layer Input Image Traditional Attention Traditional Attention Mechanism: Grid Generator Bilinear Sampler Recurrent Localization Network Character Location and Size: Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

HAM: Recurrent RoIWarp Layer Input Image Traditional Attention Traditional Attention Mechanism: Grid Generator Bilinear Sampler Recurrent Localization Network Character Location and Size: Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Recurrent Localization Network HAM: Recurrent RoIWarp Layer Input Image crop warp Variable-size character of interest with a fixed size Grid Generator Bilinear Sampler Recurrent Localization Network Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Recurrent Localization Network HAM: Recurrent RoIWarp Layer Input Image crop warp Variable-size character of interest with a fixed size 1. Grid Generator: where (u, v) is a point in and (u’, v’) is its corresponding sampling point in . Grid Generator Bilinear Sampler Recurrent Localization Network Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Recurrent Localization Network HAM: Recurrent RoIWarp Layer Input Image crop warp Variable-size character of interest with a fixed size 1. Grid Generator: where (u, v) is a point in and (u’, v’) is its corresponding sampling point in . Grid Generator Bilinear Sampler Recurrent Localization Network 2. Bilinear Sampler Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

HAM: Character-Level Attention Input Image Take the form of the traditional attention mechanism Essential for the end-to-end training Semi-supervised learning for character locations Distortion of the whole text Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Experiment Six public benchmarks: ICDAR-2003 (IC-03) Street View Text (SVT) IIIT5K Street View Text Perspective (SVT-P) ICDAR Incidental Scene Text (IC-IST) Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Experiment - Comparison with Previous Methods Experimental Setting: 37 classes (26 case-insensitive characters + 10 digits + eos) Training datasets: 8-million synthetic images (Jaderberg et al. 2014). Image size: 100 x 32 Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Experiment - General Scene Text Recognition Experimental Setting: 96 classes (26 upper-case letters + 26 lower-case letters + 10 digits + 33 punctuations + eos) Training datasets: 12-million synthetic images (Jaderberg et al. 2014 + Gupta et al. 2016) Image size: 100 x 100 Data augmentation Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Experiment – Qualitative Results prediction: toast Character-Level Attention prediction: beyond Character-Level Attention prediction: wishing Character-Level Attention Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.

Conclusion Input Image A simple but efficient network for distorted text recognition End-to-end trainable framework Hierarchical attention mechanism Character-level encoder Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong. “Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition”. AAAI-18.