DeepFont: Large-Scale Real-World Font Recognition from Images

Slides:



Advertisements
Similar presentations
A brief review of non-neural-network approaches to deep learning
Advertisements

High-level Component Filtering for Robust Scene Text Detection
Brian Merrick CS498 Seminar.  Introduction to Neural Networks  Types of Neural Networks  Neural Networks with Pattern Recognition  Applications.
Distributed Representations of Sentences and Documents
Spatial Pyramid Pooling in Deep Convolutional
Hub Queue Size Analyzer Implementing Neural Networks in practice.
Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Overcoming Dataset Bias: An Unsupervised Domain Adaptation Approach Boqing Gong University of Southern California Joint work with Fei Sha and Kristen Grauman.
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman.
Learning Features and Parts for Fine-Grained Recognition Authors: Jonathan Krause, Timnit Gebru, Jia Deng, Li-Jia Li, Li Fei-Fei ICPR, 2014 Presented by:
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Big data classification using neural network
Hybrid Deep Learning for Reflectance Confocal Microscopy Skin Images
Dimensionality Reduction and Principle Components Analysis
Learning to Compare Image Patches via Convolutional Neural Networks
Handwritten Digit Recognition Using Stacked Autoencoders
Analysis of Sparse Convolutional Neural Networks
Convolutional Neural Network
Applying Deep Neural Network to Enhance EMPI Searching
Deep Feedforward Networks
Deeply learned face representations are sparse, selective, and robust
The Relationship between Deep Learning and Brain Function
Compact Bilinear Pooling
Data Mining, Neural Network and Genetic Programming
DeepFont: Identify Your Font from An Image
Data Mining, Neural Network and Genetic Programming
Data Mining, Neural Network and Genetic Programming
DeepCount Mark Lenson.
Article Review Todd Hricik.
Matt Gormley Lecture 16 October 24, 2016
Restricted Boltzmann Machines for Classification
Neural networks (3) Regularization Autoencoder
Supervised Training of Deep Networks
Deep Belief Networks Psychology 209 February 22, 2013.
Unsupervised Learning and Autoencoders
State-of-the-art face recognition systems
By: Kevin Yu Ph.D. in Computer Engineering
Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
RGB-D Image for Scene Recognition by Jiaqi Guo
Deep Learning Hierarchical Representations for Image Steganalysis
INF 5860 Machine learning for image classification
Deep learning Introduction Classes of Deep Learning Networks
Very Deep Convolutional Networks for Large-Scale Image Recognition
Pose Estimation for non-cooperative Spacecraft Rendevous using CNN
Deep Cross-media Knowledge Transfer
Deep Robust Unsupervised Multi-Modal Network
Deep Learning Some slides are from Prof. Andrew Ng of Stanford.
Autoencoders hi shea autoencoders Sys-AI.
Neural networks (3) Regularization Autoencoder
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
Course Recap and What’s Next?
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Unsupervised Perceptual Rewards For Imitation Learning
Automatic Handwriting Generation
Low-Rank Sparse Feature Selection for Patient Similarity Learning
Shengcong Chen, Changxing Ding, Minfeng Liu 2018
Presentation transcript:

DeepFont: Large-Scale Real-World Font Recognition from Images Zhangyang (Atlas) Wang Joint work with Jianchao Yang, Hailin Jin, Jon Brandt, Eli Shechtman, Aseem Argawala, and Thomas Huang

Problem Definition Seen a font in use and want to identify what it is?

Problem Definition Font recognition: recognize font style (typeface, slop, weight, etc) automatically from real-world photos Why it matters? Highly desirable feature for designers Design library collection Design inspiration Text editing

Challenges An extremely large-scale recognition problem Over 100,000 fonts claimed on myfonts.com in their collection Beyond object recognition: recognizing subtle design styles. Extremely difficult to collect real-world training data Has to rely on synthetic training data BIG mismatch between synthetic training and real-world testing

Solution Deep convolutional neural network? Effective at large-scale recognition Effective at fine-grained recognition Data-driven Problem: huge mismatch between synthetic training and real-world testing Data augmentation Decomposition-based deep CNN for domain adaptation

The AdobeVFR Dataset Synthetic training set 2383 fonts from Adobe Type Library (extended to 4052 classes later) 1000 synthetic English word images per font ~2.4M training images Real-world testing set 4383 real-world labeled images Covering 671 fonts out of 2383 …………………………………………… The first large-scale benchmark set for the task of visual font recognition Consisting of both synthetic and real-world text images Also good for fine-grain classification, domain adaption, understand design styles

Deep Convolutional Neural Network Following the benchmark structure?

Domain Mismatch Direct training on synthetic data and testing on real-world data (Top-5 accuracy) Need domain adaptation to minimize the gap between synthetic training and real-world testing! Synthetic Real-World Training 99.16% NA Testing 98.97% 49.24%

Data Augmentation Common degradations Noise, blur, warping, shading, compression artifacts, etc

Data Augmentation Common degradations Noise, blur, warping, shading, compression artifacts, etc Special degradations Aspect ratio squeezing: squeeze the image using a random ratio in [1.5, 3.5] in the horizontal direction.

Data Augmentation Common degradations Noise, blur, warping, shading, compression artifacts, etc Special degradations Aspect ratio squeezing: squeeze the image using a random ratio in [1.5, 3.5] in the horizontal direction. Random character spacing: render training text images with random character spacing

Data Augmentation Common degradations Noise, blur, warping, shading, compression artifacts, etc Special degradations Aspect ratio squeezing: squeeze the image using a random ratio in [1.5, 3.5] in the horizontal direction. Random character spacing: render training text images with random character spacing Inputs to the network: random 105x105 crops

Effects of Data Augmentation Synthetic 1-4: common degradations Synthetic 5-6: special degradations Synthetic 1-6: all degradations On the right: MMD between synthetic and real-world data responses

Beyond Data Augmentation Problems Cannot enumerate all possible degradations, e.g., background and font decorations. May introduce degradation bias in training Design the learning algorithm to be robust to domain mismatch? Mismatch already happens in the low-level features Tons of unlabeled real-world data

Network Decomposition for Domain Adaptation Unsupervised cross-domain sub-network Cu (N layers) Supervised domain-specific sub-network Cs (7-N layers)

Network Decomposition for Domain Adaptation Train sub-network Cu in a unsupervised training using stacked convolutional auto encoders, with both synthetic data and unlabeled real-world data. Fix sub-network Cu, and train sub-network Cs in a supervised way, using the labeled synthetic data.

Quantitative Evaluation 4383 real-world test images collected from font forums. Model Augmentation? Decomposition? Real-World Test (Accuracy) Top 1 Top 5 LFE Y Na 42.56% 60.31% DeepFont N 42.49% 49.24% 66.70% 79.22% 71.42% 81.79% Varying the layer number K of unsupervised network Cu K 1 2 3 4 5 Training 91.54% 90.12% 88.77% 87.46% 84.79% 82.12% Testing 79.28% 79.69% 81.79% 81.04% 77.48% 74.03%

Successful Examples

Failure Examples

Model Compression For a typical CNN, about 90% of the storage is taken up by the dense connected layers Matrix factorization methods are considered for compressing parameters in linear models, by capturing nearly low-rank property of parameter matrices. The plots of eigenvalues for the fc6 layer weight matrix in DeepFont. This densely connected layer takes up 85% of the total model size.

Model Compression During training, we add a low rank constraint on the fc 6 (rank < k) layer In practice, we adopt very aggressive compression on all fc layers, and obtained a mini-model with ~40 MB in storage, with a compression ratio >18, and (top-5) performance loss ~3%. Take-Home Points: FC layers can be highly redundant. Compressing them aggressively MIGHT work well. Joint Training-Compression performs notably better than two-stage.

In Adobe Product: Recognize Fonts from Images

In Adobe Product: Photoshop Prototype

Text Editing Inside Photoshop

Text Editing Inside Photoshop

In Adobe Product: Discover Similarity between Fonts Font inspiration, browsing, and organization

In Adobe Product: Discover Similarity between Fonts Font inspiration, browsing, and organization

Thank you! For more information Full paper will be made available quite soon AdobeVFR Dataset will be available soon