Tofik AliPartha Pratim Roy Department of Computer Science and Engineering Indian Institute of Technology Roorkee CVIP-WM 2017 Paper ID 172 Word Spotting.

Slides:



Advertisements
Similar presentations
A brief review of non-neural-network approaches to deep learning
Advertisements

Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University.
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Handwritten Character Recognition Using Block wise Segmentation Technique (BST) in Neural Network 47th Annual Convention of the Computer Society of India.
Overview of Back Propagation Algorithm
Eigenedginess vs. Eigenhill, Eigenface and Eigenedge by S. Ramesh, S. Palanivel, Sukhendu Das and B. Yegnanarayana Department of Computer Science and Engineering.
IIIT Hyderabad Thesis Presentation By Raman Jain ( ) Towards Efficient Methods for Word Image Retrieval.
Avoiding Segmentation in Multi-digit Numeral String Recognition by Combining Single and Two-digit Classifiers Trained without Negative Examples Dan Ciresan.
Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.
Handwritten Hindi Numerals Recognition Kritika Singh Akarshan Sarkar Mentor- Prof. Amitabha Mukerjee.
Feedforward semantic segmentation with zoom-out features
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
Mentor Prof. Amitabha Mukerjee Deepak Pathak Kaustubh Tapi 10346
Convolutional Neural Network
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Optical Character Recognition
Face Recognition based on 2D-PCA and CNN
Learning to Compare Image Patches via Convolutional Neural Networks SERGEY ZAGORUYKO & NIKOS KOMODAKIS.
Deep Learning for Dual-Energy X-Ray
Summary of “Efficient Deep Learning for Stereo Matching”
Deep Learning Amin Sobhani.
S.Rajeswari Head , Scientific Information Resource Division
Data Mining, Neural Network and Genetic Programming
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
A practical guide to learning Autoencoders
Lockheed Martin Automatic Recognition Clinic
Restricted Boltzmann Machines for Classification
Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan.
Lecture 5 Smaller Network: CNN
Training Techniques for Deep Neural Networks
Mean Euclidean Distance Error (mm)
CS6890 Deep Learning Weizhen Cai
Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules
Introduction to Deep Learning for neuronal data analyses
By: Kevin Yu Ph.D. in Computer Engineering
Computer Vision James Hays
Introduction to Neural Networks
Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.
Rob Fergus Computer Vision
Counting in Dense Crowds using Deep Learning
Vessel Extraction in X-Ray Angiograms Using Deep Learning
Construct a Convolutional Neural Network with Python
Deep learning Introduction Classes of Deep Learning Networks
Hairong Qi, Gonzalez Family Professor
Smart Robots, Drones, IoT
Basics of Deep Learning No Math Required
Single Image Rolling Shutter Distortion Correction
Creating Data Representations
Age and Gender Classification using Convolutional Neural Networks
Outline Background Motivation Proposed Model Experimental Results
Segmentation of Hand Written Text Document
Recurrent Encoder-Decoder Networks for Time-Varying Dense Predictions
Forward and Backward Max Pooling
Analysis of Trained CNN (Receptive Field & Weights of Network)
John H.L. Hansen & Taufiq Al Babba Hasan
RCNN, Fast-RCNN, Faster-RCNN
Convolutional Neural Networks
Word2Vec.
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CSC 578 Neural Networks and Deep Learning
Department of Computer Science Ben-Gurion University of the Negev
Department of Computer Science Ben-Gurion University of the Negev
Automatic Handwriting Generation
Image Processing and Multi-domain Translation
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Neural Machine Translation using CNN
End-to-End Facial Alignment and Recognition
Bidirectional LSTM-CRF Models for Sequence Tagging
Week 7 Presentation Ngoc Ta Aidean Sharghi
Directional Occlusion with Neural Network
Presentation transcript:

Tofik AliPartha Pratim Roy Department of Computer Science and Engineering Indian Institute of Technology Roorkee CVIP-WM 2017 Paper ID 172 Word Spotting based on Pyramidal Histogram of Characters Code for Handwritten Text Documents

Word Spotting Retrieval of similar word to query word. Query word has two form 1.Query by example 2.Query by String PHOC Code It is a n encoded representation of a word. Almaz´an, J., Gordo, A., Forn´es, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(12), 2552–2566 (2014) Paper ID 172 Word Spotting based on Pyramidal Histogram of Characters Code for Handwritten Text Documents

Phase 1 Word segmentation from handwritten text documents. Phase 2 Transformation of word image/String into PHOC code. Phase 3 Retrieval of words similar to query word. Paper ID 172 Word Spotting based on Pyramidal Histogram of Characters Code for Handwritten Text Documents

Phase 1 Word segmentation from handwritten text documents. C O N V 5 X 5 (32) C O N V 5 X 5 (32) MAX POOL (2,2) MAX POOL (2,2) 512x512 C O N V 5 X 5 (32) C O N V 5 X 5 (32) C1_1 C1_2 MP1 512x x256 C O N V 3 X 3 (64) C O N V 3 X 3 (64) MAX POOL (2,2) MAX POOL (2,2) 256x256 C O N V 3 X 3 (64) C O N V 3 X 3 (64) C2_1 C2_2 MP2 256x x128 C O N V 3 X 3 (128) C O N V 3 X 3 (128) MAX POOL (2,2) MAX POOL (2,2) 128x128 C O N V 3 X 3 (128) C O N V 3 X 3 (128) C3_1 C3_2 MP3 128x128 64x64 C O N V 3 X 3 (128) C O N V 3 X 3 (128) 64x64 C O N V 3 X 3 (128) C O N V 3 X 3 (128) C4_2 C4_1 64x64 UP- SAMP LE (2,2) UP- SAMP LE (2,2) UP1 128x128 C O N V 3 X 3 (64) C O N V 3 X 3 (64) 128x128 C O N V 3 X 3 (64) C O N V 3 X 3 (64) C5_2 C5_1 128x128 UP- SAMP LE (2,2) UP- SAMP LE (2,2) UP2 256x256 Merge 128x128 C O N V 3 X 3 (32) C O N V 3 X 3 (32) 256x256 C O N V 3 X 3 (32) C O N V 3 X 3 (32) C6_2 C6_1 256x256 UP- SAMP LE (2,2) UP- SAMP LE (2,2) UP2 512x512 Merge 256x256 C O N V 3 X 3 (1) C O N V 3 X 3 (1) C7_1 512x512 Merge 512x512

Original Document Image Ground Truth Final Bounding Box Output of Segmentation CNN Phase 1 Results of word segmentation

How the PHOC code generated? Level 1 PHOC code 3 level PHOC code Phase 2 Transformation of word image/String into PHOC code. place Level 2 PHOC code Level 3 PHOC code

Phase 2 PHOCNet : Existing deep CNN based architecture Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016)

3x3 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 5x5 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 9x9 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 13x13 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 21x21 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 29x29 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 37x37 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 45x45 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 53x53 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 61x61 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 69x69 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 77x77 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 3x3 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 5x5 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 7x7 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 9x9 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 11x11 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 13x13 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 15x15 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 17x17 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 19x19 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 21x21 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 42x42 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 44x44 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 46x46 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 92x92 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 94x94 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 96x96 85x85 receptive field

Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 96x96 85x85 receptive field

Phase 2 Modified PHOCNet: It require less memory and time to train and test. Minimum size 48x48 45x45 receptive field C O N V 3 X 3 (64) C O N V 3 X 3 (64) MAX POO L (2,2) MAX POO L (2,2) C O N V 3 X 3 (64) C O N V 3 X 3 (64) C1_ 1 C1_ 2 MP1 C O N V 3 X 3 (128) C O N V 3 X 3 (128) MAX POO L (2,2) MAX POO L (2,2) C O N V 3 X 3 (128) C O N V 3 X 3 (128) C2_ 1 C2_ 2 MP1 C O N V 3 X 3 (256) C O N V 3 X 3 (256) C O N V 3 X 3 (256) C O N V 3 X 3 (256) C3_ 1 C3_ 2 C O N V 3 X 3 (512) C O N V 3 X 3 (512) C O N V 3 X 3 (512) C O N V 3 X 3 (512) C4_ 1 C4_ 2 S P N 1 X [1,2,4,7,11] (512x25 =12800) S P N 1 X [1,2,4,7,11] (512x25 =12800) SPP1_1 F C N (4096) F C N (4096) F C N (1024) F C N (1024) FCN_ 1 FCN_ 2 F C N (674) F C N (674) FCN_ 3 P H O C c o d e ( ) Fixed the input word image height as 48 pixels Remove 5 convolution layers to cooperate with image height Introduce overlapped Spatial pyramidal max pooling layer

Phase 2 Modified PHOCNet: It require less memory and time to train and test. Minimum size 48x48 45x45 receptive field C O N V 3 X 3 (64) C O N V 3 X 3 (64) MAX POO L (2,2) MAX POO L (2,2) C O N V 3 X 3 (64) C O N V 3 X 3 (64) C1_ 1 C1_ 2 MP1 C O N V 3 X 3 (128) C O N V 3 X 3 (128) MAX POO L (2,2) MAX POO L (2,2) C O N V 3 X 3 (128) C O N V 3 X 3 (128) C2_ 1 C2_ 2 MP1 C O N V 3 X 3 (256) C O N V 3 X 3 (256) C O N V 3 X 3 (256) C O N V 3 X 3 (256) C3_ 1 C3_ 2 C O N V 3 X 3 (512) C O N V 3 X 3 (512) C O N V 3 X 3 (512) C O N V 3 X 3 (512) C4_ 1 C4_ 2 S P N 1 X [1,2,4,7,11] (512x25 =12800) S P N 1 X [1,2,4,7,11] (512x25 =12800) SPP1_1 F C N (4096) F C N (4096) F C N (1024) F C N (1024) FCN_ 1 FCN_ 2 F C N (674) F C N (674) FCN_ 3 P H O C c o d e ( ) Fixed the input word image height as 48 pixels Remove 5 convolution layers to cooperate with image height Introduce overlapped Spatial pyramidal max pooling layer

Paper ID 172 Phase 2 Transformation of word image/String into PHOC code

C O N V 5 X 5 (32) MAX POOL (2,2) 512x512 C O N V 5 X 5 (32) C1_1 C1_2 MP1 512x x256 C O N V 3 X 3 (64) MAX POOL (2,2) 256x256 C O N V 3 X 3 (64) C2_1 C2_2 MP2 256x x128 C O N V 3 X 3 (128) MAX POOL (2,2) 128x128 C O N V 3 X 3 (128) C3_1 C3_2 MP3 128x128 64x64 C O N V 3 X 3 (128) 64x64 C O N V 3 X 3 (128) C4_2 C4_1 64x64 UP- SAM PLE (2,2) UP1 128x128 C O N V 3 X 3 (64) 128x128 C O N V 3 X 3 (64) C5_2 C5_1 128x128 UP- SAM PLE (2,2) UP2 256x256 Merge 128x128 C O N V 3 X 3 (32) 256x256 C O N V 3 X 3 (32) C6_2 C6_1 256x256 UP- SAM PLE (2,2) UP2 512x512 Merge 256x256 C O N V 3 X 3 (1) C7_1 512x512 Merge 512x512

C O N V 3 X 3 (64) MAX POOL (2,2) C O N V 3 X 3 (64) C1_1 C1_2 MP1 C O N V 3 X 3 (128) MAX POOL (2,2) C O N V 3 X 3 (128) C2_1 C2_2 MP1 C O N V 3 X 3 (256) C O N V 3 X 3 (256) C3_1 C3_2 C O N V 3 X 3 (512) C O N V 3 X 3 (512) C4_1 C4_2 S P N 1 X [1,2,4,7,11] (512x25 =12800) SPP1_1 F C N (4096) F C N (1024) FCN_1 FCN_2 F C N (674) FCN_3 P H O C c o d e ( )

Original Document Image Ground TruthFinal Bounding Box Output of Segmentation CNN

Original Document Image Ground TruthFinal Bounding Box Output of Segmentation CNN

Can you examine Document ImageTextSegmented WordsRetrieved Words

Handwritten Documents Segmentation using CNN PHOC code Generation Query Word Minimum Edit Distance Calculation Information Retrieval as Word Spotting PHOC Codes Repository with Indexing