IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar
IIIT Hyderabad Motivation Large number of printed books are digitized
IIIT Hyderabad Motivation Large number of printed books are digitized Digital libraries like Universal Digital library (UDL), Digital library of India (DLI) and Google Books etc. Digital Library Database
IIIT Hyderabad Motivation Large number of printed books are digitized Digital libraries like Universal Digital library (UDL), Digital library of India (DLI) and Google Books etc. Need to design efficient and effective methodology for content level access Digital Library Database
IIIT Hyderabad Process Overview Index Database Documents ProcessingInput Query Matching Retrieved Documents Scanning Matching can be done by two levels : “Text” and “Image”
IIIT Hyderabad Matching Approaches Recognition Based Approach (Text Level Matching) Optical Character Recognition (OCR) Recognition Free Approach (Image Level Matching) Word Spotting
IIIT Hyderabad Recognition Based Approach Optical Character Recognition (OCR) Binarization of Document Segmentation using connected components Line level Word level Character level Character recognition using different features like patch, profile etc Classification using ANN or SVM
IIIT Hyderabad Limitations of Recognition Based Approach Cuts
IIIT Hyderabad Limitations of Recognition Based Approach Cuts Merges
IIIT Hyderabad Limitations of Recognition Based Approach Cuts Merges Variation in Script
IIIT Hyderabad Limitations of Recognition Based Approach Cuts Merges Variation in Script Variation in Font and Typesetting
IIIT Hyderabad Limitations of Recognition Based Approach Cuts Merges Variation in Script Variation in Font and Typesetting Underline and Over Written
IIIT Hyderabad Recognition Free Approach Word Spotting Representation of word image using global (profile) features
IIIT Hyderabad Recognition Free Approach Word Spotting Representation of word image using global (profile) features Matching features using different distance measures like L1, L2 etc
IIIT Hyderabad Recognition Free Approach Word Spotting Representation of word image using global (profile) features Matching features using different distance measures like L1, L2 etc Comparison of different size word images using Dynamic time warping (DTW)
IIIT Hyderabad Why Recognition Free Approach ? Robust OCRs are unavailable for many non-Latin languages These languages have rich heritage and there is a need for content level search Word Spotting based methods are too slow for real time system Most of the existing retrieval methods are memory intensive Scalability is an immediate challenge
IIIT Hyderabad Word Image Retrieval using Bag of Visual Words
IIIT Hyderabad Bag of Visual Words (BoVW) Bag of Words (BoW) representation is the most popular representation for text retrieval BoW based efficient systems like Lucene are publically available Bag of Visual Words (BoVW) performs excellently for image and video retrieval BoVW based system is flexible, powerful and scalable to Billions of images
IIIT Hyderabad BoVW Representation Word Images are represented using Histogram of Visual Words
IIIT Hyderabad BoVW Representation Code Book generation Subset of Images is used Clustering is done using Hierarchical K-Means (HKM) HKM is faster than K-Means both in building tree and finding nearest neighbours
IIIT Hyderabad BoVW based Representation
IIIT Hyderabad BoVW based Representation
IIIT Hyderabad Histogram of Visual Words BoVW based Representation
IIIT Hyderabad BoVW based Representation Cuts
IIIT Hyderabad Histogram of Visual Words BoVW based Representation Cuts
IIIT Hyderabad BoVW based Representation Merges
IIIT Hyderabad Histogram of Visual Words BoVW based Representation Merges
IIIT Hyderabad Proposed Architecture
IIIT Hyderabad Fixed size representation Advantages of BoVW based Representation
IIIT Hyderabad Fixed size representation Advantages of BoVW based Representation Clean
IIIT Hyderabad Fixed size representation Robust against degradation Advantages of BoVW based Representation
IIIT Hyderabad Fixed size representation Robust against degradation Advantages of BoVW based Representation Cuts Merge Clean
IIIT Hyderabad Fixed size representation Robust against degradation Scalable to Billions of images Advantage of BoVW based Representation
IIIT Hyderabad Fixed size representation Robust against degradation Scalable to Billions of Images Language independent Advantages of BoVW based Representation
IIIT Hyderabad Lost Geometry Spatial Verification
IIIT Hyderabad Lost Geometry Spatial Verification Clean
IIIT Hyderabad Lost Geometry Spatial Verification Clean
IIIT Hyderabad Lost Geometry Spatial Verification Clean
IIIT Hyderabad Lost Geometry Spatial Verification
IIIT Hyderabad Lost Geometry Spatial Verification
IIIT Hyderabad Lost Geometry Spatial Verification
IIIT Hyderabad Re-ranking SIFT based re-ranking Higher the Total Score, better the match
IIIT Hyderabad Experimentations Books Used in Experimentations Language#Books#Pages#Words Hindi Malayalam Telugu Bangla Hindi
IIIT Hyderabad Quantitative Results Performance Statistics Language#Images#QuerymAP after Re-ranking mAP after Spatial Verification Hindi Malayalam Telugu Bangla Hindi
IIIT Hyderabad Quantitative Results Performance Statistics Language#Images#Query after Re-ranking after Spatial Verification Hindi Malayalam Telugu Bangla Hindi
IIIT Hyderabad Quantitative Results mAP Vs Query Length
IIIT Hyderabad Quantitative Results mAP Vs Query Length More the # characters, better the results
IIIT Hyderabad Quantitative Results Retrieval Time and Index Size #ImagesRetrieval TimeIndex Size 25K50ms28 MB 100K209ms130 MB 0.5M411ms550 MB 1M700ms1.2 GB
IIIT Hyderabad Qualitative Results QueryRetrieved Results HI
IIIT Hyderabad Qualitative Results QueryRetrieved Results
IIIT Hyderabad Qualitative Results QueryRetrieved Results
IIIT Hyderabad Qualitative Results QueryRetrieved Results
IIIT Hyderabad Qualitative Results Sample Output for Noisy Images where Commercial OCR fails QueryRetrieved Results
IIIT Hyderabad Enhancement over Bag of Visual Words based Word Image Retrieval
IIIT Hyderabad Query Expansion Observation: Top ranked results are correct Top-k results are used to form new query Improves the precision of retrieved list Modified average query expansion ─Instead of equal weight to every Top-k results, rank based weight (1/2 rank ) is given Improves mAP and by 2%
IIIT Hyderabad Query Expansion Query Image Index Histogram Querying Refined Histogram Rank 1 Rank 2Rank 3Rank 4Rank 5Rank 6 Query Image Rank 1 Rank 2 Rank 3 Rank 4Rank 5 Rank 6 Query Histogram
IIIT Hyderabad Query Expansion Query Image Index Expanded Query Histogram Querying Previous Results Rank 1 Rank 2Rank 3Rank 4Rank 5Rank 6 Modified Results Rank 1 Rank 2Rank 3Rank 4Rank 5Rank 6
IIIT Hyderabad Text Query Support Originally formulated in a “query by example” setting but users would prefer textual interface for document image collection We propose a novel and simple framework for text query support Used a small subset of data with ground truth covering all possible characters in a particular language Visual words are learnt specific to each character and averaged across its different variations Given a textual query, we synthesize its BoVW histogram Text query results are comparable to word image results
IIIT Hyderabad Text Query Support Query by example setting Input Query ImageHistogram
IIIT Hyderabad Text Query Support Query by example setting Text Queries Support Input Text Query Text Query Histogram
IIIT Hyderabad Qualitative Results Sample output for queries using different techniques
IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment
IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment
IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment
IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment (a) Input Descriptor
IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment Problems with VQ
IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment Problems with VQ Visual word uncertainty
IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment Problems with VQ Visual word uncertainty Mapping single VW from out of 2 or more possible
IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Mapping single VW from out of 2 or more possible
IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Visual word plausibility
IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Visual word plausibility Mapping a visual word without a suitable candidate in the vocabulary
IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Visual word plausibility Mapping a visual word without a suitable candidate in the vocabulary.
IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Visual word plausibility Solution: Soft Assignment Map each feature vector to 2 or more possible VW
IIIT Hyderabad Soft Assignment Map each feature vector to 2 or more possible VW Approached of Soft Assignment Distance based Equal weight Based on Distance in Feature Space Gaussian Distance Does not minimize reconstruction error
IIIT Hyderabad Soft Assignment Map each feature vector to 2 or more possible VW Approached of Soft Assignment Distance based Equal weight Based on Distance in Feature Space Gaussian Distance Does not minimize reconstruction error Input Descriptor
IIIT Hyderabad Soft Assignment Map each feature vector to 2 or more possible VW Approached of Soft Assignment Distance based Equal weight Based on Distance in Feature Space Gaussian Distance Does not minimize reconstruction error Through learning optimal reconstruction
IIIT Hyderabad Locality-constrained Linear Coding (LLC) Similar patch should have similar code Locality of Visual Word is used to describe feature vector
IIIT Hyderabad Locality-constrained Linear Coding (LLC) Similar patch should have similar code Locality of Visual Word is used to describe feature vector
IIIT Hyderabad Locality-constrained Linear Coding (LLC) Similar patch should have similar code Locality of Visual Word is used to describe feature vector LLC Coding Process Find K – Nearest Neighbors of x i denoted as B Reconstruct x i using B Replace input x i with non-zero code obtained from previous step Input Descriptor
IIIT Hyderabad Re-ranking SIFT based re-ranking 1 Longest common sub-sequence (LCS) based re-ranking 2 Size of LCS of visual words projected on x-axis Larger the size, better the match 1.Ravi Shekhar, C. V. Jawahar: Word Image Retrieval Using Bag of Visual Words. DAS Ismet Zeki Yalniz, R. Manmatha: An Efficient Framework for Searching Text in Noisy Document Images, DAS 2012 V1V1 V2V2 V6V6 V4V4 V4V4 V8V8 V9V9 x y
IIIT Hyderabad Re-ranking SIFT based re-ranking 1 Longest common sub-sequence (LCS) based re-ranking 2 Size of LCS of visual words projected on X-axis Larger the size, better the match Linear Combination 2 Final Score = λ * Index_Score + (1-λ) * Re-ranking _Score where λ weighting parameter 1.Ravi Shekhar, C. V. Jawahar: Word Image Retrieval Using Bag of Visual Words. DAS Ismet Zeki Yalniz, R. Manmatha: An Efficient Framework for Searching Text in Noisy Document Images, DAS 2012
IIIT Hyderabad Dataset Used Books Used For The Experiments Book#Pages#Words Telugu Telugu English
IIIT Hyderabad Quantitative Results LLC Based Statistics (mAP) BookBoVW BoVW + SIFT Re-ranking BoVW + LCS Re-ranking LLC LLC + LCS Re-raking Telugu Telugu English
IIIT Hyderabad Quantitative Results Text Query Based Statistics BookMethodmAP Telugu- 1716Text Query Telugu- 1718Text Query0.90 English-1601Text Query0.87
IIIT Hyderabad Patch Based Word Image Retrieval
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Measures ink distribution of word image
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Measures internal shape of image
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Measures internal shape of image
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile Distance from Upper Boundary of word image
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile Distance from Upper Boundary of word image
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile Lower Word Profile
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile Lower Word Profile Distance from Lower Boundary of word image
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile Lower Word Profile Distance from Lower Boundary of word image
IIIT Hyderabad Overview of Feature Calculation... Calculate 4 profile features Concatenate 4 profile features Projection profile Lower word profile Ink Transition Upper word profile Input word image Descriptor
IIIT Hyderabad Fast Pre-Processing V1V1 V2V2 V3V VkVk Input Patch Corresponding Patch Vector Lookup Table Is patch Vector Present ? Find corresponding Visual Word Retrieve corresponding Visual Word Yes No Update
IIIT Hyderabad Dataset Used Book#Pages#Words Telugu English
IIIT Hyderabad Quantitative Results Baseline Statistics BookMethodmAP Telugu- 1718SIFT Telugu- 1718Patch0.53 Telugu- 1718Patch Feature Telugu- 1718Patch Feature with Overlap0.7214
IIIT Hyderabad Quantitative Results Enhancement on Baseline Statistics Enhancement MethodSIFTPatch Feature Query Expansion Spatial Verification LCS Re-ranking
IIIT Hyderabad Quantitative Results Results with Split Features BookSIFTPatch Feature Telugu English –
IIIT Hyderabad Qualitative Results
IIIT Hyderabad Contributions Language Independent System Tested on 4 different languages Scalable to huge dataset Tested on 1 Millions of word Images Handles Noisy document images Demonstrated performance on dataset where commercial OCR fails. Enhancement on baseline results Query Expansion Text Query Support Document specific Sparse coding Document Specific descriptor is proposed
IIIT Hyderabad Future Work Test on different font dataset Similar method for handwritten, camera based datasets Learning character level visual word automatically using annotated data Multi Keyword support Combine both recognition based and recognition free methods Improve patch based descriptor.
IIIT Hyderabad Related Publications Ravi Shekhar and C. V. Jawahar, “Word Image Retrieval using Bag of Visual Words”, In Proceedings of 10 th IAPR International Workshop on Document Analysis Systems (DAS), Praveen Krishnan, Ravi Shekhar and C. V. Jawahar, “Content Level Access to Digital Library of India Pages”, In Proceedings of 8 th Indian Conference on Vision, Graphics and Image Processing (ICVGIP), Ravi Shekhar and C. V. Jawahar, “Document Specific Sparse Coding for Word Retrieval”, In Proceedings of 12 th International Conference on Document Analysis and Recognition (ICDAR), 2013.
IIIT Hyderabad Thanks !!!