IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar.

IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

IIIT Hyderabad Motivation Large number of printed books are digitized

IIIT Hyderabad Motivation Large number of printed books are digitized Digital libraries like Universal Digital library (UDL), Digital library of India (DLI) and Google Books etc. Digital Library Database

IIIT Hyderabad Motivation Large number of printed books are digitized Digital libraries like Universal Digital library (UDL), Digital library of India (DLI) and Google Books etc. Need to design efficient and effective methodology for content level access Digital Library Database

IIIT Hyderabad Process Overview Index Database Documents ProcessingInput Query Matching Retrieved Documents Scanning Matching can be done by two levels : “Text” and “Image”

IIIT Hyderabad Matching Approaches Recognition Based Approach (Text Level Matching) Optical Character Recognition (OCR) Recognition Free Approach (Image Level Matching) Word Spotting

IIIT Hyderabad Recognition Based Approach Optical Character Recognition (OCR) Binarization of Document Segmentation using connected components Line level Word level Character level Character recognition using different features like patch, profile etc Classification using ANN or SVM

IIIT Hyderabad Limitations of Recognition Based Approach Cuts

IIIT Hyderabad Limitations of Recognition Based Approach Cuts Merges

IIIT Hyderabad Limitations of Recognition Based Approach Cuts Merges Variation in Script

IIIT Hyderabad Limitations of Recognition Based Approach Cuts Merges Variation in Script Variation in Font and Typesetting

IIIT Hyderabad Limitations of Recognition Based Approach Cuts Merges Variation in Script Variation in Font and Typesetting Underline and Over Written

IIIT Hyderabad Recognition Free Approach Word Spotting Representation of word image using global (profile) features

IIIT Hyderabad Recognition Free Approach Word Spotting Representation of word image using global (profile) features Matching features using different distance measures like L1, L2 etc

IIIT Hyderabad Recognition Free Approach Word Spotting Representation of word image using global (profile) features Matching features using different distance measures like L1, L2 etc Comparison of different size word images using Dynamic time warping (DTW)

IIIT Hyderabad Why Recognition Free Approach ? Robust OCRs are unavailable for many non-Latin languages These languages have rich heritage and there is a need for content level search Word Spotting based methods are too slow for real time system Most of the existing retrieval methods are memory intensive Scalability is an immediate challenge

IIIT Hyderabad Word Image Retrieval using Bag of Visual Words

IIIT Hyderabad Bag of Visual Words (BoVW) Bag of Words (BoW) representation is the most popular representation for text retrieval BoW based efficient systems like Lucene are publically available Bag of Visual Words (BoVW) performs excellently for image and video retrieval BoVW based system is flexible, powerful and scalable to Billions of images

IIIT Hyderabad BoVW Representation Word Images are represented using Histogram of Visual Words

IIIT Hyderabad BoVW Representation Code Book generation Subset of Images is used Clustering is done using Hierarchical K-Means (HKM) HKM is faster than K-Means both in building tree and finding nearest neighbours

IIIT Hyderabad BoVW based Representation

IIIT Hyderabad Histogram of Visual Words BoVW based Representation

IIIT Hyderabad BoVW based Representation Cuts

IIIT Hyderabad Histogram of Visual Words BoVW based Representation Cuts

IIIT Hyderabad BoVW based Representation Merges

IIIT Hyderabad Histogram of Visual Words BoVW based Representation Merges

IIIT Hyderabad Proposed Architecture

IIIT Hyderabad Fixed size representation Advantages of BoVW based Representation

IIIT Hyderabad Fixed size representation Advantages of BoVW based Representation Clean

IIIT Hyderabad Fixed size representation Robust against degradation Advantages of BoVW based Representation

IIIT Hyderabad Fixed size representation Robust against degradation Advantages of BoVW based Representation Cuts Merge Clean

IIIT Hyderabad Fixed size representation Robust against degradation Scalable to Billions of images Advantage of BoVW based Representation

IIIT Hyderabad Fixed size representation Robust against degradation Scalable to Billions of Images Language independent Advantages of BoVW based Representation

IIIT Hyderabad Lost Geometry Spatial Verification

IIIT Hyderabad Lost Geometry Spatial Verification Clean

IIIT Hyderabad Lost Geometry Spatial Verification

IIIT Hyderabad Re-ranking SIFT based re-ranking Higher the Total Score, better the match

IIIT Hyderabad Experimentations Books Used in Experimentations Language#Books#Pages#Words Hindi4427112677 Malayalam6610108767 Telugu5742131156 Bangla3363124584 Hindi3239921008138

IIIT Hyderabad Quantitative Results Performance Statistics Language#Images#QuerymAP after Re-ranking mAP after Spatial Verification Hindi112677 1380.68080.78200.7865 Malayalam108767 1010.69620.79910.8188 Telugu131156 1310.64830.73280.7495 Bangla124584 1250.78060.87660.8947 Hindi1008138 1380.58950.70220.7062

IIIT Hyderabad Quantitative Results Performance Statistics Language#Images#Query Prec@10 Prec@10 after Re-ranking Prec@10 after Spatial Verification Hindi112677 138 0.84370.87190.8770 Malayalam108767 1010.76680.83280.8581 Telugu131156 1310.85070.86680.883 Bangla124584 1250.84980.90220.9182 Hindi1008138 138 0.80590.85090.8543

IIIT Hyderabad Quantitative Results mAP Vs Query Length

IIIT Hyderabad Quantitative Results mAP Vs Query Length More the # characters, better the results

IIIT Hyderabad Quantitative Results Retrieval Time and Index Size #ImagesRetrieval TimeIndex Size 25K50ms28 MB 100K209ms130 MB 0.5M411ms550 MB 1M700ms1.2 GB

IIIT Hyderabad Qualitative Results QueryRetrieved Results HI

IIIT Hyderabad Qualitative Results QueryRetrieved Results

IIIT Hyderabad Qualitative Results Sample Output for Noisy Images where Commercial OCR fails QueryRetrieved Results

IIIT Hyderabad Enhancement over Bag of Visual Words based Word Image Retrieval

IIIT Hyderabad Query Expansion Observation: Top ranked results are correct Top-k results are used to form new query Improves the precision of retrieved list Modified average query expansion ─Instead of equal weight to every Top-k results, rank based weight (1/2 rank ) is given Improves mAP and Prec@10 by 2%

IIIT Hyderabad Query Expansion Query Image Index Histogram Querying Refined Histogram Rank 1 Rank 2Rank 3Rank 4Rank 5Rank 6 Query Image Rank 1 Rank 2 Rank 3 Rank 4Rank 5 Rank 6 Query Histogram

IIIT Hyderabad Query Expansion Query Image Index Expanded Query Histogram Querying Previous Results Rank 1 Rank 2Rank 3Rank 4Rank 5Rank 6 Modified Results Rank 1 Rank 2Rank 3Rank 4Rank 5Rank 6

IIIT Hyderabad Text Query Support Originally formulated in a “query by example” setting but users would prefer textual interface for document image collection We propose a novel and simple framework for text query support Used a small subset of data with ground truth covering all possible characters in a particular language Visual words are learnt specific to each character and averaged across its different variations Given a textual query, we synthesize its BoVW histogram Text query results are comparable to word image results

IIIT Hyderabad Text Query Support Query by example setting Input Query ImageHistogram

IIIT Hyderabad Text Query Support Query by example setting Text Queries Support Input Text Query Text Query Histogram

IIIT Hyderabad Qualitative Results Sample output for queries using different techniques

IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment

IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment (a) Input Descriptor

IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment Problems with VQ

IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment Problems with VQ Visual word uncertainty

IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment Problems with VQ Visual word uncertainty Mapping single VW from out of 2 or more possible

IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Mapping single VW from out of 2 or more possible

IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Visual word plausibility

IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Visual word plausibility Mapping a visual word without a suitable candidate in the vocabulary

IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Visual word plausibility Mapping a visual word without a suitable candidate in the vocabulary.

IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Visual word plausibility Solution: Soft Assignment Map each feature vector to 2 or more possible VW

IIIT Hyderabad Soft Assignment Map each feature vector to 2 or more possible VW Approached of Soft Assignment Distance based Equal weight Based on Distance in Feature Space Gaussian Distance Does not minimize reconstruction error

IIIT Hyderabad Soft Assignment Map each feature vector to 2 or more possible VW Approached of Soft Assignment Distance based Equal weight Based on Distance in Feature Space Gaussian Distance Does not minimize reconstruction error Input Descriptor

IIIT Hyderabad Soft Assignment Map each feature vector to 2 or more possible VW Approached of Soft Assignment Distance based Equal weight Based on Distance in Feature Space Gaussian Distance Does not minimize reconstruction error Through learning optimal reconstruction

IIIT Hyderabad Locality-constrained Linear Coding (LLC) Similar patch should have similar code Locality of Visual Word is used to describe feature vector

IIIT Hyderabad Locality-constrained Linear Coding (LLC) Similar patch should have similar code Locality of Visual Word is used to describe feature vector LLC Coding Process Find K – Nearest Neighbors of x i denoted as B Reconstruct x i using B Replace input x i with non-zero code obtained from previous step Input Descriptor

IIIT Hyderabad Re-ranking SIFT based re-ranking 1 Longest common sub-sequence (LCS) based re-ranking 2 Size of LCS of visual words projected on x-axis Larger the size, better the match 1.Ravi Shekhar, C. V. Jawahar: Word Image Retrieval Using Bag of Visual Words. DAS 2012 2.Ismet Zeki Yalniz, R. Manmatha: An Efficient Framework for Searching Text in Noisy Document Images, DAS 2012 V1V1 V2V2 V6V6 V4V4 V4V4 V8V8 V9V9 x y 0.5 0 1 11.5 2 2.5 3

IIIT Hyderabad Re-ranking SIFT based re-ranking 1 Longest common sub-sequence (LCS) based re-ranking 2 Size of LCS of visual words projected on X-axis Larger the size, better the match Linear Combination 2 Final Score = λ * Index_Score + (1-λ) * Re-ranking _Score where λ weighting parameter 1.Ravi Shekhar, C. V. Jawahar: Word Image Retrieval Using Bag of Visual Words. DAS 2012 2.Ismet Zeki Yalniz, R. Manmatha: An Efficient Framework for Searching Text in Noisy Document Images, DAS 2012

IIIT Hyderabad Dataset Used Books Used For The Experiments Book#Pages#Words Telugu- 17161204121 Telugu- 171810021345 English-1601363113008

IIIT Hyderabad Quantitative Results LLC Based Statistics (mAP) BookBoVW BoVW + SIFT Re-ranking BoVW + LCS Re-ranking LLC LLC + LCS Re-raking Telugu-17160.81730.86450.90360.910.95 Telugu-17180.78340.88610.9180.920.96 English-16010.80150.85310.920.87650.9451

IIIT Hyderabad Quantitative Results Text Query Based Statistics BookMethodmAP Telugu- 1716Text Query0.8413 Telugu- 1718Text Query0.90 English-1601Text Query0.87

IIIT Hyderabad Patch Based Word Image Retrieval

IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch

IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features

IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature

IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile

IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Measures ink distribution of word image

IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Measures internal shape of image

IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile

IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile Distance from Upper Boundary of word image

IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile Lower Word Profile

IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile Lower Word Profile Distance from Lower Boundary of word image

IIIT Hyderabad Overview of Feature Calculation... Calculate 4 profile features Concatenate 4 profile features Projection profile Lower word profile Ink Transition Upper word profile Input word image Descriptor

IIIT Hyderabad Fast Pre-Processing......... V1V1 V2V2 V3V3...... VkVk Input Patch Corresponding Patch Vector Lookup Table Is patch Vector Present ? Find corresponding Visual Word Retrieve corresponding Visual Word Yes No Update

IIIT Hyderabad Dataset Used Book#Pages#Words Telugu- 171810021345 English-1601363113008

IIIT Hyderabad Quantitative Results Baseline Statistics BookMethodmAP Telugu- 1718SIFT0.7834 Telugu- 1718Patch0.53 Telugu- 1718Patch Feature0.6183 Telugu- 1718Patch Feature with Overlap0.7214

IIIT Hyderabad Quantitative Results Enhancement on Baseline Statistics Enhancement MethodSIFTPatch Feature Query Expansion0.79200.75 Spatial Verification0.85710.83 LCS Re-ranking0.87980.8481

IIIT Hyderabad Quantitative Results Results with Split Features BookSIFTPatch Feature Telugu -17180.940.954 English – 16010.930.90

IIIT Hyderabad Qualitative Results

IIIT Hyderabad Contributions Language Independent System Tested on 4 different languages Scalable to huge dataset Tested on 1 Millions of word Images Handles Noisy document images Demonstrated performance on dataset where commercial OCR fails. Enhancement on baseline results Query Expansion Text Query Support Document specific Sparse coding Document Specific descriptor is proposed

IIIT Hyderabad Future Work Test on different font dataset Similar method for handwritten, camera based datasets Learning character level visual word automatically using annotated data Multi Keyword support Combine both recognition based and recognition free methods Improve patch based descriptor.

IIIT Hyderabad Related Publications Ravi Shekhar and C. V. Jawahar, “Word Image Retrieval using Bag of Visual Words”, In Proceedings of 10 th IAPR International Workshop on Document Analysis Systems (DAS), 2012. Praveen Krishnan, Ravi Shekhar and C. V. Jawahar, “Content Level Access to Digital Library of India Pages”, In Proceedings of 8 th Indian Conference on Vision, Graphics and Image Processing (ICVGIP), 2012. Ravi Shekhar and C. V. Jawahar, “Document Specific Sparse Coding for Word Retrieval”, In Proceedings of 12 th International Conference on Document Analysis and Recognition (ICDAR), 2013.

IIIT Hyderabad Thanks !!!

IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar.

Similar presentations

Presentation on theme: "IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar.

Similar presentations

Presentation on theme: "IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar."— Presentation transcript:

Similar presentations

About project

Feedback