Download presentation
Presentation is loading. Please wait.
Published byHarry Chandler Modified over 9 years ago
1
IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar
2
IIIT Hyderabad Motivation Large number of printed books are digitized
3
IIIT Hyderabad Motivation Large number of printed books are digitized Digital libraries like Universal Digital library (UDL), Digital library of India (DLI) and Google Books etc. Digital Library Database
4
IIIT Hyderabad Motivation Large number of printed books are digitized Digital libraries like Universal Digital library (UDL), Digital library of India (DLI) and Google Books etc. Need to design efficient and effective methodology for content level access Digital Library Database
5
IIIT Hyderabad Process Overview Index Database Documents ProcessingInput Query Matching Retrieved Documents Scanning Matching can be done by two levels : “Text” and “Image”
6
IIIT Hyderabad Matching Approaches Recognition Based Approach (Text Level Matching) Optical Character Recognition (OCR) Recognition Free Approach (Image Level Matching) Word Spotting
7
IIIT Hyderabad Recognition Based Approach Optical Character Recognition (OCR) Binarization of Document Segmentation using connected components Line level Word level Character level Character recognition using different features like patch, profile etc Classification using ANN or SVM
8
IIIT Hyderabad Limitations of Recognition Based Approach Cuts
9
IIIT Hyderabad Limitations of Recognition Based Approach Cuts Merges
10
IIIT Hyderabad Limitations of Recognition Based Approach Cuts Merges Variation in Script
11
IIIT Hyderabad Limitations of Recognition Based Approach Cuts Merges Variation in Script Variation in Font and Typesetting
12
IIIT Hyderabad Limitations of Recognition Based Approach Cuts Merges Variation in Script Variation in Font and Typesetting Underline and Over Written
13
IIIT Hyderabad Recognition Free Approach Word Spotting Representation of word image using global (profile) features
14
IIIT Hyderabad Recognition Free Approach Word Spotting Representation of word image using global (profile) features Matching features using different distance measures like L1, L2 etc
15
IIIT Hyderabad Recognition Free Approach Word Spotting Representation of word image using global (profile) features Matching features using different distance measures like L1, L2 etc Comparison of different size word images using Dynamic time warping (DTW)
16
IIIT Hyderabad Why Recognition Free Approach ? Robust OCRs are unavailable for many non-Latin languages These languages have rich heritage and there is a need for content level search Word Spotting based methods are too slow for real time system Most of the existing retrieval methods are memory intensive Scalability is an immediate challenge
17
IIIT Hyderabad Word Image Retrieval using Bag of Visual Words
18
IIIT Hyderabad Bag of Visual Words (BoVW) Bag of Words (BoW) representation is the most popular representation for text retrieval BoW based efficient systems like Lucene are publically available Bag of Visual Words (BoVW) performs excellently for image and video retrieval BoVW based system is flexible, powerful and scalable to Billions of images
19
IIIT Hyderabad BoVW Representation Word Images are represented using Histogram of Visual Words
20
IIIT Hyderabad BoVW Representation Code Book generation Subset of Images is used Clustering is done using Hierarchical K-Means (HKM) HKM is faster than K-Means both in building tree and finding nearest neighbours
21
IIIT Hyderabad BoVW based Representation
22
IIIT Hyderabad BoVW based Representation
23
IIIT Hyderabad Histogram of Visual Words BoVW based Representation
24
IIIT Hyderabad BoVW based Representation Cuts
25
IIIT Hyderabad Histogram of Visual Words BoVW based Representation Cuts
26
IIIT Hyderabad BoVW based Representation Merges
27
IIIT Hyderabad Histogram of Visual Words BoVW based Representation Merges
28
IIIT Hyderabad Proposed Architecture
29
IIIT Hyderabad Fixed size representation Advantages of BoVW based Representation
30
IIIT Hyderabad Fixed size representation Advantages of BoVW based Representation Clean
31
IIIT Hyderabad Fixed size representation Robust against degradation Advantages of BoVW based Representation
32
IIIT Hyderabad Fixed size representation Robust against degradation Advantages of BoVW based Representation Cuts Merge Clean
33
IIIT Hyderabad Fixed size representation Robust against degradation Scalable to Billions of images Advantage of BoVW based Representation
34
IIIT Hyderabad Fixed size representation Robust against degradation Scalable to Billions of Images Language independent Advantages of BoVW based Representation
35
IIIT Hyderabad Lost Geometry Spatial Verification
36
IIIT Hyderabad Lost Geometry Spatial Verification Clean
37
IIIT Hyderabad Lost Geometry Spatial Verification Clean
38
IIIT Hyderabad Lost Geometry Spatial Verification Clean
39
IIIT Hyderabad Lost Geometry Spatial Verification
40
IIIT Hyderabad Lost Geometry Spatial Verification
41
IIIT Hyderabad Lost Geometry Spatial Verification
42
IIIT Hyderabad Re-ranking SIFT based re-ranking Higher the Total Score, better the match
43
IIIT Hyderabad Experimentations Books Used in Experimentations Language#Books#Pages#Words Hindi4427112677 Malayalam6610108767 Telugu5742131156 Bangla3363124584 Hindi3239921008138
44
IIIT Hyderabad Quantitative Results Performance Statistics Language#Images#QuerymAP after Re-ranking mAP after Spatial Verification Hindi112677 1380.68080.78200.7865 Malayalam108767 1010.69620.79910.8188 Telugu131156 1310.64830.73280.7495 Bangla124584 1250.78060.87660.8947 Hindi1008138 1380.58950.70220.7062
45
IIIT Hyderabad Quantitative Results Performance Statistics Language#Images#Query Prec@10 Prec@10 after Re-ranking Prec@10 after Spatial Verification Hindi112677 138 0.84370.87190.8770 Malayalam108767 1010.76680.83280.8581 Telugu131156 1310.85070.86680.883 Bangla124584 1250.84980.90220.9182 Hindi1008138 138 0.80590.85090.8543
46
IIIT Hyderabad Quantitative Results mAP Vs Query Length
47
IIIT Hyderabad Quantitative Results mAP Vs Query Length More the # characters, better the results
48
IIIT Hyderabad Quantitative Results Retrieval Time and Index Size #ImagesRetrieval TimeIndex Size 25K50ms28 MB 100K209ms130 MB 0.5M411ms550 MB 1M700ms1.2 GB
49
IIIT Hyderabad Qualitative Results QueryRetrieved Results HI
50
IIIT Hyderabad Qualitative Results QueryRetrieved Results
51
IIIT Hyderabad Qualitative Results QueryRetrieved Results
52
IIIT Hyderabad Qualitative Results QueryRetrieved Results
53
IIIT Hyderabad Qualitative Results Sample Output for Noisy Images where Commercial OCR fails QueryRetrieved Results
54
IIIT Hyderabad Enhancement over Bag of Visual Words based Word Image Retrieval
55
IIIT Hyderabad Query Expansion Observation: Top ranked results are correct Top-k results are used to form new query Improves the precision of retrieved list Modified average query expansion ─Instead of equal weight to every Top-k results, rank based weight (1/2 rank ) is given Improves mAP and Prec@10 by 2%
56
IIIT Hyderabad Query Expansion Query Image Index Histogram Querying Refined Histogram Rank 1 Rank 2Rank 3Rank 4Rank 5Rank 6 Query Image Rank 1 Rank 2 Rank 3 Rank 4Rank 5 Rank 6 Query Histogram
57
IIIT Hyderabad Query Expansion Query Image Index Expanded Query Histogram Querying Previous Results Rank 1 Rank 2Rank 3Rank 4Rank 5Rank 6 Modified Results Rank 1 Rank 2Rank 3Rank 4Rank 5Rank 6
58
IIIT Hyderabad Text Query Support Originally formulated in a “query by example” setting but users would prefer textual interface for document image collection We propose a novel and simple framework for text query support Used a small subset of data with ground truth covering all possible characters in a particular language Visual words are learnt specific to each character and averaged across its different variations Given a textual query, we synthesize its BoVW histogram Text query results are comparable to word image results
59
IIIT Hyderabad Text Query Support Query by example setting Input Query ImageHistogram
60
IIIT Hyderabad Text Query Support Query by example setting Text Queries Support Input Text Query Text Query Histogram
61
IIIT Hyderabad Qualitative Results Sample output for queries using different techniques
62
IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment
63
IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment
64
IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment
65
IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment (a) Input Descriptor
66
IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment Problems with VQ
67
IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment Problems with VQ Visual word uncertainty
68
IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment Problems with VQ Visual word uncertainty Mapping single VW from out of 2 or more possible
69
IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Mapping single VW from out of 2 or more possible
70
IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Visual word plausibility
71
IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Visual word plausibility Mapping a visual word without a suitable candidate in the vocabulary
72
IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Visual word plausibility Mapping a visual word without a suitable candidate in the vocabulary.
73
IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Visual word plausibility Solution: Soft Assignment Map each feature vector to 2 or more possible VW
74
IIIT Hyderabad Soft Assignment Map each feature vector to 2 or more possible VW Approached of Soft Assignment Distance based Equal weight Based on Distance in Feature Space Gaussian Distance Does not minimize reconstruction error
75
IIIT Hyderabad Soft Assignment Map each feature vector to 2 or more possible VW Approached of Soft Assignment Distance based Equal weight Based on Distance in Feature Space Gaussian Distance Does not minimize reconstruction error Input Descriptor
76
IIIT Hyderabad Soft Assignment Map each feature vector to 2 or more possible VW Approached of Soft Assignment Distance based Equal weight Based on Distance in Feature Space Gaussian Distance Does not minimize reconstruction error Through learning optimal reconstruction
77
IIIT Hyderabad Locality-constrained Linear Coding (LLC) Similar patch should have similar code Locality of Visual Word is used to describe feature vector
78
IIIT Hyderabad Locality-constrained Linear Coding (LLC) Similar patch should have similar code Locality of Visual Word is used to describe feature vector
79
IIIT Hyderabad Locality-constrained Linear Coding (LLC) Similar patch should have similar code Locality of Visual Word is used to describe feature vector LLC Coding Process Find K – Nearest Neighbors of x i denoted as B Reconstruct x i using B Replace input x i with non-zero code obtained from previous step Input Descriptor
80
IIIT Hyderabad Re-ranking SIFT based re-ranking 1 Longest common sub-sequence (LCS) based re-ranking 2 Size of LCS of visual words projected on x-axis Larger the size, better the match 1.Ravi Shekhar, C. V. Jawahar: Word Image Retrieval Using Bag of Visual Words. DAS 2012 2.Ismet Zeki Yalniz, R. Manmatha: An Efficient Framework for Searching Text in Noisy Document Images, DAS 2012 V1V1 V2V2 V6V6 V4V4 V4V4 V8V8 V9V9 x y 0.5 0 1 11.5 2 2.5 3
81
IIIT Hyderabad Re-ranking SIFT based re-ranking 1 Longest common sub-sequence (LCS) based re-ranking 2 Size of LCS of visual words projected on X-axis Larger the size, better the match Linear Combination 2 Final Score = λ * Index_Score + (1-λ) * Re-ranking _Score where λ weighting parameter 1.Ravi Shekhar, C. V. Jawahar: Word Image Retrieval Using Bag of Visual Words. DAS 2012 2.Ismet Zeki Yalniz, R. Manmatha: An Efficient Framework for Searching Text in Noisy Document Images, DAS 2012
82
IIIT Hyderabad Dataset Used Books Used For The Experiments Book#Pages#Words Telugu- 17161204121 Telugu- 171810021345 English-1601363113008
83
IIIT Hyderabad Quantitative Results LLC Based Statistics (mAP) BookBoVW BoVW + SIFT Re-ranking BoVW + LCS Re-ranking LLC LLC + LCS Re-raking Telugu-17160.81730.86450.90360.910.95 Telugu-17180.78340.88610.9180.920.96 English-16010.80150.85310.920.87650.9451
84
IIIT Hyderabad Quantitative Results Text Query Based Statistics BookMethodmAP Telugu- 1716Text Query0.8413 Telugu- 1718Text Query0.90 English-1601Text Query0.87
85
IIIT Hyderabad Patch Based Word Image Retrieval
86
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch
87
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features
88
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature
89
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile
90
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Measures ink distribution of word image
91
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Measures internal shape of image
92
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Measures internal shape of image
93
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile
94
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile Distance from Upper Boundary of word image
95
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile Distance from Upper Boundary of word image
96
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile Lower Word Profile
97
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile Lower Word Profile Distance from Lower Boundary of word image
98
IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile Lower Word Profile Distance from Lower Boundary of word image
99
IIIT Hyderabad Overview of Feature Calculation... Calculate 4 profile features Concatenate 4 profile features Projection profile Lower word profile Ink Transition Upper word profile Input word image Descriptor
100
IIIT Hyderabad Fast Pre-Processing......... V1V1 V2V2 V3V3...... VkVk Input Patch Corresponding Patch Vector Lookup Table Is patch Vector Present ? Find corresponding Visual Word Retrieve corresponding Visual Word Yes No Update
101
IIIT Hyderabad Dataset Used Book#Pages#Words Telugu- 171810021345 English-1601363113008
102
IIIT Hyderabad Quantitative Results Baseline Statistics BookMethodmAP Telugu- 1718SIFT0.7834 Telugu- 1718Patch0.53 Telugu- 1718Patch Feature0.6183 Telugu- 1718Patch Feature with Overlap0.7214
103
IIIT Hyderabad Quantitative Results Enhancement on Baseline Statistics Enhancement MethodSIFTPatch Feature Query Expansion0.79200.75 Spatial Verification0.85710.83 LCS Re-ranking0.87980.8481
104
IIIT Hyderabad Quantitative Results Results with Split Features BookSIFTPatch Feature Telugu -17180.940.954 English – 16010.930.90
105
IIIT Hyderabad Qualitative Results
106
IIIT Hyderabad Contributions Language Independent System Tested on 4 different languages Scalable to huge dataset Tested on 1 Millions of word Images Handles Noisy document images Demonstrated performance on dataset where commercial OCR fails. Enhancement on baseline results Query Expansion Text Query Support Document specific Sparse coding Document Specific descriptor is proposed
107
IIIT Hyderabad Future Work Test on different font dataset Similar method for handwritten, camera based datasets Learning character level visual word automatically using annotated data Multi Keyword support Combine both recognition based and recognition free methods Improve patch based descriptor.
108
IIIT Hyderabad Related Publications Ravi Shekhar and C. V. Jawahar, “Word Image Retrieval using Bag of Visual Words”, In Proceedings of 10 th IAPR International Workshop on Document Analysis Systems (DAS), 2012. Praveen Krishnan, Ravi Shekhar and C. V. Jawahar, “Content Level Access to Digital Library of India Pages”, In Proceedings of 8 th Indian Conference on Vision, Graphics and Image Processing (ICVGIP), 2012. Ravi Shekhar and C. V. Jawahar, “Document Specific Sparse Coding for Word Retrieval”, In Proceedings of 12 th International Conference on Document Analysis and Recognition (ICDAR), 2013.
109
IIIT Hyderabad Thanks !!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.