Presentation is loading. Please wait.

Presentation is loading. Please wait.

IIIT Hyderabad Thesis Presentation By Raman Jain (20052021) Towards Efficient Methods for Word Image Retrieval.

Similar presentations


Presentation on theme: "IIIT Hyderabad Thesis Presentation By Raman Jain (20052021) Towards Efficient Methods for Word Image Retrieval."— Presentation transcript:

1 IIIT Hyderabad Thesis Presentation By Raman Jain (20052021) Towards Efficient Methods for Word Image Retrieval

2 IIIT Hyderabad Aim at learning similarity measures to compare word images. Similarity? Problem Statement

3 IIIT Hyderabad Feature Extraction and Representation Sliding window is used for feature extraction. Profile features: – Upper word profile, – Lower word profile, – Projection profile, – Background-to-Ink Transition Upper profile Lower profile Projection profile Background-ink transition

4 IIIT Hyderabad Dataset Three types of English datasets are used to demonstrate the capabilities of learning schemes. 1.Calibrated Data (CD) : Generated by rendering the text and passing through a document degradation model. 2.Real Annotated Data (RD) : Set of words from 4 books(765 pages) with their ground truth. 3.Un-annotated Data (UD) : Dataset of 5,870,486 words which come out of 61 scanned books without ground truth. Used only for evaluating Precision.

5 IIIT Hyderabad DTW v/s Fixed Length Matching Performance Measures : 1.Precision : Measures how well a system discards irrelevant results while retrieving. 2.Recall : Measures how well a system finds what the user wants. 3.Average Precision : Measures the area under the precision-recall curve. MeasureDTWEuclidea n mP0.6530.598 mR0.8050.792 mAP0.8530.764 DTW is much slower than Fixed length Matching Baseline results on comparing DTW and Euclidean on CD dataset. Mean of the above measures is computed for multiple queries.

6 IIIT Hyderabad Learning Query Specific Classifier Given a query word image, retrieve all similar word images. We use a weighted Euclidean distance function for matching word images and retrieving relevant images. Where w is a weight vector. During retrieval, in each of the iteration t, weight is updated using

7 IIIT Hyderabad DatasetNo Learning QSC with Eq. 1 QSC with Eq. 2 CD0.7640.9460.944 RD0.8170.9300.939 Results (mAP) on two dataset with 300 queries.

8 IIIT Hyderabad Learning by extrapolating QSC Feature descriptor mapped to d dimension query specific learning in closed form disintegration into sub-word weight vectors Mapped to Constant length vectors Already learnt sub-word(letter) weight vectors Projected back to new dimension based on the relative width of each letter Concatenate and map to a constant length vector Query text This pipeline shows how a weight vector is learnt for each sub-word during training. This pipeline shows how a weight vector is generated by extrapolation for an unseen query which is later used for retrieval.

9 IIIT Hyderabad Extrapolation

10 IIIT Hyderabad Results Data setMeasureDTWEuclideanQSC with extrapolation CDmAP0.8530.7640.902 RDmAP0.7780.8170.923 UDmP0.8900.9150.955 Comparative results of extrapolation on various data.

11 IIIT Hyderabad vowel consonants क (c) + ई (v) = की ka ee kee त (c) + त (c) = त्त tha tha ththa क (c) + द (c) = क्द ka dha kdha स (c) + त (c) + र (c) + ई (v) = स्त्री sa tha ra ee sthree No of characters: 52 No of ligatures : 1000 Hindi Script and Word Formation

12 IIIT Hyderabad Hindi Recognition and Retrieval B. B. Chaudhari and U. Pal –OCR for Bangla and Hindi –Satisfactory performance for clean documents B. B. Chaudhari and U. Pal, An OCR System to Read Two Indian Language Scripts: Bangla and Devnagari (Hindi), ICDAR 1997

13 IIIT Hyderabad Avoiding Complete Recognition Most of the modifiers appear either above the shirorekha or below the character. Shirorekha removal is common. Recognition of the middle zone is simple. Number of classes reduced to around 119.

14 IIIT Hyderabad Taking advantage of both.. Recognition –Compact representation –Efficiency in indexing and retrieval Retrieval –Works with degraded words and complex scripts –No need to segment into characters

15 IIIT Hyderabad BLSTM Model Recurrent neural network Applications in –Handwriting recognition –Speech recognition

16 IIIT Hyderabad BLSTM Model Smart network unit which can remember a value for an arbitrary length Contains gates that determine when the input is significant to remember, when it should continue to remember, and when it should get output. BLSTM – 2 LSTM networks, in which one takes the input from beginning to end and other one from end to the beginning. We used 30 such nodes and 2 hidden layers

17 IIIT Hyderabad BLSTM Model From training examples, BLSTM learn to map input sequences to output sequences. K -> number of classes t -> input sequence index Output Probabilities Input: Sequence of Feature Vectors

18 IIIT Hyderabad Matching and Retrieval Output of BLSTM is a sequence of characters for each input word image. Two images are compared with Edit Distance. word1word2 zoning BLSTM output c1 c2 c3 c4c1 c2 c3 c4 c2 c5 Edit distance =2

19 IIIT Hyderabad Re-ranking Used connected component (CC) at upper zone. #CC at upper zone 1 1 0 0 upper zone Query Database images query1 query2 1 1

20 IIIT Hyderabad Overall Solution Query Image Zoning Feature Extraction Trained BLSTM NN Output character seq Database images Zoning Feature Extraction Trained BLSTM NN Output character seq Edit distance Re-ranking Ranked Word Images

21 IIIT Hyderabad Dataset Book#Pages#Lines#Words Book198246327764 Book2108259028265 Book1 is used as training and validating Book2 is used for testing the retrieval performance

22 IIIT Hyderabad Quantitative Results MethodmPmAP Euclidean78.2371.82 DTW84.6477.39 BLSTM based91.7384.77 BLSTM with Re-ranking93.2689.02 mP : mean of Precision at 50% recall for 100 queries. mAP : mean of Average Precision for 100 queries

23 IIIT Hyderabad Quantitative Results QueriesmPmAP In-vocabulary95.9091.18 Out-vocabulary92.1788.91 Results of BLSTM based method on In-vocabulary and out-vocabulary querites (100 each).

24 IIIT Hyderabad Qualitative Results QueryRetrieved result

25 IIIT Hyderabad Raman Jain, Volkmar Frinken, C. V. Jawahar, R. Manmatha BLSTM Neural Network based Word Retrieval for Hindi Documents In Proceedings of the IEEE International Conference on Document Analysis and Recognition (ICDAR), Beijing, China, 2011. Raman Jain, C. V. Jawahar Towards More Effective Distance Functions for Word Image Matching In Proceedings of the IAPR Document Analysis System (DAS), Boston, U.S. 2010. Publications

26 IIIT Hyderabad


Download ppt "IIIT Hyderabad Thesis Presentation By Raman Jain (20052021) Towards Efficient Methods for Word Image Retrieval."

Similar presentations


Ads by Google