IIIT Hyderabad Thesis Presentation By Raman Jain (20052021) Towards Efficient Methods for Word Image Retrieval.

Slides:



Advertisements
Similar presentations
FUNCTION FITTING Student’s name: Ruba Eyal Salman Supervisor:
Advertisements

Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
Image Retrieval With Relevant Feedback Hayati Cam & Ozge Cavus IMAGE RETRIEVAL WITH RELEVANCE FEEDBACK Hayati CAM Ozge CAVUS.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
Neural networks Introduction Fitting neural networks
Chapter 5: Introduction to Information Retrieval
Segmentation of Touching Characters in Devnagari & Bangla Scripts Using Fuzzy MultiFactorial Analysis Presented By: Sanjeev Maharjan St. Xavier’s College.
Content-Based Image Retrieval
Word Recognition of Indic Scripts
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Information Retrieval in Practice
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
Object retrieval with large vocabularies and fast spatial matching
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
CONTENT BASED FACE RECOGNITION Ankur Jain 01D05007 Pranshu Sharma Prashant Baronia 01D05005 Swapnil Zarekar 01D05001 Under the guidance of Prof.
Multi-Class Object Recognition Using Shared SIFT Features
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.
Document Image Analysis CSE 717 An Introduction. Document Image Analysis  DIA is the theory and practice of recovering the symbol structures of digital.
Chapter 5: Information Retrieval and Web Search
Content Level Access to Digital Library of India Pages
Overview of Search Engines
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
IIIT HyderabadUMASS AMHERST Robust Recognition of Documents by Fusing Results of Word Clusters Venkat Rasagna 1, Anand Kumar 1, C. V. Jawahar 1, R. Manmatha.
A Search Engine for Historical Manuscript Images Toni M. Rath, R. Manmatha and Victor Lavrenko Center for Intelligent Information Retrieval University.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Chapter 6: Information Retrieval and Web Search
IIIT Hyderabad Word Hashing for Efficient Search in Document Image Collections Anand Kumar Advisors: Dr. C. V. Jawahar IIIT Hyderabad Dr. R. Manmatha University.
IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar.
Personalized Course Navigation Based on Grey Relational Analysis Han-Ming Lee, Chi-Chun Huang, Tzu- Ting Kao (Dept. of Computer Science and Information.
IIIT Hyderabad Learning Semantic Interaction among Graspable Objects Swagatika Panda, A.H. Abdul Hafez, C.V. Jawahar Center for Visual Information Technology,
EE459 Neural Networks Examples of using Neural Networks Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.
PROJECT PROPOSAL DIGITAL IMAGE PROCESSING TITLE:- Automatic Machine Written Document Reader Project Partners:- Manohar Kuse(Y08UC073) Sunil Prasad Jaiswal(Y08UC124)
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Reporter: 資訊所 P Yung-Chih Cheng ( 鄭詠之 ).  Introduction  Data Collection  System Architecture  Feature Extraction  Recognition Methods  Results.
An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Preliminary Transformations Presented By: -Mona Saudagar Under Guidance of: - Prof. S. V. Jain Multi Oriented Text Recognition In Digital Images.
Arabic Handwriting Recognition Thomas Taylor. Roadmap  Introduction to Handwriting Recognition  Introduction to Arabic Language  Challenges of Recognition.
Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Optical Character Recognition
Scanned Documents INST 734 Module 10 Doug Oard. Agenda Document image retrieval Representation  Retrieval Thanks for David Doermann for most of these.
Tofik AliPartha Pratim Roy Department of Computer Science and Engineering Indian Institute of Technology Roorkee CVIP-WM 2017 Paper ID 172 Word Spotting.
Information Retrieval in Practice
Naifan Zhuang, Jun Ye, Kien A. Hua
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Supervised Time Series Pattern Discovery through Local Importance
4.3 Feedforward Net. Applications
Text Detection in Images and Video
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Handwritten Characters Recognition Based on an HMM Model
Word embeddings (continued)
Information Retrieval and Web Design
Automatic Handwriting Generation
Presented By: Harshul Gupta
Presentation transcript:

IIIT Hyderabad Thesis Presentation By Raman Jain ( ) Towards Efficient Methods for Word Image Retrieval

IIIT Hyderabad Aim at learning similarity measures to compare word images. Similarity? Problem Statement

IIIT Hyderabad Feature Extraction and Representation Sliding window is used for feature extraction. Profile features: – Upper word profile, – Lower word profile, – Projection profile, – Background-to-Ink Transition Upper profile Lower profile Projection profile Background-ink transition

IIIT Hyderabad Dataset Three types of English datasets are used to demonstrate the capabilities of learning schemes. 1.Calibrated Data (CD) : Generated by rendering the text and passing through a document degradation model. 2.Real Annotated Data (RD) : Set of words from 4 books(765 pages) with their ground truth. 3.Un-annotated Data (UD) : Dataset of 5,870,486 words which come out of 61 scanned books without ground truth. Used only for evaluating Precision.

IIIT Hyderabad DTW v/s Fixed Length Matching Performance Measures : 1.Precision : Measures how well a system discards irrelevant results while retrieving. 2.Recall : Measures how well a system finds what the user wants. 3.Average Precision : Measures the area under the precision-recall curve. MeasureDTWEuclidea n mP mR mAP DTW is much slower than Fixed length Matching Baseline results on comparing DTW and Euclidean on CD dataset. Mean of the above measures is computed for multiple queries.

IIIT Hyderabad Learning Query Specific Classifier Given a query word image, retrieve all similar word images. We use a weighted Euclidean distance function for matching word images and retrieving relevant images. Where w is a weight vector. During retrieval, in each of the iteration t, weight is updated using

IIIT Hyderabad DatasetNo Learning QSC with Eq. 1 QSC with Eq. 2 CD RD Results (mAP) on two dataset with 300 queries.

IIIT Hyderabad Learning by extrapolating QSC Feature descriptor mapped to d dimension query specific learning in closed form disintegration into sub-word weight vectors Mapped to Constant length vectors Already learnt sub-word(letter) weight vectors Projected back to new dimension based on the relative width of each letter Concatenate and map to a constant length vector Query text This pipeline shows how a weight vector is learnt for each sub-word during training. This pipeline shows how a weight vector is generated by extrapolation for an unseen query which is later used for retrieval.

IIIT Hyderabad Extrapolation

IIIT Hyderabad Results Data setMeasureDTWEuclideanQSC with extrapolation CDmAP RDmAP UDmP Comparative results of extrapolation on various data.

IIIT Hyderabad vowel consonants क (c) + ई (v) = की ka ee kee त (c) + त (c) = त्त tha tha ththa क (c) + द (c) = क्द ka dha kdha स (c) + त (c) + र (c) + ई (v) = स्त्री sa tha ra ee sthree No of characters: 52 No of ligatures : 1000 Hindi Script and Word Formation

IIIT Hyderabad Hindi Recognition and Retrieval B. B. Chaudhari and U. Pal –OCR for Bangla and Hindi –Satisfactory performance for clean documents B. B. Chaudhari and U. Pal, An OCR System to Read Two Indian Language Scripts: Bangla and Devnagari (Hindi), ICDAR 1997

IIIT Hyderabad Avoiding Complete Recognition Most of the modifiers appear either above the shirorekha or below the character. Shirorekha removal is common. Recognition of the middle zone is simple. Number of classes reduced to around 119.

IIIT Hyderabad Taking advantage of both.. Recognition –Compact representation –Efficiency in indexing and retrieval Retrieval –Works with degraded words and complex scripts –No need to segment into characters

IIIT Hyderabad BLSTM Model Recurrent neural network Applications in –Handwriting recognition –Speech recognition

IIIT Hyderabad BLSTM Model Smart network unit which can remember a value for an arbitrary length Contains gates that determine when the input is significant to remember, when it should continue to remember, and when it should get output. BLSTM – 2 LSTM networks, in which one takes the input from beginning to end and other one from end to the beginning. We used 30 such nodes and 2 hidden layers

IIIT Hyderabad BLSTM Model From training examples, BLSTM learn to map input sequences to output sequences. K -> number of classes t -> input sequence index Output Probabilities Input: Sequence of Feature Vectors

IIIT Hyderabad Matching and Retrieval Output of BLSTM is a sequence of characters for each input word image. Two images are compared with Edit Distance. word1word2 zoning BLSTM output c1 c2 c3 c4c1 c2 c3 c4 c2 c5 Edit distance =2

IIIT Hyderabad Re-ranking Used connected component (CC) at upper zone. #CC at upper zone upper zone Query Database images query1 query2 1 1

IIIT Hyderabad Overall Solution Query Image Zoning Feature Extraction Trained BLSTM NN Output character seq Database images Zoning Feature Extraction Trained BLSTM NN Output character seq Edit distance Re-ranking Ranked Word Images

IIIT Hyderabad Dataset Book#Pages#Lines#Words Book Book Book1 is used as training and validating Book2 is used for testing the retrieval performance

IIIT Hyderabad Quantitative Results MethodmPmAP Euclidean DTW BLSTM based BLSTM with Re-ranking mP : mean of Precision at 50% recall for 100 queries. mAP : mean of Average Precision for 100 queries

IIIT Hyderabad Quantitative Results QueriesmPmAP In-vocabulary Out-vocabulary Results of BLSTM based method on In-vocabulary and out-vocabulary querites (100 each).

IIIT Hyderabad Qualitative Results QueryRetrieved result

IIIT Hyderabad Raman Jain, Volkmar Frinken, C. V. Jawahar, R. Manmatha BLSTM Neural Network based Word Retrieval for Hindi Documents In Proceedings of the IEEE International Conference on Document Analysis and Recognition (ICDAR), Beijing, China, Raman Jain, C. V. Jawahar Towards More Effective Distance Functions for Word Image Matching In Proceedings of the IAPR Document Analysis System (DAS), Boston, U.S Publications

IIIT Hyderabad