IIIT HyderabadUMASS AMHERST Robust Recognition of Documents by Fusing Results of Word Clusters Venkat Rasagna 1, Anand Kumar 1, C. V. Jawahar 1, R. Manmatha.

Slides:



Advertisements
Similar presentations
Image Retrieval with Geometry-Preserving Visual Phrases
Advertisements

Indexing DNA Sequences Using q-Grams
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Aggregating local image descriptors into compact codes
Multimedia Database Systems
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Space-for-Time Tradeoffs
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Similarity Search in High Dimensions via Hashing
Patch to the Future: Unsupervised Visual Prediction
IIIT Hyderabad Pose Invariant Palmprint Recognition Chhaya Methani and Anoop Namboodiri Centre for Visual Information Technology IIIT, Hyderabad, INDIA.
Word Recognition of Indic Scripts
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.
Sparse Solutions for Large Scale Kernel Machines Taher Dameh CMPT820-Multimedia Systems Dec 2 nd, 2010.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Multiple Agents for Pattern Recognition Louis Vuurpijl
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
بسم الله الرحمن الرحيم معالج الحروف الضوئي OCR. Introduction Definition : OCR stands for O ptical C haracter R ecognition refers to the branch of computer.
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
© 2013 IBM Corporation Efficient Multi-stage Image Classification for Mobile Sensing in Urban Environments Presented by Shashank Mujumdar IBM Research,
Overview of Search Engines
A Search Engine for Historical Manuscript Images Toni M. Rath, R. Manmatha and Victor Lavrenko Center for Intelligent Information Retrieval University.
Indexing Techniques Mei-Chen Yeh.
Groundtruthing for Performance Evaluation of Document Image Analysis Systems: a primer Mathieu Delalandre Pattern Recognition.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
L. Padmasree Vamshi Ambati J. Anand Chandulal J. Anand Chandulal M. Sreenivasa Rao M. Sreenivasa Rao Signature Based Duplicate Detection in Digital Libraries.
Classification with Hyperplanes Defines a boundary between various points of data which represent examples plotted in multidimensional space according.
IIIT Hyderabad Synthesizing Classifiers for Novel Settings Viresh Ranjan CVIT,IIIT-H Adviser: Prof. C. V. Jawahar, IIIT-H Co-Adviser: Dr. Gaurav Harit,
IIIT Hyderabad Thesis Presentation By Raman Jain ( ) Towards Efficient Methods for Word Image Retrieval.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
S EGMENTATION FOR H ANDWRITTEN D OCUMENTS Omar Alaql Fab. 20, 2014.
Chapter 7 Space and Time Tradeoffs James Gain & Sonia Berman
Nearest Neighbor Paul Hsiung March 16, Quick Review of NN Set of points P Query point q Distance metric d Find p in P such that d(p,q) < d(p’,q)
IIIT Hyderabad Word Hashing for Efficient Search in Document Image Collections Anand Kumar Advisors: Dr. C. V. Jawahar IIIT Hyderabad Dr. R. Manmatha University.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Henry S. Baird & Daniel Lopresti Pattern Recognition Research Lab Whole-Book Recognition using Mutual-Entropy-Driven Model Adaptation Pingping Xiu* Henry.
IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar.
EXPLOITING DYNAMIC VALIDATION FOR DOCUMENT LAYOUT CLASSIFICATION DURING METADATA EXTRACTION Kurt Maly Steven Zeil Mohammad Zubair WWW/Internet 2007 Vila.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒.
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval O. Chum, et al. Presented by Brandon Smith Computer Vision.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Lukáš Neumann and Jiří Matas Centre for Machine Perception, Department of Cybernetics Czech Technical University, Prague 1.
INVITATION TO Computer Science 1 11 Chapter 2 The Algorithmic Foundations of Computer Science.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Preliminary Transformations Presented By: -Mona Saudagar Under Guidance of: - Prof. S. V. Jain Multi Oriented Text Recognition In Digital Images.
Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects for Blind Persons.
KNN & Naïve Bayes Hongning Wang
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Fusion of Multiple Corrupted Transmissions and its effect on Information Retrieval Walid Magdy Kareem Darwish Mohsen Rashwan.
Guillaume-Alexandre Bilodeau
ROBUST FACE NAME GRAPH MATCHING FOR MOVIE CHARACTER IDENTIFICATION
Supervised Time Series Pattern Discovery through Local Importance
CLUSTERING IS EFFICIENT FOR APPROXIMATE MAXIMUM INNER PRODUCT SEARCH
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Structure and Content Scoring for XML
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Structure and Content Scoring for XML
Measuring the Similarity of Rhythmic Patterns
Presentation transcript:

IIIT HyderabadUMASS AMHERST Robust Recognition of Documents by Fusing Results of Word Clusters Venkat Rasagna 1, Anand Kumar 1, C. V. Jawahar 1, R. Manmatha 2 1 Center for Visual Information Technology, IIIT- Hyderabad 2 Center for Intelligent Information Retrieval, UMASS - Amherst

IIIT HyderabadUMASS AMHERST Recognition of books and collections. Recognition of words is crucial to Information Retrieval. Use of dictionaries and post processors are not feasible in many languages. Introduction

IIIT HyderabadUMASS AMHERST Motivation Most of the (Indian language) OCRs recognize glyph(component) and generate text from the class labels. Word accuracies are far lower than component accuracies. Word accuracy is inversely proportional to no. of components in the word. Use of language model for post processing is challenging. –High entropy, Large vocabulary (eg. Telugu). –Language processing modules still emerging. Component acc. word acc word acc No of components Is it possible to make use of multiple occurrence of the same word to improve OCR performance ? RecognizeParse Average word length = Component Accuracy = 9 / 12 = 75% Word Accuracy = 25%

IIIT HyderabadUMASS AMHERST Overview Text Multiple occurrences of a word Words are degraded independently OCR output is different for the word at different instances OCR outputGoal Cluster OCR

IIIT HyderabadUMASS AMHERST Related Work MalayalamBangla TamilHindi U. Pal, B. Chaudhuri, Pattern Recognition, A. Negi et al., ICDAR, 2001 ; 2 C. V. Jawahar et al., ICDAR, 2003; 3 K. S. Sesh Kumar et al., ICDAR P. Xiu and H. S. Baird, DRR XV,2008; 2 N. V. Neeba, C. V. Jawahar, ICPR, T. M. Rath et al., IJDAR, 2007; 2 T. M. Rath et al., CVPR, 2003; 3 Anand Kumar et al., ACCV, 2007 H. Tao, J. Hull, Document Analysis and Information Retrieval, 1995 Character Recognition in Indian languages is still an unsolved problem. Telugu is one of the most complex scripts. Recognition of a book has received some attention recently. Word images are efficiently matched for retreival. Use of word image clusters to improve OCR accuracy

IIIT HyderabadUMASS AMHERST Conventional Recognition Process Preprocessing Segmentation and Word detection Text (UNICODE) Feature ExtractionClassification Recognizer Scanned Images Word level Feature Extraction Grouping Word Grouping (Clustering) Word groups Combining OCR Results Proposed Recognition Process

IIIT HyderabadUMASS AMHERST LSH Goal: “r-Near Neighbour” –for any query q, return a point p ∈ P such that||p-q|| ≤ r (if it exists) LSH has been used for –Data Mining Taher H. Haveliwala, Aristides Gionis, Piotr Indyk, WebDB, 2000 – Information retrieval A.Andoni, M.Datar, N.Immorlica, V.Mirrokni, Piotr Indyk, 2006 – Document Image Search Anand Kumar, C.V.Jawahar, R.Manmatha, ACCV, 2007 Locality Sensitive Hashing (LSH)

IIIT HyderabadUMASS AMHERST LSH clustering on word images [TODO]

IIIT HyderabadUMASS AMHERST Character Majority Voting OCR output Components Word Cluster Final Output Algorithm [TODO]

IIIT HyderabadUMASS AMHERST Word ImgOCR o/p Dynamic Programming Voting for 1 after aligning DTW o/p for word 1 = CMV o/p for word 1 = Alignment Dynamic Programming [1,2]

IIIT HyderabadUMASS AMHERST Results Word generation process makes correct annotations available for evaluating the performance. Component AccuracyWord Accuracy DatasetOCRCMVDTWOCRCMVDTW SF SF SF SF clusters 20 variations Degraded dataset More Details

IIIT HyderabadUMASS AMHERST Results Word Accuracy Vs No. of words –Adding more no. of words makes the data set more ambiguous –Algorithm performance increases with no. of words, and saturates. Word Accuracy Vs Word Length –Word accuracy decreases as the word length increase. –Use of the cluster info helps in gaining good word accuracies.

IIIT HyderabadUMASS AMHERST Analysis ImageOCRCMVDTW

IIIT HyderabadUMASS AMHERST Results SizeNo. of Clusters Length Range Word WL No. of words Symbol accuracyWord Accuracy OCRCMVDTWOCRCMVDTW B1Short B1Medium B1Long B1ALL B2ALL B3ALL B4ALL For a small increase in component accuracy, there is a large improvement in the word accuracy. The improvement is high for long words. Relative improvement of 12% for words which occur at least twice.

IIIT HyderabadUMASS AMHERST Analysis Cuts and Merges CMV vs. DTW Wrong word in the cluster. Cases that cant be handled ImageOCRCMVDTW ImageOCRCMVDTW ImageOCRCMVDTW ImageOCRCMVDTW

IIIT HyderabadUMASS AMHERST Conclusion & Future work A new framework has been proposed for OCRing the book. A word recognition technique which uses the document constraints is shown. An efficient clustering algorithm is used to speed up the process. Word level accuracy is improved from 70.37% to 79.12%. This technique can also be used for other languages. Extending it to include the uses of techniques to handle unique words by creating clusters over parts of words.

IIIT HyderabadUMASS AMHERST END

IIIT HyderabadUMASS AMHERST Additional slides

IIIT HyderabadUMASS AMHERST LSH Algorithm Algorithm: Word Image Clustering Require: Word Images Wj andFeatures Fj, j = 1,...,n Ensure : Word Image Clusters O for each i = 1,...,l do for each j = 1,...,n do Compute hash bucket I = gi (Fj ) Store word image Wj on bucket I of hash table Ti end for k = 1 for each i = 1,...,n and Wi unmarked do Query hash table for word Wi toget cluster Ok Mark word Wi with k k = k +1 end for Back

IIIT HyderabadUMASS AMHERST Word Error Correction Algorithm: Word Error Correction Require: Cluster C of words Wi,i = 1,...,n Ensure: Clusters O of correct words for each i = 1,...,n do for each j = 1,...,n do if j != i then Align word Wi and Wj Record errors Ek,k = 1,...,m in Wi Record possible corrections Gk for Ek end if end for Correct Ek if Probability pk of correction Gk is maximum O <- O U Wi end for Back

IIIT HyderabadUMASS AMHERST Dataset –5000 clusters with 20 images of same word with different font size and resolution. –Words were generated using Image Magick. –Words were degraded with Kanungo degradation model to approximate real data. –SF1, SF2, SF3, SF4 datasets were degraded with 0, 10, 20, 30% noise.

IIIT HyderabadUMASS AMHERST Hashing Hashed Words Pre-processing Segmentation and word detection Feature Extraction Hashing Feature Extraction OCR Text Fusion Method 1 / Method 2 OCR output Cluster of words Word image