Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.

Slides:



Advertisements
Similar presentations
1 Gesture recognition Using HMMs and size functions.
Advertisements

Character Recognition using Hidden Markov Models Anthony DiPirro Ji Mei Sponsor:Prof. William Sverdlik.
ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Ka-Lok Ng Dept. of Bioinformatics Asia University
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
Hidden Markov Models in Bioinformatics Example Domain: Gene Finding Colin Cherry
Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.
Assuming normally distributed data! Naïve Bayes Classifier.
Computer Science Department Jeff Johns Autonomous Learning Laboratory A Dynamic Mixture Model to Detect Student Motivation and Proficiency Beverly Woolf.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Lecture 5: Learning models using EM
Conditional Random Fields
(Some issues in) Text Ranking. Recall General Framework Crawl – Use XML structure – Follow links to get new pages Retrieve relevant documents – Today.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Crash Course on Machine Learning
Albert Gatt Corpora and Statistical Methods Lecture 9.
(Off-Line) Cursive Word Recognition Tal Steinherz Tel-Aviv University.
Hidden Markov Models In BioInformatics
Sensys 2009 Speaker:Lawrence.  Introduction  Overview & Challenges  Algorithm  Travel Time Estimation  Evaluation  Conclusion.
1 A Network Traffic Classification based on Coupled Hidden Markov Models Fei Zhang, Wenjun Wu National Lab of Software Development.
Online Chinese Character Handwriting Recognition for Linux
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
by B. Zadrozny and C. Elkan
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
Graphical models for part of speech tagging
7-Speech Recognition Speech Recognition Concepts
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Online Arabic Handwriting Recognition Fadi Biadsy Jihad El-Sana Nizar Habash Abdul-Rahman Daud Done byPresented by KFUPM Information & Computer Science.
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
Page  1 LAND COVER GEOSTATISTICAL CLASSIFICATION FOR REMOTE SENSING  Kęstutis Dučinskas, Lijana Stabingiene and Giedrius Stabingis  Department of Statistics,
AISTATS 2010 Active Learning Challenge: A Fast Active Learning Algorithm Based on Parzen Window Classification L.Lan, H.Shi, Z.Wang, S.Vucetic Temple.
CS Statistical Machine learning Lecture 24
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
Presented By Lingzhou Lu & Ziliang Jiao. Domain ● Optical Character Recogntion (OCR) ● Upper-case letters only.
Christopher M. Bishop Object Recognition: A Statistical Learning Perspective Microsoft Research, Cambridge Sicily, 2003.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others.
John Lafferty Andrew McCallum Fernando Pereira
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Classification of melody by composer using hidden Markov models Greg Eustace MUMT 614: Music Information Acquisition, Preservation, and Retrieval.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
1 A Statistical Matching Method in Wavelet Domain for Handwritten Character Recognition Presented by Te-Wei Chiang July, 2005.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Constrained Hidden Markov Models for Population-based Haplotyping
Statistical Models for Automatic Speech Recognition
Hidden Markov Models Part 2: Algorithms
Online Arabic Handwriting Recognition
Statistical Models for Automatic Speech Recognition
Thomas L. Packer BYU CS DEG
CONTEXT DEPENDENT CLASSIFICATION
Handwritten Characters Recognition Based on an HMM Model
Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.
Presentation transcript:

Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and words

Optical Character Recognition Rich field of research with many applicable domains Off-line vs. On-line (includes time-sequence info) Handwritten vs. Typed Cursive vs. Hand-printed Cooperative vs. Random Writers Language-specific differences of grammar and dictionary size We focus on off-line mixed-modal English data set with mostly handwritten and some cursive data Observation is monochrome bitmap representation of each letter with segmentation problem already solved for us (but poorly) Pre-processing of dataset for noise filtering and normalizations of scale also assumed done

Common Approaches to OCR Statistical Grammar Rules and Dictionaries Feature Extraction of observations Global features: Moments and invariants of image (e.g., percentage of pixels in certain region, measuring curvature) Local features: Group windows around image pixels Hidden Markov Models Used mostly in cursive domain for easy training and to avoid segmentation issues Most HMMs use very large models with words as states, combined with above approaches, which is more applicable to domains of small dictionary size with other restrictions

Visualizing the Dataset Data Collected from 159 subjects with varying styles, printed and cursive Missing first letter of each word to simplify capital letters Each character represented by 16x8 array of bits Character meta-data includes correct labels and end-of-word boundaries Pre-processed into 10 cross-validation folds

Our Approach: HMMs Primary Goal: Quantify the impact of correlations between adjacent letters and words Secondary Goal: Learn an accurate classifier for our data set Our Approach: Use a HMM and compare to other algorithms 26 states of HMM each represent letter of alphabet Supervised learning of model with labeled data Prior probabilities and transition matrix learned by frequency of letters in training Learning algorithm for emission probabilities uses Naive Bayes assumption (i.e., pixels conditionally independent given the letter) Viterbi algorithm predicts most probable sequence of states given the observed character pixel maps

Algorithms and Optimizations Learning algorithms implemented and tested: Baseline Algorithm: Naïve Bayes Classifier (no HMM) Algorithm 2: NB with maximum probable classification over a set of shifted observations Motivation was to compensate for correlations between adjacent pixels not included in Naïve Bayes assumption Algorithm 3: HMM with NB assumption Fix for incomplete data: Examples ‘hallucinated’ prior to training Algorithm 4: Optimized HMM with NB assumption Ignore effects of inter-word transitions when learning HMM Algorithm 5: Dictionary Creation and Lookup with NB assumption (no HMM) Geared toward specific data set with small dictionary size, but less generalizable to more constrained data sets with larger dictionaries

Alternative Algorithms and Experimental Setup Other variants considered but not implemented: Joint Bayes parameter estimation (too many probabilities to learn, 2 ^128 vs. 3,328) HMM with 2 nd -order Markov assumption (exponential in number of Viterbi paths) Training Naïve Bayes over a set of shifted and overlayed observations (preprocessing to create thicker boundary) All experiments run with 10-fold cross-validation Results given as averages with standard deviations

Experimental Results

Conclusions Naïve Bayes classifier did pretty good on its own (62.7% accuracy - 15x better than random classifier!) Classification on shifted data did worse since we lost data on edges! Small dictionary size of dataset affected results: Optimized HMM w/ NB achieves 71% accuracy Optimizations only marginally significant because of dataset More simple and flexible approach for achieving impressive results on other datasets Dictionary approach is almost perfect with 99.3% accuracy! Demonstrates additional benefit of exploiting domain constraints, grammatical or syntactic rules Not always feasible: dictionary may be unknown, too large, or the data may not be predictable