Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center.

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

Hidden Markov Models (HMM) Rabiner’s Paper
. Inference and Parameter Estimation in HMM Lecture 11 Computational Genomics © Shlomo Moran, Ydo Wexler, Dan Geiger (Technion) modified by Benny Chor.
Learning HMM parameters
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
Introduction to Hidden Markov Models
Hidden Markov Models Eine Einführung.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Hidden Markov Models Theory By Johan Walters (SR 2003)
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
… Hidden Markov Models Markov assumption: Transition model:
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Doug Downey, adapted from Bryan Pardo,Northwestern University
Hidden Markov Models David Meir Blei November 1, 1999.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Dishonest Casino Let’s take a look at a casino that uses a fair die most of the time, but occasionally changes it to a loaded die. This model is hidden.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
CHAPTER 15 SECTION 3 – 4 Hidden Markov Models. Terminology.
Isolated-Word Speech Recognition Using Hidden Markov Models
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Hidden Markov Models CBB 231 / COMPSCI 261 part 2.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Evaluation Decoding Dynamic Programming.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
CS Statistical Machine learning Lecture 24
CSC321: Neural Networks Lecture 16: Hidden Markov Models
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, March 14, 2013 Session 8: Sequence Labeling This work is licensed under.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
Hidden Markov Models BMI/CS 576
Learning, Uncertainty, and Information: Learning Parameters
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
LECTURE 15: HMMS – EVALUATION AND DECODING
Computational NeuroEngineering Lab
Hidden Markov Models (HMMs)
Three classic HMM problems
N-Gram Model Formulas Word sequences Chain rule of probability
LECTURE 14: HMMS – EVALUATION AND DECODING
Algorithms of POS Tagging
LECTURE 15: REESTIMATION, EM AND MIXTURES
Introduction to HMM (cont)
Hidden Markov Models By Manish Shrivastava.
CSCI 5582 Artificial Intelligence
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center

Pen Technologies  Pen-based interfaces in mobile computing

Mathematical Formulation  H : Handwriting evidence on the basis of which a recognizer will make its decision – H = {h1, h2, h3, h4,…,hm}  W : Word string from a large vocabulary – W = {w1, w2, w3, w4,…., wn}  Recognizer : –

Mathematical Formulation SOURCE CHANNEL

Source Channel Model WRITERDIGITIZER FEATURE EXTRACTOR DECODER H CHANNEL

Source Channel Model Handwriting Modeling : HMMs Language Modeling SEARCH STRATEGY

Hidden Markov Models Memoryless Model Add Memory Hide Something Markov Model Mixture Model Hide Something Add Memory Hidden Markov Model Alan B Poritz : Hidden Markov Models : A Guided Tour ICASSP 1988

Memoryless Model COIN : Heads (1) : probability p Tails (0) : probability 1-p Flip the coin 10 times (IID Random sequence) Sequence : Probability = p*(1-p)*p*(1-p)*(1-p)*(1-p)*p*p*p*p =

Add Memory – Markov Model 2 Coins : COIN 1 => p(1) = 0.9, p(0) = 0.1 COIN 2 => p(1) = 0.1, p(0) = 0.9 Experiment : Flip COIN 1, Note the outcome If ( outcome = Head) Flip Coin 1 Else Flip Coin 2 End Sequence : Probability = 0.9*0.9*0.1*0.9 Sequence 1010 : Probability = 0.9*0.1*0.1*0.1

State Sequence Representation : : : : 0.9 Observed Output Sequence  Unique State Sequence

Hide the states => Hidden Markov Model s1 s

Why use Hidden Markov Models Instead of Non-hidden?  Hidden Markov Models can be smaller – less parameters to estimate  States may be truly hidden – Position of the hand – Positions of articulators

Summary of HMM Basics  We are interested in assigning probabilities p(H) to feature sequences  Memoryless model – This model has no memory of the past  Markov noticed that is some sequences the future depends on the past. He introduced the concept of a STATE – a equivalence class of the past that influences the future  Hide the states : HMM

Hidden Markov Models  Given a observed sequence H – Compute p(H) for decoding – Find the most likely state sequence for a given Markov model (Viterbi algorithm) – Estimate the parameters of the Markov source (training)

Compute p(H) s1 s p(a) p(b) s

Compute p(H) – contd.  Compute p(H) where H = a a b b  Enumerate all ways of producing h1=a s1 s2 s3 0.5x x x x

Compute p(H) – contd.  Enumerate all ways of producing h1=a h2=a s1 s2 s3 0.5x x x x s1 s2 s3 0.5x x x x s2 s3 0.4x x0.3

Compute p(H)  Can save computation by combining paths s1 s2 s3 s1 s2 s3 s2 s3

Compute p(H)  Trellis Diagram s1 s2 s3 0aaaaabaabb.5x.8.5x.2.4x.5.3x.7.3x.3.5x.3.5x.7.2.1

Basic Recursion  Prob (Node) = sum (Prob(predecessor) x Prob (predecessor->node) )  Boundary condition : Prob (s, 0) = 1 s1 s2 s3 0a aaaabaabb 1.0 s1, a : s1, a : 0.4 s1, 0 :.08 s1, a :.21 s2, a : s1, 0 : 0.2 s1, 0 :.032 s1, a :.084 s2, a :.066 s1, 0 :.0032 s1, b :.0144 s2, b :.0364 s1, 0 : s1, b : s2, b :.0108 s2, 0 :.033 s1, a : s2, 0 : 0.02 s2, 0 :.0182 s2, a :.0495 s2, 0 :.0054 s2, b :.0637 s2, 0 : s2, b :.0189

More Formally –Forward Algorithm

Find Most Likely Path for aabb - Dynamic Prog. or Viterbi  Max Prob (Node) = MAX(Max(predecessor) x Prob (predecessor->node) ) s1 s2 s3 0a aaaabaabb 1.0 s1, a : 0.4s1, a :.16s1, b :.016 s1,b :.0016 s1, 0 :.08 s1, a :.21 s2, a :.04 s1, 0 : 0.2 s1, 0 :.032 s1, a :.084 s2, a :.066 s1, 0 :.0032 s1, b :.0144 s2, b :.0168 s1, 0 : s1, b : s2, b : s2, 0 :.021 s1, a :.03 s2, 0 : 0.02 s2, 0 :.0084 s2, a :.0315 s2, 0 : s2, b :.0294 s2, 0 : s2, b :.00588

Training HMM parameters 1/3 1/2 p(a) p(b) = H = abaa p(H) =

Training HMM parameters = A posterior probability of path i =

Training HMM parameters

Keep on repeating : 600 iterations : p(H) = Another initial parameter set : p(H) =

Training HMM parameters  Converges to local maximum  There are 7 (atleast) local maxima  Final solution depends on starting point  Speed of convergence depends on starting point

Training HMM parameters : Forward Backward algorithm  Improves on enumerating algorithm by using the Trellis  Results in reduction from exponential computation to linear computation

Forward Backward Algorithm j

Forward Backward Algorithm  = Probability that hj is produced by and the complete output is H = = Probability of being in state and producing the output h1,.. hj-1 = Probability of being in state and producing the output hj+1,..hm

Forward Backward Algorithm Transition count

Training HMM parameters  Guess initial values for all parameters  Compute forward and backward pass probabilities  Compute counts  Re-estimate probabilities BAUM-WELCH, BAUM-EAGON, FORWARD-BACKWARD, E-M