Learning, Uncertainty, and Information Big Ideas November 8, 2004.

Slides:



Advertisements
Similar presentations
Angelo Dalli Department of Intelligent Computing Systems
Advertisements

Learning HMM parameters
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models in NLP
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
Apaydin slides with a several modifications and additions by Christoph Eick.
INTRODUCTION TO Machine Learning 3rd Edition
Natural Language Processing - Speech Processing -
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Probabilistic Pronunciation + N-gram Models CSPP Artificial Intelligence February 25, 2004.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
COMP 4060 Natural Language Processing Speech Processing.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
Entropy & Hidden Markov Models Natural Language Processing CMSC April 22, 2003.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
Speech, Perception, & AI Artificial Intelligence CMSC February 13, 2003.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
Hidden Markov Models: Probabilistic Reasoning Over Time Natural Language Processing.
CS Statistical Machine learning Lecture 24
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Hidden Markov Models: Probabilistic Reasoning Over Time Artificial Intelligence CMSC February 26, 2008.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, March 14, 2013 Session 8: Sequence Labeling This work is licensed under.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Hidden Markov Models: Probabilistic Reasoning Over Time Natural Language Processing CMSC February 22, 2005.
Probabilistic Pronunciation + N-gram Models CMSC Natural Language Processing April 15, 2003.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
MACHINE LEARNING 16. HMM. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Modeling dependencies.
Hidden Markov Models BMI/CS 576
Learning, Uncertainty, and Information: Learning Parameters
CSCE 771 Natural Language Processing
Uncertain Reasoning over Time
CSC 594 Topics in AI – Natural Language Processing
Hidden Markov Models (HMMs)
CONTEXT DEPENDENT CLASSIFICATION
Hidden Markov Models By Manish Shrivastava.
CSCI 5582 Artificial Intelligence
Presentation transcript:

Learning, Uncertainty, and Information Big Ideas November 8, 2004

Roadmap Turing, Intelligence, and Learning Noisy-channel model –Uncertainty, Bayes’ Rule, and Applications Hidden Markov Models –The Model –Decoding the best sequence –Training the model (EM) N-gram models: Modeling sequences –Shannon, Information Theory, and Perplexity Conclusion

Turing & Intelligence Turing (1950): –Computing Machinery and Intelligence –“Imitation Game” (aka Turing test) Functional definition of intelligence as indistinguishable from human –Key question raised: Learning Can a system be intelligent if only knows program? Learning necessary for intelligence –1) Programmed knowledge –2) Learning mechanism –Knowledge, reasoning, learning, communication

Noisy-Channel Model Original message not directly observable –Passed through some channel b/t sender, receiver + noise –From telephone (Shannon), Word sequence vs acoustics (Jelinek), genome sequence vs CATG, object vs image Derive most likely original input based on observed

Bayesian Inference P(W|O) difficult to compute –W – input, O – observations – Generative and Sequence

Applications AI: Speech recognition!, POS tagging, sense tagging, dialogue, image understanding, information retrieval Non-AI: –Bioinformatics: gene sequencing –Security: intrusion detection –Cryptography

Hidden Markov Models

Probabilistic Reasoning over Time Issue: Discrete models –Many processes continuously changing –How do we make observations? States? Solution: Discretize –“Time slices”: Make time discrete –Observations, States associated with time: Ot, Qt Observations can be discrete or continuous –Here focus on discrete for clarity

Modelling Processes over Time Infer underlying state sequence from observed Issue: New state depends on preceding states –Analyzing sequences Problem 1: Possibly unbounded # prob tables –Observation+State+Time Solution 1: Assume stationary process –Rules governing process same at all time Problem 2: Possibly unbounded # parents –Markov assumption: Only consider finite history –Common: 1 or 2 Markov: depend on last couple

Hidden Markov Models (HMMs) An HMM is: –1) A set of states: –2) A set of transition probabilities: Where aij is the probability of transition qi -> qj –3)Observation probabilities: The probability of observing ot in state i –4) An initial probability dist over states: The probability of starting in state i –5) A set of accepting states

Three Problems for HMMs Find the probability of an observation sequence given a model –Forward algorithm Find the most likely path through a model given an observed sequence –Viterbi algorithm (decoding) Find the most likely model (parameters) given an observed sequence –Baum-Welch (EM) algorithm

Bins and Balls Example Assume there are two bins filled with red and blue balls. Behind a curtain, someone selects a bin and then draws a ball from it (and replaces it). They then select either the same bin or the other one and then select another ball… –(Example due to J. Martin)

Bins and Balls Example Bin 1Bin

Bins and Balls Bin1Bin2 Bin Bin Π Bin 1: 0.9; Bin 2: 0.1 A B Bin 1Bin 2 Red Blue0.30.6

Bins and Balls Assume the observation sequence: – Blue Blue Red (BBR) Both bins have Red and Blue –Any state sequence could produce observations However, NOT equally likely –Big difference in start probabilities –Observation depends on state –State depends on prior state

Bins and Balls Blue Blue Red 1 1 1(0.9*0.3)*(0.6*0.3)*(0.6*0.7)= (0.9*0.3)*(0.6*0.3)*(0.4*0.4)= (0.9*0.3)*(0.4*0.6)*(0.3*0.7)= (0.9*0.3)*(0.4*0.6)*(0.7*0.4)= (0.1*0.6)*(0.3*0.7)*(0.6*0.7)= (0.1*0.6)*(0.3*0.7)*(0.4*0.4)= (0.1*0.6)*(0.7*0.6)*(0.3*0.7)= (0.1*0.6)*(0.7*0.6)*(0.7*0.4)=0.0070

Answers and Issues Here, to compute probability of observed –Just add up all the state sequence probabilities To find most likely state sequence –Just pick the sequence with the highest value Problem: Computing all paths expensive –2T*N^T Solution: Dynamic Programming –Sweep across all states at each time step Summing (Problem 1) or Maximizing (Problem 2)

Forward Probability Where α is the forward probability, t is the time in utterance, i,j are states in the HMM, aij is the transition probability, bj(ot) is the probability of observing ot in state bj N is the max state, T is the last time

Pronunciation Example Observations: 0/1

Sequence Pronunciation Model

Acoustic Model 3-state phone model for [m] –Use Hidden Markov Model (HMM) –Probability of sequence: sum of prob of paths OnsetMidEndFinal C1: 0.5 C2: 0.2 C3: 0.3 C3: 0.2 C4: 0.7 C5: 0.1 C4: 0.1 C6: 0.5 C6: 0.4 Transition probabilities Observation probabilities

Forward Algorithm Idea: matrix where each cell forward[t,j] represents probability of being in state j after seeing first t observations. Each cell expresses the probability: forward[t,j] = P(o 1,o 2,...,o t,q t =j|w) q t = j means "the probability that the t th state in the sequence of states is state j. Compute probability by summing over extensions of all paths leading to current cell. An extension of a path from a state i at time t-1 to state j at t is computed by multiplying together: i. previous path probability from the previous cell forward[t-1,i], ii. transition probability a ij from previous state i to current state j iii. observation likelihood b jt that current state j matches observation symbol t.

Forward Algorithm Function Forward(observations length T, state-graph) returns best-path Num-states<-num-of-states(state-graph) Create path prob matrix forwardi[num-states+2,T+2] Forward[0,0]<- 1.0 For each time step t from 0 to T do for each state s from 0 to num-states do for each transition s’ from s in state-graph new-score<-Forward[s,t]*at[s,s’]*bs’(ot) Forward[s’,t+1] <- Forward[s’,t+1]+new-score

Viterbi Code Function Viterbi(observations length T, state-graph) returns best-path Num-states<-num-of-states(state-graph) Create path prob matrix viterbi[num-states+2,T+2] Viterbi[0,0]<- 1.0 For each time step t from 0 to T do for each state s from 0 to num-states do for each transition s’ from s in state-graph new-score<-viterbi[s,t]*at[s,s’]*bs’(ot) if ((viterbi[s’,t+1]==0) || (viterbi[s’,t+1]<new-score)) then viterbi[s’,t+1] <- new-score back-pointer[s’,t+1]<-s Backtrace from highest prob state in final column of viterbi[] & return

Modeling Sequences, Redux Discrete observation values –Simple, but inadequate –Many observations highly variable Gaussian pdfs over continuous values –Assume normally distributed observations Typically sum over multiple shared Gaussians –“Gaussian mixture models” –Trained with HMM model

Learning HMMs Issue: Where do the probabilities come from? Solution: Learn from data –Trains transition (aij) and emission (bj) probabilities Typically assume structure –Baum-Welch aka forward-backward algorithm Iteratively estimate counts of transitions/emitted Get estimated probabilities by forward comput’n –Divide probability mass over contributing paths