1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.

Slides:

Advertisements

Similar presentations

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

Advertisements

Angelo Dalli Department of Intelligent Computing Systems

2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)

Hidden Markov Models By Marc Sobel. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Modeling.

Automatic Speech Recognition II  Hidden Markov Models  Neural Network.

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Maximum Likelihood Sequence Detection (MLSD) and the Viterbi Algorithm

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.

An Introduction to Markov Decision Processes Sarah Hickmott

Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수

Statistical NLP: Lecture 11

Ch-9: Markov Models Prepared by Qaiser Abbas ( )

Profiles for Sequences

Hidden Markov Models Theory By Johan Walters (SR 2003)

Statistical NLP: Hidden Markov Models Updated 8/12/2005.

1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.

Hidden Markov Models Fundamentals and applications to bioinformatics.

Hidden Markov Models in NLP

Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.

SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.

PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.

Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.

Lecture 5: Learning models using EM

Learning, Uncertainty, and Information Big Ideas November 8, 2004.

Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.

Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.

Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.

Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.

Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.

1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:

Isolated-Word Speech Recognition Using Hidden Markov Models

7-Speech Recognition Speech Recognition Concepts

BINF6201/8201 Hidden Markov Models for Sequence Analysis

Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.

Sergios Theodoridis Konstantinos Koutroumbas Version 2

Hidden Markov Models Usman Roshan CS 675 Machine Learning.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

1 E. Fatemizadeh Statistical Pattern Recognition.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

CS Statistical Machine learning Lecture 24

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

Hidden Markovian Model. Some Definitions Finite automation is defined by a set of states, and a set of transitions between states that are taken based.

. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.

Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.

1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

Stochastic Processes and Transition Probabilities D Nagesh Kumar, IISc Water Resources Planning and Management: M6L5 Stochastic Optimization.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, March 14, 2013 Session 8: Sequence Labeling This work is licensed under.

Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.

Other Models for Time Series. The Hidden Markov Model (HMM)

Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.

Hidden Markov Models BMI/CS 576

Chapter 3: Maximum-Likelihood Parameter Estimation

Statistical Models for Automatic Speech Recognition

Hidden Markov Models Part 2: Algorithms

Statistical Models for Automatic Speech Recognition

Hidden Markov Models (HMMs)

CONTEXT DEPENDENT CLASSIFICATION

LECTURE 15: REESTIMATION, EM AND MIXTURES

Presentation transcript:

1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values of the other features  An existing relation among the various classes

2  This interrelation demands the classification to be performed simultaneously for all available feature vectors  Thus, we will assume that the training vectors occur in sequence, one after the other and we will refer to them as observations

3  The Context Dependent Bayesian Classifier  Let  Let be a sequence of classes, that is There are M N of those  Thus, the Bayesian rule can equivalently be stated as  Markov Chain Models (for class dependence)

4  NOW remember: or  Assume:  statistically mutually independent  The pdf in one class independent of the others, then

5  From the above, the Bayes rule is readily seen to be equivalent to: that is, it rests on  To find the above maximum in brute-force task we need Ο(NM Ν ) operations!!

6  The Viterbi Algorithm

7  Thus, each Ω i corresponds to one path through the trellis diagram. One of them is the optimum (e.g., black). The classes along the optimal path determine the classes to which ω i are assigned.  To each transition corresponds a cost. For our case

8 Equivalently where, Define the cost up to a node,k,

9  Bellman’s principle now states  The optimal path terminates at Complexity O (NM 2 )

10  Channel Equalization  The problem

11  Example In x k three input symbols are involved: I k, I k-1, I k-2

12 IkIk I k-1 I k-2 xkxk x k ω1ω ω2ω ω3ω ω4ω ω5ω ω6ω ω7ω ω8ω8

13  Not all transitions are allowed Then

14 In this context, ω i are related to states. Given the current state and the transmitted bit, I k, we determine the next state. The probabilities P(ω i |ω j ) define the state dependence model.  The transition cost for all allowable transitions

15  Assume: Noise white and Gaussian A channel impulse response estimate to be available The states are determined by the values of the binary variables I k-1,…,I k-n+1 For n=3, there will be 4 states

16  Hidden Markov Models  In the channel equalization problem, the states are observable and can be “learned” during the training period  Now we shall assume that states are not observable and can only be inferred from the training data  Applications: Speech and Music Recognition OCR Blind Equalization Bioinformatics

17  An HMM is a stochastic finite state automaton, that generates the observation sequence, x 1, x 2,…, x N  We assume that: The observation sequence is produced as a result of successive transitions between states, upon arrival at a state:

18  This type of modeling is used for nonstationary stochastic processes that undergo distinct transitions among a set of different stationary processes.

19  Examples of HMM: The single coin case: Assume a coin that is tossed behind a curtain. All it is available to us is the outcome, i.e., H or T. Assume the two states to be: S = 1  H S = 2  T This is also an example of a random experiment with observable states. The model is characterized by a single parameter, e.g., P(H). Note that P(1|1) = P(H) P(2|1) = P(T) = 1 – P(H)

20 The two-coins case: For this case, we observe a sequence of H or T. However, we have no access to know which coin was tossed. Identify one state for each coin. This is an example where states are not observable. H or T can be emitted from either state. The model depends on four parameters. P 1 (H), P 2 (H), P(1|1), P(2|2)

21 The three-coins case example is shown below: Note that in all previous examples, specifying the model is equivalent to knowing: –The probability of each observation (H,T) to be emitted from each state. –The transition probabilities among states: P(i|j).

22  A general HMM model is characterized by the following set of parameters Κ, number of states

23 That is:  What is the problem in Pattern Recognition Given M reference patterns, each described by an HMM, find the parameters, S, for each of them (training) Given an unknown pattern, find to which one of the M, known patterns, matches best (recognition)

24  Recognition: Any path method Assume the M models to be known ( M classes). A sequence of observations, X, is given. Assume observations to be emissions upon the arrival on successive states Decide in favor of the model S * (from the M available) according to the Bayes rule for equiprobable patterns

25 For each model S there is more than one possible sets of successive state transitions Ω i, each with probability Thus: For the efficient computation of the above DEFINE – History Local activity

26 Observe that Compute this for each S

27 Some more quantities –

28  Training The philosophy: Given a training set X, known to belong to the specific model, estimate the unknown parameters of S, so that the output of the model, e.g. to be maximized  This is a ML estimation problem with missing data

29  Assumption: Data x discrete  Definitions:

30  The Algorithm: Initial conditions for all the unknown parameters. Step 1: From the current estimates of the model parameters reestimate the new model S from

31 Step 3: Compute go to step 2. Otherwise stop Remarks: –Each iteration improves the model –The algorithm converges to a maximum (local or global) –The algorithm is an implementation of the EM algorithm