Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

Slides:

Advertisements

Similar presentations

Pattern Finding and Pattern Discovery in Time Series

Advertisements

Learning HMM parameters

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.

Introduction to Hidden Markov Models

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.

Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.

Ch-9: Markov Models Prepared by Qaiser Abbas ( )

Hidden Markov Models Theory By Johan Walters (SR 2003)

EE-148 Expectation Maximization Markus Weber 5/11/99.

Speech Recognition Training Continuous Density HMMs Lecture Based on:

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

Lecture 6, Thursday April 17, 2003

Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.

Lecture 5: Learning models using EM

Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.

Expectation Maximization Algorithm

Fast Temporal State-Splitting for HMM Model Selection and Learning Sajid Siddiqi Geoffrey Gordon Andrew Moore.

Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.

Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.

Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.

HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Isolated-Word Speech Recognition Using Hidden Markov Models

Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

7-Speech Recognition Speech Recognition Concepts

Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.

1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

HMM - Part 2 The EM algorithm Continuous density HMM.

1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.

CS Statistical Machine learning Lecture 24

1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.

1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.

1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.

John Lafferty Andrew McCallum Fernando Pereira

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.

Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

Kalman Filtering And Smoothing

1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:

Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.

Other Models for Time Series. The Hidden Markov Model (HMM)

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.

Hidden Markov Models BMI/CS 576

Statistical Models for Automatic Speech Recognition

Latent Variables, Mixture Models and EM

Hidden Markov Models Part 2: Algorithms

Statistical Models for Automatic Speech Recognition

LECTURE 15: REESTIMATION, EM AND MIXTURES

Parametric Methods Berlin Chen, 2005 References:

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

Presentation transcript:

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren

Outline Introduction Segmental HMMs Segmental HMMs with Random Effects Model Inference and Parameter Estimation Experiment Results Conclusion

Introduction Problem: Automatically parsing and recognizing waveforms based on their shapes. Example: analysis of waveforms from turbulent flow, discrimination an earthquakes in seismograph, waveform match and fault diagnosis in complex systems. Challenge: To address the problem of significant variability in wave shape is difficult for automated methods. Source of variability: shifts of the locations of prominent features, scaling along axes and measurement noise. The paper proposed a new statistical model that directly addresses within-class shape variability.

Standard discrete-time HMMs: generates noisy versions of piecewise constant shapes over time, since the observations within a sequence of states of the same value have constant mean. Segment HMMs: allows arbitrary distribution on run lengths and allow the mean to be a function of time within each segment but uses the same fixed parameters for all waveforms.

Segmental HMM with random effects: allow each group of observations have their own parameters that are still coupled together by an overall population prior. Specifically to say: allow the slopes and intercepts to vary according to a prior distribution

Segmental HMMs In a segmental HMM, Transition matrix In waveform modeling,is constrained to allow only left-to-right transitions and do not allow self-transitions. Duration distribution: (1) State k produces a segment of observations of length d from the segment distribution model. Segment model: (2) vector of regression coefficients for the intercept and slope design matrix and is the Gaussian noise.

The F-B algorithm for segmental HMMs takes into account the duration distribution, recursively computes in the forward pass and in the backward pass. The F-B algorithm scores a previously unseen waveform y by calculating the likelihood: (3) (4) (5)

Segmental HMMs with Random Effects Consider the th segment of length from the th individual waveform generated by state. represents the mean regression parameters and represents the variation in regression parameters. and is independent of. Notice the model is equivalent to with. It can be shown the joint distribution of and are : And the posterior distribution ofcan be written as: (6) (7) (8)

where and (9) (10)

Inference and parameters estimation The likelihood of a waveformgiven fixed parameters, is: The viterbi algorithm can compute the most likely state sequence. However, in F-B algorithm, needs to be calculated for all possible durations in each of the and at each recursion. Using direct inversion of the covariance matrix,, leads to an overall time complexity of. Based on Bayes’s rule, the likelihood of generated by state as (11) (12) Inference

By setting to as in equation (9), the likelihood can be reduced to: (13) Where In this way, the time complexities of the F-B and Viterbi algorithm are reduced to. Then EM algorithm is used to get the maximum-likelihood estimates of the parameters from a training set : Where the sate sequence implies segments in waveform.

Given the complete data, the log-likelihood decouples into four parts. Because only parts of the data are observed, EM is used to find the local maximum solution. (14) Parameters estimation

In E Step, the expected log likelihood of the complete data is with respect to (15) (16) In M step, the optimization problem decouples into four parts and closed form solutions exist for all of the parameters but in practice the algorithm often converges relatively slowly. Segmental HMMRandom effect segmental HMM

Faster learning with ECME Expectation conditional maximization(ECM): Replace the M step of the EM with a sequence of Constrained or conditional maximizations (CM): The set of parameters is divided into subvectors and in the ECME: some of the CM steps of the ECM are replaced by a maximization of the actual log likelihood instead of the expected complete data log likelihood. th CM step of theth iteration, is maximized over under the constrain of. For random effects segmental HMMs, we partition the parameters Into and consider ECME with two CM steps as follows:

and get the update equations for Random effects segmental HMM for fluid-flow waveform data

Experimental Results Two data sets were used: 1.Bubble-Probe Interaction Data (turbulent flow measurements): 9-fold cross-validation with 5 waveforms in the training set and 43 waveforms in the test set. Another 72 waveforms were used as negative examples in the test. 2. ECG Data (heartbeat cycles) 4-fold cross-validation with 6 waveforms as a training set and 18 waveforms as the testing set. Another 22 abnormal hearbeats were used as negative examples in the test. Evaluation method: Average LogP score, Recognition Accuracy and Segmentation Quality (the mean squared difference between the observed data and regression model)

Segmentation result by segmental HMMRandom effect segmental HMM ROC plot for ECG data ROC plot for bubble-probe data

Conclusions 1.Segmental HMMs including random effects allows an individual waveform to vary its shape in a constrained manner. 2. The ECME algorithm greatly improved the speed of convergence of parameter estimation compared to a standard EM approach.

Thanks!