Yasuhiro Fujiwara (NTT Cyber Space Labs)

Slides:

Advertisements

Similar presentations

Arc-length computation and arc-length parameterization

Advertisements

Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:

Minimum Redundancy and Maximum Relevance Feature Selection

Chop-SPICE: An Efficient SPICE Simulation Technique For Buffered RC Trees Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of.

Efficient Distribution Mining and Classification Yasushi Sakurai (NTT Communication Science Labs), Rosalynn Chong (University of British Columbia), Lei.

Resource Management of Highly Configurable Tasks April 26, 2004 Jeffery P. HansenSourav Ghosh Raj RajkumarJohn P. Lehoczky Carnegie Mellon University.

Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models Eine Einführung.

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.

Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수

Statistical NLP: Lecture 11

Hidden Markov Models Theory By Johan Walters (SR 2003)

1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.

1 Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski Natural Language Lab Simon Fraser university Homotopy-based Semi- Supervised Hidden Markov.

Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,

Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

Face Recognition Using Embedded Hidden Markov Model.

Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.

Conditional Random Fields

Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.

Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.

Fast Temporal State-Splitting for HMM Model Selection and Learning Sajid Siddiqi Geoffrey Gordon Andrew Moore.

Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.

Isolated-Word Speech Recognition Using Hidden Markov Models

Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.

THE HIDDEN MARKOV MODEL (HMM)

Qualitative approximation to Dynamic Time Warping similarity between time series data Blaž Strle, Martin Možina, Ivan Bratko Faculty of Computer and Information.

Graphical models for part of speech tagging

7-Speech Recognition Speech Recognition Concepts

Fast and Exact Monitoring of Co-evolving Data Streams Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Naonori Ueda (NTT) Masatoshi Yoshikawa (Kyoto.

Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.

1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.

CSIE Dept., National Taiwan Univ., Taiwan

Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.

A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.

Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.

Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

NLP. Introduction to NLP Sequence of random variables that aren’t independent Examples –weather reports –text.

A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.

PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,

1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.

Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.

School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.

Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.

1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,

Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.

Accurate WiFi Packet Delivery Rate Estimation and Applications Owais Khan and Lili Qiu. The University of Texas at Austin 1 Infocom 2016, San Francisco.

More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.

4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.

Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.

FEC decoding algorithm overview VLSI 자동설계연구실 정재헌.

1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ； Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.

Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy

Hidden Markov Models BMI/CS 576

Online Multiscale Dynamic Topic Models

Constrained Hidden Markov Models for Population-based Haplotyping

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture

Handwritten Characters Recognition Based on an HMM Model

Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.

Presentation transcript:

SPIRAL: Efficient and Exact Model Identification for Hidden Markov Models Yasuhiro Fujiwara (NTT Cyber Space Labs) Yasushi Sakurai (NTT Communication Science Labs) Masashi Yamamuro (NTT Cyber Space Labs) Speaker: Yasushi Sakurai

Motivation HMM(Hidden Markov Model) Goal Mental task classification Understand human brain functions with EEG signals Biological analysis Predict organisms functions with DNA sequences Many other applications Speech recognition, image processing, etc Goal Fast and exact identification of the highest-likelihood model for large datasets

Mini-introduction to HMM Observation sequence is a probabilistic function of states Consists of the three sets of parameters: Initial state probability : State at time State transition probability: Transition from state to Symbol probability: Output symbol in state

Mini-introduction to HMM HMM types Ergodic HMM Every state can be reached from every other state Left-right HMM Transitions to lower number states are prohibited Always begin with the first state Transition are limited to a small number of states Ergodic HMM Left-right HMM

Mini-introduction to HMM Viterbi path in the trellis structure Trellis structure: states lie on the vertical axis, the sequence is aligned along the horizontal axis Viterbi path: state sequence which gives the likelihood ・・・・ Viterbi path Trellis structure

Mini-introduction to HMM Viterbi algorithm Dynamic programming approach Maximize the probabilities from the previous states : the maximum probability of state at time

Problem Definition Given Find HMM dataset Sequence of arbitrary length Highest-likelihood model, estimated with respect to X, from the dataset 7

Why not ‘Naive’ Naïve solution But.. 1. Compute the likelihood for every model using the Viterbi algorithm 2. Then choose the highest-likelihood model But.. High search cost: time for every model Prohibitive for large HMM datasets m: # of states n: sequence length of X

Our Solution, SPIRAL Requirements: High-speed search Exactness Identify the model efficiently Exactness Accuracy is not sacrificed No restriction on model type Achieve high search performance for any type of models

Likelihood Approximation Reminder: Naive

Likelihood Approximation Create compact models (reduce the model size) For given m states and granularity g, Create m/g states by merging ‘similar’ states

Likelihood Approximation Use the vector Fi of state ui for clustering Merge all the states ui in a cluster C and create a new state uC Choose the highest probability among the probabilities of ui s: number of symbols Obtain the upper bounding likelihood

Likelihood Approximation Compute approximate likelihood from the compact model Upper bounding likelihood For approximate likelihood , holds Exploit this property to guarantee exactness in search processing : maximum probability of states

Likelihood Approximation Advantages The best model can not be pruned The approximation gives the upper bounding likelihood of the original model Support any model type Any probabilistic constraint is not applied to the approximation

Multi-granularities The likelihood approximation has the trade-off between accuracy and computation time As the model size increases, accuracy improves But the likelihood computation cost increases Q: How to choose granularity ?

Multi-granularities The likelihood approximation has the trade-off between accuracy and comparison speed As the model size increases, accuracy improves But the likelihood computation cost increases Q: How to choose granularity ? A: Use multiple granularities distinct granularities that form a geometric progression gi =2i (i=0,1,2,…,h) Geometrically increase the model size

Multi-granularities Compute the approximate likelihood from the coarsest model as the first step Coarsest model has states Prune the model if , otherwise : threshold

Multi-granularities Compute the approximate likelihood from the second coarsest model Second coarsest model has states Prune the model if

Multi-granularities Threshold Exploit the fact that we have found a good model of high likelihood : exact likelihood of the best-so-far candidate during search processing is updated and increases when promising model is found Use for model pruning 19

Multi-granularities Compute the approximate likelihood from the second coarsest model Second coarsest model has states Prune the model if , otherwise : exact likelihood of the best-so-far candidate 20

Multi-granularities Compute the likelihood from more accurate model Prune the model if

Multi-granularities Repeat until the finest granularity (the original model) Update the answer candidate and best-so-far likelihood if 22

Multi-granularities Optimize the trade-off between accuracy and computation time Low-likelihood models are pruned by coarse-grained models Fine-grained approximation is applied only to high-likelihood models Efficiently find the best model for a large dataset The exact likelihood computations are limited to the minimum number of necessary

Transition Pruning Trellis structure has too many transitions Q: How to exclude unlikely paths

Transition Pruning Trellis structure has too many transitions Q: How to exclude unlikely paths A: Use the two properties Likelihood is monotone non-increasing (likelihood computation) Threshold is monotone non-decreasing (search processing)

Transition Pruning In likelihood computation, compute the estimate eit : conservative estimate of the likelihood pit of state ui at time t If , prune all paths that pass through ui at t : exact likelihood of the best-so-far candidate where

Transition Pruning Terminate the likelihood computation if all the paths are excluded Efficient especially for long sequences Applicable to approximate likelihood computation

Accuracy and Complexity SPIRAL needs the same order of memory space, while can be up to times faster Accuracy Complexity Memory Space Computation time Viterbi Guarantee exactness SPIRAL At least At most

Experimental Evaluation Setup Intel Core 2 1.66GHz, 2GB memory Datasets EEG, Chromosome, Traffic Evaluation Mainly computation time Ergodic HMM Compared the Viterbi algorithm and Beam search Beam search: popular technique, but does not guarantee exactness

Experimental Evaluation Wall clock time versus number of states Wall clock time versus number of models Effect of likelihood approximation Effect of transition pruning SPIRAL vs Beam search 30

Experimental Evaluation Wall clock time versus number of states EEG: up to 200 times faster

Experimental Evaluation Wall clock time versus number of states Chromosome: up to 150 times faster

Experimental Evaluation Wall clock time versus number of states Traffic: up to 500 times faster

Experimental Evaluation Wall clock time versus number of states Wall clock time versus number of models Effect of likelihood approximation Effect of transition pruning SPIRAL vs Beam search 34

Experimental Evaluation Wall clock time versus number of models EEG: up to 200 times faster

Experimental Evaluation Wall clock time versus number of states Wall clock time versus number of models Effect of likelihood approximation Effect of transition pruning SPIRAL vs Beam search 36

Experimental Evaluation Effect of likelihood approximation Most of models are pruned by coarser approximations

Experimental Evaluation Wall clock time versus number of states Wall clock time versus number of models Effect of likelihood approximation Effect of transition pruning SPIRAL vs Beam search 38

Experimental Evaluation Effect of transition pruning SPIRAL find the highest-likelihood model more efficiently by transition pruning

Experimental Evaluation Wall clock time versus number of states Wall clock time versus number of models Effect of likelihood approximation Effect of transition pruning SPIRAL vs Beam search 40

Experimental Evaluation SPIRAL vs Beam search SPIRAL is significantly faster while it guarantees exactness Likelihood error ratio Note: SPIRAL gives no error Wall clock time SPIRAL is up to 27 times faster

Conclusion Design goals: SPIRAL achieves all the goals High-speed search SPIRAL is significantly (up to 500 times) faster Exactness We prove that it guarantees exactness No restriction on model type It can handle any HMM model type SPIRAL achieves all the goals