Download presentation
Presentation is loading. Please wait.
Published byJeffry Baker Modified over 9 years ago
1
Siddiqi and Moore, www.autonlab.org Fast Inference and Learning in Large-State-Space HMMs Sajid M. Siddiqi Andrew W. Moore The Auton Lab Carnegie Mellon University
2
Siddiqi and Moore, www.autonlab.org HMM Overview Reducing quadratic complexity in the number of states The model Algorithms for fast evaluation and inference Algorithms for fast learning Results Speed Accuracy Conclusion
3
Siddiqi and Moore, www.autonlab.org HMM Overview Reducing quadratic complexity in the number of states The model Algorithms for fast evaluation and inference Algorithms for fast learning Results Speed Accuracy Conclusion
4
Siddiqi and Moore, www.autonlab.org Hidden Markov Models 1/3 q0q0 q1q1 q2q2 q3q3 q4q4 O0O0 O1O1 O2O2 O3O3 O4O4
5
Siddiqi and Moore, www.autonlab.org i P( q t+1 =s 1 |q t = s i ) P( q t+1 =s 2 |q t = s i )… P( q t+1 =s j |q t = s i )… P( q t+1 =s N |q t = s i ) 1 a 11 a 12 … a 1j … a 1N 2 a 21 a 22 … a 2j … a 2N 3 a 31 a 32 … a 3j … a 3N ::::::: i a i1 a i2 … a ij … a iN N a N1 a N2 … a Nj … a NN Transition Model 1/3 q0q0 q1q1 q2q2 q3q3 q4q4
6
Siddiqi and Moore, www.autonlab.org Each of these probability tables is identical i P( q t+1 =s 1 |q t = s i ) P( q t+1 =s 2 |q t = s i )… P( q t+1 =s j |q t = s i )… P( q t+1 =s N |q t = s i ) 1 a 11 a 12 … a 1j … a 1N 2 a 21 a 22 … a 2j … a 2N 3 a 31 a 32 … a 3j … a 3N ::::::: i a i1 a i2 … a ij … a iN N a N1 a N2 … a Nj … a NN Transition Model 1/3 q0q0 q1q1 q2q2 q3q3 q4q4 Notation:
7
Siddiqi and Moore, www.autonlab.org Observation Model q0q0 q1q1 q2q2 q3q3 q4q4 O0O0 O1O1 O2O2 O3O3 O4O4 i P( O t =1 |q t = s i ) P( O t =2 |q t = s i )… P( O t =k |q t = s i )… P( O t =M |q t = s i ) 1 b 1 (1)b 1 (2) … b 1 (k) … b 1 (M) 2 b 2 (1)b 2 (2) … b 2 (k) … b 2 (M) 3 b 3 (1)b 3 (2) … b 3 (k) … b 3 (M) : :::::: i b i (1)b i (2) … b i (k) … b i (M) : :::::: N b N (1)b N (2) … b N (k) … b N (M)
8
Siddiqi and Moore, www.autonlab.org Observation Model q0q0 q1q1 q2q2 q3q3 q4q4 O0O0 O1O1 O2O2 O3O3 O4O4 i P( O t =1 |q t = s i ) P( O t =2 |q t = s i )… P( O t =k |q t = s i )… P( O t =M |q t = s i ) 1 b 1 (1)b 1 (2) … b 1 (k) … b 1 (M) 2 b 2 (1)b 2 (2) … b 2 (k) … b 2 (M) 3 b 3 (1)b 3 (2) … b 3 (k) … b 3 (M) : :::::: i b i (1)b i (2) … b i (k) … b i (M) : :::::: N b N (1)b N (2) … b N (k) … b N (M) Notation:
9
Siddiqi and Moore, www.autonlab.org Some Famous HMM Tasks Question 1: State Estimation What is P(q T =S i | O 1 O 2 …O T )
10
Siddiqi and Moore, www.autonlab.org Question 1: State Estimation What is P(q T =S i | O 1 O 2 …O T ) Some Famous HMM Tasks
11
Siddiqi and Moore, www.autonlab.org Question 1: State Estimation What is P(q T =S i | O 1 O 2 …O T ) Some Famous HMM Tasks
12
Siddiqi and Moore, www.autonlab.org Question 1: State Estimation What is P(q T =S i | O 1 O 2 …O T ) Question 2: Most Probable Path Given O 1 O 2 …O T, what is the most probable path that I took? Some Famous HMM Tasks
13
Siddiqi and Moore, www.autonlab.org Question 1: State Estimation What is P(q T =S i | O 1 O 2 …O T ) Question 2: Most Probable Path Given O 1 O 2 …O T, what is the most probable path that I took? Some Famous HMM Tasks
14
Siddiqi and Moore, www.autonlab.org Question 1: State Estimation What is P(q T =S i | O 1 O 2 …O T ) Question 2: Most Probable Path Given O 1 O 2 …O T, what is the most probable path that I took? Some Famous HMM Tasks Woke up at 8.35, Got on Bus at 9.46, Sat in lecture 10.05-11.22…
15
Siddiqi and Moore, www.autonlab.org Some Famous HMM Tasks Question 1: State Estimation What is P(q T =S i | O 1 O 2 …O T ) Question 2: Most Probable Path Given O 1 O 2 …O T, what is the most probable path that I took? Question 3: Learning HMMs: Given O 1 O 2 …O T, what is the maximum likelihood HMM that could have produced this string of observations?
16
Siddiqi and Moore, www.autonlab.org Some Famous HMM Tasks Question 1: State Estimation What is P(q T =S i | O 1 O 2 …O T ) Question 2: Most Probable Path Given O 1 O 2 …O T, what is the most probable path that I took? Question 3: Learning HMMs: Given O 1 O 2 …O T, what is the maximum likelihood HMM that could have produced this string of observations?
17
Siddiqi and Moore, www.autonlab.org Some Famous HMM Tasks Question 1: State Estimation What is P(q T =S i | O 1 O 2 …O T ) Question 2: Most Probable Path Given O 1 O 2 …O T, what is the most probable path that I took? Question 3: Learning HMMs: Given O 1 O 2 …O T, what is the maximum likelihood HMM that could have produced this string of observations? Eat Bus walk a AB a BB a AA a CB a BA a BC a CC O t-1 O t+1 OtOt b A (O t-1 ) b B (O t ) b C (O t+1 )
18
Siddiqi and Moore, www.autonlab.org Basic Operations in HMMs For an observation sequence O = O 1 …O T, the three basic HMM operations are: ProblemAlgorithmComplexity Evaluation: Calculating P(O| ) Forward-Backward O(TN 2 ) Inference: Computing Q * = argmax Q P(O,Q| ) Viterbi Decoding O(TN 2 ) Learning: Computing * = argmax P(O| Baum-Welch (EM) O(TN 2 ) T = # timesteps, i.e. datapoints N = # states
19
Siddiqi and Moore, www.autonlab.org Basic Operations in HMMs For an observation sequence O = O 1 …O T, the three basic HMM operations are: ProblemAlgorithmComplexity Evaluation: Calculating P(O| ) Forward-Backward O(TN 2 ) Inference: Computing Q * = argmax Q P(O,Q| ) Viterbi Decoding O(TN 2 ) Learning: Computing * = argmax P(O| Baum-Welch (EM) O(TN 2 ) This talk: A simple approach to reducing the complexity in N T = # timesteps, i.e. datapoints N = # states
20
Siddiqi and Moore, www.autonlab.org HMM Overview Reducing quadratic complexity The model Algorithms for fast evaluation and inference Algorithms for fast learning Results Speed Accuracy Conclusion
21
Siddiqi and Moore, www.autonlab.org Reducing Quadratic Complexity in N Why does it matter? Quadratic HMM algorithms hinder HMM computations when N is large Several promising applications for efficient large-state-space HMM algorithms in topic modeling speech recognition real-time HMM systems such as for activity monitoring … and more
22
Siddiqi and Moore, www.autonlab.org Idea One: Sparse Transition Matrix Only K << N non- zero next-state probabilities
23
Siddiqi and Moore, www.autonlab.org Idea One: Sparse Transition Matrix Only K << N non- zero next-state probabilities
24
Siddiqi and Moore, www.autonlab.org Idea One: Sparse Transition Matrix Only K << N non- zero next-state probabilities Only O(TNK)!
25
Siddiqi and Moore, www.autonlab.org Idea One: Sparse Transition Matrix Only K << N non- zero next-state probabilities But can get very badly confused by “impossible transitions” Cannot learn the sparse structure (once chosen cannot change) Only O(TNK)!
26
Siddiqi and Moore, www.autonlab.org Dense-Mostly-Constant (DMC) Transitions K non-constant probabilities per row DMC HMMs comprise a richer and more expressive class of models than sparse HMMs a DMC transition matrix with K=2
27
Siddiqi and Moore, www.autonlab.org Dense-Mostly-Constant (DMC) Transitions The transition model for state i now consists of: K = the number of non-constant values per row NC i = { j : s i s j is a non-constant transition probability } c i = the transition probability for s i to all states not in NC i a ij = the non-constant transition probability for s i s j, K = 2 NC 3 = {2,5} c 3 = 0.05 a 32 = 0.25 a 35 = 0.6
28
Siddiqi and Moore, www.autonlab.org HMM Overview Reducing quadratic complexity in the number of states The model Algorithms for fast evaluation and inference Algorithms for fast learning Results Speed Accuracy Conclusion
29
Siddiqi and Moore, www.autonlab.org Evaluation in Regular HMMs P(q t = s i | O 1, O 2 … O t )
30
Siddiqi and Moore, www.autonlab.org Evaluation in Regular HMMs P(q t = s i | O 1, O 2 … O t ) = Where
31
Siddiqi and Moore, www.autonlab.org Evaluation in Regular HMMs P(q t = s i | O 1, O 2 … O t ) = Where Then,
32
Siddiqi and Moore, www.autonlab.org Evaluation in Regular HMMs P(q t = s i | O 1, O 2 … O t ) = Where Then, Called the “forward variables”
33
Siddiqi and Moore, www.autonlab.org
34
t t (1) t (2) t (3) … t (N) 1 2… 3 4 5 6 7 8 9
35
Siddiqi and Moore, www.autonlab.org t t (1) t (2) t (3) … t (N) 1 2… 3… 4 5 6 7 8 9
36
Siddiqi and Moore, www.autonlab.org t t (1) t (2) t (3) … t (N) 1 2… 3… 4 5 6 7 8 9 Cost O(TN 2 )
37
Siddiqi and Moore, www.autonlab.org Similarly, and Also costs O(TN 2 )
38
Siddiqi and Moore, www.autonlab.org Similarly, and Also costs O(TN 2 ) Called the “backward variables”
39
Siddiqi and Moore, www.autonlab.org Fast Evaluation in DMC HMMs
40
Siddiqi and Moore, www.autonlab.org Fast Evaluation in DMC HMMs O(N), but only computed once per row of the table! O(K) for each t ( j ) entry This yields O(TNK) complexity for the evaluation problem
41
Siddiqi and Moore, www.autonlab.org Fast Inference in DMC HMMs
42
Siddiqi and Moore, www.autonlab.org Fast Inference in DMC HMMs O(N 2 ) recursion in regular model:
43
Siddiqi and Moore, www.autonlab.org Fast Inference in DMC HMMs O(N 2 ) recursion in regular model: O(NK) recursion in DMC model: O(N), but only computed once per row of the table O(K) for each t ( j ) entry
44
Siddiqi and Moore, www.autonlab.org HMM Overview Reducing quadratic complexity in the number of states The model Algorithms for fast evaluation and inference Algorithms for fast learning Results Speed Accuracy Conclusion
45
Siddiqi and Moore, www.autonlab.org Learning a DMC HMM
46
Siddiqi and Moore, www.autonlab.org Learning a DMC HMM Idea One: Ask user to tell us the DMC structure Learn the parameters using EM
47
Siddiqi and Moore, www.autonlab.org Learning a DMC HMM Idea One: Ask user to tell us the DMC structure Learn the parameters using EM Simple! But in general, don’t know the DMC structure
48
Siddiqi and Moore, www.autonlab.org Learning a DMC HMM Idea Two: Use EM to learn the DMC structure also 1.Guess DMC structure 2.Find expected transition counts and observation parameters, given current model and observations 3.Find maximum likelihood DMC model given counts 4.Goto 2
49
Siddiqi and Moore, www.autonlab.org Learning a DMC HMM Idea Two: Use EM to learn the DMC structure also 1.Guess DMC structure 2.Find expected transition counts and observation parameters, given current model and observations 3.Find maximum likelihood DMC model given counts 4.Goto 2 DMC structure can (and does) change!
50
Siddiqi and Moore, www.autonlab.org Learning a DMC HMM Idea Two: Use EM to learn the DMC structure also 1.Guess DMC structure 2.Find expected transition counts and observation parameters, given current model and observations 3.Find maximum likelihood DMC model given counts 4.Goto 2 DMC structure can (and does) change! In fact, just start with an all-constant transition model
51
Siddiqi and Moore, www.autonlab.org Learning a DMC HMM 2.Find expected transition counts and observation parameters, given current model and observations
52
Siddiqi and Moore, www.autonlab.org We wantnew estimate of
53
Siddiqi and Moore, www.autonlab.org We wantnew estimate of
54
Siddiqi and Moore, www.autonlab.org We wantnew estimate of
55
Siddiqi and Moore, www.autonlab.org We wantnew estimate of where Applying Bayes rule to both terms gives us…
56
Siddiqi and Moore, www.autonlab.org We want where
57
Siddiqi and Moore, www.autonlab.org T N T N We want where
58
Siddiqi and Moore, www.autonlab.org T N T N Can get this in O(TN) time We want where
59
Siddiqi and Moore, www.autonlab.org We wantwhere T N T N Can get this in O(TN) time
60
Siddiqi and Moore, www.autonlab.org We want where T N T N
61
Siddiqi and Moore, www.autonlab.org We want where T N T N S N N S 24 *2 *4 Dot Product of Columns
62
Siddiqi and Moore, www.autonlab.org We want where T N T N S N N S 24 *2 *4 Dot Product of Columns O(TN 2 )
63
Siddiqi and Moore, www.autonlab.org We want where T N T N S N N S 24 *2 *4 Dot Product of Columns O(TN 2 ) Speedups: Strassen?
64
Siddiqi and Moore, www.autonlab.org We want where T N T N S N N S 24 *2 *4 Dot Product of Columns O(TN 2 ) Speedups: Strassen Approximate by DMC
65
Siddiqi and Moore, www.autonlab.org We want where T N T N S N N S 24 *2 *4 Dot Product of Columns O(TN 2 ) Speedups: Strassen Approximate by DMC Approximate randomized A T B
66
Siddiqi and Moore, www.autonlab.org We want where T N T N S N N S 24 *2 *4 Dot Product of Columns O(TN 2 ) Speedups: Strassen Approximate by DMC Approximate randomized A T B Sparse structure fine?
67
Siddiqi and Moore, www.autonlab.org We want where T N T N S N N S 24 *2 *4 Dot Product of Columns O(TN 2 ) Speedups: Strassen Approximate by DMC Approximate randomized A T B Sparse structure fine Fixed DMC is fine?
68
Siddiqi and Moore, www.autonlab.org We want where T N T N S N N S 24 *2 *4 Dot Product of Columns O(TN 2 ) Speedups: Strassen Approximate by DMC Approximate randomized A T B Sparse structure fine Fixed DMC is fine Speedup without approximation
69
Siddiqi and Moore, www.autonlab.org We want where T N T N S N N S 24 Insight One: only need the top K entries in each row of S Insight Two: Values in columns of and are often very skewed
70
Siddiqi and Moore, www.autonlab.org T NN -biggies(i) -biggies(j) For i = 1..N, store indexes of R largest values in i’th column of For j = 1..N, store indexes of R largest values in j’th column of There’s an important detail I’m omitting here to do with prescaling the rows of and .
71
Siddiqi and Moore, www.autonlab.org T NN -biggies(i) -biggies(j) For i = 1..N, store indexes of R largest values in i’th column of For j = 1..N, store indexes of R largest values in j’th column of R << T Takes O(TN) time to do all indexes
72
Siddiqi and Moore, www.autonlab.org T NN -biggies(i) -biggies(j) For i = 1..N, store indexes of R largest values in i’th column of For j = 1..N, store indexes of R largest values in j’th column of R << T Takes O(TN) time to do all indexes
73
Siddiqi and Moore, www.autonlab.org T NN -biggies(i) -biggies(j) For i = 1..N, store indexes of R largest values in i’th column of For j = 1..N, store indexes of R largest values in j’th column of R << T Takes O(TN) time to do all indexes
74
Siddiqi and Moore, www.autonlab.org T NN -biggies(i) -biggies(j) For i = 1..N, store indexes of R largest values in i’th column of For j = 1..N, store indexes of R largest values in j’th column of R << T Takes O(TN) time to do all indexes
75
Siddiqi and Moore, www.autonlab.org T NN -biggies(i) -biggies(j) For i = 1..N, store indexes of R largest values in i’th column of For j = 1..N, store indexes of R largest values in j’th column of R << T Takes O(TN) time to do all indexes R’th largest value in i’th column of O(1) time to obtain O(1) time to obtain (precached for all j in time O(TN) ) O(R) computation
76
Siddiqi and Moore, www.autonlab.org S N j 123N… S ij Computing the i’th row of S… In O(NR) time, we can put upper and lower bounds on S ij for j = 1,2.. N
77
Siddiqi and Moore, www.autonlab.org S N j 123N… S ij Computing the i’th row of S… In O(NR) time, we can put upper and lower bounds on S ij for j = 1,2.. N Only need exact values of S ij for the k largest values within the row
78
Siddiqi and Moore, www.autonlab.org S N j 123N… S ij Computing the i’th row of S… In O(NR) time, we can put upper and lower bounds on S ij for j = 1,2.. N Only need exact values of S ij for the k largest values within the row Ignore j’s that can’t be the best
79
Siddiqi and Moore, www.autonlab.org S N j 123N… S ij Computing the i’th row of S… In O(NR) time, we can put upper and lower bounds on S ij for j = 1,2.. N Only need exact values of S ij for the k largest values within the row Ignore j’s that can’t be the best Be exact for the rest: O(N) time each.
80
Siddiqi and Moore, www.autonlab.org S N j 123N… S ij Computing the i’th row of S… In O(NR) time, we can put upper and lower bounds on S ij for j = 1,2.. N Only need exact values of S ij for the k largest values within the row Ignore j’s that can’t be the best Be exact for the rest: O(N) time each. If there’s enough pruning, total time is O(TN+RN 2 )
81
Siddiqi and Moore, www.autonlab.org In Short … Sub-quadratic evaluation Sub-quadratic inference ‘Nearly’ sub-quadratic learning Fully connected transition models allowed
82
Siddiqi and Moore, www.autonlab.org In Short … Sub-quadratic evaluation Sub-quadratic inference ‘Nearly’ sub-quadratic learning Fully connected transition models allowed Some extra work to extract ‘important’ transitions from data
83
Siddiqi and Moore, www.autonlab.org HMM Overview Reducing quadratic complexity in the number of states The model Algorithms for fast evaluation and inference Algorithms for fast learning Results Speed Accuracy Conclusion
84
Siddiqi and Moore, www.autonlab.org Evaluation and Inference Speedup Dataset: synthetic data with T=2000 time steps
85
Siddiqi and Moore, www.autonlab.org Parameter Learning Speedup Dataset: synthetic data with T=2000 time steps
86
Siddiqi and Moore, www.autonlab.org HMM Overview Reducing quadratic complexity in the number of states The model Algorithms for fast evaluation and inference Algorithms for fast learning Results Speed Accuracy Conclusion
87
Siddiqi and Moore, www.autonlab.org Datasets DMC-friendly dataset: From 2-D gaussian 20-state DMC HMM with K=5 (20,000 train, 5,000 test) Anti-DMC dataset: From 2-D gaussian 20-state regular HMM with steadily varying, well-distributed transition probabilities (20,000 train, 5,000 test) Motionlogger dataset: Accelerometer data from two sensors worn over several days (10,000 train, 4,720 test)
88
Siddiqi and Moore, www.autonlab.org HMMs Used Regular and DMC HMMs: 20 states Baseline 1: 5-state regular HMM Baseline 2: 20-state HMM with uniform transition probabilities
89
Siddiqi and Moore, www.autonlab.org HMMs Used Regular and DMC HMMs: 20 states Baseline 1: 5-state regular HMM Baseline 2: 20-state HMM with uniform transition probabilities Do we really need a large HMM? Does the transition model matter?
90
Siddiqi and Moore, www.autonlab.org Learning Curves for DMC-friendly data
91
Siddiqi and Moore, www.autonlab.org Learning Curves for DMC-friendly data
92
Siddiqi and Moore, www.autonlab.org Learning Curves for DMC-friendly data
93
Siddiqi and Moore, www.autonlab.org Learning Curves for DMC-friendly data
94
Siddiqi and Moore, www.autonlab.org Learning Curves for DMC-friendly data
95
Siddiqi and Moore, www.autonlab.org Learning Curves for DMC-friendly data
96
Siddiqi and Moore, www.autonlab.org Learning Curves for DMC-friendly data DMC model achieves full model score!
97
Siddiqi and Moore, www.autonlab.org Learning Curves for DMC-friendly data DMC model achieves full model score!
98
Siddiqi and Moore, www.autonlab.org Learning Curves for Anti-DMC data
99
Siddiqi and Moore, www.autonlab.org Learning Curves for Anti-DMC data
100
Siddiqi and Moore, www.autonlab.org Learning Curves for Anti-DMC data
101
Siddiqi and Moore, www.autonlab.org Learning Curves for Anti-DMC data
102
Siddiqi and Moore, www.autonlab.org Learning Curves for Anti-DMC data
103
Siddiqi and Moore, www.autonlab.org Learning Curves for Anti-DMC data
104
Siddiqi and Moore, www.autonlab.org Learning Curves for Anti-DMC data DMC model worse than full model
105
Siddiqi and Moore, www.autonlab.org Learning Curves for Anti-DMC data DMC model worse than full model
106
Siddiqi and Moore, www.autonlab.org Learning Curves for Motionlogger data
107
Siddiqi and Moore, www.autonlab.org Learning Curves for Motionlogger data
108
Siddiqi and Moore, www.autonlab.org Learning Curves for Motionlogger data
109
Siddiqi and Moore, www.autonlab.org Learning Curves for Motionlogger data
110
Siddiqi and Moore, www.autonlab.org Learning Curves for Motionlogger data
111
Siddiqi and Moore, www.autonlab.org Learning Curves for Motionlogger data
112
Siddiqi and Moore, www.autonlab.org Learning Curves for Motionlogger data DMC model achieves full model score!
113
Siddiqi and Moore, www.autonlab.org Learning Curves for Motionlogger data DMC model achieves full model score! Baselines do much worse
114
Siddiqi and Moore, www.autonlab.org Regularization with DMC HMMs # of transition parameters in regular 100-state HMM: 10,000 # of transition parameters in DMC 100-state HMM with K= 5 : 500
115
Siddiqi and Moore, www.autonlab.org Tradeoffs between N and K We vary N and K while keeping the number of transition parameters (N×K) constant Increasing N and decreasing K allows more states for modeling data features but fewer parameters per state for temporal structure
116
Siddiqi and Moore, www.autonlab.org Tradeoffs between N and K Average test-set log-likelihoods at convergence Datasets: A: DMC-friendly B: Anti-DMC C: Motionlogger
117
Siddiqi and Moore, www.autonlab.org Tradeoffs between N and K Average test-set log-likelihoods at convergence Datasets: A: DMC-friendly B: Anti-DMC C: Motionlogger Each dataset has a different optimal N-vs-K tradeoff
118
Siddiqi and Moore, www.autonlab.org HMM Overview Reducing quadratic complexity in the number of states The model Algorithms for fast evaluation and inference Algorithms for fast learning Results Speed Accuracy Conclusion
119
Siddiqi and Moore, www.autonlab.org Conclusions DMC HMMs are an important class of models that allow parameterized complexity-vs-efficiency tradeoffs in large state spaces
120
Siddiqi and Moore, www.autonlab.org Conclusions DMC HMMs are an important class of models that allow parameterized complexity-vs-efficiency tradeoffs in large state spaces The speedup can be several orders of magnitude
121
Siddiqi and Moore, www.autonlab.org Conclusions DMC HMMs are an important class of models that allow parameterized complexity-vs-efficiency tradeoffs in large state spaces The speedup can be several orders of magnitude Even for non-DMC domains, DMC HMMs yield higher scores than baseline models
122
Siddiqi and Moore, www.autonlab.org Conclusions DMC HMMs are an important class of models that allow parameterized complexity-vs-efficiency tradeoffs in large state spaces The speedup can be several orders of magnitude Even for non-DMC domains, DMC HMMs yield higher scores than baseline models The DMC HMM model can be applied to arbitrary state spaces and observation densities
123
Siddiqi and Moore, www.autonlab.org Related Work Felzenszwalb et al. (2003) – fast HMM algorithms when transition probabilities can be expressed as distances in an underlying parameter space Murphy and Paskin (2002) – fast inference in hierarchical HMMs cast as DBNs Salakhutdinov et al. (2003) – combines EM and conjugate gradient for faster HMM learning when missing information amount is high Ghahramani and Jordan (1996) – Factorial HMMs for distributed representation of large state spaces Beam Search – widely used heuristic in viterbi inference for speech systems
124
Siddiqi and Moore, www.autonlab.org Future Work Eliminate R parameter using an automatic backoff evaluation approach Investigate DMC HMMs as regularization mechanism Compare robustness against overfitting with factorial HMMs for large-state-space problems
125
Siddiqi and Moore, www.autonlab.org Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.