Download presentation
Presentation is loading. Please wait.
Published byAshlynn Bell Modified over 8 years ago
1
Variational Infinite Hidden Conditional Random Fields with Coupled Dirichlet Process Mixtures K. Bousmalis, S. Zafeiriou, L.-P. Morency, M. Pantic, Z. Ghahramani
2
Hidden Conditional Random Field X1X1 X2X2 X3X3 X4X4 X5X5 Head Nod Head Shake F0 Shoulder Shrug y P(y=‘Agreement’|X) = ? s1s1 s2s2 s3s3 s4s4 s5s5 P(y=‘Disagreement’|X) = ? haha hbhb hchc Shake Shrug Hand Wag Hand Scissor … F0 Energy Hidden States haha hbhb hchc
3
Weights and equivalent potentials for each relationship: – hidden states and labels θ y exp{θ y } – features and hidden states θ x exp{∑f t θ x } – transitions among hidden states and labels θ e exp{θ e } X1X1 X2X2 X3X3 X4X4 X5X5 y s1s1 s2s2 s3s3 s4s4 s5s5 haha hbhb hchc Hidden States Learned HCRF Model
4
Number of hidden states is not intuitive for behavior problems Computationally expensive cross-validation for the number of hidden states Solution: Allow for a potentially infinite number of hidden states X1X1 X2X2 X3X3 X4X4 X5X5 y s1s1 s2s2 s3s3 s4s4 s5s5 HCRF Problems haha hbhb hchc Hidden States
5
Motivation and Novelty Previous work introduced infinite-state HCRFs with an efficient MCMC sampling approach (IHCRF-MCMC) This work proposes a model that is a generalization of: Finite HCRFs : in terms of its ability to automatically determine its hidden structure without cross-validation IHCRF-MCMC: in terms of its ability to handle continuous input gracefully. We present a novel variational inference method for learning Deterministic alternative to MCMC Precise learning stopping criterion
6
Our Framework No a priori bound on the number of hidden states, by introducing a set of random variables These are drawn by distinct processes that allow the number of hidden states to grow with the data … and are incorporated in our potential:
7
The HCRF-DPM Model In our model, the π-variables are driven by coupled DPs. According to the stick-breaking properties: where ω μ = {h k, y}
8
The HCRF-DPM Model X y s1s1 s2s2 sΤsΤ αxαx αyαy αeαe πxπx πyπy πeπe ∞ with Actual Joint Distribution
9
Variational Approximation We approximate all π-variables (variational parameters τ) with a truncated stick-breaking representation, which approximates the infinite number of hidden states with a finite L: If L = 5: In practice, L is large enough for the actual sum to be really small!
10
Model Training Objective: Find the parameters {θ, τ} that minimize [q||p] We alternate, until convergence, between – a coordinate descent method for finding τ – a HCRF-like gradient ascent method to find θ
11
Experiments-Human Behavior Classification performance (F1) on 1.Agreement vs. Disagreement (ADA2) 2.Agreement vs. Disagreement vs. Neutral (ADA3) 3.Extreme Pain vs. No Pain (PAIN2) 4.Extreme vs. Moderate vs. No Pain (PAIN3)
12
Agreement and Disagreement Canal 9 Dataset of Political Debates – Ground truth based ONLY on verbal content – 11 debates- 28 distinct individuals – 53 episodes of agreement – 94 episodes of disagreement – 130 neutral episodes Binary Visual Features: Presence per frame of 8 gestures Continuous Auditory Feature: F0, Energy
13
UNBC Dataset of Pain Different levels of elicited shoulder pain in 200 sequences – 25 subjects Annotations of 12 pain-related facial action units (AUs) 2 classification problems – Extreme pain vs Minimal pain – Incl. Moderate pain
14
Classification Performance 10 different random initializations HCRFs cross-validated for – 2, 3, 4 and 5 hidden states – Regularization factor of 1, 10, 100 HCRF-DPM L=10 F1
15
No Overfitting HCRF-DPM Performance on the Canal 9 Validation Set
16
Node Features—HCRF-DPM, L = 50 Node Features—finite HCRF, K = 50 Hidden States 1020304050 Hidden States 1020304050 Sparsity
17
Future Avenues More datasets Using HDPs and Pitman-Yor processes Infinite Latent Dynamic CRFs X
18
Thank you! Poster Stand #46 for more details
19
Dirichlet Process Mixture A DPM model: a hierarchical Bayesian model that uses a DP as a nonparametric prior
20
HCRF-DPM π-sticks Although the π-variables are drawn by distinct processes, they are coupled together by a common latent variable assignment. Feature 1 Feature 2 Label 1 Label 2 Prev. State 1 Prev. State 2 Prev. State 3
21
Variational Approximation with Actual Joint Distribution Approximate Joint Distribution with
22
Initialize α x,α y,α e,θ,τ Initialize nbItrs, nbVarItrs itr = 0 converged = FALSE while (not converged) and (itr < nbItrs) do varItrs = 0 varConverged = FALSE while (not varConverged) and (varItr < nbVarItrs) do Compute q(s t =h k | i), q(s t =h k | y), q(s t =h k, y, s t-1 = h a ), i.e. the approximate marginals by using the forward-backward algorithm Hyperparameter posterior sampling for α x, α y, α e Calculate [q||p](varItr) Update τ varItr = varItr + 1 end while Gradient ascent to find θ(itr) by using a quasi-Newton method and an Armijo backtracking line search with projected gradients to keep θ non-negative itr = itr + 1 end while Model Training for Variational HCRF-DPM
23
Performance Evaluation Classification of Agreement and Disagreement – Leave-2-debates out for testing (5 folds) – Optimal parameter choice based on 3 debates Classification of Pain Levels – Leave-1-subject out for testing (25 folds) – Optimal parameter choice based on 7 subjects
24
Synthetic Dataset 0.4 0.1 0.4 0.1 0.4 0.1 0.4 0.10.70.1 0.70.1 0.7 0.1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.