Download presentation
Presentation is loading. Please wait.
Published bySheryl Hamilton Modified over 9 years ago
1
SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain
2
Contents Bayesian Networks Automatic Speech Recognition using Dynamic BNs Auxiliary variables Experiments with energy as an auxiliary variable Conclusions
3
What is a Bayesian Network? A BN is a type of graphical model composed of: A directed acyclic graph (DAG) A set of variables V = {v 1,…,v N } A set of probability density functions P(v n |parents(v n )) Example: P(V) = P(v 1,v 2,v 3 ) = P(v 1 |v 2 ) P(v 2 ) P(v 3 |v 2 ) v3v3 v1v1 v2v2 Joint distribution of V: P(V) = P(v n |parents(v n )) n=1 N
4
Automatic Speech Recognition (ASR) Feature extraction Statistical models X = {x 1,…,x T } LPC, MFCC,...HMM, ANN,... M j = argmax P(M k |X) = argmax P(X|M k ) P(M k ) P(X|M k ) = p(x t |q t ) p(q t |q t-1 ) MjMj M 1 : ‘cat’ M 2 : ‘dog’ … M K : ‘tiger’ {M k } T t=1
5
ASR with Dynamic Bayesian Networks acoustics x t phone q t /k//a/ /t/ Equivalent to a standard HMM t = 1t = 2t = 3t = 4
6
ASR with Dynamic Bayesian Networks P(q t | q t-1 ) q t-1 p(x t |q t =k) ~ N x ( k, k ) qtqt xtxt x t-1
7
Auxiliary information (1) Main advantage of BNs: –Flexibility in defining dependencies between variables Energy damage the system performance if it is appended to the feature vector BNs allow us to use it in an alternative way: –Conditioning the emission distributions upon this auxiliary variable –Marginalizing it out in recognition
8
Auxiliary information (2) p(x t | q t =k, a t =z) ~ N x ( k +B k z, k ) The value of a t affects the value of x t qtqt atat xtxt
9
Auxiliary information (3) p(a t | q t =k) ~ N a ( ak, ak ) p(x t | q t =k, a t =z) ~ N x ( k +B k z, k ) The value of the auxiliary variable can be influenced by the hidden state q t qtqt atat xtxt
10
Auxiliary information (4) qtqt atat xtxt Equivalent to appending the auxiliary variable to the feature vector p(x t, a t | q t =k) ~ N xa ( k xa, k xa )
11
qtqt atat xtxt Hiding auxiliary information We can also marginalize out (hide) the auxiliary variable in recognition Useful when: It is noisy It is not accessible p(x t |q t ) = p(x t |q t,a t ) p(a t |q t )da t
12
Experimental setup Isolated word recognition Small vocabulary (75 words) Feature extraction: Mel Frequency Cepstral Coefficients (MFCC) p(x t |q t ) modeled with 4 mixtures of gaussians p(a t |q t ) modeled with 1 gaussian
13
Experiments with Energy as an auxiliary variable WER Observed Energy Hidden Energy System 1 6.9 %5.3 % System 2 6.1 %5.6 % System 3 5.8 %5.9 % Baseline5.9 % E = log s 2 [n]w 2 [n] N n=1 Baseline System 1 System 2 System 3
14
Conclusions BNs are more flexible than HMMs. You can easily: –Change the topology of the distributions –Hide variables when necessary Energy can improve the system performance if used in a non-traditional way
15
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.