Bayesian Enhancement of Speech Signals Jeremy Reed
Outline Speech Model Bayes application MCMC algorithm Results
Speech Model Predict current speech sample from p previous samples (AR process) Justified by physics –Lossless acoustic tubes –Time for vocal tract to change shape Use a window of T samples for short-time analysis
Speech Model x 1 are corrupted or “bad” samples Prior for e~N(0, σ e 2 ) Prior, p(a, σ e 2 )=p(a, σ e 2 )~IG(σ e 2 ; α e, β e ) –α e, β e chosen to be broad enough to incorporate a (approach Jeffrey’s Prior) AR coefficients are normal with ML mean and variance related to error and samples
Speech Model v t is the channel noise v t ~ N(0, σ v 2 ) Inverse Gamma for prior on σ v 2 Can use different distribution if have prior knowledge on the channel’s characteristics
Bayesian Speech Enhancement x is the clean speech sequence y is x plus additive noise, v θ is a vector containing the parameters of the speech and noise
Algorithm Window audio segment of T samples, overlapping successive windows by p samples Assign initial values to a, σ v 2, and σ e 2 by using values from last p samples of previous windows For first window, inferences for these parameters drawn from p(x,θ|y)
Algorithm Perform Gibbs sampling for unknown parameters:
Algorithm R v is the covariance matrix for the corrupted samples and assumed diag(σ v 2 )
Results – 440 Hz Sine Wave
Results - Speech