INTRODUCTION Sibilant speech is aperiodic. the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ / we present a sibilant detection algorithm robust to high levels of noise
Gaussian for noisy speech signal Xk,i = power K = frequency i = time-frame µk,i = mean power
PSD for / ʃ /
Log-likelihood µ k,N1 = µ k,N2 = a k µ k,S = a k + b k
Maximizing the log-likelihood 74% of sibilant within 60 and 130 ms. |t| < 30 ms high probability sibilant |t| > 65 ms high probability outside the sibilant. reduces contribution of the transition region 30 ms < |t| < 65 ms
Maximizing the log-likelihood
Estimate noise and siblant
Estimated sibilant mean power
Maximum filter W = 30
Normalization To make the estimate independent of the overall speech level
Gaussian Mixture Model For each frame has two Gaussian mix- ture models (GMMs): one trained on non-sibilant speech and the other on sibilant speech.
EXPERIMENTS Filter for1.5 kHz to 8 kHz. The weighting function used for three Hamming windows
GMMs The input for the GMMs was a 14- component vector containing the estimated sibilant power spectrum from 1.5 kHz to 8 kHz every 500 Hz
Result White Gaussian noise was added to the speech files it is more difficult to detect sibilants in white noise than in other typical stationary noise
Result P miss = miss probability P fa = false alarm probability
Result
CONCLUSIONS we have presented a sibilant detection algorithm with noise sibilant mean power estimation stage likelihood ratio of two GMMs, Test in TIMIT. 80% classification accuracy for positive SNRs.
For Future it is possible that its classification accuracy could be further improved by applying temporal constraints to the classification decisions.
Thank you