Presentation is loading. Please wait.

Presentation is loading. Please wait.

III. Analysis of Modulation Metrics IV. Modifications

Similar presentations


Presentation on theme: "III. Analysis of Modulation Metrics IV. Modifications"— Presentation transcript:

1 III. Analysis of Modulation Metrics IV. Modifications
I. Introduction III. Analysis of Modulation Metrics IV. Modifications Standard Deviation of MER (σMER) Increasing Envelope Fine Structure Speech Transmission Index (STI) predicts speech intelligibility in noisy and reverberant environments Based on change in modulation depth between clean and degraded envelopes in 125 Hz - 8 kHz octave bands Originally used artificial probe signals, later adapted to use speech Empirical methods do not require a priori knowledge of degradation condition Short-time speech-based STI (ssSTI) algorithm developed by Payton & Shrestha [1] Reliable predictor of intelligibility using windows as short as 1/3 s Implemented with Envelope Regression (ER) STI method: MER computed from single speech string degraded by 200 realizations of noise Yields 200 realizations of MER for each window Mean and STD of MER computed across realizations σMER inversely proportional to Window Length Octave Band Adding envelope fine structure (increasing LPF cutoff) reduces σMER Low octave bands (< 1kHz) and short windows (< 320 ms) benefit most Greatest reduction when LPF cutoff is slightly higher than twice the octave band upper cutoff f Spectral Content after squaring 2x band center frequency Theoretical Method ssSTI 160 ms window 320 ms window 80 ms window ssSTI vs. Theoretical STI LPF Cutoff for 95th Percentile of σMER ≤ 0.15 Standard Deviation of MER , 50 Hz LPF Window Length - 250 Hz 1 kHz Band 4 kHz Octave Band 320 ms Lowpass Filter Cutoff (Hz) Applications: Predict Intelligibility In fluctuating noise environments Differences between talkers In a real-time environment - 250 Hz 1 kHz Band 4 kHz 160 ms Window 884 Hz LPF 309 Hz LPF 110 Hz LPF Current Study Focus on 0 dB SNR with stationary speech-shaped Gaussian noise degradation Aim – Understand & Correct: Deviation from Theoretical (TH) Method for windows shorter than 1/3 s Non-zero STI during silence (data points on vertical axis) 80 ms Time (s) Silence Detection Time (s) II. Short-time STI Computation Reduces non-zero STI during silence and deviation from TH method for low STI values If μxk is below threshold relative to long term mean, Mk set to zero Long term mean approximated with exponential smoothing Silence Detector Applied to MER 250 Hz Band, 80 ms Window 8 kHz Band Extract Envelope Degraded Clean Mk aSNRk TIk STI t 125 Hz Band Transmission Index Apparent SNR Octave Banpass Filter f 2 Modulation Metric Intensity (Square) 50 Hz Lowpass Filter y7[m] x7[m] y1[m] x1[m] Rectangular Window Algebraic Expansion of MER MER can be algebraically expanded to resemble MTH plus error Dropping k subscript, Zero if speech & noise uncorrelated Time (s) Not zero because z is correlated with speech Combined Modifications ob subscript denotes octave band signal Modified ssSTI vs. Theoretical STI 320 ms 160 ms 80 ms Broadband Speech σMER Inversely proportional to variance of clean speech envelope, σx2 Modified ssSTI 1 kHz Octave Band Speech clipped to ±15 dB MER Hz Band, 80 ms Window 1 kHz Band Intensity During Silence μxk: Mean of clean envelope, xk[m] μyk : Mean of degraded envelope, yk[m] μnk : Mean of noise envelope, nk[m] (not accessible in natural environments) Cxyk: Covariance of clean and degraded envelopes σxk2: Variance of clean envelope αk & βk: Octave band weighting & redundancy correction factors, respectively Theoretical Method 1 kHz Band Intensity Envelope V. Conclusions Time (s) Modifications significantly improve performance of the ssSTI for windows shorter than 320 ms and provide slight improvement for longer windows For windows shorter than 160 ms, deviation from TH method is significantly reduced but still greater than that of longer windows (e.g. 1 s) References: [1] K. L. Payton, M. Shrestha, 2013 “Comparison of a short-time speech-based intelligibility metric to the speech transmission index and intelligibility data” J. Acoust. Soc. Am., 134, Theoretical Method (requires noise envelope mean) Envelope Regression Method Reduce σMER → Reduce Deviation from MTH

2 2aSC7: Error Analysis and Modifications to the Short-Time Speech Transmission Index
Karen L. Payton and Matthew J. Ferreira; Electrical & Computer Engineering Department University of Massachusetts Dartmouth, N. Dartmouth, MA I. Introduction III. Analysis of Modulation Metrics IV. Modifications


Download ppt "III. Analysis of Modulation Metrics IV. Modifications"

Similar presentations


Ads by Google