Download presentation
Presentation is loading. Please wait.
Published byCamilla Pearson Modified over 8 years ago
1
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage
2
Speech Signal Representations I Decomposition of the speech signal (x[n]) as a source (e[n]) passed through a linear time- varying filter (h[n]).
3
Speech Signal Representations I Estimation of the filter, inspired by: Speech production models –Linear Predictive Coding (LPC) –Cepstral analysis Speech perception models (part II) –Mel-frequency cepstrum –Perceptual Linaer Prediction (PLP) Speech recognizers estimate filter characteristics and ignore the source
4
Speech Signal Representations I Short-Time Fourier Analysis Spectrogram –Representation of a signal highlighting several of its properties based on short-time Fourier analysis –Two dimensional: time horizontal and frequency vertical –Third ‘dimension’: gray or color level indicating energy
5
Speech Signal Representations I Short-Time Fourier Analysis Spectrogram –Narrow band Long windows (> 20 ms) → Narrow bandwidth Lower time resolution, better frequency resolution –Wide band Short windows ( <10 ms) → Wide bandwidth Good time resolution, lower frequency resolution –Pitch synchronous Requires knowledge of local pitch period
6
Speech Signal Representations I Short-Time Fourier Analysis Spectrogram
7
Speech Signal Representations I Short-Time Fourier Analysis Window analysis –Series of short segments, analysis frames –Short enough so that the signal is stationary –Usually constant, 20-30 ms –Overlaps possible –Different types of window functions (w m [n]): Rectangular (equal to no window function) Hamming Hanning
8
Speech Signal Representations I Short-Time Fourier Analysis Window analysis –Window size must be long enough Rectangular: N ≥ M Hamming, Hanning: N ≥ 2M –Pitch period not known in advance → –Prepare for lowest pitch period → –At least 20ms for rectangular or 40ms for Hamming/Hanning (50Hz) –But longer windows give a more average spectrum instead of distinct spectra → –Rectangular window has better time resolution
9
Speech Signal Representations I Short-Time Fourier Analysis
16
Window analysis –Frequency response not completely zero outside main lobe → Spectral leakage –Second lobe of a Hamming window is approx. 43dB below main lobe → less spectral leakage –Hamming, Hanning, triangular windows offer less spectral leakage → –Rectangular windows are rarely used despite their better time resolution
17
Speech Signal Representations I Short-Time Fourier Analysis
21
Short-time spectrum of male voice speech a)Time signal /ah/ local pitch 110Hz b)30ms rectangular window c)15ms rectangular window d)30ms Hamming window e)15ms Hamming window
22
Speech Signal Representations I Short-Time Fourier Analysis Short-time spectrum of female voice speech a)Time signal /aa/ local pitch 200Hz b)30ms rectangular window c)15ms rectangular window d)30ms Hamming window e)15ms Hamming window
23
Speech Signal Representations I Short-Time Fourier Analysis Short-time spectrum of unvoiced speech a)Time signal b)30ms rectangular window c)15ms rectangular window d)30ms Hamming window e)15ms Hamming window
24
Speech Signal Representations I Linear Predictive Coding LPC a.k.a. auto-regressive (AR) modeling All-pole filter is good approximation of speech, with p as the order of the LPC analysis: Predicts current sample as linear combination of past p samples
25
Speech Signal Representations I Linear Predictive Coding To estimate predictor coefficients (a k ), use short- term analysis technique Per segment, minimize the total prediction error by calculating the minimum squared error Take the derivative, equate it to 0; expressed as a set of p linear equations: the Yule-Walker equations
26
Speech Signal Representations I Linear Predictive Coding Solution of the Yule-Walker equations: –Any standard matrix inversion package –Due to the special form of the matrix, efficient solutions: Covariance method using the Cholesky decomposition Autocorrelation method using windows, results in equations with Toeplitz matrices, solved by the Durbin recursion algorithm Lattice method equivalent to Levinson Durbin recursion often used in fixed-point implementations because lack of precision doesn’t result in unstable filters
27
Speech Signal Representations I Linear Predictive Coding
29
Spectral analysis via LPC –All-pole (IIR) filter –Peaks at the roots of the denominator
30
Speech Signal Representations I Linear Predictive Coding Prediction error –Should be (approximately) the excitation –Unvoiced speech, expect white noise; OK –Voiced speech, expect impulse train; NOK All-pole assumption not altogether valid Real speech not perfectly periodic Pitch synchronous analysis gives better results –LPC order Larger p gives lower prediction errors Too large a p results in fitting the individual harmonics → separation between filter and source will not be so good
31
Speech Signal Representations I Linear Predictive Coding Prediction error –Inverse LPC filter gives residual signal
32
Speech Signal Representations I Linear Predictive Coding Alternatives for the predictor coefficients –Line Spectral Frequencies local sensitivity efficiency –Reflection Coefficients Guaranteed stable → useful for coefficient interpolated over time –Log-area ratios Flat spectral sensitivity –Roots of the polynomial Represent resonance frequencies and bandwidths
33
Speech Signal Representations I Cepstral Processing –A homomorphic transformation converts a convolution into a sum:
34
Speech Signal Representations I Cepstral Processing
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.