Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speech Recognition Chapter 3

Similar presentations


Presentation on theme: "Speech Recognition Chapter 3"— Presentation transcript:

1 Speech Recognition Chapter 3

2 Speech Front-Ends Linear Prediction Analysis
Linear-Prediction Based Processing Cepstral Analysis Auditory signal Processing

3 Linear Prediction Analysis
Introduction Linear Prediction Model Linear Prediction Coefficients Computation Linear Prediction for Automatic Speech Recognition Linear Prediction in Speech Processing How good is the LP Model.

4 Signal Processing Front End
Convert the speech waveform in some type of parametric representation. sk Filterbank Signal Processing Front End Linear Prediction Front End Linear Prediction Coefficients O=o(1)o(2)..o(T)

5 Introduction In short intervals, it provides a good model of the speech. Mathematical precise and simple. Easy to implement in software or hardware. Works fine for recognition applications. It also has applications in formant and pitch estimation, speech coding and synthesis.

6 Linear Prediction Model
Basic idea: are called LP(Linear Prediction) coefficients. By including the excitation signal, we obtain: where is the normalised excitation and is the gain of the excitation.

7 In the z-domain (secc. 1.1.4, pp. 15, Deller)
leading to the transfer function (Fig. 3.27)

8 LP model retains the spectral magnitude, but it has a minimum phase (Sec. 1.1.7, Deller) feature.
However, in practice, phase is not very important for speech perception. Observation: H(z) models the glottal filter(G(z)) and the lips radiation(R(z).

9 Linear Prediction Coefficients Computation
Introduction Methogologies

10 Linear Prediction Coefficients Computation
LP coefficients can be obtained by solving the next equation system (Secc , Prove ):

11 Methodologies Autocorrelation Method Covariance Method
Not commonly used in Speech Recognition

12 Autocorrelation Method
Assumptions: Each frame is independent (Fig ). Solution (Juang, secc pp ): where (2) M es el número de parametros LPC. These equations are know as Yule-Walker equations.

13 Using matrix notation:
or

14 Features Symetric. Diagonal elements are the same. Toeplitz Matriz

15 This matrix is known as Toeplitz
This matrix is known as Toeplitz. A linear system with this matrix can be solved very efficient. Examples (Fig and ) Example (Fig ) Example (Fig ) Example (Fig )

16 Linear Prediction for Automatic Speech Recogition
To minimise signal discontinuity Flats the spectrum equation (2) usually M=8 Incorporate signal dynamics to minimise noise sensitivity To Cepstral Coefficients Durbin Algorithm

17 Preemphasis The transfer function of the glottis can be modelled as follows: The radiation effect can be modelled as follows:

18 Hence, to obtain the transfer function of the vocal tract
the other pole must be cancelled as follows:.

19 Preemphasis sould be done only for sonorant sounds.
This process can be automated as follows. where is the autocorrelation function.

20 N samples size frame, M samples frame shift

21 Minimize signal discontinuities at the edges of the frames.
A typical window is the Hamming window.

22

23 LPC Analysis Converts the autocorrelations coefficients into LPC “parameter set”. LPC Parameter set LPC coefficients Reflection (PARCOR) coefficients log area ratio coefficients The formal method to obtain the LPC parameter set is know as Durbin’s method.

24 Durbin’s method

25

26 LPC (Typical values)

27 LPC Parameter Conversion
Conversion to Cepstral Coeficients. Robust feature set for speech recognition. Algorithm:

28 Parameter weighting low-order cepstral coefficents are highly sensibles to noise

29 Temporal Cepstral Derivative
First or second order derivatives is enough. It can be aproximated as follows:

30

31

32 Given

33 Hamming Windowed Large prediction errors since speech is predicted form previous samples arbitray set to zero.

34 Large prediction errors
since speech is predicted form previous samples arbitray set to zero.

35 Unvoiced signals are not position sensitive. It does not show special effect at the edges.

36 Observe the “whitening” phenomena at the error spectrum.

37 Observe the “whitening
phenomena at the error specturm

38 Observe the error wave periodicity behaviour taken as bases for the Pitch Estimators.

39 Observe that a sharp decrease
in the prediction error is obtain for small M value (M=1...4). Observe that unvoiced signal has higher RMS error.

40 Observe the all-pole model
ability to match the spectrum.

41 Linear Prediction in Speech Processing
LPC for Vocal Tract Shape Estimation LPC for Pitch Detection LPC for Formant prediction

42 LPC for Vocal Tract Shape Estimation
To minimise signal discontinuity Free of glottis and radiation effects Vocal Tract Shape Estimation Parameter Calculation to minimise noise sensitivity To Cepstral Coefficients

43 Parameter Calculation
Durbin’s Method (As in Speech Recognition) In case, this method is used, first the autocorrelation analysis should be performed. Lattice Filter

44 Lattice Filter The reflection coefficients are obtain directly form the signal, avoiding the autocorrelation analysis. Methods: Itakura-Saito (Parcor) Burg New forms Advantage: Easier to implement in Hardware Disadvantage: needs around 5 times more calculation.

45 Itakura-Saito (PARCOR)
where Accumulates over time (n). It can be shown that the PARCOR coefficients, obtain for the Itakura-Saito method are exactly the same as the reflection coefficients obtained by the Levison Durbin algorithm. Example

46 Burg where Example

47 Example Itakura-Saito Burg

48 New Forms Stroback, New forms of Levinson and Schur algorithms, IEEE Signal Processing Magazine, pp , 1991.

49 Vocal Tract Shape Estimation
From: We obtain Therefore, by setting the the lips area to an arbitrary value we can obtain the vocal tract configuration relative to the initial condition. This technique as been succesfully used to train deaf persons.

50 LPC for Pitch Detection
Speech Sampled at 10KHz Inverse Filering A(z) LPF 800Hz DownSampler 5:1 Peak finding Autocorrelation LPC Analysis V/U decision or Pitch

51 LPC for Formant Detection
Sampled Speech Formants LPC Spectrum Emphasis Peaks (second derivative) Peak finding LPC Analysis

52 LPC Spectrum LP assumes that the vocal tract system can be modelled with an all-pole system: The spectrum can be obtain by In order to emphasis formant peaks we can set

53 In order to increase the spectral resolution we pad with zeros:
Therefore Spectrum (DTFT) Spectrum (DFT) In order to increase the spectral resolution we pad with zeros: In order to use an FFT algorithm

54 Caclulate the Spectral magnitude(DFT)
Invert the Spectral magnitude(DFT) This spectrum is called the LPC Spectrum.

55 How good is the LP Model As shown by the physiological analysis of the vocal tract the speech model is as follows: However, it can be shown ( ), that LP Model is good for estimating the magnitude of pole-zero system.

56 Prove According to lema 1 ( ) and lema 2 ( ) , can be written as follows: The estimates are calculated such that it correspond to the of this model. All pass component

57 Since hence therefore, if the estimators, are exacts, then at least we obtain a model with a correct magnitude.

58 Lema 1 Lema 1(System Decomposition): Any causal ration system
can be descomponed as (prove ): Minimal phase component

59 Prove For two poles and two zeros: Lets define:
Re-arranging this equation:

60 With the knowledge that:
Hence:

61 Therefore: End of prove.

62 Lema 2 Lema 2: Minimum phase component can be expresed as an all-pole system: in theory goes to infinity, in practice is limited.

63 Linear Prediction Based Procesing
Critics to the Linear Prediction Model Perceptual Linear Prediction (PLP) LP Cepstra

64 Critics to the Linear Prediction Model
The LP spectrum approximate the speech spectrum equally well at all frequencies of the analysis band. This property is inconsistent with the human hearing.

65 Precepual Linear Prediction (PLP)
Critical Band Spectral Analysis Equal Loudness Pre-emphasis Intensity Loudness IDFT Yule-Walker Equations Solutions

66 Critical Band Analysis
Speech Signal Frame Critical Band Spectral Resolution Short-Term Spectra Windowing DFT (20 ms) (200 samples 56 zeros for padding for Ts=10KHz)c DFT (20 ms Hamming Window

67 Critical-Band Spectral Resolution
Frequency Warping (Hertz -> Barks) Convolution and Downsampling filter-bank masking curve approximation

68 Equal Loudness Pre-emphasis
Approximate the non-equal sensitivity of the human hearing at different frequencies.

69 Intensitive Loudnes Power Law
Approximate the non-linear relation between the intensity of sound and its perceived loudness.

70 Cepstral Analysis Introduction Homomorphic Processing
Cepstral Spectrum Cepstrum Mel-Cepstrum Cepstrum in Speech Processing

71 Introduction When speech is pre-emphasised
The excitation is not necessary for estimate the vocal tract function. Therefore, it is desirable to separate the excitation information form the vocal tract information.

72 We can think the speech spectrum as a signal,
we can observer that is composed for the multiplication of a slow signal, and a fast signal, Therefore, we can try to obtain the best of this knowledge. The formal technique which exploit this feature is called “Homomorphic Processing”.

73 Homomorphic Processing
It is a technique to filter no-lineal systems. In Homomorphic Processing the non-linear related signals are transform the signal to a linear domain. H[ ] F(z) H-1[ ]

74 log[ ] S+(z) exp[ ] In order to obtain a linear system a complex
log transformation is applied to the speech spectrum. log[ ] S+(z) exp[ ]

75 Cepstral Spectrum Definition. where is the STFT

76 Cepstrum Definition.

77 Cepstrum In Speech Processing
Pitch Estimation Format Estimation Pitch and Formant Estimation

78 Pitch Estimation Sampled Speech High-Pass Liftering Emphasis Peaks
(second derivative) Peak finding Cepstrum Pitch

79 Formant Estimation Sampled Speech Low-Pass Liftering Emphasis Peaks
(second derivative) Peak finding Cepstrum Formants

80 Pitch and Formant Estimation
Sampled Speech High-Pass Liftering Emphasis Peaks (second derivative) Peak finding Cepstrum Pitch Low-Pass Liftering Emphasis Peaks (second derivative) Peak finding Formants


Download ppt "Speech Recognition Chapter 3"

Similar presentations


Ads by Google