Download presentation
Presentation is loading. Please wait.
Published byChristian Norris Modified over 9 years ago
1
1 2.5.4.1 Basics of Neural Networks
2
2 2.5.4.2 Neural Network Topologies
3
3
4
4
5
5TDNN
6
6 2.5.4.6 Neural Network Structures for Speech Recognition
7
7
8
8 3.1.1 Spectral Analysis Models
9
9
10
10 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSOR
11
11 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSOR
12
12 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSOR
13
13 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSOR
14
14 3.2 THE BANK-OF-FILTERS FRONT- END PROCESSOR
15
15 3.2.1 Types of Filter Bank Used for Speech Recognition
16
16 Nonuniform Filter Banks
17
17 Nonuniform Filter Banks
18
18 3.2.1 Types of Filter Bank Used for Speech Recognition
19
19 3.2.1 Types of Filter Bank Used for Speech Recognition
20
20 3.2.2 Implementations of Filter Banks Instead of direct convolution, which is computationally expensive, we assume each bandpass filter impulse response to be represented by: Instead of direct convolution, which is computationally expensive, we assume each bandpass filter impulse response to be represented by: Where w(n) is a fixed lowpass filter
21
21 3.2.2 Implementations of Filter Banks
22
22 3.2.2.1 Frequency Domain Interpretation of the Short- Time Fourier Transform
23
23 3.2.2.1 Frequency Domain Interpretation of the Short-Time Fourier Transform
24
24 3.2.2.1 Frequency Domain Interpretation of the Short-Time Fourier Transform
25
25 3.2.2.1 Frequency Domain Interpretation of the Short-Time Fourier Transform
26
26 Linear Filter Interpretation of the STFT
27
27 3.2.2.4 FFT Implementation of a Uniform Filter Bank
28
28 Direct implementation of an arbitrary filter bank
29
29 3.2.2.5 Nonuniform FIR Filter Bank Implementations
30
30 3.2.2.7 Tree Structure Realizations of Nonuniform Filter Banks
31
31 3.2.4 Practical Examples of Speech- Recognition Filter Banks
32
32 3.2.4 Practical Examples of Speech- Recognition Filter Banks
33
33 3.2.4 Practical Examples of Speech- Recognition Filter Banks
34
34 3.2.4 Practical Examples of Speech- Recognition Filter Banks
35
35 3.2.5 Generalizations of Filter-Bank Analyzer
36
36 3.2.5 Generalizations of Filter-Bank Analyzer
37
37 3.2.5 Generalizations of Filter-Bank Analyzer
38
38 3.2.5 Generalizations of Filter-Bank Analyzer
39
39
40
40
41
41
42
42
43
43
44
44
45
45
46
46 روش مل - کپستروم روش مل - کپستروم Mel-scaling فریم بندی IDCT |FFT| 2 Low-order coefficients Differentiator Cepstra Delta & Delta Delta Cepstra سیگنال زمانی Logarithm
47
47 Time-Frequency analysis Short-term Fourier Transform Short-term Fourier Transform Standard way of frequency analysis: decompose the incoming signal into the constituent frequency components. Standard way of frequency analysis: decompose the incoming signal into the constituent frequency components. W(n): windowing function W(n): windowing function N: frame length N: frame length p: step size p: step size
48
48 Critical band integration Related to masking phenomenon: the threshold of a sinusoid is elevated when its frequency is close to the center frequency of a narrow-band noise Related to masking phenomenon: the threshold of a sinusoid is elevated when its frequency is close to the center frequency of a narrow-band noise Frequency components within a critical band are not resolved. Auditory system interprets the signals within a critical band as a whole Frequency components within a critical band are not resolved. Auditory system interprets the signals within a critical band as a whole
49
49 Bark scale
50
50 Feature orthogonalization Spectral values in adjacent frequency channels are highly correlated Spectral values in adjacent frequency channels are highly correlated The correlation results in a Gaussian model with lots of parameters: have to estimate all the elements of the covariance matrix The correlation results in a Gaussian model with lots of parameters: have to estimate all the elements of the covariance matrix Decorrelation is useful to improve the parameter estimation. Decorrelation is useful to improve the parameter estimation.
51
51 Cepstrum Computed as the inverse Fourier transform of the log magnitude of the Fourier transform of the signal Computed as the inverse Fourier transform of the log magnitude of the Fourier transform of the signal The log magnitude is real and symmetric -> the transform is equivalent to the Discrete Cosine Transform. The log magnitude is real and symmetric -> the transform is equivalent to the Discrete Cosine Transform. Approximately decorrelated Approximately decorrelated
52
52 Principal Component Analysis Find an orthogonal basis such that the reconstruction error over the training set is minimized Find an orthogonal basis such that the reconstruction error over the training set is minimized This turns out to be equivalent to diagonalize the sample autocovariance matrix This turns out to be equivalent to diagonalize the sample autocovariance matrix Complete decorrelation Complete decorrelation Computes the principal dimensions of variability, but not necessarily provide the optimal discrimination among classes Computes the principal dimensions of variability, but not necessarily provide the optimal discrimination among classes
53
53 Principal Component Analysis (PCA) Mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components (PC) Mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components (PC) Find an orthogonal basis such that the reconstruction error over the training set is minimized Find an orthogonal basis such that the reconstruction error over the training set is minimized This turns out to be equivalent to diagonalize the sample autocovariance matrix This turns out to be equivalent to diagonalize the sample autocovariance matrix Complete decorrelation Complete decorrelation Computes the principal dimensions of variability, but not necessarily provide the optimal discrimination among classes Computes the principal dimensions of variability, but not necessarily provide the optimal discrimination among classes
54
54 PCA (Cont.) Algorithm Algorithm Apply Transform Output = (R- dim vectors) Input= (N-dim vectors) Covariance matrix Transform matrix Eigen values Eigen vectors
55
55 PCA (Cont.) PCA in speech recognition systems PCA in speech recognition systems
56
56 Linear discriminant Analysis Find an orthogonal basis such that the ratio of the between-class variance and within-class variance is maximized Find an orthogonal basis such that the ratio of the between-class variance and within-class variance is maximized This also turns to be a general eigenvalue- eigenvector problem This also turns to be a general eigenvalue- eigenvector problem Complete decorrelation Complete decorrelation Provide the optimal linear separability under quite restrict assumption Provide the optimal linear separability under quite restrict assumption
57
57 PCA vs. LDA
58
58 Spectral smoothing Formant information is crucial for recognition Formant information is crucial for recognition Enhance and preserve the formant information: Enhance and preserve the formant information: Truncating the number of cepstral coefficients Truncating the number of cepstral coefficients Linear prediction: peak-hugging property Linear prediction: peak-hugging property
59
59 Temporal processing To capture the temporal features of the spectral envelop; to provide the robustness: To capture the temporal features of the spectral envelop; to provide the robustness: Delta Feature: first and second order differences; regression Delta Feature: first and second order differences; regression Cepstral Mean Subtraction: Cepstral Mean Subtraction: For normalizing for channel effects and adjusting for spectral slope For normalizing for channel effects and adjusting for spectral slope
60
60 RASTA (RelAtive SpecTral Analysis) Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features This is usually a bandpass filter, maintaining the linguistically important spectral envelop modulation (1-16Hz) This is usually a bandpass filter, maintaining the linguistically important spectral envelop modulation (1-16Hz)
61
61
62
62 RASTA-PLP
63
63
64
64
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.