Download presentation
Presentation is loading. Please wait.
Published byBuddy York Modified over 9 years ago
1
Perceptual Linear Predictive Analysis of Speech Hynek Hermansky, Speech Technology Laboratory, J. Acoustical Society of America, April 1990 報告 : 張志豪
2
2 Outline Linear Prediction Coding Mel-scale Frequency Cepstral Coefficients Perceptual Linear Predictive
3
3 Introduction Feature Extraction –Speech Production Model Linear Prediction Coding –Speech Perception Model Mel-scale Frequency Cepstral Coefficients
4
4 Linear Prediction Coding Property –Approximates the areas of high-energy concentration while smoothing out the fine harmonic structure and other less-relevant spectral details. –The approximated high-energy spectral areas often correspond to the resonance frequencies of the vocal tract (formants).
5
5 Linear Prediction Coding Autocorrelation –Levinson-Durbin Recursion Impulse Response Time domain Frequency domain SpeechLPCSpeech and LPC
6
6 Linear Prediction Coding Disadvantage –LPC approximates speech equally well at all frequencies of the analysis band. This property is inconsistent with human hearing. Beyond about 800Hz, the spectral resolution of hearing decreases with frequency. –The amplitude levels typically encountered in conversational speech, hearing is more sensitive in the middle frequency range of the audible spectrum. –The spectral details of speech are not always preserved or discarded by LPC analysis according to their auditory prominence.
7
7 Mel-scale Frequency Cepstral Coefficients Mel-scale – 在低頻部分, 人耳感受是比較敏銳 – 在高頻部分, 人耳的感受就會越來越粗糙 – 人耳對於頻率的感受事呈對數變化的
8
8 Mel-scale Frequency Cepstral Coefficients
9
9 Discrete cosine transform – 由 frequency domain 轉回 time domain –frequency of frequency
10
10 MFCC & LPC Mel-scale Frequency Cepstral Coefficients –Advantage 強調語音頻譜上的特性, 即使在有雜訊干擾的環境下, 仍能維持較佳的 辨識率 –Disadvantage 運算量較大 Linear Prediction Coding –Advantage 運算量小 –Disadvantage 未考慮語音頻譜上的特性, 辨識率隨著雜訊增加而下降
11
11 Perceptual Linear Predictive MFCC LPC
12
12 Perceptual Linear Predictive Equal-Loudness Preemphasis
13
13 Perceptual Linear Predictive Equal-Loudness Preemphasis (count.) – 與預強的效果相同 ? Frequency domain
14
14 Perceptual Linear Predictive Intensity-Loudness Power Law – Frequency domain
15
15 Perceptual Linear Predictive Intensity-Loudness Power Law (count.) –Power spectrum 不需要再開平方 ek = (float)sqrt((double)(t1*t1 + t2*t2)); –Filter bank 後的值不需要取 log bins[bin] = log((double)t1);
16
16 Perceptual Linear Predictive Inverse Discrete Fourier Transform – 由 frequency domain 轉回 time domain Frequency domainTime domain
17
17 Perceptual Linear Predictive Autoregressive Modeling (LPC) Time domain
18
18 Experiment 369 MFCC54.2155.1155.37 PLP_0539.0139.3239.88 PLP_1052.5553.02 PLP_1253.7954.4954.94 PLP_1453.6253.9454.27 *PLP_1231.6532.0832.03
19
19 Thanks
20
20 Thanks
21
21 Choice Of The Order Of The Autoregressive PLP Model Introduction Spectral distortion measure of PLP Single-frame phoneme identification Isolated-word identification
22
22 Choice Of The Order Of The Autoregressive PLP Model Introduction –With increasing model order the spectrum of the all-pole model asymptotically approaches the auditory spectrum.
23
23 Choice Of The Order Of The Autoregressive PLP Model Spectral Distortion Measure of PLP –group-delay distortion measure The spectral peaks of the model are enhanced and its spectral slope is suppressed. The group-delay metric is more sensitive to distance between narrow peaks. The group-delay measure is more sensitive to the actual value of the spectral peak width. –Exponential measure Allows for various degrees of peak enhancement.
24
24 Single-Frame Phoneme Identification –As is evident, the PLP identification accuracy increases up to about the 5th order of the autoregressive model and then starts decreasing with further increases in the model order. Choice Of The Order Of The Autoregressive PLP Model
25
25 Choice Of The Order Of The Autoregressive PLP Model Isolated-Word Identification
26
26 Choice Of The Order Of The Autoregressive PLP Model Discussion –The advantage of the PLP over the LP is that it allows for the effective suppression of the speaker-dependent information by choosing the particular model order. –The linguistically relevant speaker-independent cues lie in the gross shape of the auditory spectrum. This gross shape can be characterized by the one or two spectral peaks of the 5 th -order PLP model.
27
27 PLP and Human Hearing Introduction Formant Frequency Changes Sensitivity to Bandwidth Changes Sensitivity to Spectral Tilt Sensitivity to F0 Discussion
28
28 PLP and Human Hearing Introduction –The first three formant frequencies is approximately constant in relative frequency. The LP analysis is in conflict with it.
29
29 PLP and Human Hearing Formant Frequency Changes
30
30 PLP and Vowel Perception Introduction The effective second formant Spectral peak integration theory The significance of the bandwidth B2 Discussion
31
31
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.