Download presentation
Presentation is loading. Please wait.
1
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26, 2004
2
Outline Speech production and glottal pulse excitation in detail Linear prediction: short-term and Long-term Glottal spectrum estimated with long-term prediction and acoustic features For speaker recognition implementation
3
Speech Production Glottal pulses Vocal tractSpeech signal Discrete time model for speech production A combined transfer function
4
Acoustic Features of Glottal Pulse Time domain –pitch period –pitch period perturbation (jitter) –pulse amplitude perturbation (shimmer) –glottal pulse width –abruptness of closure of the glottal flow –aspiration noise Frequency domain –fundamental frequency (F0) –spectral tilt (slope) –harmonic richness
5
Glottal Pulse and Voice Quality Glottal pulse shape plays an important role on the quality of Natural or synthesized vowels [Rosenberg 1971] –The shape and periodicity of vocal cord excitation are subject to large variation –Such variations are significant for preserving the speech naturalness –A typical glottal pulse: asymmetric with shorter falling phase; spectrum with -12dB/octave decay More variation among different speakers than among different utterance of the same speaker [Mathews 1963] Such variations have little significance for speech intelligibility but affect the perceived vocal quality [Childers 1991]
6
Various Glottal Pulses Some other vocal types breathy falsettovocal fry Temporal and spectral characteristics
7
Some Comments Generally, to study the glottal pulse characteristics, it is necessary to rebuilding the glottal pulse waveform by inverse filtering technique Automatically and exactly rebuilding the glottal waveform from real speech is almost impossible, especially, at the transient phase of articulation, or, for high pitched speakers Fortunately, it is possible to estimate the glottal spectrum from residual signal with pitch prediction
8
Linear Prediction Speech waveform: correlation between current and past samples and thus predictable Short-term correlation: Occurs within one pitch period Formant modulation Classical linear prediction analysis (short-term prediction) Long-term correlation occurs across consecutive pitch periods Vocal cords vibration Long-term/pitch prediction
9
Linear Prediction Short-term predictor –Remove the short-term correlation and result in a glottal excitation signal Long-term predictor –Remove the correlation across consecutive periods
10
Linear Prediction: A example
11
Examples of pitch prediction estimated glottal spectrum
12
Harmonic Structure of Glottal Spectrum Two parameters describing the harmonic structure –Harmonic richness factor and Noise-to-harmonic ratio Harmonic richness factor (HRF) Noise-to-harmonic ratio (NHR)
13
Feature Generation Acoustic features including the following: –Fundamental frequency F 0 –Pitch prediction gain g –Pitch prediction coefficients b -1, b 0, b 1 –HRF n and NHR n 10 Mel scale frequency bank Feature generation process
14
Experiments Conditions Speech quality: telephone speech Subject: 49 male speakers Training condition: –3 training session, about 90s speech totally, over 3~6 weeks –128 GMM Testing condition: –12 testing sessions. Over 4~6 months.
15
Speaker recognition experiments Feature F0F0 g[b -1 b 0 b 1 ]HRFNHR Iden. Rate18%11%14%32%17% Identification results with long-term prediction related features Features Identification error rate (%) F gs : F0_g_HRF_NHR 25 52% LPCC_D_A 36 2.84 LPCC_D_A+F gs 2.26 MFCC_D_A 2.1 MFCC_D_A+F gs 1.9 Comparison of glottal source feature with classical features
16
Summary Glottal source excitation is important for perceptional naturalness of voice quality and is helpful for distinguishing a speaker from the others. Linear prediction is a powerful tool for speech analysis. The spectral property of the supraglottal vocal tract system can be estimated by short-term prediction; While the long-term prediction estimates the spectrum of the glottal excitation system Recognition results show that the glottal source related acoustic features (F 0, prediction gain, HRF, NHR, etc.) provide a certain degree of speaker discriminative power.
17
Other Applications Speech coding Speech recognition ? Speaking emotion recognition !
18
Thank You!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.