Assessment of Vocal Noise via Bi-directional Long-term Linear Prediction of Running Speech F. Bettens *, F. Grenez *, J. Schoentgen *,** * Université Libre de Bruxelles ** National Fund for Scientific Research Belgium
CauseVocal Dysperiodicities Vocal Fold Dynamics Diplophonia, Bi-Phonation, Random Vibrations Perturbations Vocal Jitter & Shimmer, Frequency & Amplitude Tremor (Audible) Additive Noise Owing to Turbulence Breathiness, Breathy Voice, Whispery Voice, … “Parasitic” Vibrations Vibrations Ventricular Folds or Ary-Epiglottic Ligaments, … Transients Pitch Breaks, Phonation Breaks, Timbre Breaks, …
Existing Cues of Vocal Noise Detection of individual vocal cycles (or harmonics) Steady vowel fragments (Pseudo)-Periodicity Period Perturbation Quotient Amplitude Perturbation Quotient Harmonics-to-Noise Ratio
Objectives : Analyses of Dysperiodicities Give up request that speech fragments are : (Pseudo)-Periodic Steady Any Speech Fragment : Modal Voices & (Very) Hoarse Voices Sustained Vowels & Running Speech
Motivation : Analysis of Running Speech Voicing in running speech Variable acoustic impedance Voicing onsets & offsets Variable pressure drops Variable laryngeal positions Voice Loading
Double Linear Predictive Analysis Conventional short-term linear prediction: Long-term linear prediction: remove existing correlations unpredictable noise component (Qi, 1999) forward short-term prediction error forward double prediction error
Double Linear Predictive Analysis Drawbacks: –e S [n] is an artificial signal –the dysperiodicities in weighted sum x [n] are omitted –e L [n] is inflated to the right of unvoiced/voiced boundaries Solutions: remove short-term linear predictive analysis stage proceed to bi-directional analysis
Forward long-term linear prediction: Backward long-term linear prediction: Bi-directional long-term linear prediction: keep the “best” (frame by frame) Bi-directional Long-term Prediction forward long-term prediction error backward long-term prediction error bi-directional long- term prediction error
Long-term Prediction Distance : P Maximum of the auto-correlation function example: steady vowel [a] (dysphonic speaker) P = 184 (2 cycles)
Vocal Noise Cue Signal-to-Dysperiodicity Ratio: SDR = 31,2 dB speech signal dysphonic speaker bi-directional long-term prediction error SDR = 10,1 dB healthy speaker x[n]x[n] eL[n]eL[n] example: steady vowel [a]
Results 1 : Sentence (1 female speaker; modal phonation type) ( : “Il est sorti avant le jour”) speech signal forward long-term prediction error bi-directional long-term prediction error segments [il]
Results 2 : Sentence (1 female speaker; 5 phonation types) ( : “Il est sorti avant le jour”)
Conclusion The forward & backward long-term prediction of speech enables the analysis of any speech signal with a view to the assessment of the vocal noise (i.e. vocal dysperiodicities) The analysis is not based on any assumptions regarding the periodicity or stationarity of the speech signals