Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé.

Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé 1, Maurice Charbit 1, Gérard Chollet 1, Eric Moulines 1 (presented here by Guido Aversano 1,2 ) 2 IIASS, Vietri sul Mare (SA), Italy 1 Département TSI, ENST, Paris, France

Plan of the presentation  Text-to-speech: classic methods  HNM model  Analysis  Synthesis  Analysis-Synthesis examples  Conclusions

Text-To-Speech by concatenation English, male English, female (vocal server example) English, female (another vocal server example) German, male French, female Examples realized on the AT&T web site:

Text-To-Speech by concatenation 2 major challenges :  smooth connection between acoustic units  flexible prosody

TD-PSOLA method Analysis :  Pitch estimation  Pitch-synchronous windowing Synthesis :  Rearrangement of frames

TD-PSOLA method Some very good-quality results: Singing, original Singing, modified  Time-scaling Cello, original Cello, modified  Pitch-shifting

TD-PSOLA method "rain", original "rain", 0.5 rate "ss", original "ss", slowed down (classic method) "ss", slowed down (improved) Artifacts appearing in non-voiced sounds:

Phase Vocoder method Intuitive description: Compression/stretching of (narrow-band) spectrogram’s time-frequency scales… time-scaling pitch-shifting

Phase Vocoder method Examples : "rain", male voice Slow-motion by Vocoder (PSOLA : ) "The quick fox …", female voice Slow-motion by Vocoder Main problem :  phase coherence is lost in the synthesized signal

 TD-PSOLA and Vocoder allow basic prosodic modifications.  The problem of unit concatenation for TTS is not solved.  Other kinds of modifications (timbre, denoising, …) should be considered. We need a parametric model

Harmonic plus Noise Model (HNM) Main assumption :  stationary segments of a speech signal can be always seen as the superposition of a periodic and a noisy part

HNM Model Modelling : S(t)H(t)B(t) =+ where :H(t) =  A k cos ( 2  k f 0 t +  k ) and B(t) = white noise passed through an AR filter

HNM analysis of a frame 1.Pitch estimation  Spectral comb method

HNM analysis of a frame 1.Pitch estimation  Good results are obtained  In some cases the method erroneously returns f0/2  Possibility of tracking… "aka…aga"

HNM analysis of a frame 2.Harmonic part: extraction of amplitudes  Least squares method H(t) =  a k cos ( 2  k f 0 t ) + b k sin ( 2  k f 0 t ) min  s(t) – H(t)  2 a k, b k

HNM analysis of a frame 2.Extraction of amplitudes Problem: the noisy part gives a non-null contribution to the spectral power  Gain correction for the harmonics (using an euristic formula g(DV), where DV is the estimated voicing degree)

HNM analysis of a frame 2.Extraction of amplitudes  Residual:R(t) = s(t) - H(t)

HNM analysis of a frame 2.Extraction of amplitudes  Possibility of improving harmonic estimation

where Bg = gaussian white noise and F(t) = AR filter, F(z) = HNM analysis of a frame 3.AR filter estimation for the residual:  Linear prediction method R(t) = Bg  F(t) a 0 + a 1 z -1 + … + a N z -N 1

HNM Synthesis  Interpolation for each harmonic between two succesive frames H(t) =  a k (t) cos ( 2  k f 0 (t) t ) + b k (t) sin ( 2  k f 0 (t) t ) = =  A k (t) cos  k (t) =  A k (t) cos  k (t)  k (t a ) = 2  k f 0 (t a ) is known by pitch analysis. A k (t a ) and  k (t a ) are known at analysis instants t a

HNM Synthesis Erroneous pitch (usually f0/2)  harmonic correspondence problem is solved introducing fictitious harmonics

HNM Synthesis A k cos  k (t) Linear interpolation Unwrapping + cubic interpolation 

HNM Synthesis Noisy part  Generation of normally distributed random numbers  AR filtering (abrupt changes of coefficients between 2 windows have no incidence…)

HNM Synthesis Results "Carottes" : synthesizedoriginal "Lawyer" : synthesizedoriginal Tuba : synthesizedoriginal "wazi" : synthesizedoriginal a-e-i-o-u : synthesizedoriginal singing : synthesizedoriginal

HNM Synthesis Results Discours : synthesizedoriginal "aka aga" : synthesizedoriginal Dussolier : synthesized original Andie : synthesizedoriginal noisy part "coiffe" : synthesizedoriginal

Synthesis with time-stretching Synthesis instants (t s )  Analysis instants (t a ) The following parameters remain unchanged:  Noisy part parameters  The pitch  The amplitudes A k of the harmonics

Synthesis with time-stretching  Simple phase trajectories resampling or  "harmonic" rephasing Phase adaptation a-e-i-o-u : slow-motion with phase "stretching" original slow-motion with "harmonic" rephasing

Final results Original 1 Synthesized with rate : 0.40.50.60.70.81.21.52 "carottes" : "lawyer" : tuba : "wazi" : singing : "a-e-i-o-u" : Dussolier : Discours : Andie : "aka aga": "coiffe" :

Conclusions  Good results, showing method’s potential for different applications including TTS  Future work will include other kinds of modifications (pitch shifting, timbre etc.)

Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé.

Similar presentations

Presentation on theme: "Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé.

Similar presentations

Presentation on theme: "Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé."— Presentation transcript:

Similar presentations

About project

Feedback