Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé.

Slides:



Advertisements
Similar presentations
An Approach in Reproducing the Auto-Tune Effect Mentees: Dong-San Choi & Tejas Rawal Mentor: David Jun.
Advertisements

ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL Preeti Rao and Pushkar Patwardhan Department of Electrical Engineering,
Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Periodograms Bartlett Windows Data Windowing Blackman-Tukey Resources:
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Easily extensible unix software for spectral analysis, display modification, and synthesis of musical sounds James W. Beauchamp School of Music Dept.
A System for Hybridizing Vocal Performance By Kim Hang Lau.
Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner.
Logo Prosodic Manipulation Advanced Signal Processing, SE David Ludwig
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
Analysis and Synthesis of Shouted Speech Tuomo Raitio Jouni Pohjalainen Manu Airaksinen Paavo Alku Antti Suni Martti Vainio.
1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
Forecasting JY Le Boudec 1. Contents 1.What is forecasting ? 2.Linear Regression 3.Avoiding Overfitting 4.Differencing 5.ARMA models 6.Sparse ARMA models.
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 664 Final Presentation May 2009 Dr. Radu Balan Department of Mathematics.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Communications & Multimedia Signal Processing Meeting 6 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 6 July,
Communications & Multimedia Signal Processing Report of Work on Formant Tracking LP Models and Plans on Integration with Harmonic Plus Noise Model Qin.
Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 23 November,
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
Effects in frequency domain Stefania Serafin Music Informatics Fall 2004.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement Qin Yan Communication & Multimedia Signal Processing.
Analysis & Synthesis The Vocoder and its related technology.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.
Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Speech Enhancement Using Spectral Subtraction
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart, October 25, 2001.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Structure of Spoken Language
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
[Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006.
(Extremely) Simplified Model of Speech Production
HMM training strategy for incremental speech synthesis.
Lecture#10 Spectrum Estimation
Ways to generate computer speech Record a human speaking every sentence HAL will ever speak (not likely) Make a mathematical model of the human vocal.
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
Page 0 of 7 Particle filter - IFC Implementation Particle filter – IFC implementation: Accept file (one frame at a time) Initial processing** Compute autocorrelations,
IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
Adv DSP Spring-2015 Lecture#11 Spectrum Estimation Parametric Methods.
High Quality Voice Morphing
Speech Enhancement Summer 2009
Mr. Darko Pekar, Speech Morphing Inc.
Automatic Speech Processing Project
Vocoders.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Vocoders.
Linear Prediction.
Real-time Uncertainty Output for MBES Systems
Presentation transcript:

Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé 1, Maurice Charbit 1, Gérard Chollet 1, Eric Moulines 1 (presented here by Guido Aversano 1,2 ) 2 IIASS, Vietri sul Mare (SA), Italy 1 Département TSI, ENST, Paris, France

Plan of the presentation  Text-to-speech: classic methods  HNM model  Analysis  Synthesis  Analysis-Synthesis examples  Conclusions

Text-To-Speech by concatenation English, male English, female (vocal server example) English, female (another vocal server example) German, male French, female Examples realized on the AT&T web site:

Text-To-Speech by concatenation 2 major challenges :  smooth connection between acoustic units  flexible prosody

TD-PSOLA method Analysis :  Pitch estimation  Pitch-synchronous windowing Synthesis :  Rearrangement of frames

TD-PSOLA method Some very good-quality results: Singing, original Singing, modified  Time-scaling Cello, original Cello, modified  Pitch-shifting

TD-PSOLA method "rain", original "rain", 0.5 rate "ss", original "ss", slowed down (classic method) "ss", slowed down (improved) Artifacts appearing in non-voiced sounds:

Phase Vocoder method Intuitive description: Compression/stretching of (narrow-band) spectrogram’s time-frequency scales… time-scaling pitch-shifting

Phase Vocoder method Examples : "rain", male voice Slow-motion by Vocoder (PSOLA : ) "The quick fox …", female voice Slow-motion by Vocoder Main problem :  phase coherence is lost in the synthesized signal

 TD-PSOLA and Vocoder allow basic prosodic modifications.  The problem of unit concatenation for TTS is not solved.  Other kinds of modifications (timbre, denoising, …) should be considered. We need a parametric model

Harmonic plus Noise Model (HNM) Main assumption :  stationary segments of a speech signal can be always seen as the superposition of a periodic and a noisy part

HNM Model Modelling : S(t)H(t)B(t) =+ where :H(t) =  A k cos ( 2  k f 0 t +  k ) and B(t) = white noise passed through an AR filter

HNM analysis of a frame 1.Pitch estimation  Spectral comb method

HNM analysis of a frame 1.Pitch estimation  Good results are obtained  In some cases the method erroneously returns f0/2  Possibility of tracking… "aka…aga"

HNM analysis of a frame 2.Harmonic part: extraction of amplitudes  Least squares method H(t) =  a k cos ( 2  k f 0 t ) + b k sin ( 2  k f 0 t ) min  s(t) – H(t)  2 a k, b k

HNM analysis of a frame 2.Extraction of amplitudes Problem: the noisy part gives a non-null contribution to the spectral power  Gain correction for the harmonics (using an euristic formula g(DV), where DV is the estimated voicing degree)

HNM analysis of a frame 2.Extraction of amplitudes  Residual:R(t) = s(t) - H(t)

HNM analysis of a frame 2.Extraction of amplitudes  Possibility of improving harmonic estimation

where Bg = gaussian white noise and F(t) = AR filter, F(z) = HNM analysis of a frame 3.AR filter estimation for the residual:  Linear prediction method R(t) = Bg  F(t) a 0 + a 1 z -1 + … + a N z -N 1

HNM Synthesis  Interpolation for each harmonic between two succesive frames H(t) =  a k (t) cos ( 2  k f 0 (t) t ) + b k (t) sin ( 2  k f 0 (t) t ) = =  A k (t) cos  k (t) =  A k (t) cos  k (t)  k (t a ) = 2  k f 0 (t a ) is known by pitch analysis. A k (t a ) and  k (t a ) are known at analysis instants t a

HNM Synthesis Erroneous pitch (usually f0/2)  harmonic correspondence problem is solved introducing fictitious harmonics

HNM Synthesis A k cos  k (t) Linear interpolation Unwrapping + cubic interpolation 

HNM Synthesis Noisy part  Generation of normally distributed random numbers  AR filtering (abrupt changes of coefficients between 2 windows have no incidence…)

HNM Synthesis Results "Carottes" : synthesizedoriginal "Lawyer" : synthesizedoriginal Tuba : synthesizedoriginal "wazi" : synthesizedoriginal a-e-i-o-u : synthesizedoriginal singing : synthesizedoriginal

HNM Synthesis Results Discours : synthesizedoriginal "aka aga" : synthesizedoriginal Dussolier : synthesized original Andie : synthesizedoriginal noisy part "coiffe" : synthesizedoriginal

Synthesis with time-stretching Synthesis instants (t s )  Analysis instants (t a ) The following parameters remain unchanged:  Noisy part parameters  The pitch  The amplitudes A k of the harmonics

Synthesis with time-stretching  Simple phase trajectories resampling or  "harmonic" rephasing Phase adaptation a-e-i-o-u : slow-motion with phase "stretching" original slow-motion with "harmonic" rephasing

Final results Original 1 Synthesized with rate : "carottes" : "lawyer" : tuba : "wazi" : singing : "a-e-i-o-u" : Dussolier : Discours : Andie : "aka aga": "coiffe" :

Conclusions  Good results, showing method’s potential for different applications including TTS  Future work will include other kinds of modifications (pitch shifting, timbre etc.)