Logo Prosodic Manipulation Advanced Signal Processing, SE 3.12.03 David Ludwig

Slides:



Advertisements
Similar presentations
ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL Preeti Rao and Pushkar Patwardhan Department of Electrical Engineering,
Advertisements

Acoustic/Prosodic Features
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Overview of Real-Time Pitch Tracking Approaches Music information retrieval seminar McGill University Francois Thibault.
High Level Prosody features: through the construction of a model for emotional speech Loic Kessous Tel Aviv University Speech, Language and Hearing
Time-scale and pitch modification Algorithms review Alexey Lukin.
A System for Hybridizing Vocal Performance By Kim Hang Lau.
Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.
CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Overview What is in a speech signal?
SPPA 6010 Advanced Speech Science 1 The Source-Filter Theory: The Sound Source.
SOME SIMPLE MANIPULATIONS OF SOUND USING DIGITAL SIGNAL PROCESSING Richard M. Stern demo August 31, 2004 Department of Electrical and Computer.
1st and 2nd Generation Synthesis
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
On improving the intelligibility of synchronized over-lap-and-add (SOLA) at low TSM factor Wong, P.H.W.; Au, O.C.; Wong, J.W.C.; Lau, W.H.B. TENCON '97.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
Effects in frequency domain Stefania Serafin Music Informatics Fall 2004.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement Qin Yan Communication & Multimedia Signal Processing.
Analysis & Synthesis The Vocoder and its related technology.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
A Text-to-Speech Synthesis System
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
Patrick-André Savard, Philippe Gournay and Roch Lefebvre Université de Sherbrooke, Québec, Canada.
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
The role of prosody in dialect synthesis and authentication Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento.
Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University Korea.
Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart, October 25, 2001.
Structure of Spoken Language
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé.
Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling.
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
(Extremely) Simplified Model of Speech Production
Imposing native speakers’ prosody on non-native speakers’ utterances: Preliminary studies Kyuchul Yoon Spring 2006 NAELL The Division of English Kyungnam.
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.
The role of prosody in dialect authentication Simulating Masan dialect with Seoul speech segments Kyuchul Yoon Division of English, Kyungnam University.
Dialect Simulation through Prosody Transfer: A preliminary study on simulating Masan dialect with Seoul dialect Kyuchul Yoon Division of English, Kyungnam.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH GEORGE P. KAFENTZIS, YANNIS STYLIANOU MULTIMEDIA INFORMATICS LABORATORY DEPARTMENT OF COMPUTER SCIENCE.
영어교육에 있어서의 영어억양의 역할 (The role of prosody in English education) Korea Nazarene University Kyuchul Yoon English Division Kyungnam University.
Vocoders.
Automated Detection of Speech Landmarks Using
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Speech and Language Processing
Vocoders.
A System for Hybridizing Vocal Performance
Presentation transcript:

Logo Prosodic Manipulation Advanced Signal Processing, SE David Ludwig

Logo 2 Contents  Introduction  SOLA, PSOLA  LP-PSOLA  RELP  Sinusoidal/harmonic-plus-noise modeling  MBROLA  Application

Logo Introduction

Logo 4 Definition  prosody (noun) 1. the study of poetic meter and the art of versification 2. the patterns of stress and intonation in a language Synonyms: inflection 3. a system of versification Synonyms: poetic rhythm, rhythmic patterninflectionpoetic rhythmrhythmic pattern  pitch, duration, amplitude (gestures)  Function:  Stress, non – lexical information, discourse, emotion

Logo 5 Pitch ´n Time (Robot: His name is R1D1...)  Playful_Time:  Random number between 10 and 400 milliseconds and use that for the phone duration.  Serious_Time:  same duration value to each phone.  Playful_Pitch:  random melody for the sentence  Serious_Pitch:  same pitch values, monotone

Logo SOLA, PSOLA (pitch-)synchronous overlap-and-add

Logo 7 SOLA  Time-segment processing  Segmentation of x[n] into overlapping frames  Shifting according to scaling- factor   Repositioning, Overlap/Add  Cross-Correlation in the overlap interval  Maximum of CC  Fade in / fade out  Flexible time lag

Logo 8 PSOLA  Variation especially for voice processing  Splitting signal ino overlapping windows  Synchronized with fundamental frequency  Avoids pitch discontinuities  Neccesitates preliminary pitch marking  Analysis:  Pitch Period P(t) at pitch mark t i  Segment extraction by windowing with pitch mark as its center

Logo 9

10 Synthesis  Time-Scaling:  ST-signals must be added (or suppressed) without altering the distance among adjacent pitch periods  Pitch-Shifting:  synthesis time axis will have the same duration, but it will be necessary to scale the local pitch period  ST-Signals might be discarded (compression/lower pitch)  ST-Signals might be used twice (stretching/higher pitch)  Artefacts:  Transient smearing, audible slices, Distortion due to phase errors

Logo 11

Logo 12

Logo LP-PSOLA, RELP

Logo 14 LP-PSOLA  LP-Residual or Error Function e(t) is used  spectrally flat  Separating excitation and vocal tract  Little correlation within each pitch period  TD-PSOLA algorithm is applied to the residual part  Advantages:  Control of spectral structure  No additional computation time

Logo 15 RELP  Residual Excited LPC  Vocoding technique for speech transmission (e.g. mobile phones)  Residual Signal is compressed  Low-Pass Filtering  Downsampling  Re-Quantisation

Logo 16

Logo 17 Source-Filter Models  Source=oscillation of vocal chords  Voiced (Dirac-Impulses)  Unvoiced(Noise)  Filter=TF of vocal tract  LP  Approximation of spectral envelope  Problem: Estimation of filter coefficients

Logo Sinusoidal/Harmonic + Residual Model (HNM)

Logo 19 Analysis/Synthesis  Signal is decomposed in harmonic+noise part:  Number of harmonics, fundamental frequency, time- variant amplitudes (harmonic model)  Peak detection/continuation, pitch detection, Subtraction  Residual~ time-pulsed,filtered noise  Synthesis: Additive/Subtractive Synthesis

Logo 20

Logo 21 Features  Voiced/unvoiced decision  Crucial:  Pitch estimation  Peak continuation  McAulay + Quatieri Algorithm  Phase Relationships

Logo 22 MBROLA

Logo 23 MBROLA  Multiband Resynthesis OLA  Faculté polytechnique de Mons (Belgium)  Open source synthesizer  As many voices, dialects and languages as possible  Actually 27 languages !!  Diphone concatenation  Time-domain approach (MBR-PSOLA)  Smoothing of spectral discontinuities in the time domain enhances fludity

Logo 24 Examples  German  U.K. English  U.S. English  Japanese

Logo 25 Manipulation  Manipulation in frequency domain  Pitch-Shifting  Direct access to sinusoidal components  frequency shifting with/without formant preservation  Time-Scaling  No change of Input/Output hopsize  Superior to phase vocoder  Computationally expensive

Logo Application

Logo 27 Application  Mean value  Macro-Prosody D F0  Micro-Prosody M F0  Pitch Modification by:

Logo 28

Logo 29 References  PSOLA:  U.Zoelzer: DAFX Wiley, John & Sons, Incorporated  E. Moulines and F. Charpentier: Pitch-Synchronous Waveform Processing Techniques for Text-to-Speech Synthesis using Diphones, Speech Comminucation, vol 9, pp ,  HNM:  J. Laroche, Y. Stylianou, and E. Moulines: HNS: Speech Modification Based on a Harmonic+Noise Model, Proc. of ICASSP 1993, vol.2, pp  MBROLA:  tcts.fpms.ac.be/synthesis/mbrola.html

Logo 30 THANK YOU