Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner.

Slides:



Advertisements
Similar presentations
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Advertisements

EE513 Audio Signals and Systems Digital Signal Processing (Synthesis) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
A System for Hybridizing Vocal Performance By Kim Hang Lau.
Logo Prosodic Manipulation Advanced Signal Processing, SE David Ludwig
IBM Labs in Haifa © 2007 IBM Corporation SSW-6, Bonn, August 23th, 2007 Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
Classical Analog Synthesis. Analog Synthesis Overview Sound is created by controlling electrical current within synthesizer, and amplifying result. Basic.
Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Project by Fridman Eduard Supervisor and Escort Dr. Yizhar Lavner SIPL Lab experiment onTime-Scale and Pitch- Scale Modifications of Speech.
Effectiveness of spatial cues, prosody, and talker characteristics in selective attention C.J. Darwin & R.W. Hukin.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Analysis and Synthesis of Shouted Speech Tuomo Raitio Jouni Pohjalainen Manu Airaksinen Paavo Alku Antti Suni Martti Vainio.
1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
Communications & Multimedia Signal Processing Report of Work on Formant Tracking LP Models and Plans on Integration with Harmonic Plus Noise Model Qin.
Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 23 November,
On improving the intelligibility of synchronized over-lap-and-add (SOLA) at low TSM factor Wong, P.H.W.; Au, O.C.; Wong, J.W.C.; Lau, W.H.B. TENCON '97.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement Qin Yan Communication & Multimedia Signal Processing.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
LE 460 L Acoustics and Experimental Phonetics L-13
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.
Synthesis advanced techniques. Other modules Synthesis would be fairly dull if we were limited to mixing together and filtering a few standard waveforms.
1 ELEN 6820 Speech and Audio Processing Prof. D. Ellis Columbia University Midterm Presentation High Quality Music Metacompression Using Repeated- Segment.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Suprasegmentals Segmental Segmental refers to phonemes and allophones and their attributes refers to phonemes and allophones and their attributes Supra-
Prepared by: Waleed Mohamed Azmy Under Supervision:
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart, October 25, 2001.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Subtractive Sound Synthesis. Subtractive Synthesis Involves subtracting frequency components from a complex tone to produce a desired sound Why is it.
LPC-analysis-VOSIM-resynthesis Combined class December 18 th 2012 Johan & Peter Institute of Sonology Royal Conservatory, The Hague.
Takeshi SAITOU 1, Masataka GOTO 1, Masashi UNOKI 2 and Masato AKAGI 2 1 National Institute of Advanced Industrial Science and Technology (AIST) 2 Japan.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé.
revision Transfer function. Frequency Response
Lecture#10 Spectrum Estimation
CHAPTER 4 COMPLEX STIMULI. Types of Sounds So far we’ve talked a lot about sine waves =periodic =energy at one frequency But, not all sounds are like.
Background 2 Outline 3 Scopus publications 4 Goal and a signal model 5Harmonic signal parameters estimation.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Topic: Pitch Extraction
Types of Synthesizers and How They Work
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
UNIT-IV. Introduction Speech signal is generated from a system. Generation is via excitation of system. Speech travels through various media. Nature of.
Synthesizing a Clarinet Nicole Bennett. Overview  Frequency modulation  Using FM to model instrument signals  Generating envelopes  Producing a clarinet.
High Quality Voice Morphing
August 15, 2008, presented by Rio Akasaka
Automatic Speech Processing Project
Vocoders.
CS 591 S1 – Computational Audio -- Spring, 2017
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
لجنة الهندسة الكهربائية
Speech Perception CS4706.
ON THE ARCHITECTURE OF THE CDMA2000® VARIABLE-RATE MULTIMODE WIDEBAND (VMR-WB) SPEECH CODING STANDARD Milan Jelinek†, Redwan Salami‡, Sassan Ahmadi*, Bruno.
Linear Predictive Coding Methods
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Vocoders.
Pitch Estimation By Chih-Ti Shih 12/11/2006 Chih-Ti Shih.
7.1 Introduction to Fourier Transforms
Lecture 6: FM Modulation 1st semester By: Elham Sunbu.
A System for Hybridizing Vocal Performance
Presentation transcript:

Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner

Prosody: the "non-textual" aspects of the speech signal ”Segmental" aspects: timing, duration, rhythm, stress, and metrical structure. The duration of each individual "segment" is under the control of the speaker to varying degrees, and varies with stress and rate. The relative strength of an individual syllable, word, or phrase may be realized in a number of ways, including lengthening (or shortening and cliticization), changes in pitch, and amplitude, and spectral character.

Project goals Prosody modification with TDPSOLA algorithm Prosody modification with HNM model Conversion of male voice to female voice & vice versa

Four steps in prosody modification Time-scale modification Pitch-scale modification Energy envelope modification Modification of distribution of utterancers

TDPSOLA Approach (*) Based on Overlapp-and-Add idea (*) Synchronization with original pitch by: 1) Setting up pitch marks in analysis signal 2) Setting up new pitch marks in synthesis signal according to time-scale and pitch-scale factors (0.6 for pitch 1.3 for time) (*) Building synthesis signal using OLA

Let us define time instants in analysis signal t a (s) as original pitch marks and pitch contour as P(t) The stream of synthesis pitch-marks t s (u) is determined from t a (s) according to desired time-scale modification (t  D(t)) and pitch- scale modification F p (P) by: t s (u+1)- t s (u) =  P`(t) dt t s `(u+1)-t s `(u) t s `(u+1) t s `(u) 1 with t s (u+1) = D(t s `(u+1)) Setting up new pitch marks P`(t) = F p (P(t))

Problem of TDPSOLA: Impossible to change pitch contour because algorithm is based on original pitch marks Problem: too many pitch marks are not counted in, resulting bad sound quality original pitch-marks new pitch-marks

HNM Approach Speech signal is modeled as harmonics of pitch plus noise Harmonics and noise are treated in different ways Synthesis and analysis are performed in pitch synchronous way

Let X(n) be the speech segment. According to HNM model can be found and written as: where the complex constants h k and z k are defined as: h k - complex amplitude of harmonic K f k - frequency of harmonic K T - sampling period W(n) - noise To minimize error

Amplitudes and phases of pitch-harmonics computed with Prony algorithm by minimizing least square error between harmonics and original signal yielding: Harmonic K is set to be K*F0 where F0 is pitch that found by PDA In each voiced speech fragment maximum voiced frequency Fm is calculated and noise part obtained by filtering signal with HP filter with cutoff frequency Fm In unvoiced fragments signal’s specturm is modeled by pth-order all-pole filter H(z). The noise is synthesized by filtering a unit variance gaussian noise through H(z) When pitch scaling is done there is a need to re-compute amplitudes and phases of modified pitch-harmonics. For this purpose a frequency-continuous spectral and phase envelope is necessary.

Comparing between TDPSOLA & HNM

The only target in pitch-scaling was to change F0 preserving other formants There was an attempt to change spectral envelope in order to change male voice to female voice and vice versa New algorithm was proposed