6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December 1 2002.

Slides:



Advertisements
Similar presentations
Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.
Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.
Improvement of Audio Capture in Handheld Devices through Digital Filtering Problem Microphones in handheld devices are of low quality to reduce cost. This.
A System for Hybridizing Vocal Performance By Kim Hang Lau.
Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Speech Processing. References L.R. Rabiner and R.W. Schafer. Digital Processing of Speech Signals. Prentice-Hall, Lawrence Rabiner and Biing-Hwang.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
Speech Group INRIA Lorraine
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Complete Discrete Time Model Complete model covers periodic, noise and impulsive inputs. For periodic input 1) R(z): Radiation impedance. It has been shown.
1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.
Unit 9 IIR Filter Design 1. Introduction The ideal filter Constant gain of at least unity in the pass band Constant gain of zero in the stop band The.
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Yi Liang July 12, 2000 Adaptive Playout Time Control with Time-scale Packet Modification.
Voice Transformation Project by: Asaf Rubin Michael Katz Under the guidance of: Dr. Izhar Levner.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
A PRESENTATION BY SHAMALEE DESHPANDE
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Computer Sound Synthesis 2 MUS_TECH 335 Selected Topics.
Source/Filter Theory and Vowels February 4, 2010.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Prepared by: Waleed Mohamed Azmy Under Supervision:
Acoustic Analysis of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart, October 25, 2001.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
1 Reconstructing head models from photograph for individualized 3D-audio processing Matteo Dellepiane, Nico Pietroni, Nicolas Tsingos, Manuel Asselot,
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Midterm Presentation Performed by: Ron Amit Supervisor: Tanya Chernyakova Semester: Spring Sub-Nyquist Sampling in Ultrasound Imaging.
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
High Quality Voice Morphing
Automatic Speech Processing Project
ARTIFICIAL NEURAL NETWORKS
Vocoders.
Linear Prediction.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Kocaeli University Introduction to Engineering Applications
Linear Predictive Coding Methods
Lect5 A framework for digital filter design
Digital Systems: Hardware Organization and Design
Linear Prediction.
Auditory Morphing Weyni Clacken
Presentation transcript:

6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December

6/3/20152 Project Goals Gradually change a source speaker ’ s voice, to sound like the voice of a target speaker. The inputs : two reference voice signals, one for each speaker. The output : N voice signals that gradually change from source to target. sourcetargetinterp

6/3/20153 Applications ● Multimedia and video entertainment: voice Morphing, just like it’s “face” counterpart. While seeing a face gradually changing from one person to another’s (like often done in video clips),we could simultaneously hear his voice changing as well. ● Forensic voice identification by synthesis: Identifying a suspect voice by creating voice- bank of different pitch, rate and timbre. Similar method was developed for face recognition to replace police sketch artists.

6/3/20154 The Challenges The source and target voice signals will never be of the same length. A time varying method is needed. It is needed to change the source voice’s characteristics to those of the target speaker : pitch,duration,spectral parameters. Produce a natural sounding speech signal. sourcetarget

6/3/20155 The Challenges cont. Here are two identical words (“shade”) form source and target The target speaker’s word lasts longer then the source speaker’s and it’s “shape” is quite different.

6/3/20156 Speech Modeling A1A2A3Ak Sound transmission in the vocal tract can be modeled as sound passing through concatenated loss less acoustic tubes. A known mathematical relation between the areas of these tubes and the vocal tract’s filter, will help in the implementation of our algorithm

6/3/20157 Speech Modeling - Synthesis The basic synthesis of digital speech is done by the discrete- time system model.

6/3/20158 sample align interpolate Prototype Waveform Interpolation PWI is a speech coding method, which is based on the presentation of a speech signal, or its residual error function, by a 3-D surface. The creation of such a surface is described by 3 main steps.

6/3/20159 Surface Construction Algorithm

6/3/ The Solution Use 3D surfaces that will capture each vocal phoneme’s residual error signal characteristics and interpolate between the two speakers. Unvoiced phonemes are not dealt with due to complexity and the fact that they carry little information about the speaker.

6/3/ Algorithm – Block Diagram

6/3/ The Algorithm + Intermediate surface Speaker A Speaker B The new error function, that will be reconstructed from that surface, will then be the input of a new Vocal Tract Filter. Once the surfaces for both source and target speakers are created (for each phoneme) an interpolated surface is created.

6/3/ After creating a surface for each speaker’s phoneme, an intermediate surface is created. The Waveform – Intermediate

6/3/ The Waveform - Reconstruction The new, intermediate error signal can be evaluated from the new surface by the equation : And that : Assuming the pitch cycle changes slowly in time :

6/3/ Algorithm – Cont. The areas of the new tube model will be an interpolation between the source ’ s and the target ’ s. Once the new Areas are computed the LPC parameters and V(z) can be calculated, and the signal can be synthesized.

6/3/ New Voiced Phoneme Creation Block Diagram

6/3/ The transfer function The factor could be invariant, and then the voice produced will be an intermediate between the two speakers, or it could vary in time, (from at t=0 to at t=T), yielding a gradual change from one voice to the other. In order for one to hear a “linear” change between the source’s and the target’s voices, the coefficient, (the relative part of ) has to vary in time nonlinearly with a fast transition to a = 0.5 and a slow transitions around it.

6/3/ The Algorithm – cont. The final and new speech signal is created by concatenating the new vocal phonemes, in order, along with the source’s/target’s unvoiced phonemes and silent periods. 1 silent 2 voiced 3 unvoiced 4 voiced

6/3/ Conclusion The utterances produced by the algorithm were shown to consist of intermediate features of the two speakers. Both intermediate sounds: And gradually changing sounds were produced.

6/3/ Algorithm’s limitations Although most of the speaker’s individuality is concentrated in the voiced speech segments, degradation could be noticed, when interpolating between two speech signals that differ greatly in their unvoiced speech segments, such as heavy breathing, long periods of silence, etc.

6/3/ What has been done in the second part of this project? Basic implementation of reconstruction algorithm. Interpolation between two surfaces. Final implementation of the algorithm. Fixes in the algorithm such as: –Maximization of the crosscoralation between surfaces. –Modifications to the morphing factor.

6/3/ Future work proposals The effect of the unvoiced/silent segments of speech on voice individuality. The effect of the Vocal Tract’s formants shift around the unit cycle on the human’s ear, in order to find a robust mapping of the transform function alpha.

6/3/ Appendix - Equations

6/3/ Appendix – transfer function.