DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING.

Slides:



Advertisements
Similar presentations
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Advertisements

Applications in Signal and Image Processing
Toward a high-quality singing synthesizer with vocal texture control Hui-Ling Lu Center for Computer Research in Music and Acoustics (CCRMA) Stanford University,
The Human Voice. I. Speech production 1. The vocal organs
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
G.S.MOZE COLLEGE OF ENGINNERING BALEWADI,PUNE -45.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.
Hossein Sameti Department of Computer Engineering Sharif University of Technology.
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Time and Frequency Representations Accompanying presentation Kenan Gençol presented in the course Signal Transformations instructed by Prof.Dr. Ömer Nezih.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Oytun Turk and Levent M.Arslan Subband Based Voice Conversion SESTEK Inc., R&D Dept. Istanbul, Turkey Bogazici University, Electrical-Electronics Eng.
Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.
1 Using A Multiscale Approach to Characterize Workload Dynamics Characterize Workload Dynamics Tao Li June 4, 2005 Dept. of Electrical.
Introduction to Wavelets
Introduction to Wavelets -part 2
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Biometrics: Voice Recognition
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Acoustic Analysis of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Jacob Zurasky ECE5526 – Spring 2011
Wavelet-based Coding And its application in JPEG2000 Monia Ghobadi CSC561 final project
Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart, October 25, 2001.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Basics Course Outline, Discussion about the course material, reference books, papers, assignments, course projects, software packages, etc.
1 Speech Synthesis User friendly machine must have complete voice communication abilities Voice communication involves Speech synthesis Speech recognition.
Linear Predictive Analysis 主講人:虞台文. Contents Introduction Basic Principles of Linear Predictive Analysis The Autocorrelation Method The Covariance Method.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
CCN COMPLEX COMPUTING NETWORKS1 This research has been supported in part by European Commission FP6 IYTE-Wireless Project (Contract No: )
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
LIN 3201 Sounds of Human Language Sayers -- Week 1 – August 29 & 31.
Performance Comparison of Speaker and Emotion Recognition
PHONETIC 1 MGSTER. RAMON GUERRA by: Mgster. Ramon Guerra.
Anatomy and Physiology of the Speech Mechanism. Major Biological Systems Respiratory System Laryngeal System Supralaryngeal System.
The Wavelet Tutorial: Part2 Dr. Charturong Tantibundhit.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
By Dr. Rajeev Srivastava CSE, IIT(BHU)
Speech Generation and Perception
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
Unit Two The Organs of speech
AN ANALOG INTEGRATED- CIRCUIT VOCAL TRACT PRESENTED BY: NIEL V JOSEPH S7 AEI ROLL NO-46 GUIDED BY: MR.SANTHOSHKUMAR.S ASST.PROFESSOR E&C DEPARTMENT.
Whip Around  What 3 adjectives best describe you?  Think about this question and be prepared to share aloud with the class.
ARTICULATORY PHONETICS
High Quality Voice Morphing
The Human Voice. 1. The vocal organs
Automatic Speech Processing Project
ARTIFICIAL NEURAL NETWORKS
Vocoders.
Chapter 3: The Speech Process
The Human Voice. 1. The vocal organs
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Voice conversion using Artificial Neural Networks
Speech Generation and Perception
Speech Organs The process of producing speech
Linear Predictive Coding Methods
The Voice The written word can be erased - not so with the spoken word. Author Unknown.
Speech Generation and Perception
Speech Processing Final Project
Presented by Chen-Wei Liu
Auditory Morphing Weyni Clacken
Speech Systems/Organs Extended
Presentation transcript:

DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING

 WHAT IS VOICE MORPHING ?  APPROACHS TO THE PROBLEM.  SPEECH PRODUCTION.  CONVERSION OF VOICE.  TYPES OF VOICE MORPHING.  REFRANCES OR METHODS.  APPLICATION OF VOICE MORPHING.  AVAILABLE SOFTWARE FOR VOICE MORPHING.  SUMMARY.  CONCLUSION.

 Voice Morphing which is also referred to as voice transformation and voice conversion is a technique to modify a source speaker's speech utterance to sound as if it was spoken by a target speaker.  There are many applications which may benefit from this sort of technology. For example, a TTS system with voice morphing technology integrated can produce many different voices. In cases where the speaker identity plays a key role, such as dubbing movies and TV-shows, the availability of high quality voice morphing technology will be very valuable allowing the appropriate voice to be generated (maybe in different languages) without the original actors being present.

 Voice conversion will be performed in two phases.  In the first phase, the training, the speech signals of the source and target speakers will be analyzed and the voice characteristics will be extracted by means of a mathematical optimization technique, very popular in the speech processing world, the Linear Prediction Coding (LPC) technique.

 In second phase, the transformed features will be used in order to synthesis speech that will, hopefully, resemble that of the target speaker.  Speech synthesis will be performed again by means of the Linear Prediction Coding.

 The respiratory subsystem is composed of the lungs, trachea and windpipe, diaphragm and the chest cavity.  The larynx and pharyngeal cavity or throat constitutes the laryngeal subsystems.  The articulatory subsystem includes the oral cavity and the nasal cavity.

 The oral cavity is comprised of the velum, the tongue, the lips, the jaw and the teeth.  In speech processing technical discussions, the vocal tract is referred to as the combination of the larynx, the pharyngeal cavity and the oral cavity.  The respiratory subsystem behaves like an air pump, supplying the aerodynamic energy for the other two subsystems.  In speech processing, the basic aerodynamic parameters are air volume, flow, pressure and resistance.

 TECHNICS:- Wavelet Decomposition. Proposed model.  Wavelet Decomposition :-  Wavelets are a class of functions that possess compact support and form a basis for all finite energy signals.  They are able to capture the non-stationary spectral characteristics of a signal by decomposing it over a set of atoms which are localized in both time and frequency. The DWT uses the set of dyadic scales and translates of the mother wavelet to form an orthonormal basis for signal analysis.

 The original signal S is split into an approximation cA1 and a detail cD1.  The approximation is then itself split into an approximation and a detail and so on.  Decomposing a signal into k levels of decomposition therefore results in k+1 sets of coefficients at different frequency resolutions, k levels of detail and 1 level of approximation coefficients.

 Proposed model :  Voice morphing is performed in two steps: training and transformation. The training data consist of repetitions of the same phonemes uttered by both source and target speakers.  The source and target training data is divided into frames of 128 samples and the data is randomly divided into training and validation sets.  A 5-level wavelet decomposition is then performed to the source and target training data.

 IN THIS SECTION WE KNOW THAT IN WHICH FORM WE CAN TRANFORM A NORMAL VOICE OR SPEECH. SOURCETARGETRESULT1RESULT2 F TO MSPEECH1TARGET1RESULT1VOICE1 M TO FSPEECH2TARGET2RESULT2VOICE2 F TO FSPEECH3TARGET3RESULT3VOICE3 M TO MSPEECH4TARGET4RESULT4VOICE4

 The "Source Speech" column indicates the utterances of the source speaker.  Target Speech" column is the target speaker's utterances.  The utterances in both these two columns are NOT included in the training data for the estimation of the conversion function.  The next two columns for result.  The difference between these two columns is that the “RESULT1" applies the target prosody extracted from the target utterance, but the “RESULT2" still applies the original prosody of the source utterances.

 Abe M., Nakamura S., Shikano K. and Kuwabara H.: Voice conversion through vector quantization, Proceedings of the ICASSP,  Stylianou Y., Cappe O. And Moulines E.: Statistical Methods for Voice Quality Transformation, Proceedings of Euro speech,  Arslan L. and Talkin D: Voice Conversion by Codebook Mapping of Line Spectral Frequencies and Excitation Spectrum, Proceedings of Euro speech, 1997.

 ENTERTAINMENT.  IN FILM INDUSTRY.  SECURITY.  IN COMPUTER GAMING

 MORPH VOX PRO VOICE CHANGER  MORPH VOX PRO VOICE CHANGER  MORPH VOX PROVOICE CHANGER  TERA VOICE SERVAER  FLASH VOICE BUTTONS 3.0.  VOICE TWISTER  VOICE AGAIN  QUICK VOICE FOR OSX  QUICK VOICE FOR WINDOWS

 Voice morphing is the process of changing voice personality i.e. speech uttered by a source speaker is modified to sound as if the target speaker had uttered it.  In this dissertation our attempt of voice morphing commenced by introducing the basic properties of speech signals.  Introducing basic techniques of voice morphing.  Concept behind voice morphing.

As voice morphing is a technology with a lot of interesting, useful and fun applications further research on the subject with or without the implementation of the GTM (Generative Topographic Mapping) model is bound to follow that will lead to the production of morphed speech of an excellent quality.