VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.

Slides:



Advertisements
Similar presentations
STQ Workshop, Sophia-Antipolis, February 11 th, 2003 Packet loss concealment using audio morphing Franck Bouteille¹ Pascal Scalart² Balazs Kövesi² ¹ PRESCOM.
Advertisements

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Advanced Speech Enhancement in Noisy Environments
Results obtained in speaker recognition using Gaussian Mixture Models Marieta Gâta*, Gavril Toderean** *North University of Baia Mare **Technical University.
A System for Hybridizing Vocal Performance By Kim Hang Lau.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012.
Vocal microtremor in normophonic and mildly dysphonic speakers Jean Schoentgen Université Libre Bruxelles Brussels - Belgium.
G.S.MOZE COLLEGE OF ENGINNERING BALEWADI,PUNE -45.
Speech Group INRIA Lorraine
Analysis and Synthesis of Shouted Speech Tuomo Raitio Jouni Pohjalainen Manu Airaksinen Paavo Alku Antti Suni Martti Vainio.
CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Presented By: Karan Parikh Towards the Automated Social Analysis of Situated Speech Data Watt, Chaudhary, Bilmes, Kitts CS546 Intelligent.
Oytun Turk and Levent M.Arslan Subband Based Voice Conversion SESTEK Inc., R&D Dept. Istanbul, Turkey Bogazici University, Electrical-Electronics Eng.
Voice Transformation Project by: Asaf Rubin Michael Katz Under the guidance of: Dr. Izhar Levner.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
MIL Speech Seminar TRACHEOESOPHAGEAL SPEECH REPAIR Arantza del Pozo CUED Machine Intelligence Laboratory November 20th 2006.
A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.
Representing Acoustic Information
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.
Multiresolution STFT for Analysis and Processing of Audio
Prepared by: Waleed Mohamed Azmy Under Supervision:
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento.
Regression Approaches to Voice Quality Control Based on One-to-Many Eigenvoice Conversion Kumi Ohta, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, and.
Multimodal Information Analysis for Emotion Recognition
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING.
HMM-Based Synthesis of Creaky Voice
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling.
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
Performance Comparison of Speaker and Emotion Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
SRINIVAS DESAI, B. YEGNANARAYANA, KISHORE PRAHALLAD A Framework for Cross-Lingual Voice Conversion using Artificial Neural Networks 1 International Institute.
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
CELP / FS-1016 – 4.8kbps Federal Standard in Voice Coding
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH GEORGE P. KAFENTZIS, YANNIS STYLIANOU MULTIMEDIA INFORMATICS LABORATORY DEPARTMENT OF COMPUTER SCIENCE.
High Quality Voice Morphing
ARTIFICIAL NEURAL NETWORKS
Vocoders.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Packet loss concealment using audio morphing
Vocoders.
Presenter: Shih-Hsiang(士翔)
Auditory Morphing Weyni Clacken
Presentation transcript:

VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi University, İstanbul, Turkey SESTEK Inc. Boğaziçi University SELECTIVE PRE-EMPHASIS TRAINING 1.Same utterances from source & target speaker 2.Automatic alignment by Sentence-HMMs and/or manual alignment 3.Selective pre-emphasis 4.Generate LSF codebooks for each sub-band component SELECTIVE PRE-EMPHASIS TRANSFORMATION 1.Selective pre-emphasis applied on source utterance 2.Full-band excitation spectrum modified separately 3.Weighted average of corresponding target sub-band codebook entries 4.Full-band vocal tract spectrum estimated using selective pre- emphasis based synthesis MOTIVATION Vocal tract and pitch characteristics have dominant role in perception of speaker identity Detailed modeling of vocal tract and pitch contour is required for high quality voice conversion As the number of parameters increase, transformation of the vocal tract spectrum becomes problematic (distortion at the output) This study proposes two methods for detailed modeling and transformation of the vocal tract spectra and the pitch contours New methods are compared with existing ones in a subjective listening test 1 VOICE CONVERSION Formant frequencies, sinusoidal model parameters, and LSFs are used for transformation of the vocal tract spectrum Codebooks can be used to represent the mapping between the source and the target speaker’s acoustical spaces Vocal tract spectra and pitch are processed separately 2 SELECTIVE PRE-EMPHASIS SYSTEM Motivation Pre-emphasis enhances the numerical properties of LPC analysis We combine pre-emphasis with perceptual sub-band processing to model the vocal tract spectra in detail Detailed analysis with less LP order is possible at high sampling rates Example: LP order of 50 might be required at 44 KHz. LP order of 24 is sufficient with selective pre-emphasis 3 SELECTIVE PRE-EMPHASIS ANALYSIS 1.Bandpass filtering & frame-by-frame processing 2.LP analysis on each subband component 3.Fullband spectrum estimated as: 4.k 1 : lower cut-off of (i+1) th filter k 2 : higher cut-off of i th filter 4 SELECTIVE PRE-EMPHASIS SYNTHESIS 1.Use synthesis vocal tract and excitation spectra 2.Inverse Fourier Transform 3.Perform overlap-add synthesis EXAMPLE: LP vs. Selective pre-emphasis based spectral estimation SEGMENTAL PITCH CONTOUR MODEL 1.Source&Target utterances aligned,pitch contours extracted 2.Target pitch contours linearly interpolated in unvoiced segments 3.Corresponding target segment found for each voiced source pitch contour segment 4.Pitch contour extracted for source utterance to be transformed 5.Minimum Mahalanobis distance source segments are found 6.Transformed contour is synthesized as a weighted average of corresponding target segments 9 EVALUATIONS Subjective test using 30 sentences, 50 words in Turkish, 44KHz, 8 speakers (4 female, 4 male) Vocal tract conversion: STASC, DWT, Sel. Pre-emp. Pitch conversion: Mean-Variance, Segmental Transplantations: Vocal tract, Vocal tract + Pitch 10 subjects listened to 112 triples of sound files Each triple consisted of a source recording, a target recording, and an output recording Output recording is either a source or target recording, or the output of voice conversion, or acoustic feature transplantation Scores: Identity(0.0,0.5,1.0), Confidence(1-5), Quality(1-5) Means & Interquartile ranges calculated 10 EVALUATIONS Subjective Test Results 11 CONCLUSIONS Lower scores obtained when only vocal tract converted Confidence and quality scores decrease as processing increases STASC is more robust in different gender combinations Selective pre-emphasis performs well at a lower prediction order. It can be used for employing differing amounts of resolution at different sub-bands Segmental pitch improves identity scores Source, target and third speakers were identified perfectly References [1] Turk, O., New Methods For Voice Conversion, M.S. Thesis, Bogazici University, [2] Gutierrez-Arriola, J.M., Hsiao, Y.S., Montero, J.M., Pardo, J.M., and Childers, D.G., “Voice Conversion Based On Parameter Transformation”, Proc. of the ICSLP 1998, Vol. 3, pp , Sydney, Australia. [3] Stylianou, Y., Cappe, O., and Moulines, E., “Continuous Probabilistic Transform for Voice Conversion”, IEEE Transactions on Speech and Audio Processing, Vol. 6, No. 2, 1998, pp [4] Arslan, L.M., “Speaker Transformation Algorithm Using Segmental Codebooks”, Speech Communication 28 (1999), pp [5] Kain, A.B., and Macon, M., “Personalizing A Speech Synthesizer by Voice Adaptation”, in Proc. of the 3 rd ESCA/COCOSDA International Speech Synthesis Workshop, 1998, pp [6] Chappell, D.T., and Hansen, J.H.L., “Speaker-Specific Pitch Contour Modeling and Modification”, in Proc. of the ICASSP 1998, Vol. II, pp , Seattle, USA. [7] Turk, O., and Arslan, L.M., “Subband Based Voice Conversion”, in Proc. of the ICSLP 2002, Vol. 1, pp , Denver, Colorado, USA. 12 EUROSPEECH 2003 (Interspeech 2003), 8 th European Conference on Speech Communication and Technology, September 1-4, 2003, Geneva, Switzerland