1 Analysis of Parameter Importance in Speaker Identity Ricardo de Córdoba, Juana M. Gutiérrez-Arriola Speech Technology Group Departamento de Ingeniería.

Slides:



Advertisements
Similar presentations
PF-STAR: emotional speech synthesis Istituto di Scienze e Tecnologie della Cognizione, Sezione di Padova – “Fonetica e Dialettologia”, CNR.
Advertisements

Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.
Glottal Source Parameterization: A Comparative Study Authors: Ixone Arroabarren, Alfonso Carlosena UNIVERSIDAD PÚBLICA DE NAVARRA Dpt. Electrical Engineering.
1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ Department of Radioelectronics, Slovak University of Technology Ilkovičova 3, SK-812.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
5-Text To Speech (TTS) Speech Synthesis
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012.
Effectiveness of spatial cues, prosody, and talker characteristics in selective attention C.J. Darwin & R.W. Hukin.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Dr. O. Dakkak & Dr. N. Ghneim: HIAST M. Abu-Zleikha & S. Al-Moubyed: IT fac., Damascus U. Prosodic Feature Introduction and Emotion Incorporation in an.
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
EMOTIONS NATURE EVALUATION BASED ON SEGMENTAL INFORMATION BASED ON PROSODIC INFORMATION AUTOMATIC CLASSIFICATION EXPERIMENTS RESYNTHESIS VOICE PERCEPTUAL.
Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.
Introduction to Speech Synthesis ● Key terms and definitions ● Key processes in sythetic speech production ● Text-To-Phones ● Phones to Synthesizer parameters.
I.1 ii.2 iii.3 iv.4 1+1=. i.1 ii.2 iii.3 iv.4 1+1=
Anatomic Aspects Larynx: Sytem of muscles, cartileges and ligaments.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
Voice Transformation Project by: Asaf Rubin Michael Katz Under the guidance of: Dr. Izhar Levner.
I.1 ii.2 iii.3 iv.4 1+1=. i.1 ii.2 iii.3 iv.4 1+1=
Chapter 15 Speech Synthesis Principles 15.1 History of Speech Synthesis 15.2 Categories of Speech Synthesis 15.3 Chinese Speech Synthesis 15.4 Speech Generation.
A PRESENTATION BY SHAMALEE DESHPANDE
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.
Representing Acoustic Information
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
LE 460 L Acoustics and Experimental Phonetics L-13
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice.
Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis
Prepared by: Waleed Mohamed Azmy Under Supervision:
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
1 Phoneme and Sub-phoneme T- Normalization for Text-Dependent Speaker Recognition Doroteo T. Toledano 1, Cristina Esteve-Elizalde 1, Joaquin Gonzalez-Rodriguez.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
Regression Approaches to Voice Quality Control Based on One-to-Many Eigenvoice Conversion Kumi Ohta, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, and.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Acoustic Manipulation Tamer Elnady
Page 1 NOLISP, Paris, May 23rd 2007 Audio-Visual Audio-Visual Subspaces Audio Visual Reduced Audiovisual Subspace Principal Component & Linear Discriminant.
Pitch Determination by Wavelet Transformation Santhosh Bellikoth ECE Speech Processing Instructor: Dr Kepuska.
HMM-Based Synthesis of Creaky Voice
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
MMDB-8 J. Teuhola Audio databases About digital audio: Advent of digital audio CD in Order of magnitude improvement in overall sound quality.
Ways to generate computer speech Record a human speaking every sentence HAL will ever speak (not likely) Make a mathematical model of the human vocal.
Performance Comparison of Speaker and Emotion Recognition
SRINIVAS DESAI, B. YEGNANARAYANA, KISHORE PRAHALLAD A Framework for Cross-Lingual Voice Conversion using Artificial Neural Networks 1 International Institute.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.
Mr. Darko Pekar, Speech Morphing Inc.
Vocoders.
Text-To-Speech System for English
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
PROJECT PROPOSAL Shamalee Deshpande.
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Speech Processing Final Project
Presented by Chen-Wei Liu
Auditory Morphing Weyni Clacken
Presentation transcript:

1 Analysis of Parameter Importance in Speaker Identity Ricardo de Córdoba, Juana M. Gutiérrez-Arriola Speech Technology Group Departamento de Ingeniería Electrónica Universidad Politécnica de Madrid

2 Index Introduction System description Parameter extraction Voice conversion and synthesis Parameter analysis Application to a voice quality task Results Conclusions

3 parameters converted target speaker speech Synthesis — — source speaker voice — target speaker voice Analysis parameters Transformation functions computation transformation functions Voice conversion Introduction

4 Index Introduction System description System description Parameter extraction Voice conversion and synthesis Parameter analysis Application to a voice quality task Results Conclusions

5 System description

6 Index Introduction System description Parameter extraction Parameter extraction Voice conversion and synthesis Parameter analysis Application to a voice quality task Results Conclusions

7 Parameter Extraction I We used a 39 parameter synthesizer – F0 – Glottal source: FLUTTER, KOPEN, RET, SKEW, VELO, Eo, Ee – AV, ASP, ATURB, AF – F1, F2, F3, F4, F5, F6 – B1, B2, B3, B4, B5 – FNZ, FNP, BNZ, BNP, B2P, B3P, B4P, B5P, B6P – A2, A3, A4, A5, A6 – AB, GAIN

8 Parameter Extraction I Glottal parameters:

9 Parameter extraction II

10 Parameter Extraction III

11 Parameter Extraction IV We calculate F0, AV, AF, formant frequencies and bandwidths Pitch marks and formants are manually revised Only voiced sounds are transformed

12 Index Introduction System description Parameter extraction Voice conversion and synthesis Voice conversion and synthesis Parameter analysis Application to a voice quality task Results Conclusions

13 Voice conversion I Lineal transformation functions: For each pair of source-target units we compute the transformation coefficients which are stored in a file

14 Synthesis Formant synthesizer (Klatt) Parameterized units concatenation Prosodic modification, changing glottal pulse length and the number of glottal pulses Formant smoothing during unit transitions

15 Index Introduction System description Parameter extraction Voice conversion and synthesis Parameter analysis Parameter analysis Application to a voice quality task Application to a voice quality task Results Conclusions

16 Parameter Analysis I 11 speakers (5 female, 6 male) EUROM1 database in Castilian Spanish Sentence: “Mi abuelo me animó a estudiar solfeo” (My grandfather encouraged me to study solfa) Fs=16 kHz

17 Parameter Analysis II

18 Parameter Analysis III We want to know which parameters are actually relevant for speaker identity Discriminant functions are linear combinations of variables that best discriminate classes – They can be used to rank the variables in terms of their relative contribution to class discrimination LDA is performed: – For each phoneme of the sentence (does not work well for the whole sentence) – Coefficients of the first discriminant function are used to rank the parameters

19 Application to a Voice Quality Task We extracted four sentences of the Brian VOQUAL'03 database: normal, clear, creaky, and relax. ea We analyzed two phonemes of the sentence: “She has left for a great party today” We wanted to rank parameter importance to discriminate between the four classes: – We use the coefficients of the first discriminant function

20 Index Introduction System description Parameter extraction Voice conversion and synthesis Parameter analysis Application to a voice quality task Results Results Conclusions

21 Results I Voice Quality Task Frame classification for E and A using LDA for the first two discriminant functions normal creaky clear relax EA

22 Results II Voice Quality Task E A First function coefficients Absolute values of the coefficients that multiply each parameter in the first discriminant functions

23 Results III Speaker Identity Number of times each parameter has been the most relevant (up) and the least relevant (bottom) in the first discriminant function

24 Index Introduction System description Parameter extraction Voice conversion and synthesis Parameter analysis Application to a voice quality task Results Conclusions Conclusions

25 Conclusions Parameter importance depends on: – the type of speech – the gender of the speaker – the phonemes under study Results show that F0, formant frequencies and OQ are the most important parameters for speaker classification.