Download presentation
Presentation is loading. Please wait.
Published byÁlvaro Ignacio Soto Miranda Modified over 6 years ago
1
1-R-43 Neutral-to-Emotional Voice Conversion with Latent Representations of F0 using Generative Adversarial Networks Zhaojie Luo, Tetsuya Takiguchi, and Yasuo Ariki (Kobe University) Canonical Correlation Analysis Overview Background Problems Goal 1. Applying the continuous wavelet transform (CWT) and cross wavelet transform (XWT) method to systematically capture the F0 features of different temporal scales. 2. Using the VAE-GAN to train the MCC and AS-CWT features. 1. The representation of fundamental frequency (F0) is too simple for emotion conversion. 2. The emotional voice data is insufficient. keep linguistic information unchanged Hey Hey neutral sad happy angry Emotional voice conversion Emotional robot Framework L = LGAN + LDl like + Lprior Training model Dataset Samples: Results x E h G D x’ y input real data ouput Table 1 F0-RMSE results for different emotions. N2A, N2S and N2H represent the datasets from neutral to angry, sad and happy voice, respectively. MOS evaluation of emotional voice conversion Source LG NN VAE GAN VA-GAN N2A N2S N2H
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.