1-R-43  Neutral-to-Emotional Voice Conversion with Latent Representations of F0 using Generative Adversarial Networks Zhaojie Luo, Tetsuya Takiguchi, and.

Slides:



Advertisements
Similar presentations
An Approach to ECG Delineation using Wavelet Analysis and Hidden Markov Models Maarten Vaessen (FdAW/Master Operations Research) Iwan de Jong (IDEE/MI)
Advertisements

A) 80 b) 53 c) 13 d) x 2 = : 10 = 3, x 3 = 309.
Pitch Recognition with Wavelets Final Presentation by Stephen Geiger.
Survey of INTERSPEECH 2013 Reporter: Yi-Ting Wang 2013/09/10.
BEYOND SIMPLE FEATURES: A LARGE-SCALE FEATURE SEARCH APPROACH TO UNCONSTRAINED FACE RECOGNITION Nicolas Pinto Massachusetts Institute of Technology David.
Spoken Language Generation Project II Synthesizing Emotional Speech in Fairy Tales.
Object Detection Using the Statistics of Parts Henry Schneiderman Takeo Kanade Presented by : Sameer Shirdhonkar December 11, 2003.
Exploring Emotions.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Project 10 Facial Emotion Recognition Based On Mouth Analysis SSIP 08, Vienna 1
Basic signals Why use complex exponentials? – Because they are useful building blocks which can be used to represent large and useful classes of signals.
1 Multiple Classifier Based on Fuzzy C-Means for a Flower Image Retrieval Keita Fukuda, Tetsuya Takiguchi, Yasuo Ariki Graduate School of Engineering,
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
 Detecting system  Training system Human Emotions Estimation by Adaboost based on Jinhui Chen, Tetsuya Takiguchi, Yasuo Ariki ( Kobe University ) User's.
Variation of aspect ratio Voice section Correct voice section Voice Activity Detection by Lip Shape Tracking Using EBGM Purpose What is EBGM ? Experimental.
Prediction of Influencers from Word Use Chan Shing Hei.
Multimodal Emotion Recognition Colin Grubb Advisor: Nick Webb.
Active Microphone with Parabolic Reflection Board for Estimation of Sound Source Direction Tetsuya Takiguchi, Ryoichi Takashima and Yasuo Ariki Organization.
Wavelet Spectral Analysis Ken Nowak 7 December 2010.
Performance Comparison of Speaker and Emotion Recognition
A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.
Feelings and Emotions. Angry What does it mean to be angry? What makes you angry or mad? Make a mad or angry face.
RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.
Interpreting Ambiguous Emotional Expressions Speech Analysis and Interpretation Laboratory ACII 2009.
Speech emotion detection General architecture of a speech emotion detection system: What features?
Презентацию подготовила Хайруллина Ч.А. Муслюмовская гимназия Подготовка к части С ЕГЭ.
DeepWalk: Online Learning of Social Representations
Automatic Lung Cancer Diagnosis from CT Scans (Week 1)
Mr. Darko Pekar, Speech Morphing Inc.
WAVENET: A GENERATIVE MODEL FOR RAW AUDIO
Image Sampling Moire patterns
Presented by Minh Hoai Nguyen Date: 28 March 2007
Textual Video Prediction
Low Dose CT Image Denoising Using WGAN and Perceptual Loss
The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression By: Patrick Lucey, Jeffrey F. Cohn, Takeo.
Image Sampling Moire patterns
PixelGAN Autoencoders
إستراتيجيات ونماذج التقويم
A Unifying View on Instance Selection
Visualizing Audio for Anomaly Detection
Notes Assignments Tutorial problems
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
”Thinking Quantitatively”
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
AHED Automatic Human Emotion Detection
Filtering Part 2: Image Sampling
Word Embedding Word2Vec.
Image Sampling Moire patterns
Yi Zhao1, Yanyan Shen*1, Yanmin Zhu1, Junjie Yao2
AHED Automatic Human Emotion Detection
Figure 11-1.
Figure Overview.
S.N.U. EECS Jeong-Jin Lee Eui-Taik Na
Figure Overview.
Abnormally Detection
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
one-input multi-output architecture
Speech Prosody Conversion using Sequence Generative Adversarial Nets
Variational autoencoders to visualize non-intuitive data
Background Task Fashion image inpainting Some conceptions
Cengizhan Can Phoebe de Nooijer
Text-to-speech (TTS) Traditional approaches (before 2016) Neural TTS
End-to-End Speech-Driven Facial Animation with Temporal GANs
Motivation The subjects/objects are correlated to each other under semantic relationships.
Deep screen image crop and enhance
Self-Supervised Cross-View Action Synthesis
Deep screen image crop and enhance
Do Better ImageNet Models Transfer Better?
1-P-30 Speech-to-Speech Translation using Dual Learning and Prosody Conversion Zhaojie Luo, Yoichi Takashima, Tetsuya Takiguchi, and Yasuo Ariki (Kobe.
Presentation transcript:

1-R-43  Neutral-to-Emotional Voice Conversion with Latent Representations of F0 using Generative Adversarial Networks Zhaojie Luo, Tetsuya Takiguchi, and Yasuo Ariki (Kobe University) Canonical Correlation Analysis Overview Background Problems Goal 1. Applying the continuous wavelet transform (CWT) and cross wavelet transform (XWT) method to systematically capture the F0 features of different temporal scales. 2. Using the VAE-GAN to train the MCC and AS-CWT features. 1. The representation of fundamental frequency (F0) is too simple for emotion conversion. 2. The emotional voice data is insufficient. keep linguistic information unchanged  Hey Hey neutral sad happy angry Emotional voice conversion Emotional robot Framework L = LGAN + LDl like + Lprior Training model Dataset Samples: Results x E h G D x’ y input real data ouput Table 1 F0-RMSE results for different emotions. N2A, N2S and N2H represent the datasets from neutral to angry, sad and happy voice, respectively. MOS evaluation of emotional voice conversion Source LG NN VAE GAN VA-GAN N2A 76.8 76.3 70.4 73.4 59.5 51.2 N2S 73.7 72.0 62.3 77.5 56.1 58.5 N2H 100.4 99.1 75.2 85.8 65.5 62.1