1-R-43 　Neutral-to-Emotional Voice Conversion with Latent Representations of F0 using Generative Adversarial Networks Zhaojie Luo, Tetsuya Takiguchi, and.

Slides:

Advertisements

Similar presentations

An Approach to ECG Delineation using Wavelet Analysis and Hidden Markov Models Maarten Vaessen (FdAW/Master Operations Research) Iwan de Jong (IDEE/MI)

Advertisements

A) 80 b) 53 c) 13 d) x 2 = : 10 = 3, x 3 = 309.

Pitch Recognition with Wavelets Final Presentation by Stephen Geiger.

Survey of INTERSPEECH 2013 Reporter: Yi-Ting Wang 2013/09/10.

BEYOND SIMPLE FEATURES: A LARGE-SCALE FEATURE SEARCH APPROACH TO UNCONSTRAINED FACE RECOGNITION Nicolas Pinto Massachusetts Institute of Technology David.

Spoken Language Generation Project II Synthesizing Emotional Speech in Fairy Tales.

Object Detection Using the Statistics of Parts Henry Schneiderman Takeo Kanade Presented by : Sameer Shirdhonkar December 11, 2003.

Exploring Emotions.

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

Project 10 Facial Emotion Recognition Based On Mouth Analysis SSIP 08, Vienna 1

Basic signals Why use complex exponentials? – Because they are useful building blocks which can be used to represent large and useful classes of signals.

1 Multiple Classifier Based on Fuzzy C-Means for a Flower Image Retrieval Keita Fukuda, Tetsuya Takiguchi, Yasuo Ariki Graduate School of Engineering,

Understanding The Semantics of Media Chapter 8 Camilo A. Celis.

NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.

 Detecting system  Training system Human Emotions Estimation by Adaboost based on Jinhui Chen, Tetsuya Takiguchi, Yasuo Ariki （ Kobe University ） User's.

Variation of aspect ratio Voice section Correct voice section Voice Activity Detection by Lip Shape Tracking Using EBGM Purpose What is EBGM ？ Experimental.

Prediction of Influencers from Word Use Chan Shing Hei.

Multimodal Emotion Recognition Colin Grubb Advisor: Nick Webb.

Active Microphone with Parabolic Reflection Board for Estimation of Sound Source Direction Tetsuya Takiguchi, Ryoichi Takashima and Yasuo Ariki Organization.

Wavelet Spectral Analysis Ken Nowak 7 December 2010.

Performance Comparison of Speaker and Emotion Recognition

A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.

Feelings and Emotions. Angry What does it mean to be angry? What makes you angry or mad? Make a mad or angry face.

RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.

Interpreting Ambiguous Emotional Expressions Speech Analysis and Interpretation Laboratory ACII 2009.

Speech emotion detection General architecture of a speech emotion detection system: What features?

Презентацию подготовила Хайруллина Ч.А. Муслюмовская гимназия Подготовка к части С ЕГЭ.

DeepWalk: Online Learning of Social Representations

Automatic Lung Cancer Diagnosis from CT Scans (Week 1)

Mr. Darko Pekar, Speech Morphing Inc.

WAVENET: A GENERATIVE MODEL FOR RAW AUDIO

Image Sampling Moire patterns

Presented by Minh Hoai Nguyen Date: 28 March 2007

Textual Video Prediction

Low Dose CT Image Denoising Using WGAN and Perceptual Loss

The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression By: Patrick Lucey, Jeffrey F. Cohn, Takeo.

Image Sampling Moire patterns

PixelGAN Autoencoders

إستراتيجيات ونماذج التقويم

A Unifying View on Instance Selection

Visualizing Audio for Anomaly Detection

Notes Assignments Tutorial problems

with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017

”Thinking Quantitatively”

Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.

AHED Automatic Human Emotion Detection

Filtering Part 2: Image Sampling

Word Embedding Word2Vec.

Image Sampling Moire patterns

Yi Zhao1, Yanyan Shen*1, Yanmin Zhu1, Junjie Yao2

AHED Automatic Human Emotion Detection

Figure Overview.

S.N.U. EECS Jeong-Jin Lee Eui-Taik Na

Figure Overview.

Abnormally Detection

Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.

one-input multi-output architecture

Speech Prosody Conversion using Sequence Generative Adversarial Nets

Variational autoencoders to visualize non-intuitive data

Background Task Fashion image inpainting Some conceptions

Cengizhan Can Phoebe de Nooijer

Text-to-speech (TTS) Traditional approaches (before 2016) Neural TTS

End-to-End Speech-Driven Facial Animation with Temporal GANs

Motivation The subjects/objects are correlated to each other under semantic relationships.

Deep screen image crop and enhance

Self-Supervised Cross-View Action Synthesis

Deep screen image crop and enhance

Do Better ImageNet Models Transfer Better?

1-P-30 Speech-to-Speech Translation using Dual Learning and Prosody Conversion Zhaojie Luo, Yoichi Takashima, Tetsuya Takiguchi, and Yasuo Ariki (Kobe.

Presentation transcript:

1-R-43 　Neutral-to-Emotional Voice Conversion with Latent Representations of F0 using Generative Adversarial Networks Zhaojie Luo, Tetsuya Takiguchi, and Yasuo Ariki (Kobe University) Canonical Correlation Analysis Overview Background Problems Goal 1. Applying the continuous wavelet transform (CWT) and cross wavelet transform (XWT) method to systematically capture the F0 features of different temporal scales. 2. Using the VAE-GAN to train the MCC and AS-CWT features. 1. The representation of fundamental frequency (F0) is too simple for emotion conversion. 2. The emotional voice data is insufficient. keep linguistic information unchanged　 Hey Hey neutral sad happy angry Emotional voice conversion Emotional robot Framework L = LGAN + LDl like + Lprior Training model Dataset Samples: Results x E h G D x’ y input real data ouput Table 1 F0-RMSE results for diﬀerent emotions. N2A, N2S and N2H represent the datasets from neutral to angry, sad and happy voice, respectively. MOS evaluation of emotional voice conversion Source LG NN VAE GAN VA-GAN N2A 76.8 76.3 70.4 73.4 59.5 51.2 N2S 73.7 72.0 62.3 77.5 56.1 58.5 N2H 100.4 99.1 75.2 85.8 65.5 62.1