HW07 607410009 洪銘佑 607410150 王璽喆.

Slides:



Advertisements
Similar presentations
Deema Abdal Hafeth MSc student by research School of Computer Science, University of Lincoln Dr Amr Ahmed Supervisor Dr David Cobham supervisor.
Advertisements

Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
Learning Techniques for Video Shot Detection Under the guidance of Prof. Sharat Chandran by M. Nithya.
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
G.S.MOZE COLLEGE OF ENGINNERING BALEWADI,PUNE -45.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Paula Gonzalez 1, Leticia Velazquez 1,2, Miguel Argaez 1,2, Carlos Castillo-Chávez 3, Eli Fenichel 4 1 Computational Science Program, University of Texas.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
Introduction Using time property and location property from lost items’ pictures, we construct the Lost and Found System which combined with image search.
Introduction In recent years, products are required to follow the trend of fashion. It is very popular in using freeform surface to design the model of.
C HU H AI C OLLEGE O F H IGHER E DUCATION D EPARTMENT O F C OMPUTER S CIENCE Preparation of Final Year Project Report Bachelor of Science in Computer Science.
Regression Approaches to Voice Quality Control Based on One-to-Many Eigenvoice Conversion Kumi Ohta, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, and.
Jacob Zurasky ECE5526 – Spring 2011
From Machine Learning to Deep Learning. Topics that I will Cover (subject to some minor adjustment) Week 2: Introduction to Deep Learning Week 3: Logistic.
An Evaluation of Many-to-One Voice Conversion Algorithms with Pre-Stored Speaker Data Sets Daisuke Tani, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
From Pencil to Computer in Math Education JUAN JOSÉ PRIETO-VALDÉS Part-I from IV This dynamic presentation contains mp3 text to speech features. Please.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
C HU H AI C OLLEGE O F H IGHER E DUCATION D EPARTMENT O F C OMPUTER S CIENCE Preparation of Final Year Project Report Bachelor of Science in Computer Science.
Introduction A sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order. Efficient sorting.
Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:
Phonetic Posteriorgrams for Many-to-one Voice Conversion without Parallel Data Training Lifa Sun, Kun Li, Hao Wang, Shiyin Kang and Helen Meng Human-Computer.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Olivier Siohan David Rybach
Big data classification using neural network
Chapter 12: Simulation and Modeling
Lesson 8: Basic Monte Carlo integration
Recommendation in Scholarly Big Data
Image Processing For Soft X-Ray Self-Seeding
Bibliography / References Conclusion / Discussions
Mr. Darko Pekar, Speech Morphing Inc.
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
基于多核加速计算平台的深度神经网络 分割与重训练技术
Jie Wu1, Dongyan Huang2, Lei Xie1 and Haizhou Li2,3
ARTIFICIAL NEURAL NETWORKS
Delivering a Persuasive Speech
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
College of Engineering
Parallel Density-based Hybrid Clustering
For Evaluating Dialog Error Conditions Based on Acoustic Information
Bibliography / References Conclusion / Discussions
Chapter 1 Created by Educational Technology Network
A Comparative Study of Link Analysis Algorithms
Advanced Techniques for Automatic Web Filtering
خشنه اتره اهورهه مزدا شيوۀ ارائه مقاله 17/10/1388.
Statistical Machine Translation
Advanced Techniques for Automatic Web Filtering
Put your name here Name of the Department, School or College
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Put your name here Name of the Department, School or College
iSRD Spam Review Detection with Imbalanced Data Distributions
Interpret the execution mode of SQL query in F1 Query paper
How to Digitize the Natural Color
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
Put your name here Name of the Department, School or College
Unsupervised Pretraining for Semantic Parsing
Bibliography / References Conclusion / Discussions
Project Title Your title should be large enough to read easily, but not so large that you do not have space for the other important sections of the poster.
Computer Science The 6 Programming Steps.
CSSE463: Image Recognition Day 18
Put your name here Department of What, School or College
Presenter: Shih-Hsiang(士翔)
Keyword Spotting Dynamic Time Warping
Speech Prosody Conversion using Sequence Generative Adversarial Nets
Auditory Morphing Weyni Clacken
Presentation transcript:

HW07 607410009 洪銘佑 607410150 王璽喆

DNN Voice conversion technique Bruce Wang, Simon hong Department of Computer Science , CCU, Minhsiung,  Chiayi 62102 Introduction Voice conversion is a technique that can be used to modify source speech to make it sound like another type of speech (target speech), while retaining the linguistic information. There are many ways to achieve voice conversion, such as the trajectory-based conversion method using a GMM(Gaussian mixture model)and a vocoder-based conversion. Solve the voice conversion process by the mathematic model. Due to the development of the machine learning technique. We think that DNN might be a good way to implement the voice conversion. These are the structure that we create a voice conversion process through DNN. The method We used, and the final result. Conclusions In general, the conversion is good enough that half of the people could not tails the difference between the real target sentence and the sentence converted through our method. In the converting process, we use the weight and bias that the training process creates. Literature cited T. Nakashika, R. Takashima, T. Takiguchi, and Y. Ariki, “Voice conversion in high-order eigen space using deep belief nets,” Proc. INTERSPEECH, pp. 369–372, Aug. 2013. D. Erro, A. Moreno, and A. Bonafonte, “INCA algorithm for training voice conversion systems from nonparallelcorpora,” IEEETrans.ASLP,vol.18,no.5,pp.944– 953, 2010. K. Kobayashi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, “Statistical singing voice conversion with direct waveform modification based on the spectrum differential,” Proc. INTERSPEECH, pp. 2514–2518, Sept. 2014. Figure 4. The “convert feature” block includes the DNN calculation and the MLSA filter function. Figure 2. This figure shows the steps in the training process. The row data need to be cut into frames and extract the f0 and the mel-cepstrum. Results Methods To implement a voice conversion base on DNN, we divide the whole process into several steps. Same as all DNN process, it needs to be trained before we start to convert source data. The activation function we choose is Re-Lu. In the training process, the alignment between the source data and the target data will affect the accuracy dramatically. We choose the Dynamic time warping method to implement the alignment. Acknowledgments The author would like to thank Fin Jones, Jeffery Walker and Siori Uchino for the technical assistance. Thanks to Fiona Brown helping me calculate the feedback survey. Figure 1. The Re-Lu function: always 0 if the value is less than 0 Further information Please contact chechewang@ccu.edu. More information of this device can be obtained at www.chechewang.com. The online pdf link: www.chechewang.com/pdf. Figure 5. This chart is about the similarity between the source and the target. We choose 50 people randomly. Figure 3. The simple graph introducing the dynamic time wraping function.