Download presentation
Presentation is loading. Please wait.
1
HW07 洪銘佑 王璽喆
2
DNN Voice conversion technique
Bruce Wang, Simon hong Department of Computer Science , CCU, Minhsiung, Chiayi 62102 Introduction Voice conversion is a technique that can be used to modify source speech to make it sound like another type of speech (target speech), while retaining the linguistic information. There are many ways to achieve voice conversion, such as the trajectory-based conversion method using a GMM(Gaussian mixture model)and a vocoder-based conversion. Solve the voice conversion process by the mathematic model. Due to the development of the machine learning technique. We think that DNN might be a good way to implement the voice conversion. These are the structure that we create a voice conversion process through DNN. The method We used, and the final result. Conclusions In general, the conversion is good enough that half of the people could not tails the difference between the real target sentence and the sentence converted through our method. In the converting process, we use the weight and bias that the training process creates. Literature cited T. Nakashika, R. Takashima, T. Takiguchi, and Y. Ariki, “Voice conversion in high-order eigen space using deep belief nets,” Proc. INTERSPEECH, pp. 369–372, Aug D. Erro, A. Moreno, and A. Bonafonte, “INCA algorithm for training voice conversion systems from nonparallelcorpora,” IEEETrans.ASLP,vol.18,no.5,pp.944– 953, 2010. K. Kobayashi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, “Statistical singing voice conversion with direct waveform modification based on the spectrum differential,” Proc. INTERSPEECH, pp. 2514–2518, Sept Figure 4. The “convert feature” block includes the DNN calculation and the MLSA filter function. Figure 2. This figure shows the steps in the training process. The row data need to be cut into frames and extract the f0 and the mel-cepstrum. Results Methods To implement a voice conversion base on DNN, we divide the whole process into several steps. Same as all DNN process, it needs to be trained before we start to convert source data. The activation function we choose is Re-Lu. In the training process, the alignment between the source data and the target data will affect the accuracy dramatically. We choose the Dynamic time warping method to implement the alignment. Acknowledgments The author would like to thank Fin Jones, Jeffery Walker and Siori Uchino for the technical assistance. Thanks to Fiona Brown helping me calculate the feedback survey. Figure 1. The Re-Lu function: always 0 if the value is less than 0 Further information Please contact More information of this device can be obtained at The online pdf link: Figure 5. This chart is about the similarity between the source and the target. We choose 50 people randomly. Figure 3. The simple graph introducing the dynamic time wraping function.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.