Download presentation
Presentation is loading. Please wait.
Published bySabrina Ellis Modified over 6 years ago
1
Denoising Recurrent Neural Network for Deep Bidirectional LSTM based Voice Conversion
Jie Wu1, Dongyan Huang2, Lei Xie1 and Haizhou Li2,3 1School of Computer Science, Northwestern Polytechnical University, Xi’an, China 2Institute for Infocomm Research, A*STAR, Singapore 3Department of Electrical and Computer Engineering, National University of Singapore Samples to support the INTERSPEECH 2017 submission titled above. More details can be found in our paper. If you have any questions, please drop me an (Jie Wu) This next page (Demos) contains following samples: Source: Recording speech of source speaker (CLB in CMU ARCTIC). Target: Recording speech of target speaker (BDL in CMU ARCTIC). NONE: Typical deep bidirectional LSTM based voice conversion [1] without post processing. GV: Global variance (GV) based postfilter [2] is adopted. MS: Modulation spectrum (MS) based postfilter [3] is adopted. RNN: Recurrent neural network (RNN) based postfilter [4] is adopted. DeRNN: The proposed postfilter, in which denoising recurrent neural network (DeRNN) is used in the mel-cepstra domain. CMU ARCTIC Corpus:
2
Demos Sample Source Target NONE GV MS RNN DeRNN 1 2 3 4 5 6
3
References [1] L. Sun, S. Kang, K. Li, and H. M. Meng, “Voice conversion using deep bidirectional long short-term memory based recurrent neural networks,” in Proc. ICASSP. IEEE, 2015, pp. 4869–4873. [2] T. Toda, T. Muramatsu, and H. Banno, “Implementation of computationally efficient real-time voice conversion,” in Proc. INTERSPEECH. ISCA, 2012, pp. 94–97. [3] S. Takamichi, T. Toda, A. W. Black, and S. Nakamura, “Modulation spectrum-based post-filter for gmm-based voice conversion,” in Proc. APSIPA. IEEE, 2014, pp. 1–4. [4] P. K. Muthukumar and A. W. Black, “Recurrent neural network postfilters for statistical parametric speech synthesis,” CoRR, vol.abs/ , 2016. Cite our paper as: Jie Wu, Dongyan Huang, Lei Xie and Haizhou Li, Denoising Recurrent Neural Network for Deep Bidirectional LSTM based Voice Conversion, Interspeech2017, August 20-24, Stockholm, Sweden
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.