Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speech Enhancement based on

Similar presentations


Presentation on theme: "Speech Enhancement based on"— Presentation transcript:

1 Speech Enhancement based on
Deep Learning Jiawen Wu 2017/4/27

2 Background Task:Model the mapping relationship between the noisy and clean speech signals 1989,Tamura [1] -- Time domain 1994,Xie and Van [2] -- Frequency domain 2006,Hinton [3] -- RMB Since 2006 -- Classification Task[4-5], Auto-encoder[6] 规模小,结构简单,训练样本少, 没有可靠的初始化方案,容易陷入局部最优 [1] S. I. Tamura, “An analysis of a noise reduction neural network,” in Proc. ICASSP, 1989, pp. 2001–2004. [2] F. Xie and D. V. Compernolle, “A family of MLP based nonlinear spectral estimators for noise reduction,” in Proc. ICASSP, 1994, pp. 53–56. [3] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507,2006. [4] Y.X.WangandD.L.Wang,“Towardsscalingupclassification-based speech separation,” IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 7, pp. 1381–1390, Jul [5] E.W.Healy,S.E.Yoho,Y.X.Wang,andD.L.Wang,“Analgorithm to improve speech recognition in noise for hearing-impaired listeners,” J.Acoust.Soc.Amer., vol. 134, no. 4, pp. 3029–3038,2013. [6] X.-G.Lu,Y.Tsao,S.Matsuda,andC.Hori,“Speechenhancement based on deep denoising Auto-Encoder,” in Proc. Interspeech, 2013, pp. 436–440.

3 Yong Xu University of Science and Technology of China
个人主页: Xu Y, Du J, Dai L R, et al. An experimental study on speech enhancement based on deep neural networks[J]. Signal Processing Letters, IEEE, 2014, 21(1): cited:81 Xu Y, Du J, Dai L R, et al. A regression approach to speech enhancement based on deep neural networks[J]. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 2015, 23(1): cited:20 cited up to

4 Contributions Nonlinear regression-based framework using DNNs
A large amount of training data hours, and more than 100 noise types Context information Unseen noise and non-stationary noise

5 DNN-based SE system

6 Baseline System A nonlinear regression function
finding a mapping function between noisy and clean speech

7 Baseline System Normalization: zero mean and unit variance 对数功率谱 相位
J. Du and Q. Huo, “A speech enhancement approach using piecewise linear approximation of an explicit model of environmental distortions,” in Proc. Interspeech, 2008, pp. 569–572

8 Baseline System Fine-tuning
A nonlinear regression function from noisy speech features to clean speech features Fine-tuning E---mean squared error W---weight parameters b---bias parameters ---the d-th enhanced frequency bins of the log-spectral feature at sample index n --- target frequency bins Update of the weights and bias

9 Improved System Fine-tuning Update of the weights and bias
--- being the noisy log-spectral feature vector where the window size of context is 2*τ+1 E --- mean squared error W --- weight parameters b --- bias parameters --- the d-th enhanced frequency bins of the log-spectral feature at sample index n --- target frequency bins Update of the weights and bias κ --- the weight decay coefficient ω --- is the momentum

10 Improved System Post-processing----Global Variance Equalization
Dropout Training Noise Estimation----Noise aware training(NAT) 2017/4/27

11 Global Variance Equalization
a simple type of histogram equalization The global variance of the estimated clean speech features is defined as: A dimension-independent global variance can be computed as follows: A. D. L. Torre, A. M. Peinado, J. C.Segura,J.L.Perez-Cordoba,M.C. Benitez, and A. J. Rubio, “Histogram equalization of speech representation for robust speech recognition,” IEEE Trans. Speech Audio Process., vol. 13, no. 3, pp. 355–366, May 2005

12 Noise Aware Training[2]
Dropout Training[1] In the DNN training, dropout randomly omits a certain percentage of the neurons in the input and each hidden layer during each presentation of the sample for each training sample, which can be treated as model averaging to avoid the over-fitting problem. Noise Aware Training[2] [1] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R.Salakhutdinov, “Improving neural networks by preventing co-adapta-tion of feature detectors,” Arxiv, 2012 [Online]. [2] Dynamic Noise Aware Training for Speech Enhancement Based on Deep Neural Networks, Yong Xu, Jun Du, Li-Rong Dai and Chin-Hui Lee, to be appeared at Interspeech2014

13 01 02 03 04 Measures SSNR: segmental SNR LSD: log-spectral distortion
PESQ: perceptual evaluation of speech quality 04 STOI: Short-Time Objective Intelligibility [1] J. Du and Q. Huo, “A speech enhancement approach using piecewise linear approximation of an explicit model of environmental distor-tions,” in Proc. Interspeech, 2008, pp. 569–572. [2] ITU-T, Rec. P.862, Perceptual evaluation of speech quality (PESQ):An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs International Telecommu-nication Union-Telecommunication Standardisation Sector, 2001. [3] C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “An algo-rithm for intelligibility prediction of time frequency weighted noisy speech,” IEEE Trans. Audio, Speech, Lang. Process.,Sep

14 Experiment Setup The Depth of DNN The Number of Noise Types

15 Experiment Setup The Size of Training Set
The Length of Acoustic Context

16 Experiment Results Noise Aware Training Global Variance Equalization

17 Experiment Results Unseen Noise GVE 2.00 Dropout&GVE 2.13 Noisy 1.42
LogMMSE 1.83 Dropout 2.06 DNN-baseline 1.87 Dropout&GVE&NAT 2.25 Clean 4.5

18 Experiment Results Non-stationary and Unseen Noise 104-noise DNN 2.78
LogMMSE did not work 104-noise DNN 2.78 4-noise DNN 2.14 Noisy 1.85

19 Experiment Results Changing Noise Environments DNN 2.99 LogMMSE 1.46
Clean 4.50 Noisy 2.05

20 Experiment Results Real-world

21 Experiment Results Overall Evaluation on 15 Unseen Noise Types

22 Experiment Results 32 real-world noisy utterances (22 spoken in English, and others spoken in other languages) 10 persons :five Chinese males and five Chinese females

23 01 02 03 Summary Experiment Setup Input data processing
Change the Deep Model 03 2017/4/27

24 THANK YOU end 2017/4/27


Download ppt "Speech Enhancement based on"

Similar presentations


Ads by Google