Speech Enhancement based on

Name: Speech Enhancement based on
Uploaded: 2017-10-04T10:47:37+00:00
Duration: PTM11S48
Channel: Gregory Holland
Description: Speech Enhancement based on

Speech Enhancement based on
Deep Learning Jiawen Wu 2017/4/27

Background Task：Model the mapping relationship between the noisy and clean speech signals 1989，Tamura [1] -- Time domain 1994，Xie and Van [2] -- Frequency domain 2006，Hinton [3] -- RMB Since 2006 -- Classification Task[4-5], Auto-encoder[6] 规模小，结构简单，训练样本少，没有可靠的初始化方案，容易陷入局部最优 [1] S. I. Tamura, “An analysis of a noise reduction neural network,” in Proc. ICASSP, 1989, pp. 2001–2004. [2] F. Xie and D. V. Compernolle, “A family of MLP based nonlinear spectral estimators for noise reduction,” in Proc. ICASSP, 1994, pp. 53–56. [3] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507,2006. [4] Y.X.WangandD.L.Wang,“Towardsscalingupclassiﬁcation-based speech separation,” IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 7, pp. 1381–1390, Jul [5] E.W.Healy,S.E.Yoho,Y.X.Wang,andD.L.Wang,“Analgorithm to improve speech recognition in noise for hearing-impaired listeners,” J.Acoust.Soc.Amer., vol. 134, no. 4, pp. 3029–3038,2013. [6] X.-G.Lu,Y.Tsao,S.Matsuda,andC.Hori,“Speechenhancement based on deep denoising Auto-Encoder,” in Proc. Interspeech, 2013, pp. 436–440.

Yong Xu University of Science and Technology of China
个人主页： Xu Y, Du J, Dai L R, et al. An experimental study on speech enhancement based on deep neural networks[J]. Signal Processing Letters, IEEE, 2014, 21(1): cited:81 Xu Y, Du J, Dai L R, et al. A regression approach to speech enhancement based on deep neural networks[J]. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 2015, 23(1): cited:20 cited up to

Contributions Nonlinear regression-based framework using DNNs
A large amount of training data hours, and more than 100 noise types Context information Unseen noise and non-stationary noise

DNN-based SE system

Baseline System A nonlinear regression function
finding a mapping function between noisy and clean speech

Baseline System Normalization： zero mean and unit variance 对数功率谱相位
J. Du and Q. Huo, “A speech enhancement approach using piecewise linear approximation of an explicit model of environmental distortions,” in Proc. Interspeech, 2008, pp. 569–572

Baseline System Fine-tuning
A nonlinear regression function from noisy speech features to clean speech features Fine-tuning E---mean squared error W---weight parameters b---bias parameters ---the d-th enhanced frequency bins of the log-spectral feature at sample index n --- target frequency bins Update of the weights and bias

Improved System Fine-tuning Update of the weights and bias
--- being the noisy log-spectral feature vector where the window size of context is 2*τ+1 E --- mean squared error W --- weight parameters b --- bias parameters --- the d-th enhanced frequency bins of the log-spectral feature at sample index n --- target frequency bins Update of the weights and bias κ --- the weight decay coefﬁcient ω --- is the momentum

Improved System Post-processing----Global Variance Equalization
Dropout Training Noise Estimation----Noise aware training(NAT) 2017/4/27

Global Variance Equalization
a simple type of histogram equalization The global variance of the estimated clean speech features is deﬁned as: A dimension-independent global variance can be computed as follows: A. D. L. Torre, A. M. Peinado, J. C.Segura,J.L.Perez-Cordoba,M.C. Benitez, and A. J. Rubio, “Histogram equalization of speech representation for robust speech recognition,” IEEE Trans. Speech Audio Process., vol. 13, no. 3, pp. 355–366, May 2005

Noise Aware Training[2]
Dropout Training[1] In the DNN training, dropout randomly omits a certain percentage of the neurons in the input and each hidden layer during each presentation of the sample for each training sample, which can be treated as model averaging to avoid the over-ﬁtting problem. Noise Aware Training[2] [1] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R.Salakhutdinov, “Improving neural networks by preventing co-adapta-tion of feature detectors,” Arxiv, 2012 [Online]. [2] Dynamic Noise Aware Training for Speech Enhancement Based on Deep Neural Networks, Yong Xu, Jun Du, Li-Rong Dai and Chin-Hui Lee, to be appeared at Interspeech2014

01 02 03 04 Measures SSNR: segmental SNR LSD: log-spectral distortion
PESQ: perceptual evaluation of speech quality 04 STOI: Short-Time Objective Intelligibility [1] J. Du and Q. Huo, “A speech enhancement approach using piecewise linear approximation of an explicit model of environmental distor-tions,” in Proc. Interspeech, 2008, pp. 569–572. [2] ITU-T, Rec. P.862, Perceptual evaluation of speech quality (PESQ):An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs International Telecommu-nication Union-Telecommunication Standardisation Sector, 2001. [3] C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “An algo-rithm for intelligibility prediction of time frequency weighted noisy speech,” IEEE Trans. Audio, Speech, Lang. Process.,Sep

Experiment Setup The Depth of DNN The Number of Noise Types

Experiment Setup The Size of Training Set
The Length of Acoustic Context

Experiment Results Noise Aware Training Global Variance Equalization

Experiment Results Unseen Noise GVE 2.00 Dropout&GVE 2.13 Noisy 1.42
LogMMSE 1.83 Dropout 2.06 DNN-baseline 1.87 Dropout&GVE&NAT 2.25 Clean 4.5

Experiment Results Non-stationary and Unseen Noise 104-noise DNN 2.78
LogMMSE did not work 104-noise DNN 2.78 4-noise DNN 2.14 Noisy 1.85

Experiment Results Changing Noise Environments DNN 2.99 LogMMSE 1.46
Clean 4.50 Noisy 2.05

Experiment Results Real-world

Experiment Results Overall Evaluation on 15 Unseen Noise Types

Experiment Results 32 real-world noisy utterances (22 spoken in English, and others spoken in other languages) 10 persons :five Chinese males and five Chinese females

01 02 03 Summary Experiment Setup Input data processing
Change the Deep Model 03 2017/4/27

THANK YOU end 2017/4/27

Speech Enhancement based on

Similar presentations

Presentation on theme: "Speech Enhancement based on"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speech Enhancement based on

Similar presentations

Presentation on theme: "Speech Enhancement based on"— Presentation transcript:

Similar presentations

About project

Feedback