Speech Enhancement Based on Nonparametric Factor Analysis

Slides:



Advertisements
Similar presentations
Slide 1 Bayesian Model Fusion: Large-Scale Performance Modeling of Analog and Mixed- Signal Circuits by Reusing Early-Stage Data Fa Wang*, Wangyang Zhang*,
Advertisements

Advanced Speech Enhancement in Noisy Environments
A Practical Guide to Troubleshooting LMS Filter Adaptation Prepared by Charles H. Sobey, Chief Scientist ChannelScience.com June 30, 2000.
Ilias Theodorakopoulos PhD Candidate
Nonparametric-Bayesian approach for automatic generation of subword units- Initial study Amir Harati Institute for Signal and Information Processing Temple.
1 Removing Camera Shake from a Single Photograph Rob Fergus, Barun Singh, Aaron Hertzmann, Sam T. Roweis and William T. Freeman ACM SIGGRAPH 2006, Boston,
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis 1. EECS Department, Northwestern University 2. Advanced Technology Labs, Adobe Systems Inc. 3. University.
Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 664 Final Presentation May 2009 Dr. Radu Balan Department of Mathematics.
Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition John Hershey, Trausti Kristjansson, Zhengyou Zhang, Alex.
Communications & Multimedia Signal Processing Meeting 6 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 6 July,
Communications & Multimedia Signal Processing Report of Work on Formant Tracking LP Models and Plans on Integration with Harmonic Plus Noise Model Qin.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Single-Channel Speech Enhancement in Both White and Colored Noise Xin Lei Xiao Li Han Yan June 5, 2002.
Bayesian Methods for Speech Enhancement I. Andrianakis P. R. White Signal Processing and Control Group Institute of Sound and Vibration Research University.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Novel approach to nonlinear/non- Gaussian Bayesian state estimation N.J Gordon, D.J. Salmond and A.F.M. Smith Presenter: Tri Tran
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
Despeckle Filtering in Medical Ultrasound Imaging
Sparsity-Aware Adaptive Algorithms Based on Alternating Optimization and Shrinkage Rodrigo C. de Lamare* + and Raimundo Sampaio-Neto * + Communications.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
ACCURATE TELEMONITORING OF PARKINSON’S DISEASE SYMPTOM SEVERITY USING SPEECH SIGNALS Schematic representation of the UPDRS estimation process Athanasios.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires too much data and is computationally complex. Solution: Create.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
Speech Enhancement Using Noise Estimation Based on
Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization.
SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and Hakan Erdogan Sabanci University, Istanbul, Turkey  Single-channel.
Compressed Sensing Based UWB System Peng Zhang Wireless Networking System Lab WiNSys.
Fast Max–Margin Matrix Factorization with Data Augmentation Minjie Xu, Jun Zhu & Bo Zhang Tsinghua University.
Speech Enhancement Using Spectral Subtraction
November 1, 2012 Presented by Marwan M. Alkhweldi Co-authors Natalia A. Schmid and Matthew C. Valenti Distributed Estimation of a Parametric Field Using.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 663 Mid Year Progress Report December 2008 Professor Radu Balan 1.
Image Restoration using Iterative Wiener Filter --- ECE533 Project Report Jing Liu, Yan Wu.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
14 October, 2010LRI Seminar 2010 (Univ. Paris-Sud)1 Statistical performance analysis by loopy belief propagation in probabilistic image processing Kazuyuki.
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
S.Patil, S. Srinivasan, S. Prasad, R. Irwin, G. Lazarou and J. Picone Intelligent Electronic Systems Center for Advanced Vehicular Systems Mississippi.
Zhilin Zhang, Bhaskar D. Rao University of California, San Diego March 28,
Query by Singing and Humming System
Statistical Signal Processing Research Laboratory(SSPRL) UT Acoustic Laboratory(UTAL) A TWO-STAGE DATA-DRIVEN SINGLE MICROPHONE SPEECH ENHANCEMENT WITH.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Impulsive Noise at Wireless Receivers
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
Statistical Signal Processing Research Laboratory(SSPRL) UT Acoustic Laboratory(UTAL) SINGLE CHANNEL SPEECH ENHANCEMENT TECHNIQUE FOR LOW SNR QUASI-PERIODIC.
Speech Enhancement based on
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab Multiplicative Update of AR gains in Codebook- driven Speech.
RECONSTRUCTION OF MULTI- SPECTRAL IMAGES USING MAP Gaurav.
HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH GEORGE P. KAFENTZIS, YANNIS STYLIANOU MULTIMEDIA INFORMATICS LABORATORY DEPARTMENT OF COMPUTER SCIENCE.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
Bayesian fMRI analysis with Spatial Basis Function Priors
Speech Enhancement Summer 2009
Feature Mapping FOR SPEAKER Diarization IN NOisy conditions
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
Outlier Processing via L1-Principal Subspaces
3. Applications to Speaker Verification
Filtering and State Estimation: Basic Concepts
EE513 Audio Signals and Systems
Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Speech Enhancement Based on Nonparametric Factor Analysis Lin Li1, Jiawen Wu1, Xinghao Ding1, Qingyang Hong1, Delu Zeng2 1School of Information Science and Technology, Xiamen University, China 2School of Mathematics, South China University of Technology, China Reporter: Jiawen Wu 10/11/2016

Speech Enhancement Based on Non- parametric Factor Analysis Background of the Research The Proposed Method Experiment Setup Experiment Results Outline

Background SS[Boll79] Subspace[Moor93] MMSE NPS[Cohen03] Spectral Subtraction Subspace[Moor93] Speech Enhancement MMSE NPS[Cohen03] Minimum Mean-square Error Algorithm Using a Non-causal Priori SNR MMSE MAP[Paliwal12] maximum a posterior estimator of magnitude-squared spectrum Sparse Representation K-SVD: K-singular value decomposition[Zhao11] CLSMD: constrained low-rank and sparse matrix decomposition[Sun14] Wiener Filtering[Scalart96] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 27, no. 2, pp. 113–120, 1979. B. De Moor, “The singular value decomposition and long and short spaces of noisy matrices,” Signal Processing, IEEE Trans-actions on, vol. 41, no. 9, pp. 2826–2838, 1993. I. Cohen, “Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging,” Speech and Audio Processing, IEEE Transactions on, vol. 11, no. 5, pp. 466–475, 2003. K. Paliwal, B. Schwerin et al., “Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator,” Speech Communication, vol. 54, no. 2, pp. 282–305, 2012. P. Scalart et al., “Speech enhancement based on a priori signal to noise estimation,” ICASSP1996. pp. 629–632. N. Zhao, X. Xu, and Y. Yang, “Sparse representations for speech enhancement,” Chinese Journal of Electronics, vol. 19, no. 2, pp. 268–272, 2011. C. Sun, Q. Zhu, and M. Wan, “A novel speech enhancement method based on constrained low-rank and sparse matrix decomposition,” Speech Communication, vol. 60, pp. 44–55, 2014.

A sparse representation framework The Proposed Method A sparse representation framework with a nonparametric dictionary learning model based on beta process factor analysis

Contributions 1 2 3 Nonparametric The noise variance is not required The average sparsity level of the representation and the dictionary size could be learned by using a beta process. 1 The noise variance is not required The noise variance can be inferred automatically after analytical posterior calculation. 2 An in situ training process An in situ way of speech processing is provided, in which we do not have to train the dictionary beforehand. 3

Problem formulation

K-SVD[1] Sparsity Level L threshold σ   threshold σ Sparsity Level L [1] N. Zhao, X. Xu, and Y. Yang, “Sparse representations for speech enhancement,” Chinese Journal of Electronics, vol. 19, no. 2, pp. 268–272, 2011.

Architecture Prior: Posterior: (1) (2) (3) (4) (5) (6) Via variational Bayesian [Paisley09] or Gibbs-sampling analysis,a full posterior density function can be inferred for the update of D and α, accompanied with all other model parameters. (7) (8) J. Paisley and L. Carin, “Nonparametric factor analysis with beta process priors,” in Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 2009, pp. 777–784.

Initial dictionary size Parameters of the beta distribution Setup(1)---Parameter Database Standard NOIZEUS database [Loizou13] Noise type White, Train, Street Noise level 0dB, 5dB, 10dB and 15dB Frame size 128 point Increase step 1 point Initial dictionary size 512 Hyper-parameters c0 = d0 = e0 = f0 = 106 Parameters of the beta distribution a0 = 1;b0 = P/9 Quality evaluation SNR and SegSNR [Hu07] PESQ (Perceptual Evaluation of Speech Quality) [P01] P. C. Loizou, Speech enhancement: theory and practice. CRC press, 2013. Y. Hu and P. C. Loizou, “Subjective comparison and evaluation of speech enhancement algorithms,” Speech communication, vol. 49, no. 7, pp. 588–601, 2007. P. Recommendation, “862: Perceptual evaluation of speech quality (pesq): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs,” Feb, vol. 14, pp. 14–0, 2001.

The output SNR values vs iteration with different b0. Setup(2) Iteration=100 The output SNR values vs iteration with different b0. Input speech: the text “the birch canoe slid on the smooth planks”, corrupted with the street noise at 0dB The posterior: P is the frame number of the input speech

Setup(2) An extra handling No change b0=N/9 Yes No Whether the output SNR declines for ten times continuously? Yes No It remains a great challenge to be further investigated, since the output SNR is unavailable in practical applications. changed b0 to a larger number e.g., 1000×P No change

Results Comparison with K-SVD (a) PESQ (b) SegSNR Noise type: Gaussian white noise SegSNR / PESQ: Mean values calculated using the 30 utterances at each input SNR Match: The noise variance estimation for K-SVD matches the ground truth. Mismatch: The noise variance estimation for K-SVD doesn’t matches the ground truth.

Results Statistics of nonparametric dictionary learning (a)sorted final probabilities of dictionary elements (πks); (b)distribution of the number of elements used per frame. Input utterances: “we talked of the sideshow in the circus” (“sp19.wav” ) with input SNR at 0dB

Results

Thanks!