Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis 1. EECS Department, Northwestern University 2. Advanced Technology Labs, Adobe Systems Inc. 3. University.

Slides:



Advertisements
Similar presentations
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Advertisements

1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Online PLCA for Real-Time Semi-supervised Source Separation Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis 1. EECS Department, Northwestern University.
Speech Enhancement through Noise Reduction By Yating & Kundan.
Advanced Speech Enhancement in Noisy Environments
G. Valenzise *, L. Gerosa, M. Tagliasacchi *, F. Antonacci *, A. Sarti * IEEE Int. Conf. On Advanced Video and Signal-based Surveillance, 2007 * Dipartimento.
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 664 Final Presentation May 2009 Dr. Radu Balan Department of Mathematics.
Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition John Hershey, Trausti Kristjansson, Zhengyou Zhang, Alex.
Communications & Multimedia Signal Processing Meeting 6 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 6 July,
Communications & Multimedia Signal Processing Report of Work on Formant Tracking LP Models and Plans on Integration with Harmonic Plus Noise Model Qin.
Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 23 November,
Single-Channel Speech Enhancement in Both White and Colored Noise Xin Lei Xiao Li Han Yan June 5, 2002.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
Subband-based Independent Component Analysis Y. Qi, P.S. Krishnaprasad, and S.A. Shamma ECE Department University of Maryland, College Park.
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Communications & Multimedia Signal Processing Formant Track Restoration in Train Noisy Speech Qin Yan Communication & Multimedia Signal Processing Group.
Communications & Multimedia Signal Processing 1 Speech Communication for Mobile and Hands-Free Devices in Noisy Environments EPSRC Project GR/S30238/01.
Speech Recognition in Noise
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement Qin Yan Communication & Multimedia Signal Processing.
Advances in WP1 and WP2 Paris Meeting – 11 febr
SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification
1 Speech Enhancement Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires too much data and is computationally complex. Solution: Create.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.
-1- ICA Based Blind Adaptive MAI Suppression in DS-CDMA Systems Malay Gupta and Balu Santhanam SPCOM Laboratory Department of E.C.E. The University of.
SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and Hakan Erdogan Sabanci University, Istanbul, Turkey  Single-channel.
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Heart Sound Background Noise Removal Haim Appleboim Biomedical Seminar February 2007.
Speech Enhancement Using Spectral Subtraction
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 663 Mid Year Progress Report December 2008 Professor Radu Balan 1.
Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.
Regression Approaches to Voice Quality Control Based on One-to-Many Eigenvoice Conversion Kumi Ohta, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, and.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1.
A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj NIPS 2009.
Signal Processing Algorithms for Wireless Acoustic Sensor Networks Alexander Bertrand Electrical Engineering Department (ESAT) Katholieke Universiteit.
Noise Reduction Two Stage Mel-Warped Weiner Filter Approach.
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Lecture 2: Statistical learning primer for biologists
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Suppression of Musical Noise Artifacts in Audio Noise Reduction by Adaptive 2D filtering Alexey Lukin AES Member Moscow State University, Moscow, Russia.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab Multiplicative Update of AR gains in Codebook- driven Speech.
UNIT-IV. Introduction Speech signal is generated from a system. Generation is via excitation of system. Speech travels through various media. Nature of.
HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH GEORGE P. KAFENTZIS, YANNIS STYLIANOU MULTIMEDIA INFORMATICS LABORATORY DEPARTMENT OF COMPUTER SCIENCE.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
Speech Enhancement Summer 2009
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
Statistical Models for Automatic Speech Recognition
朝陽科技大學 資訊工程系 謝政勳 Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學 資訊工程系 謝政勳
Missing feature theory
Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
SNR-Invariant PLDA Modeling for Robust Speaker Verification
Emad M. Grais Hakan Erdogan
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Speech Enhancement Based on Nonparametric Factor Analysis
Beehive Audio Source Separation
Presentation transcript:

Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis 1. EECS Department, Northwestern University 2. Advanced Technology Labs, Adobe Systems Inc. 3. University of Illinois at Urbana-Champaign Presentation at Interspeech on September 11, ,3 Speech Enhancement by Online Non- negative Spectrogram Decomposition in Non-stationary Noise Environments

Classical Speech Enhancement Typical algorithms a)Spectral subtraction b)Wiener filtering c)Statistical-model- based (e.g. MMSE) d)Subspace algorithms Properties –Do not require clean speech for training (Only pre-learn the noise model) –Online algorithm, good for real-time apps –Cannot deal with non- stationary noise Most of them model noise with a single spectrum Keyboard noise Bird noise 2

Non-negative Spectrogram Decomposition (NSD) Uses a dictionary of basis spectra to model a non-stationary sound source DictionaryActivation weightsSpectrogram of keyboard noise Decomposition criterion: minimize the approximation error (e.g. KL divergence) 3

NSD for Source Separation Noise dict. Speech dict. Noise weights Speech weights Keyboard noise + Speech Speech dict. Speech weights Separated speech 4

Semi-supervised NSD for Speech Enhancement Properties –Capable to deal with non-stationary noise –Does not require clean speech for training (Only pre-learns the noise model) –Offline algorithm Learning the speech dict. requires access to the whole noisy speech Noisy speech Activation weights Noise dict. (trained) Speech dict. Separation Noise dict. Noise-only excerpt Activation weights Training 5

Objective: decompose the current mixture frame Constraint on speech dict.: prevent it overfitting the mixture frame Proposed Online Algorithm Noise weights (weights of previous frames were already calculated) Speech weights Weights of current frame 6 Speech dict. Noise dict. (trained) Weighted buffer frames (constraint) Current frame (objective)

EM Algorithm for Each Frame 7 Frame t Frame t+1 E step: calculate posterior probabilities for latent components M step: a) calculate speech dictionary b) calculate current activation weights

Update Speech Dict. through Prior Each basis spectrum is a discrete/categorical distribution Its conjugate prior is a Dirichlet distribution The old dict. is a exemplar/guide for the new dict. Prior strength M step to calculate the speech basis spectrum: Calculation from decomposing spectrogram (likelihood part) (prior part) 8

Prior Strength Affects Enhancement #iterations Prior determines Likelihood determines Less noise & More distorted speech Better noise reduction & Stronger speech distortion More restricted speech dict. 9

Experiments Non-stationary noise corpus: 10 kinds –Birds, casino, cicadas, computer keyboard, eating chips, frogs, jungle, machine guns, motorcycles and ocean Speech corpus: the NOIZEUS dataset [1] –6 speakers (3 male and 3 female), each 15 seconds Noisy speech –5 SNRs (-10, -5, 0, 5, 10 dB) –All combinations of noise, speaker and SNR generate 300 files –About 300 * 15 seconds = 1.25 hours [1] Loizou, P. (2007), Speech Enhancement: Theory and Practice, CRC Press, Boca Raton: FL. 10

Comparisons with Classical Algorithms KLT: subspace algorithm logMMSE: statistical-model-based MB: spectral subtraction Wiener-as: Wiener filtering better PESQ: an objective speech quality metric, correlates well with human perception SDR: a source separation metric, measures the fidelity of enhanced speech to uncorrupted speech 11

better 12

Examples Spectral subtraction Wiener filtering Statistical- model-based Subspace algorithm Proposed PESQ SDR (dB) Keyboard noise: SNR=0dB Larger value indicates better performance 13

Noise Reduction vs. Speech Distortion BSS_EVAL: broadly used source separation metrics –Signal-to-Distortion Ratio (SDR): measures both noise reduction and speech distortion –Signal-to-Interference Ratio (SIR): measures noise reduction –Signal-to-Artifacts Ratio (SAR): measures speech distortion better 14

Examples SDR SIR SAR Bird noise: SNR=10dB SDR: measures both noise reduction and speech distortion SIR: measures noise reduction SAR: measures speech distortion Larger value indicates better performance 15

Conclusions A novel algorithm for speech enhancement –Online algorithm, good for real-time applications –Does not require clean speech for training (Only pre-learns the noise model) –Deals with non-stationary noise Updates speech dictionary through Dirichlet prior –Prior strength controls the tradeoff between noise reduction and speech distortion Classical algorithms Semi-supervised non- negative spectrogram decomposition algorithm 16

Complexity and Latency 18

Parameters 19

Buffer Frames They are used to constrain the speech dictionary –Not too many or too old –We use 60 most recent frames (about 1 second long) –They should contain speech signals How to judge if a mixture frame contains speech or not (Voice Activity Detection)? 20

Voice Activity Detection (VAD) Decompose the mixture frame only using the noise dictionary –If reconstruction error is large Probably contains speech This frame goes to the buffer Semi-supervised separation (the proposed algorithm) –If reconstruction error is small Probably no speech This frame does not go to the buffer Supervised separation 21 Noise dict. (trained) Speech dict. (up-to-date) Noise dict. (trained)