Wavelet-Based Speech Enhancement Mahdi Amiri April 2003 Sharif University of Technology Course Project Presentation 1
Page 1 of 43Wavelet-Based Speech Enhancement Presentation Outline Motivation and Goals Wavelet Transform - Overview Basic Denoising in Wavelet Domain Literature Survey Implementation and Results Conclusions and Future Works
Page 2 of 43Wavelet-Based Speech Enhancement Motivation and Goals Key Applications Improving perceptual quality of speech –Reduce listener’s fatigue –Hearing aids Improving performance of –Speech coders –Voice recognition systems
Page 3 of 43Wavelet-Based Speech Enhancement Motivation and Goals Goals of SE in Wavelet Domain Variable window size for different frequency components – –Long time intervals precise low frequency info. – –Short time intervals precise high frequency info. Easy to implement – –Fast WT computation complexity: O(n) – –FFT computation complexity: O(nlog 2 n) Denoising by simple thresholding – –Real-time implementation
Page 4 of 43Wavelet-Based Speech Enhancement Motivation and Goals Wavelet Transform - Overview Basic Denoising in Wavelet Domain Literature Survey Implementation and Results Conclusions and Future Works Wavelet Transform - Overview
Page 5 of 43Wavelet-Based Speech Enhancement Wavelet Transform - Overview History Fourier (1807) Haar (1910) Math World
Page 6 of 43Wavelet-Based Speech Enhancement Wavelet Transform - Overview What kind of Could be useful? –Impulse Function (Haar): Best time resolution –Sinusoids (Fourier): Best frequency resolution –We want both of the best resolutions Heisenberg (1930) – –Uncertainty Principle There is a lower bound for (An intuitive prove in [Mac91])
Page 7 of 43Wavelet-Based Speech Enhancement Wavelet Transform - Overview Gabor (1945) – –Short Time Fourier Transform (STFT) Disadvantage: Fixed window size
Page 8 of 43Wavelet-Based Speech Enhancement Wavelet Transform - Overview Constructing Wavelets – –Daubechies (1988) Compactly Supported Wavelets Computation of WT Coefficients – –Mallat (1989) A fast algorithm using filter banks
Page 9 of 43Wavelet-Based Speech Enhancement Wavelet Transform - Overview Coarse version (Approximation) more useful than the Detail Browsing image databases on the web Signal transmission for communication Denoising Wavelet Tree Decomposition Wavelet Transform (WT) Undecimated WT (UWT) We may lose what is in the Detail Multiresolution Signal Representation
Page 10 of 43Wavelet-Based Speech Enhancement Wavelet Transform - Overview Full Tree Decomposition Wavelet Packet Transform (WPT) Undecimated WPT (UWPT) S = A1+D1 or S = A1+AD2+DD2 or … Which decomposition path could be the best choice? The answer leads us to the Best Basis
Page 11 of 43Wavelet-Based Speech Enhancement Wavelet Transform - Overview Cut if: Entropy – –Coifman, Meyer, Wickerhauser (1992) Rate-Distortion: – –Vetterli (1995) Best Basis Selection Criterions
Page 12 of 43Wavelet-Based Speech Enhancement Motivation and Goals Wavelet Transform - Overview Basic Denoising in Wavelet Domain Literature Survey Implementation and Results Conclusions and Future Works Basic Denoising in Wavelet Domain
Page 13 of 43Wavelet-Based Speech Enhancement Basic Denoising in Wavelet Domain Only a few coefficients in the lower bands could be used for approximating the main features of the clean signal. Hence, by setting the smaller coefficients to zero, we can nearly optimally eliminate noise while preserving the important information of clean signal. Principle
Page 14 of 43Wavelet-Based Speech Enhancement Basic Denoising in Wavelet Domain Clean signal Noise signal Noisy signal Notation Wavelet domainTime domain
Page 15 of 43Wavelet-Based Speech Enhancement Basic Denoising in Wavelet Domain Framing input noisy signal Forward WT of a frame Thresholding (detail) wavelet coefficients Inverse WT Keep center part of the frame Repeat for all of the frames Algorithm
Page 16 of 43Wavelet-Based Speech Enhancement Basic Denoising in Wavelet Domain Threshold Value VisuShrink [DonJ94b] Threshold Estimation of Noise varianceFrame length For Gaussian white noise: MAD: Median Absolute Difference Another definition ( wden.m ):
Page 17 of 43Wavelet-Based Speech Enhancement Basic Denoising in Wavelet Domain Threshold Value Threshold in the WPT case For the correlated noise situation: Use level dependent threshold (SureShrink [DonJ94b])
Page 18 of 43Wavelet-Based Speech Enhancement Basic Denoising in Wavelet Domain How to Threshold Hard ThresholdingSoft Thresholding Alteration of valuesComparison:Discontinuity
Page 19 of 43Wavelet-Based Speech Enhancement Motivation and Goals Wavelet Transform - Overview Basic Denoising in Wavelet Domain Literature Survey Implementation and Results Conclusions and Future Works Literature Survey
Page 20 of 43Wavelet-Based Speech Enhancement Literature Survey Title: – –Speech enhancement with reduction of noise components in the wavelet domain Novelty: – –Semisoft thresholding [GaoB95] – –Classification of unvoiced region in WD – –Different thresholding for unvoiced region [SeoB97], Novelty
Page 21 of 43Wavelet-Based Speech Enhancement Literature Survey Semisoft Thresholding: [GaoB95] – –Less sensitivity to small perturbations in the data – –Smaller bias [SeoB97], Thresholding HardSoftSemisoft Like [DonJ94b]
Page 22 of 43Wavelet-Based Speech Enhancement Literature Survey Separation of unvoiced region – –Use DWT for finding – –Calculate average energy of each subband – –Current speech segment is unvoiced if: [SeoB97], Unvoiced Regions
Page 23 of 43Wavelet-Based Speech Enhancement Literature Survey If unvoiced then threshold just highest frequency band Implementation results – –Additive white Gaussian noise – –SNR (-10dB 10 dB) – –“Should we chase those cowboys?” [SeoB97], Implementations SNR (dB) NoisyEnhanced
Page 24 of 43Wavelet-Based Speech Enhancement Literature Survey [SooKY97], Novelty Title: Wavelet for speech denoising Novelty: – –Evaluation of different wavelets and different orders (db1-10, coif1-5, sym2-8, bior ) – –Spectral Subtraction in WD – –Wiener Filtering in WD (Uses two methods for estimating the a priori SNR) Maximum Likelihood approach Decision Directed approach
Page 25 of 43Wavelet-Based Speech Enhancement Literature Survey [SooKY97], Thresholding 1 Use DWT and find L levels of decomposition ifthen else 1. Spectral Subtraction (SS) in WD Expected value of the noise magnitude, could be estimated from silence frames Use similar scheme for Denoised value
Page 26 of 43Wavelet-Based Speech Enhancement Literature Survey [SooKY97], Thresholding 2 2. Wiener Filtering in WD is the a priori SNR Estimating a. Maximum Likelihood b. Decision Directed [0, 1], Typ. 0.9
Page 27 of 43Wavelet-Based Speech Enhancement Literature Survey Implementation results – –White Gaussian noise – –Both male and female voices – –10 levels of decomposition [SooKY97], Implementations SNR: 5dB, L: 10 WaveTypeMethod 1 (dB)Method 2b (dB) bior bior Sym
Page 28 of 43Wavelet-Based Speech Enhancement Literature Survey The methods are not particularly sensitive to the various wavelet types with the exception of Bior3.1 Wiener filtered speeches have better SNR values than Magnitude subtraction For Wiener filtering, the decision directed approach gives better SNR values than the maximum likelihood approach [SooKY97], Conclusions
Page 29 of 43Wavelet-Based Speech Enhancement Literature Survey [KimYK01], Novelty Title: – –Speech enhancement using adaptive wavelet shrinkage Novelty: – –Adaptive threshold value Threshold value will depend on the variance of estimated clean signal (BayesShrink) – –Classification of unvoiced region using entropy Applies smaller threshold for unvoiced region and calls the method as “Adaptive BayesShrink”
Page 30 of 43Wavelet-Based Speech Enhancement Literature Survey [KimYK01], Threshold Value BayesShrink: Adaptive threshold value for minimizing the Bayesian risk is Thus, finds the estimated threshold value as Where[ChaYV00a]
Page 31 of 43Wavelet-Based Speech Enhancement Literature Survey Current region is unvoiced if Unvoiced region has smaller energy, so apply a smaller threshold: [KimYK01], Unvoiced Regions are selected by simulation There was no comment about type of entropy, it could be as:
Page 32 of 43Wavelet-Based Speech Enhancement Literature Survey Implementation results: – –Additive white Gaussian noise – –SNR: 0db, 10dB and 20dB [KimYK01], Implementations VisuShrinkBayesShrinkAdaptive BayesShrink 0 dB dB dB dB 10 dB dB dB dB 20 dB dB dB dB
Page 33 of 43Wavelet-Based Speech Enhancement Literature Survey [ChaKYK02], Novelty Title: Speech enhancement for non-stationary noise environment by adaptive wavelet packet Novelty: – –Node dependent thresholding for adaptation in colored or non-stationary noise – –Noise estimation based on spectral entropy not MAD – –Modified hard thresholding to alleviate time- frequency discontinuities
Page 34 of 43Wavelet-Based Speech Enhancement Literature Survey [ChaKYK02], Threshold Value Create WPT and find best basis tree’s leaf nodes Node dependent thresholding Noise estimation could be like: or the following proposed method
Page 35 of 43Wavelet-Based Speech Enhancement Literature Survey [ChaKYK02], Noise Estimation Estimate spectral pdf of wavelet packet coefficients through B bins histogram Calculate normalized spectral entropy for each node in adapted wavelet packet tree
Page 36 of 43Wavelet-Based Speech Enhancement Literature Survey [ChaKYK02], Noise Estimation (cont.) Estimate spectral magnitude intensity by histogram Define an auxiliary threshold Estimate standard deviation of noise node_length bins of C. magnitudes # of C. with magnitude equal to or greater than bin’s amplitude
Page 37 of 43Wavelet-Based Speech Enhancement Literature Survey [ChaKYK02], Noise Estimation (cont.) Greater disorder of wavelet coefficients (less voiced, more unvoiced) More uniform spectral pdf Bigger values for entropy (0 1) Bigger value for alpha Smaller # of bins bigger than alpha Smaller estimation for standard deviation of noise
Page 38 of 43Wavelet-Based Speech Enhancement Literature Survey [ChaKYK02], Thresholding Modified Hard Thresholding
Page 39 of 43Wavelet-Based Speech Enhancement Literature Survey Implementation results: – –Pink noise, SNR: -5db ~ 15 dB [ChaKYK02], Implementations Noisy Speech SNR (dB) Level Dep. with MAD Node Dep. with MAD Node Dep. with Proposed Spectral Subtraction Subjective tests were in favor of the level dependent thresholding but not every time! Anyway, the proposed method has better spectral performance (spectrogram)
Page 40 of 43Wavelet-Based Speech Enhancement Literature Survey – –SNR (dB) test for various noisy speech: “We like bleu cheese but Victor prefers swiss cheese.” (SNR= 10dB) [ChaKYK02], Implementations (cont.) Noise typeLevel Dep. with MAD Node Dep. with Proposed Spectral Subtraction White Pink F Car Babble
Page 41 of 43Wavelet-Based Speech Enhancement Literature Survey To be continued… Thank You. …
Page 42 of 43Wavelet-Based Speech Enhancement References (1 of 2) [ChaKYK02]S. Chang, Y. Kwon, S. I. Yang, and I. J. Kim, “Speech enhancement for non-stationary noise environment by adaptive wavelet packet,” Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-2002, Vol. 1, pp , [ChaYV00a]S. G. Chang, B. Yu, and M. Vetterli, “Adaptive Wavelet Thresholding for Image Denoising and Compression,” IEEE Transaction on Image Processing, Vol. 9, No. 9, pp , Sep [DonJ94b]D. L. Donoho and I. M. Johnstone, “Threshold selection for wavelet shrinkage of noisy data,” Proceedings of the 16th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Engineering Advances: New Opportunities for Biomedical Engineers, Vol. 1, pp. A24- A25, Nov [GaoB95]H. Y. Gao and A. G. Bruce, “WaveShrink with Semisoft Shrinkage,” Research Report No. 39, StatSci Division of MathSoft, Inc., [KimYK01]I. J. Kim, S. I. Yang and Y. Kwon, “Speech enhancement using adaptive wavelet shrinkage,” Proceedings of IEEE International Symposium on Industrial Electronics, ISIE-2001, Vol. 1, pp , 2001.
Page 43 of 43Wavelet-Based Speech Enhancement References (2 of 2) [SeoB97]J. W. Seok and K. S. Bae, “Speech enhancement with reduction of noise components in the wavelet domain,” IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-97, Vol. 2, pp , Apr [SooKY97]I. Y. Soon, S. N. Koh and C. K. Yeo, “Wavelet for speech denoising,” Proceedings of IEEE Region 10 Annual Conference on Speech and Image Technologies for Computing and Telecommunications, TENCON-97, Vol. 2, pp , Dec
Wavelet-Based Speech Enhancement Thank You Course Project Presentation 1 FIND OUT MORE AT