Wavelet-Based Speech Enhancement Mahdi Amiri April 2003 Sharif University of Technology Course Project Presentation 1.

Wavelet-Based Speech Enhancement Mahdi Amiri April 2003 Sharif University of Technology Course Project Presentation 1

of 43Wavelet-Based Speech Enhancement Presentation Outline Motivation and Goals Wavelet Transform - Overview Basic Denoising in Wavelet Domain Literature Survey Implementation and Results Conclusions and Future Works

of 43Wavelet-Based Speech Enhancement Motivation and Goals Key Applications Improving perceptual quality of speech –Reduce listener’s fatigue –Hearing aids Improving performance of –Speech coders –Voice recognition systems

of 43Wavelet-Based Speech Enhancement Motivation and Goals Goals of SE in Wavelet Domain Variable window size for different frequency components – –Long time intervals  precise low frequency info. – –Short time intervals  precise high frequency info. Easy to implement – –Fast WT computation complexity: O(n) – –FFT computation complexity: O(nlog 2 n) Denoising by simple thresholding – –Real-time implementation

of 43Wavelet-Based Speech Enhancement Motivation and Goals Wavelet Transform - Overview Basic Denoising in Wavelet Domain Literature Survey Implementation and Results Conclusions and Future Works Wavelet Transform - Overview

of 43Wavelet-Based Speech Enhancement Wavelet Transform - Overview History Fourier (1807) Haar (1910) Math World

of 43Wavelet-Based Speech Enhancement Wavelet Transform - Overview What kind of Could be useful? –Impulse Function (Haar): Best time resolution –Sinusoids (Fourier): Best frequency resolution –We want both of the best resolutions Heisenberg (1930) – –Uncertainty Principle There is a lower bound for (An intuitive prove in [Mac91])

of 43Wavelet-Based Speech Enhancement Wavelet Transform - Overview Gabor (1945) – –Short Time Fourier Transform (STFT) Disadvantage: Fixed window size

of 43Wavelet-Based Speech Enhancement Wavelet Transform - Overview Constructing Wavelets – –Daubechies (1988) Compactly Supported Wavelets Computation of WT Coefficients – –Mallat (1989) A fast algorithm using filter banks

of 43Wavelet-Based Speech Enhancement Wavelet Transform - Overview Coarse version (Approximation) more useful than the Detail Browsing image databases on the web Signal transmission for communication Denoising Wavelet Tree Decomposition Wavelet Transform (WT) Undecimated WT (UWT) We may lose what is in the Detail Multiresolution Signal Representation

of 43Wavelet-Based Speech Enhancement Wavelet Transform - Overview Full Tree Decomposition Wavelet Packet Transform (WPT) Undecimated WPT (UWPT) S = A1+D1 or S = A1+AD2+DD2 or … Which decomposition path could be the best choice? The answer leads us to the Best Basis

of 43Wavelet-Based Speech Enhancement Wavelet Transform - Overview Cut if: Entropy – –Coifman, Meyer, Wickerhauser (1992) Rate-Distortion: – –Vetterli (1995) Best Basis Selection Criterions

of 43Wavelet-Based Speech Enhancement Motivation and Goals Wavelet Transform - Overview Basic Denoising in Wavelet Domain Literature Survey Implementation and Results Conclusions and Future Works Basic Denoising in Wavelet Domain

of 43Wavelet-Based Speech Enhancement Basic Denoising in Wavelet Domain Only a few coefficients in the lower bands could be used for approximating the main features of the clean signal. Hence, by setting the smaller coefficients to zero, we can nearly optimally eliminate noise while preserving the important information of clean signal. Principle

of 43Wavelet-Based Speech Enhancement Basic Denoising in Wavelet Domain Clean signal Noise signal Noisy signal Notation Wavelet domainTime domain 

of 43Wavelet-Based Speech Enhancement Basic Denoising in Wavelet Domain 1. 1. Framing input noisy signal 2. 2. Forward WT of a frame 3. 3. Thresholding (detail) wavelet coefficients 4. 4. Inverse WT 5. 5. Keep center part of the frame 6. 6. Repeat for all of the frames Algorithm

of 43Wavelet-Based Speech Enhancement Basic Denoising in Wavelet Domain Threshold Value VisuShrink [DonJ94b] Threshold Estimation of Noise varianceFrame length For Gaussian white noise: MAD: Median Absolute Difference Another definition ( wden.m ):

of 43Wavelet-Based Speech Enhancement Basic Denoising in Wavelet Domain Threshold Value Threshold in the WPT case For the correlated noise situation: Use level dependent threshold (SureShrink [DonJ94b])

of 43Wavelet-Based Speech Enhancement Basic Denoising in Wavelet Domain How to Threshold Hard ThresholdingSoft Thresholding Alteration of valuesComparison:Discontinuity

of 43Wavelet-Based Speech Enhancement Motivation and Goals Wavelet Transform - Overview Basic Denoising in Wavelet Domain Literature Survey Implementation and Results Conclusions and Future Works Literature Survey

of 43Wavelet-Based Speech Enhancement Literature Survey Title: – –Speech enhancement with reduction of noise components in the wavelet domain Novelty: – –Semisoft thresholding [GaoB95] – –Classification of unvoiced region in WD – –Different thresholding for unvoiced region [SeoB97], Novelty

of 43Wavelet-Based Speech Enhancement Literature Survey Semisoft Thresholding: [GaoB95] – –Less sensitivity to small perturbations in the data – –Smaller bias [SeoB97], Thresholding HardSoftSemisoft Like [DonJ94b]

of 43Wavelet-Based Speech Enhancement Literature Survey Separation of unvoiced region – –Use DWT for finding – –Calculate average energy of each subband – –Current speech segment is unvoiced if: 1. 1. 2. 2. [SeoB97], Unvoiced Regions

of 43Wavelet-Based Speech Enhancement Literature Survey If unvoiced then threshold just highest frequency band Implementation results – –Additive white Gaussian noise – –SNR (-10dB  10 dB) – –“Should we chase those cowboys?” [SeoB97], Implementations SNR (dB) NoisyEnhanced -100.93 -53.42 07.12 511.34 1013.92

of 43Wavelet-Based Speech Enhancement Literature Survey [SooKY97], Novelty Title: Wavelet for speech denoising Novelty: – –Evaluation of different wavelets and different orders (db1-10, coif1-5, sym2-8, bior1.3-6.8) – –Spectral Subtraction in WD – –Wiener Filtering in WD (Uses two methods for estimating the a priori SNR) Maximum Likelihood approach Decision Directed approach

of 43Wavelet-Based Speech Enhancement Literature Survey [SooKY97], Thresholding 1 Use DWT and find L levels of decomposition ifthen else 1. Spectral Subtraction (SS) in WD Expected value of the noise magnitude, could be estimated from silence frames Use similar scheme for Denoised value 

of 43Wavelet-Based Speech Enhancement Literature Survey [SooKY97], Thresholding 2 2. Wiener Filtering in WD is the a priori SNR Estimating a. Maximum Likelihood b. Decision Directed [0, 1], Typ. 0.9

of 43Wavelet-Based Speech Enhancement Literature Survey Implementation results – –White Gaussian noise – –Both male and female voices – –10 levels of decomposition [SooKY97], Implementations SNR: 5dB, L: 10 WaveTypeMethod 1 (dB)Method 2b (dB) bior3.16.5691.764 bior4.419.52321.981 Sym819.75122.215

of 43Wavelet-Based Speech Enhancement Literature Survey The methods are not particularly sensitive to the various wavelet types with the exception of Bior3.1 Wiener filtered speeches have better SNR values than Magnitude subtraction For Wiener filtering, the decision directed approach gives better SNR values than the maximum likelihood approach [SooKY97], Conclusions

of 43Wavelet-Based Speech Enhancement Literature Survey [KimYK01], Novelty Title: – –Speech enhancement using adaptive wavelet shrinkage Novelty: – –Adaptive threshold value Threshold value will depend on the variance of estimated clean signal (BayesShrink) – –Classification of unvoiced region using entropy Applies smaller threshold for unvoiced region and calls the method as “Adaptive BayesShrink”

of 43Wavelet-Based Speech Enhancement Literature Survey [KimYK01], Threshold Value BayesShrink: Adaptive threshold value for minimizing the Bayesian risk is Thus, finds the estimated threshold value as Where[ChaYV00a]

of 43Wavelet-Based Speech Enhancement Literature Survey Current region is unvoiced if Unvoiced region has smaller energy, so apply a smaller threshold: [KimYK01], Unvoiced Regions are selected by simulation There was no comment about type of entropy, it could be as:

of 43Wavelet-Based Speech Enhancement Literature Survey Implementation results: – –Additive white Gaussian noise – –SNR: 0db, 10dB and 20dB [KimYK01], Implementations VisuShrinkBayesShrinkAdaptive BayesShrink 0 dB4.8208 dB4.4982 dB5.5733 dB 10 dB11.5650 dB12.8456 dB14.1543 dB 20 dB16.8488 dB21.8313 dB23.8455 dB

of 43Wavelet-Based Speech Enhancement Literature Survey [ChaKYK02], Novelty Title: Speech enhancement for non-stationary noise environment by adaptive wavelet packet Novelty: – –Node dependent thresholding for adaptation in colored or non-stationary noise – –Noise estimation based on spectral entropy not MAD – –Modified hard thresholding to alleviate time- frequency discontinuities

of 43Wavelet-Based Speech Enhancement Literature Survey [ChaKYK02], Threshold Value Create WPT and find best basis tree’s leaf nodes Node dependent thresholding Noise estimation could be like: or the following proposed method

of 43Wavelet-Based Speech Enhancement Literature Survey [ChaKYK02], Noise Estimation 1. 1. Estimate spectral pdf of wavelet packet coefficients through B bins histogram 2. 2. Calculate normalized spectral entropy for each node in adapted wavelet packet tree

of 43Wavelet-Based Speech Enhancement Literature Survey [ChaKYK02], Noise Estimation (cont.) 3. 3. Estimate spectral magnitude intensity by histogram 4. 4. Define an auxiliary threshold 5. 5. Estimate standard deviation of noise node_length bins of C. magnitudes # of C. with magnitude equal to or greater than bin’s amplitude

of 43Wavelet-Based Speech Enhancement Literature Survey [ChaKYK02], Noise Estimation (cont.) Greater disorder of wavelet coefficients (less voiced, more unvoiced) More uniform spectral pdf Bigger values for entropy (0  1) Bigger value for alpha Smaller # of bins bigger than alpha Smaller estimation for standard deviation of noise

of 43Wavelet-Based Speech Enhancement Literature Survey [ChaKYK02], Thresholding Modified Hard Thresholding

of 43Wavelet-Based Speech Enhancement Literature Survey Implementation results: – –Pink noise, SNR: -5db ~ 15 dB [ChaKYK02], Implementations Noisy Speech SNR (dB) Level Dep. with MAD Node Dep. with MAD Node Dep. with Proposed Spectral Subtraction -5-3.73.533.310.10 01.115.435.911.77 55.797.448.302.35 1010.159.4910.472.83 1514.1511.3912.154.08 Subjective tests were in favor of the level dependent thresholding but not every time! Anyway, the proposed method has better spectral performance (spectrogram)

of 43Wavelet-Based Speech Enhancement Literature Survey – –SNR (dB) test for various noisy speech: “We like bleu cheese but Victor prefers swiss cheese.” (SNR= 10dB) [ChaKYK02], Implementations (cont.) Noise typeLevel Dep. with MAD Node Dep. with Proposed Spectral Subtraction White102910.352.39 Pink9.4710.492.42 F169.7110.352.18 Car9.6513.501.95 Babble9.5910.182.23

of 43Wavelet-Based Speech Enhancement Literature Survey To be continued… Thank You. …

of 43Wavelet-Based Speech Enhancement References (1 of 2) [ChaKYK02]S. Chang, Y. Kwon, S. I. Yang, and I. J. Kim, “Speech enhancement for non-stationary noise environment by adaptive wavelet packet,” Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-2002, Vol. 1, pp. 561-564, 2002. [ChaYV00a]S. G. Chang, B. Yu, and M. Vetterli, “Adaptive Wavelet Thresholding for Image Denoising and Compression,” IEEE Transaction on Image Processing, Vol. 9, No. 9, pp. 1532-1546, Sep. 2000. [DonJ94b]D. L. Donoho and I. M. Johnstone, “Threshold selection for wavelet shrinkage of noisy data,” Proceedings of the 16th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 1994. Engineering Advances: New Opportunities for Biomedical Engineers, Vol. 1, pp. A24- A25, Nov. 1994. [GaoB95]H. Y. Gao and A. G. Bruce, “WaveShrink with Semisoft Shrinkage,” Research Report No. 39, StatSci Division of MathSoft, Inc., 1995. [KimYK01]I. J. Kim, S. I. Yang and Y. Kwon, “Speech enhancement using adaptive wavelet shrinkage,” Proceedings of IEEE International Symposium on Industrial Electronics, ISIE-2001, Vol. 1, pp. 501-504, 2001.

of 43Wavelet-Based Speech Enhancement References (2 of 2) [SeoB97]J. W. Seok and K. S. Bae, “Speech enhancement with reduction of noise components in the wavelet domain,” IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-97, Vol. 2, pp. 1323-1326, Apr. 1997. [SooKY97]I. Y. Soon, S. N. Koh and C. K. Yeo, “Wavelet for speech denoising,” Proceedings of IEEE Region 10 Annual Conference on Speech and Image Technologies for Computing and Telecommunications, TENCON-97, Vol. 2, pp. 479-482, Dec. 1997.

Wavelet-Based Speech Enhancement Thank You Course Project Presentation 1 FIND OUT MORE AT... 1. http://ce.sharif.edu/~m_amiri/ 2. http://www.aictct.com/dml/

Wavelet-Based Speech Enhancement Mahdi Amiri April 2003 Sharif University of Technology Course Project Presentation 1.

Similar presentations

Presentation on theme: "Wavelet-Based Speech Enhancement Mahdi Amiri April 2003 Sharif University of Technology Course Project Presentation 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Wavelet-Based Speech Enhancement Mahdi Amiri April 2003 Sharif University of Technology Course Project Presentation 1.

Similar presentations

Presentation on theme: "Wavelet-Based Speech Enhancement Mahdi Amiri April 2003 Sharif University of Technology Course Project Presentation 1."— Presentation transcript:

Similar presentations

About project

Feedback