Speech Enhancement Using Noise Estimation Based on

Slides:



Advertisements
Similar presentations
Advanced Speech Enhancement in Noisy Environments
Advertisements

Multipitch Tracking for Noisy Speech
EE Dept., IIT Bombay Workshop “AICTE Sponsored Faculty Development Programme on Signal Processing and Applications", Dept. of Electrical.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis 1. EECS Department, Northwestern University 2. Advanced Technology Labs, Adobe Systems Inc. 3. University.
Survey of INTERSPEECH 2013 Reporter: Yi-Ting Wang 2013/09/10.
1 TAC2000/ IP Telephony Lab Perceptual Evaluation of Speech Quality (PESQ) Speaker: Wen-Jen Lin Date: Dec
Advances in WP1 Turin Meeting – 9-10 March
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 664 Final Presentation May 2009 Dr. Radu Balan Department of Mathematics.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Advances in WP1 and WP2 Paris Meeting – 11 febr
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
Real-time Implementation of Multi-band Frequency Compression for Listeners with Moderate Sensorineural Impairment [Ref.: N. Tiwari, P. C. Pandey, P. N.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
EE Dept., IIT Bombay NCC2014 Kanpur, 28 Feb.- 2 Mar. 2014, Paper No (Session III, Sat., 1 st Mar., 1020 – 1200) A Sliding-band.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
1 SPEECH PROCESSING FOR BINAURAL HEARING AIDS Dr P. C. Pandey EE Dept., IIT Bombay Feb’03.
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
Real-time Enhancement of Noisy Speech Using Spectral Subtraction
Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), March 2013, Allahabad, India 09 March 2013 Speech.
IIT Bombay Dr. Prem C. Pandey Dr. Pandey is a Professor in Electrical Engineering at IIT Bombay. He is currently also the Associate.
Nico De Clercq Pieter Gijsenbergh Noise reduction in hearing aids: Generalised Sidelobe Canceller.
♠♠♠♠ 1Intro 2.Loudness 3.Method. 4.Results 5.Concl. ♦♦ ◄◄ ► ► 1/161Intro 2.Loudness 3.Method. 4.Results 5.Concl. ♦♦ ◄ ► IIT Bombay ICA 2010 : 20th Int.
Speech Enhancement Using Spectral Subtraction
EE Dept., IIT Bombay Indicon2013, Mumbai, Dec. 2013, Paper No. 524 (Track 4.1,
From Auditory Masking to Supervised Separation: A Tale of Improving Intelligibility of Noisy Speech for Hearing- impaired Listeners DeLiang Wang Perception.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 663 Mid Year Progress Report December 2008 Professor Radu Balan 1.
2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 1) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.
1/18 1.Intro 2. Implementation 3. Results 4. Con.
Communication and Signal Processing. Dr. Y.C. Jenq 2. Digital Signal Processing Y. C. Jenq, "A New Implementation Algorithm.
EE Dept., IIT Bombay NCC 2013, Delhi, Feb. 2013, Paper 3.2_2_ ( Sat.16 th, 1135 – 1320, 3.2_2) Speech Enhancement.
EE Dept., IIT Bombay NCC 2015, Mumbai, 27 Feb.- 1 Mar. 2015, Paper No (28 th Feb., Sat., Session SI, 10:05 – 11:15, Paper.
♠ 1.Intro 2. List. tests 3. Results 4 Concl.♠♠ 1.Intro 2. List. tests 3. Results 4 Concl. ♥♥ ◄◄ ► ► 1/17♥♥◄ ► IIT Bombay ICA 2010 : 20th Int. Congress.
Nico De Clercq Pieter Gijsenbergh.  Problem  Solutions  Single-channel approach  Multichannel approach  Our assignment Overview.
ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.
♥♥♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 1/191. Intro.2.Spec.sub.3.Est. noise4.Intro.J& S5.Results6 Concl ♠♠◄◄►►
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1.
Gammachirp Auditory Filter
EE Dept., IIT Bombay IEEE Workshop on Intelligent Computing, IIIT Allahabad, Oct Signal processing for improving speech.
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
IIT Bombay {pcpandey,   Intro. Proc. Schemes Evaluation Results Conclusion Intro. Proc. Schemes Evaluation Results Conclusion.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
EE Dept., IIT Bombay P. C. Pandey, "Signal processing for persons with sensorineural hearing loss: Challenges and some solutions,”
EE Dept., IIT Bombay Part B Sliding-band Dynamic Range Compression (N. Tiwari & P. C. Pandey, NCC 2014) P. C. Pandey, "Signal processing.
EE Dept., IIT Bombay Workshop “Radar and Sonar Signal Processing,” NSTL Visakhapatnam, Aug 2015 Coordinator: Ms. M. Vijaya.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 3) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.
1 Introduction1 Introduction 2 Noise red. tech 3 Spect. Subtr. 4. QBNE 5 Invest. QBNE 6 Conc., & future work2 Noise red. tech 3 Spect. Subtr.4. QBNE5 Invest.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Statistical Signal Processing Research Laboratory(SSPRL) UT Acoustic Laboratory(UTAL) A TWO-STAGE DATA-DRIVEN SINGLE MICROPHONE SPEECH ENHANCEMENT WITH.
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
Speech Enhancement based on
EE Dept., IIT Bombay CEP-cum-TEQUIP-KITE Course “Digital Signal Processing”, IIT Bombay, 2–6 November 2015, Course Coordinator:
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab Multiplicative Update of AR gains in Codebook- driven Speech.
UNIT-IV. Introduction Speech signal is generated from a system. Generation is via excitation of system. Speech travels through various media. Nature of.
Saketh Sharma, Nitya Tiwari, & Prem C. Pandey
Speech Enhancement Summer 2009
Single-channel Speech Enhancement for Real-time Applications
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
A Smartphone App-Based
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Results from offline processing
MPEG-1 Overview of MPEG-1 Standard
Combination of Feature and Channel Compensation (1/2)
Speech Enhancement Based on Nonparametric Factor Analysis
Presentation transcript:

Speech Enhancement Using Noise Estimation Based on NCC 2015, Mumbai, 27 Feb.- 1 Mar. 2015, Paper No. 1570056299 (28 th Feb., Sat., Session SI, 10:05 – 11:15, Paper I) ============================================================================ Speech Enhancement Using Noise Estimation Based on Dynamic Quantile Tracking for Hearing Impaired Listeners Nitya Tiwari & Prem C. Pandey {nitya, pcpandey} @ ee.iitb.ac.in www.ee.iitb.ac.in/~spilab IIT Bombay

Overview 1. Introduction 2. Signal Processing for Speech Enhancement 3. Implementation for Real-time Processing 4. Test Results 5. Summary & Conclusion

4. Test Results Test material Evaluation methods Speech: Recording with three isolated vowels, a Hindi sentence, an English sentence (-/a/-/i/-/u/– “aayiye aap kaa naam kyaa hai?” – “Where were you a year ago?”) from a male speaker. Noise: white, street, babble, car, and train noises (AURORA ). SNR: ∞, 15, 12, 9, 6, 3, 0, –3, –6, –9, and –12 dB. Evaluation methods Informal listening Objective evaluation using PESQ measure (0 – 4.5)

Results: Offline processing Investigations for most suitable values of processing parameters Processing with noise estimation carried out using sample quantile (SQ) values & the following processing parameters: β = 0, α = 0.4 – 6 τ = 0.1, σ = (0.9)1/1024 (rise time = 1 frame shift, fall time = 1024 frame shift) p = 0.1, 0.25, 0.5, 0.75, 0.9 M = 32, 64, 128, 256, & 512 M = 128 resulted in highest PESQ scores (for fixed SNR, α, & p). Noise estimation with p = 0.25 resulted in nearly the best scores for different types of noises at all SNRs PESQ scores obtained for processing with noise estimation using dynamic quantile tracking with λ = 1/256 nearly equal to the PESQ scores obtained using SQ with M = 128.

Processing examples & PESQ scores PESQ scores of the unprocessed (Unpr.) noisy speech with babble (a non-stationary noise) and processed (Pr.) signals with noise estimation by sample quantile (SQ) with M = 128 and dynamic quantile tracking (DQT) with λ = 1/256. SNR (dB) PESQ Score Unpr. Pr., α=1,β=0 Pr., α=2,β=0 Pr., α=3, β=0 SQ DQT -6 1.68 1.72 1.66 1.71 1.75 1.62 1.57 1.97 2.00 2.13 2.20 2.19 2.17 2.28 6 2.39 2.54 2.53 2.70 2.65 2.69 2.67 PESQ scores obtained using 0.25-quantile not sensitive to changes in α Combination of λ = 1/256, p = 0.25, & α = 2 used for more detailed examination of scores

PESQ score vs SNR: noisy & enhanced speech Increase in scores: 0.24 – 0.46 for white noise, 0.08 – 0.32 for babble noise. SNR advantage: ≈ 6 dB for white noise, ≈ 3 dB for babble noise. Informal listening: β = 0.001 reduced the musical noise without degrading speech quality.

Results: Real-time processing Testing of real-time processing using white, babble, car, street, and train noises at different SNRs Listening: Real-time processed output perceptually similar to the offline processed output Objective verification: High PESQ scores (> 3.5) for output of real-time processing with output of offline processing as the reference Signal delay: 36 ms Processing capacity required: ≈ 41% (System clock needed for satisfactory processing = 50 MHz, highest system clock = 120 MHz)

More examples: http://www.ee.iitb.ac.in/~spilab/material/nitya/ncc2015 Example: -/a/-/i/-/u/– “aayiye aap kaa naam kyaa hai?” – “Where were you a year ago?”) , white noise, input SNR = 3 dB. (a) Clean speech (c) Offline processed (b) Noisy speech (d) Real-time processed More examples: http://www.ee.iitb.ac.in/~spilab/material/nitya/ncc2015

5. Summary & Conclusions Proposed technique: Suppression of stationary & non-stationary background noise by estimation of noise spectrum using dynamic quantile tracking without voice activity detection or storage & sorting of past samples. Speech enhancement: SNR advantage (at PESQ score = 2) of 3 – 6 dB for different stationary & non-stationary noises. Implementation for real-time operation using 16-bit fixed-point processor TI/TMS320C5515: signal delay ≈36 ms, processing capacity required ≈41%. Technique permits use of frequency-dependent quantile for noise estimation without introducing processing overheads. Further work Combination of noise suppression with other processing techniques in sensory aids Implementation using other processors

Thank You

References [1] H. Levitt, J. M. Pickett, and R. A. Houde (eds.), Senosry Aids for the Hearing Impaired. New York: IEEE Press, 1980. [2] J. M. Pickett, The Acoustics of Speech Communication: Fundamentals, Speech Perception Theory, and Technology. Boston, Mass.: Allyn Bacon, 1999, pp. 289–323. [3] H. Dillon, Hearing Aids. New York: Thieme Medical, 2001. [4] T. Lunner, S. Arlinger, and J. Hellgren, “8-channel digital filter bank for hearing aid use: preliminary results in monaural, diotic, and dichotic modes,” Scand. Audiol. Suppl., vol. 38, pp. 75–81, 1993. [5] P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, “Binaural dichotic presentation to reduce the effects of spectral masking in moderate bilateral sensorineural hearing loss,” Int. J. Audiol., vol. 51, no. 4, pp. 334–344, 2012. [6] J. Yang, F. Luo, and A. Nehorai, “Spectral contrast enhancement: Algorithms and comparisons,” Speech Commun., vol. 39, no. 1–2, pp. 33–46, 2003. [7] T. Arai, K. Yasu, and N. Hodoshima, “Effective speech processing for various impaired listeners,” in Proc. 18th Int. Cong. Acoust., 2004, Kyoto, Japan, pp. 1389–1392. [8] P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, “Multi-band frequency compression for improving speech perception by listeners with moderate sensorineural hearing loss,” Speech Commun., vol. 54, no. 3 pp. 341–350, 2012. [9] A. R. Jayan and P. C. Pandey, “Automated modification of consonant-vowel ratio of stops for improving speech intelligibility,” Int. J. Speech Technol., 2014, [online] DOI 10.1007/s10772-014-9254-4. [10] M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in Proc. IEEE ICASSP 1979, Washington, D.C., pp. 208-211.

[11] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech, Signal Process., vol. 27, no. 2, pp. 113-120, 1979. [12] P. C. Loizou, Speech Enhancement: Theory and Practice. New York: CRC, 2007. [13] Y. Lu and P. C. Loizou, “A geometric approach to spectral subtraction,” Speech Commun., vol. 50, no. 6, pp. 453-466, 2008. [14] K. Paliwal, K. Wójcicki, and B. Schwerin, “Single-channel speech enhancement using spectral subtraction in the short-time modulation domain,” Speech Commun., vol. 52, no. 5, pp. 450–475, 2010. [15] R. Martin, “Spectral subtraction based on minimum statistics,” in Proc. 6th Eur. Signal Process. Conf. (EUSIPCO 1994), Edinburgh, U.K., 1994, pp. 1182-1185. [16] I. Cohen, “Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging,” IEEE Trans. Speech Audio Process., vol. 11, no. 5, pp. 466-475, 2003. [17] G. Doblinger, “Computationally efficient speech enhancement by spectral minima tracking in subbands,” in Proc. EUROSPEECH 1995, Madrid, Spain, pp. 1513-1516. [18] V. Stahl, A. Fisher, and R. Bipus, “Quantile based noise estimation for spectral subtraction and Wiener filtering,” in Proc. IEEE ICASSP 2000, Istanbul, Turkey, pp. 1875-1878. [19] N. W. Evans and J. S. Mason, "Time-frequency quantile-based noise estimation," in Proc. 11th Eur. Signal Process. Conf. (EUSIPCO 2002), Toulouse, France, 2002, pp. 539-542. [20] H. Bai and E. A. Wan, "Two-pass quantile based noise spectrum estimation," Center of spoken language understanding, OGI School of Science and Engineering at OHSU (2003), [online] Available: http://speech.bme.ogi.edu/publications/ps/bai03.pdf. [21] S. K. Waddi, P. C. Pandey, and N. Tiwari, “Speech enhancement using spectral subtraction and cascaded-median based noise estimation for hearing impaired listeners,” in Proc. 19th Nat. Conf. Commun. (NCC 2013), Delhi, India, 2013, paper no. 1569696063. [22] Texas Instruments, Inc., “TMS320C5515 Fixed-Point Digital Signal Processor,” 2011, [online] Available: focus.ti.com/lit/ds/symlink/ tms320c5515.pdf.

[23]. Spectrum Digital, Inc [23] Spectrum Digital, Inc., “TMS320C5515 eZdsp USB Stick Technical Reference,” 2010, [online] Available: support.spectrumdigital.com/ boards/usbstk5515/reva/files/usbstk5515_TechRef_RevA.pdf [24] Texas Instruments, Inc., “TLV320AIC3204 Ultra Low Power Stereo Audio Codec,” 2008, [online] Available: focus.ti.com/lit/ds/ symlink/tlv320aic3204.pdf. [25] ITU, “Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs,” ITU-T Rec., P.862, 2001. [26] N. Tiwari, “Speech enhancement using noise estimation based on dynamic quantile tracking for hearing impaired listeners: Processing results”, 2015, [online] Available: www.ee.iitb.ac.in/~spilab/material /nitya/ncc2015.