EE Dept., IIT Bombay NCC 2013, Delhi, 15-17 Feb. 2013, Paper 3.2_2_1569696063 ( Sat.16 th, 1135 – 1320, 3.2_2) Speech Enhancement.

Slides:



Advertisements
Similar presentations
Speech Enhancement through Noise Reduction By Yating & Kundan.
Advertisements

Advanced Speech Enhancement in Noisy Environments
Background Noise Definition: an unwanted sound or an unwanted perturbation to a wanted signal Examples: – Clicks from microphone synchronization – Ambient.
EE Dept., IIT Bombay Workshop “AICTE Sponsored Faculty Development Programme on Signal Processing and Applications", Dept. of Electrical.
© 2006 AudioCodes Ltd. All rights reserved. AudioCodes Confidential Proprietary Signal Processing Technologies in Voice over IP Eli Shoval Audiocodes.
Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis 1. EECS Department, Northwestern University 2. Advanced Technology Labs, Adobe Systems Inc. 3. University.
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
Digital Communication Techniques
Real-time Implementation of Multi-band Frequency Compression for Listeners with Moderate Sensorineural Impairment [Ref.: N. Tiwari, P. C. Pandey, P. N.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Figures for Chapter 6 Compression
SIGNAL PROCESSING IN HEARING AIDS
EE Dept., IIT Bombay NCC2014 Kanpur, 28 Feb.- 2 Mar. 2014, Paper No (Session III, Sat., 1 st Mar., 1020 – 1200) A Sliding-band.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
Speech Enhancement Using Noise Estimation Based on
1 SPEECH PROCESSING FOR BINAURAL HEARING AIDS Dr P. C. Pandey EE Dept., IIT Bombay Feb’03.
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
Real-time Enhancement of Noisy Speech Using Spectral Subtraction
Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), March 2013, Allahabad, India 09 March 2013 Speech.
IIT Bombay Dr. Prem C. Pandey Dr. Pandey is a Professor in Electrical Engineering at IIT Bombay. He is currently also the Associate.
Nico De Clercq Pieter Gijsenbergh Noise reduction in hearing aids: Generalised Sidelobe Canceller.
♠♠♠♠ 1Intro 2.Loudness 3.Method. 4.Results 5.Concl. ♦♦ ◄◄ ► ► 1/161Intro 2.Loudness 3.Method. 4.Results 5.Concl. ♦♦ ◄ ► IIT Bombay ICA 2010 : 20th Int.
Speech Enhancement Using Spectral Subtraction
EE Dept., IIT Bombay Indicon2013, Mumbai, Dec. 2013, Paper No. 524 (Track 4.1,
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 663 Mid Year Progress Report December 2008 Professor Radu Balan 1.
2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 1) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.
1/18 1.Intro 2. Implementation 3. Results 4. Con.
Communication and Signal Processing. Dr. Y.C. Jenq 2. Digital Signal Processing Y. C. Jenq, "A New Implementation Algorithm.
EE Dept., IIT Bombay NCC 2015, Mumbai, 27 Feb.- 1 Mar. 2015, Paper No (28 th Feb., Sat., Session SI, 10:05 – 11:15, Paper.
♠ 1.Intro 2. List. tests 3. Results 4 Concl.♠♠ 1.Intro 2. List. tests 3. Results 4 Concl. ♥♥ ◄◄ ► ► 1/17♥♥◄ ► IIT Bombay ICA 2010 : 20th Int. Congress.
Nico De Clercq Pieter Gijsenbergh.  Problem  Solutions  Single-channel approach  Multichannel approach  Our assignment Overview.
♥♥♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 1/191. Intro.2.Spec.sub.3.Est. noise4.Intro.J& S5.Results6 Concl ♠♠◄◄►►
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
EE Dept., IIT Bombay IEEE Workshop on Intelligent Computing, IIIT Allahabad, Oct Signal processing for improving speech.
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
IIT Bombay {pcpandey,   Intro. Proc. Schemes Evaluation Results Conclusion Intro. Proc. Schemes Evaluation Results Conclusion.
IIT Bombay ICSCI 2004, Hyderabad, India, Feb’ 04 Introduction Analysis / synthesis Spec. Sub. Methodology Results Conclusion and.
Laboratory for Experimental ORL K.U.Leuven, Belgium Dept. of Electrotechn. Eng. ESAT/SISTA K.U.Leuven, Belgium Combining noise reduction and binaural cue.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
EE Dept., IIT Bombay P. C. Pandey, "Signal processing for persons with sensorineural hearing loss: Challenges and some solutions,”
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
EE Dept., IIT Bombay Part B Sliding-band Dynamic Range Compression (N. Tiwari & P. C. Pandey, NCC 2014) P. C. Pandey, "Signal processing.
EE Dept., IIT Bombay Workshop “Radar and Sonar Signal Processing,” NSTL Visakhapatnam, Aug 2015 Coordinator: Ms. M. Vijaya.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
1 Introduction1 Introduction 2 Noise red. tech 3 Spect. Subtr. 4. QBNE 5 Invest. QBNE 6 Conc., & future work2 Noise red. tech 3 Spect. Subtr.4. QBNE5 Invest.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Suppression of Musical Noise Artifacts in Audio Noise Reduction by Adaptive 2D filtering Alexey Lukin AES Member Moscow State University, Moscow, Russia.
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
Speech Enhancement based on
EE Dept., IIT Bombay CEP-cum-TEQUIP-KITE Course “Digital Signal Processing”, IIT Bombay, 2–6 November 2015, Course Coordinator:
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab Multiplicative Update of AR gains in Codebook- driven Speech.
Saketh Sharma, Nitya Tiwari, & Prem C. Pandey
Speech Enhancement Summer 2009
Single-channel Speech Enhancement for Real-time Applications
A Smartphone App-Based
朝陽科技大學 資訊工程系 謝政勳 Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學 資訊工程系 謝政勳
Results from offline processing
MPEG-1 Overview of MPEG-1 Standard
Govt. Polytechnic Dhangar(Fatehabad)
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Speech Enhancement Based on Nonparametric Factor Analysis
Presentation transcript:

EE Dept., IIT Bombay NCC 2013, Delhi, Feb. 2013, Paper 3.2_2_ ( Sat.16 th, 1135 – 1320, 3.2_2) Speech Enhancement Using Spectral Subtraction and Cascaded-Median Based Noise Estimation for Hearing Impaired Listeners Santosh K. Waddi, Prem C. Pandey, Nitya Tiwari {wsantosh, pcpandey, ee.iitb.ac.in IIT Bombay

EE Dept., IIT Bombay 2/19 Overview 1.Introduction 2.Signal Processing for Spectral Subtraction 3.Implementation for Real-time Processing 4.Test Results 5.Summary & Conclusion

EE Dept., IIT Bombay 3/19 1. Introduction Sensorineural hearing loss – Increased hearing thresholds and high frequency loss – Decreased dynamic range & abnormal loudness growth – Reduced speech perception due to increased spectral & temporal masking → Decreased speech intelligibility in noisy environment Signal processing in hearing aids – Frequency selective amplification – Automatic volume control – Multichannel dynamic range compression ( settable attack time, release time, and compression ratios) Processing for reducing the effect of increased spectral masking in sensorineural loss – Binaural dichotic presentation ( Lunner et al. 1993, Kulkarni et al. 2012) – Spectral contrast enhancement (Yang et al. 2003) – Multiband frequency compression (Arai et al. 2004, Kulkarni et al. 2012)

EE Dept., IIT Bombay 4/19 Techniques for reducing the background noise – Directional microphone – Adaptive filtering (a second microphone needed for noise reference) – Single-channel noise suppression using spectral subtraction (Boll 1979, Berouti et al. 1979, Martin 1994, Loizou 2007, Lu & Loizou 2008, Paliwal et al. 2010) Processing steps Dynamic estimation of non-stationary noise spectrum - During non-speech segments using voice activity detection - Continuously using statistical techniques Estimation of noise-free speech spectrum - Spectral noise subtraction - Multiplication by noise suppression function Speech resynthesis (using enhanced magnitude and noisy phase)

EE Dept., IIT Bombay 5/19 Research objective Real-time single-input speech enhancement for use in hearing aids and other sensory aids (cochlear prostheses, etc) for hearing impaired listeners Main challenges Noise estimation without voice activity detection to avoid errors under low-SNR & during long speech segments Low signal delay(algorithmic + computational) for real-time application Low computational complexity & memory requirement for implementation on a low-power processor Proposed technique: Spectral subtraction using cascaded-median based continuous updating of the noise spectrum (without using voice activity detection) Real-time implementation: 16-bit fixed-point DSP with on-chip FFT hardware Evaluation: Informal listening, PESQ-MOS

EE Dept., IIT Bombay 6/19 2. Signal Processing for Spectral Subtraction Dynamic estimation of non-stationary noise spectrum Estimation of noise-free speech spectrum Speech resynthesis

EE Dept., IIT Bombay 7/19 Power subtraction Windowed speech spectrum = X n (k) Estimated noise mag. spectrum = D n (k) Estimated speech spectrum Y n (k) = [|X n (k)| 2 – (D n (k)) 2 ] 0.5 e j<X n (k) Problems: residual noise due to under-subtraction, distortion in the form of musical noise & clipping due to over-subtraction. Generalized spectral subtraction (Berouti et al. 1979) |Y n (k)| = β 1/γ D n (k), if |X n (k)| < (α + β) 1/γ D n (k) [ |X n (k)| γ – α(D n (k)) γ ] 1/γ otherwise γ = exponent factor ( 2: power subtraction, 1: magnitude subtraction) α = o ver-subtraction factor (for limiting the effect of short-term variations in noise spectrum) β = floor factor to mask the musical noise due to over-subtraction Re-synthesis with noisy phase without explicit phase calculation Y n (k) = |Y n (k)| X n (k) / |X n (k)|

EE Dept., IIT Bombay 8/19 Dynamic estimation of noise magnitude spectrum Min. statistics based est. (Martin 1994 ): SNR-dependent over-subtraction factor. Median based est. (Stahl et al. 2000) : Large computation & memory. Pseudo-median based est. (Basha & Pandey 2012) : moving median approximated by p -point q -stage cascaded-median, with a saving in memory & computation for real-time implementation. MedianStorage per freq. binSortings per frame per freq. bin M -point 2M (M–1)/2 p -pont q -stage ( M = p q ) pqp(p–1)/2 Condition for reducing sorting operations and storage: low p, q ≈ ln(M)

EE Dept., IIT Bombay 9/19 Re-synthesis of enhanced signal Spectral subtraction → Enhanced magnitude spectrum Enh. magnitude spectrum & original phase spectrum → Complex spectrum Resynthesis using IFFT and overlap-add Investigations using offline implementation ( f s = 12 kHz, frame = 30 ms) Overlap of 50% & 75% : indistinguishable outputs FFT length N = 512 & higher : indistinguishable outputs γ = 1 (magnitude subtraction) : higher tolerance to variation in α, β values Duration needed for dynamic noise estimation ≈ 1 s p = 3 for simplifying programming and reducing the sorting operations 3-frame 4-stage cascaded-median ( M=81, p=3, q=4 ), 50% overlap: moving median over s Outputs using true-median & cascaded-median: indistinguishable Reduction in storage requirement per freq. bin: from 162 to 12 samples Reduction in number of sorting operations per frame per freq. bin: from 40 to 3

EE Dept., IIT Bombay 10/19 3. Implementation for Real-time Processing 16-bit fixed point DSP: TI/TMS320C MB memory space : 320 KB on-chip RAM with 64 KB dual access RAM, 128 KB on-chip ROM Three 32-bit programmable timers, 4 DMA controllers each with 4 channels FFT hardware accelerator (8 to 1024-point FFT) Max. clock speed: 120 MHz DSP Board: eZdsp 4 MB on-board NOR flash for user program Codec TLV320AIC3204: stereo ADC & DAC, 16/20/24/32-bit quantization, 8 – 192 kHz sampling Development environment for C: TI's 'CCStudio, ver. 4.0'

EE Dept., IIT Bombay 11/19 Implementation One codec channel (ADC and DAC) with 16-bit quantization Sampling frequency: 12 kHz Window length of 30 ms ( L = 360) with 50 % overlap, FFT length N = 512 Storage of input samples, spectral values, processed samples: 16-bit real & 16- bit imaginary parts

EE Dept., IIT Bombay 12/19 Data transfers and buffering operations ( S = L/2 ) DMA cyclic buffers –3 block input buffer –2 block output buffer (each with S samples) Pointers –current input block –just-filled input block –current output block –write-to output block (incremented cyclically on DMA interrupt) Signal delay –Algorithmic: 1 frame (30 ms) –Computational ≤ frame shift (15 ms)

EE Dept., IIT Bombay 13/19 4. Test Results Test material Speech: Recording with three isolated vowels, a Hindi sentence, an English sentence (-/a/-/i/-/u/– “aayiye aap kaa naam kyaa hai?” – “Where were you a year ago?”) from a male speaker. Noise: white, pink, babble, car, and train noises (AURORA ). SNR: ∞, 15, 12, 9, 6, 3, 0, -3, -6 dB. Evaluation methods Informal listening Objective evaluation using PESQ measure (0 – 4.5) Results: Offline processing Processing parameters: β = 0.001, α : 1.5 – 2.5. Informal listening: no audible roughness in the enhanced speech, speech clipping at larger α.

EE Dept., IIT Bombay 14/19 PESQ score vs SNR: noisy & enhanced speech SNR advantage : 13 dB for white noise, 4 dB for babble

EE Dept., IIT Bombay 15/19 Processing examples & PESQ scores Noise PESQ score (SNR = 0dB) Optimal α Un-processedProcessed White Pink Babble Car Train Speech material: speech-speech-silence-speech, speech: /-a-i-u-/––"ayiye"–– "aap kaa naam kyaa hai?"––" where were you a year ago?” Processing parameters: Frame length = 30 ms, Overlap = 50%, β = 0.001, noise estimation by 3-point 4-stage casc. median (estimation duration = s)

EE Dept., IIT Bombay 16/19 (a) Clean speech (b) Noisy speech (c) Offline processed (d) Real-time processed Example: “Where were you a year ago”, white noise, input SNR = 3 dB. More examples:

EE Dept., IIT Bombay 17/19 Results: Real-time processing Real-time processing tested using white, babble, car, pink, train noises: real-time processed output perceptually similar to the offline processed output Signal delay = 48 ms Lowest clock for satisfactory operation = 16.4 MHz → Processing capacity used ≈ 1/7 of the capacity with highest clock (120 MHz)

EE Dept., IIT Bombay 18/19 5. Summary & Conclusions Proposed technique for suppression of additive noise: cascaded-median based dynamic noise estimation for reducing computation and memory requirement for real-time operation. Enhancement of speech with different types of additive stationary and non- stationary noise: SNR advantage (at PESQ score = 2.5): 4 – 13 dB, Increase in PESQ score (at SNR = 0dB): 0.37 – Implementation for real-time operation using 16-bit fixed-point processor TI/TMS320C5515: used-up processing capacity ≈ 1/7, delay = 48 ms. Further work – Frequency & a posteriori SNR-dependent subtraction & spectral floor factors – Combination of speech enhancement technique with other processing techniques in the sensory aids – Implementation using other processors

EE Dept., IIT Bombay 19/19

EE Dept., IIT Bombay 20/19 Abstract A spectral subtraction technique is presented for real-time speech enhancement in the aids used by hearing impaired listeners. For reducing computational complexity and memory requirement, it uses a cascaded-median based estimation of the noise spectrum without voice activity detection. The technique is implemented and tested for satisfactory real-time operation, with sampling frequency of 12 kHz, processing using window length of 30 ms with 50% overlap, and noise estimation by 3-frame 4-stage cascaded-median, on a 16-bit fixed- point DSP processor with on-chip FFT hardware. Enhancement of speech with different types of additive stationary and non-stationary noise resulted in SNR advantage of 4 – 13 dB.

EE Dept., IIT Bombay 21/19 References [1]H. Levitt, J. M. Pickett, and R. A. Houde (eds.), Senosry Aids for the Hearing Impaired. New York: IEEE Press, [2]J. M. Pickett, The Acoustics of Speech Communication: Fundamentals, Speech Perception Theory, and Technology. Boston, Mass.: Allyn Bacon, 1999, pp. 289–323. [3]H. Dillon, Hearing Aids. New York: Thieme Medical, [4]T. Lunner, S. Arlinger, and J. Hellgren, “8-channel digital filter bank for hearing aid use: preliminary results in monaural, diotic, and dichotic modes,” Scand. Audiol. Suppl., vol. 38, pp. 75–81, [5]P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, “Binaural dichotic presentation to reduce the effects of spectral masking in moderate bilateral sensorineural hearing loss,” Int. J. Audiol., vol. 51, no. 4, pp. 334–344, [6]J. Yang, F. Luo, and A. Nehorai, “Spectral contrast enhancement: Algorithms and comparisons,” Speech Commun., vol. 39, no. 1–2, pp. 33–46, [7]T. Arai, K. Yasu, and N. Hodoshima, “Effective speech processing for various impaired listeners,” in Proc. 18th Int. Cong. Acoust., 2004, Kyoto, Japan, pp. 1389–1392. [8]P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, “Multi-band frequency compression for improving speech perception by listeners with moderate sensorineural hearing loss,” Speech Commun., vol. 54, no. 3 pp. 341–350, [9]P. C. Loizou, "Speech processing in vocoder-centric cochlear implants," in A. R. Moller (ed.), Cochlear and Brainstem Implants, Adv. Otorhinolaryngol. vol. 64, Basel: Karger, 2006, pp. 109–143. [10]P. C. Loizou, Speech Enhancement: Theory and Practice. New York: CRC, [11]R. Martin, “Spectral subtraction based on minimum statistics,” in Proc. Eur. Signal Process. Conf., 1994, pp [12]I. Cohen, “Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging,” IEEE Trans. Speech Audio Process., vol. 11, no. 5, pp , [13]H. Hirsch and C. Ehrlicher, “Noise estimation techniques for robust speech recognition,” in Proc. IEEE ICASSP, 1995, pp

EE Dept., IIT Bombay 22/19 [14]V. Stahl, A. Fisher, and R. Bipus, “Quantile based noise estimation for spectral subtraction and Wiener filtering,” in Proc. IEEE ICASSP, 2000, pp [15]M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in Proc. IEEE ICASSP, 1979, pp [16]S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech, Signal Process., vol. 27, no. 2, pp , [17]Y. Lu and P. C. Loizou, “A geometric approach to spectral subtraction,” Speech Commun., vol. 50, no. 6, pp , [18]S. K. Basha and P. C. Pandey, “Real-time enhancement of electrolaryngeal speech by spectral subtraction,” in Proc. Nat. Conf. Commun. 2012, Kharagpur, India, pp [19]K. Paliwal, K. Wójcicki, and B. Schwerin, “Single-channel speech enhancement using spectral subtraction in the short-time modulation domain,” Speech Commun., vol. 52, no. 5, pp. 450–475, [20]Texas Instruments, Inc., “TMS320C5515 Fixed-Point Digital Signal Processor,” 2011, [online] Available: focus.ti.com/lit/ds/symlink/ tms320c5515.pdf. [21]Spectrum Digital, Inc., “TMS320C5515 eZdsp USB Stick Technical Reference,” 2010, [online] Available: support.spectrumdigital.com/boards/usbstk5515/reva/files/ usbstk5515_TechRef_RevA.pdf [22]Texas Instruments, Inc., “TLV320AIC3204 Ultra Low Power Stereo Audio Codec,” 2008, [online] Available: focus.ti.com/lit/ds/ symlink/tlv320aic3204.pdf. [23]ITU, “Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow- band telephone networks and speech codecs,” ITU-T Rec., P.862, [24]S. K. Waddi, “Speech enhancement results”, 2013, [online] Available: ~spilab/material/santosh/ncc2013.