EE Dept., IIT Bombay IEEE Workshop on Intelligent Computing, IIIT Allahabad, 13-15 Oct. 2014 Signal processing for improving speech.

Slides:



Advertisements
Similar presentations
Sounds that “move” Diphthongs, glides and liquids.
Advertisements

Acoustic Characteristics of Consonants
Speech Perception Dynamics of Speech
Hearing Aids and Hearing Impairments Part II Meena Ramani 02/23/05.
Hearing and Deafness 2. Ear as a frequency analyzer Chris Darwin.
Hearing and Deafness Outer, middle and inner ear.
More From Music music through a cochlear implant Dr Rachel van Besouw Hearing & Balance Centre, ISVR.
Speech Science XII Speech Perception (acoustic cues) Version
Desirable Properties in Modern Compression Schemes Challenges to Get the Best Out of Today’s Technology.
EE Dept., IIT Bombay Workshop “AICTE Sponsored Faculty Development Programme on Signal Processing and Applications", Dept. of Electrical.
Hossein Sameti Department of Computer Engineering Sharif University of Technology.
Speech Perception Richard Wright Linguistics 453.
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
Audiology Training Course ——Marketing Dept. Configuration of the ear ① Pinna ② Ear canal ③ Eardrum ④ Malleus ⑤ Incus ⑥ Eustachian tube ⑦ Stapes ⑧ Semicircular.
Real-time Implementation of Multi-band Frequency Compression for Listeners with Moderate Sensorineural Impairment [Ref.: N. Tiwari, P. C. Pandey, P. N.
Dr. P. C. Pandey EE Dept, IIT Bombay Education B.Tech. (BHU, 1979), M.Tech. (IIT Kanpur,1981), Ph.D. (Toronto, 1987) Employment.
Hearing Aids and Hearing Impairments
SIGNAL PROCESSING IN HEARING AIDS
EE Dept., IIT Bombay NCC2014 Kanpur, 28 Feb.- 2 Mar. 2014, Paper No (Session III, Sat., 1 st Mar., 1020 – 1200) A Sliding-band.
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
Speech Enhancement Using Noise Estimation Based on
CSD 5400 REHABILITATION PROCEDURES FOR THE HARD OF HEARING Auditory Perception of Speech and the Consequences of Hearing Loss.
1 SPEECH PROCESSING FOR BINAURAL HEARING AIDS Dr P. C. Pandey EE Dept., IIT Bombay Feb’03.
EE Audio Signals and Systems Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), March 2013, Allahabad, India 09 March 2013 Speech.
IIT Bombay Dr. Prem C. Pandey Dr. Pandey is a Professor in Electrical Engineering at IIT Bombay. He is currently also the Associate.
Super Power BTE A great new Trimmer Family. The new & complete, fully digital Trimmer family ReSound is proud to introduce the complete new trimmer family,
Speech Perception1 Fricatives and Affricates We will be looking at acoustic cues in terms of … –Manner –Place –voicing.
♠♠♠♠ 1Intro 2.Loudness 3.Method. 4.Results 5.Concl. ♦♦ ◄◄ ► ► 1/161Intro 2.Loudness 3.Method. 4.Results 5.Concl. ♦♦ ◄ ► IIT Bombay ICA 2010 : 20th Int.
EE Dept., IIT Bombay Indicon2013, Mumbai, Dec. 2013, Paper No. 524 (Track 4.1,
Hearing Test ng_test/ ng_test/
1/18 1.Intro 2. Implementation 3. Results 4. Con.
EE Dept., IIT Bombay NCC 2013, Delhi, Feb. 2013, Paper 3.2_2_ ( Sat.16 th, 1135 – 1320, 3.2_2) Speech Enhancement.
EE Dept., IIT Bombay NCC 2015, Mumbai, 27 Feb.- 1 Mar. 2015, Paper No (28 th Feb., Sat., Session SI, 10:05 – 11:15, Paper.
♠ 1.Intro 2. List. tests 3. Results 4 Concl.♠♠ 1.Intro 2. List. tests 3. Results 4 Concl. ♥♥ ◄◄ ► ► 1/17♥♥◄ ► IIT Bombay ICA 2010 : 20th Int. Congress.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
COCHLEAR IMPLANTS Brittany M. Alphonse Biomedical Engineering BME 181.
IIT Bombay 1/26 Automated CVR Modification for Improving Perception of Stop Consonants A. R. Jayan & P. C. Pandey EE Dept, IIT.
Gammachirp Auditory Filter
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
IIT Bombay {pcpandey,   Intro. Proc. Schemes Evaluation Results Conclusion Intro. Proc. Schemes Evaluation Results Conclusion.
EE Dept., IIT Bombay P. C. Pandey, "Signal processing for persons with sensorineural hearing loss: Challenges and some solutions,”
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
EE Dept., IIT Bombay Part B Sliding-band Dynamic Range Compression (N. Tiwari & P. C. Pandey, NCC 2014) P. C. Pandey, "Signal processing.
EE Dept., IIT Bombay Workshop “Radar and Sonar Signal Processing,” NSTL Visakhapatnam, Aug 2015 Coordinator: Ms. M. Vijaya.
1 Introduction1 Introduction 2 Noise red. tech 3 Spect. Subtr. 4. QBNE 5 Invest. QBNE 6 Conc., & future work2 Noise red. tech 3 Spect. Subtr.4. QBNE5 Invest.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
You better be listening… Auditory Senses Sound Waves Amplitude  Height of wave  Determines how loud Wavelength  Determines pitch  Peak to peak High.
IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.
Speech Generation and Perception
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
EE Dept., IIT Bombay CEP-cum-TEQUIP-KITE Course “Digital Signal Processing”, IIT Bombay, 2–6 November 2015, Course Coordinator:
Hearing As with the eye, the ear receives waves, this time of sounds. As with the eye, the ear receives waves, this time of sounds. Length of wave = pitch.
IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 1/30 Intro.Intro. Clear speech.
Saketh Sharma, Nitya Tiwari, & Prem C. Pandey
Hearing tests.
You better be listening…
Automated Detection of Speech Landmarks Using
Dr. Prem C. Pandey Dr. Pandey is a Professor in Electrical Engineering at IIT Bombay. He is currently also the Associate Dean of Academic Programmes.
Hearing, not trying out for a play
A Smartphone App-Based
Speech Generation and Perception
Fang Du, Dr. Christina L. Runge, Dr. Yi Hu. April 21, 2018
Speech Perception (acoustic cues)
Speech Generation and Perception
Presentation transcript:

EE Dept., IIT Bombay IEEE Workshop on Intelligent Computing, IIIT Allahabad, Oct Signal processing for improving speech perception by persons with sensorineural hearing loss: Challenges and some solutions P. C. Pandey IIT Bombay

EE Dept., IIT Bombay 2/15 Outline A. Speech & Hearing B. Sliding-band Dynamic Range Compression (N. Tiwari & P. C. Pandey, NCC 2014) C. Automated modification of consonant-vowel ratio of stops (A. R. Jayan & P. C Pandey, Int. J. Speech Technology, 2014) Outline A. Speech & Hearing B. Sliding-band Dynamic Range Compression (N. Tiwari & P. C. Pandey, NCC 2014) C. Automated modification of consonant-vowel ratio of stops (A. R. Jayan & P. C Pandey, Int. J. Speech Technology, 2014)

EE Dept., IIT Bombay 3/15 Part A Speech & Hearing P. C. Pandey, "Signal processing for improving speech perception by persons with sensorineural hearing loss: Challenges and some solutions", IEEE Workshop on Intelligent Computing, IIIT Allahabad, Oct. 2014

EE Dept., IIT Bombay 4/15 Speech Production Mechanism Excitation source & filter model Excitation: voiced/unvoiced glottal, frication Filtering: vocal tract filter

EE Dept., IIT Bombay 5/15 Speech segments Words Syllables Phonemes Sub-phonemic segments Phonemes: basic speech units Vowels: Pure vowels, Diphthongs Consonants: Semivowels, Stops, Fricatives, Affricates, Nasals /aba/ /apa/ /aga/ /ada/

EE Dept., IIT Bombay 6/15 Phonemic features Modes of excitation Glottal Unvoiced (constriction at the glottis), Voiced (glottal vibration) Frication Unvoiced (constriction in vocal tract), Voiced (constriction in v.t., glottal vibration) Movement of articulators Continuant (steady-state v.t. configuration): vowels, nasal stops, fricatives Non-continuant (changing v.t.): diphthongs, semivowels, oral stops (plosives) Place of articulation (place of maximum constriction in v.t.) Bilabial, Labio-dental, Linguo-dental, Alveolar, Palatal, Velar, Gluttoral Changes in voicing frequency (Fo) Supra-segmental features: Intonation, Rhythm

EE Dept., IIT Bombay 7/15 Hearing Mechanism Peripheral auditory system External ear: sound collection ○ Pinna ○ Auditory canal Middle ear: impedance matching ○ Ear drum ○ Middle ear bones Inner ear (cochlea): analysis & transduction Auditory nerve: transmission of neural impulses Central auditory system Information processing & interpretation

EE Dept., IIT Bombay 8/15 Tonotopic map of cochlea Auditory system

EE Dept., IIT Bombay 9/15 Hearing impairment Types of hearing losses Conductive Sensorineural Central Functional Sensorineural hearing loss: abnormalities in the cochlear hair cells or the auditory nerve Aging Excessive exposure to noise Infection Adverse effect of medicines Congenital Hearing impairment Types of hearing losses Conductive Sensorineural Central Functional Sensorineural hearing loss: abnormalities in the cochlear hair cells or the auditory nerve Aging Excessive exposure to noise Infection Adverse effect of medicines Congenital

EE Dept., IIT Bombay 10/15 Effects of sensorineural hearing loss Elevated hearing thresholds: inaudibility of low-level sounds Reduced dynamic range & loudness recruitment (abnormal loudness growth): distortion of loudness relationship among speech components Increased temporal masking: poor detection of acoustic landmarks Increased spectral masking (widening of auditory filters): reduced ability to sense spectral shapes >> Poor intelligibility and degraded perception of speech, particularly in noisy environment. Effects of sensorineural hearing loss Elevated hearing thresholds: inaudibility of low-level sounds Reduced dynamic range & loudness recruitment (abnormal loudness growth): distortion of loudness relationship among speech components Increased temporal masking: poor detection of acoustic landmarks Increased spectral masking (widening of auditory filters): reduced ability to sense spectral shapes >> Poor intelligibility and degraded perception of speech, particularly in noisy environment.

EE Dept., IIT Bombay 11/15 Signal processing in hearing aids Currently available techniques Frequency selective amplification: improves audibility but not necessarily intelligibility Automatic volume control: not effective in improving intelligibility Multichannel dynamic range compression (with settable attack & release times, compression ratios): effectiveness reduced due to processing artifacts Signal processing in hearing aids Currently available techniques Frequency selective amplification: improves audibility but not necessarily intelligibility Automatic volume control: not effective in improving intelligibility Multichannel dynamic range compression (with settable attack & release times, compression ratios): effectiveness reduced due to processing artifacts

EE Dept., IIT Bombay 12/15 Techniques under development Noise suppression Distortion-free dynamic range compression Techniques for reducing the effects of increased spectral masking o Binaural dichotic presentation o Spectral contrast enhancement o Multi-band frequency compression Improvement of consonant-to-vowel ratio (CVR): for reducing the effects of increased temporal masking Techniques under development Noise suppression Distortion-free dynamic range compression Techniques for reducing the effects of increased spectral masking o Binaural dichotic presentation o Spectral contrast enhancement o Multi-band frequency compression Improvement of consonant-to-vowel ratio (CVR): for reducing the effects of increased temporal masking

EE Dept., IIT Bombay 13/15 Analog Hearing Aids Pre-amp → AVC → Selectable Freq. Response → Amp. Digital Hearing Aids Pre-amp & AVC → ADC → Multi-band Amplitude Compr. & Freq. Resp. → DAC & Amp. Existing Problems Noisy environment & reverberation Distortions due to multiband amplitude compression Poor speech perception due to increased spectral & temporal masking Visit to audiologist for change of settings Analog Hearing Aids Pre-amp → AVC → Selectable Freq. Response → Amp. Digital Hearing Aids Pre-amp & AVC → ADC → Multi-band Amplitude Compr. & Freq. Resp. → DAC & Amp. Existing Problems Noisy environment & reverberation Distortions due to multiband amplitude compression Poor speech perception due to increased spectral & temporal masking Visit to audiologist for change of settings

EE Dept., IIT Bombay 14/15 Proposed Hearing Aids Distortion-free dynamic range compression & adjustable frequency response Noise suppression & de-reverberation Processing for reducing the effects of increased spectral masking Processing for reducing the effects of increased temporal masking User selectable settings Implementation using a low-power DSP chip with acceptable signal delay (< 60 ms) Proposed Hearing Aids Distortion-free dynamic range compression & adjustable frequency response Noise suppression & de-reverberation Processing for reducing the effects of increased spectral masking Processing for reducing the effects of increased temporal masking User selectable settings Implementation using a low-power DSP chip with acceptable signal delay (< 60 ms)

EE Dept., IIT Bombay 15/15 Some Solutions for improving speech perception by listeners with moderate-to- severe sensorineural loss Sliding-band dynamic range compression as a solution to the problem posed by loudness recruitment Automated modification of consonant-vowel ratio of stop consonants as a solution to the problem posed by increased intraspeech spectral and temporal masking. Implementation using a 16-bit fixed-point DSP processor & testing for satisfactory operation. Some Solutions for improving speech perception by listeners with moderate-to- severe sensorineural loss Sliding-band dynamic range compression as a solution to the problem posed by loudness recruitment Automated modification of consonant-vowel ratio of stop consonants as a solution to the problem posed by increased intraspeech spectral and temporal masking. Implementation using a 16-bit fixed-point DSP processor & testing for satisfactory operation.

EE Dept., IIT Bombay To be continued to Part B.

EE Dept., IIT Bombay Workshop: IEEE Workshop on Intelligent Computing, Allahabad, October 13-15, 2014, organized jointly by CSIR-CEERI Pilani and IIIT Allahabad. Speaker: Prof. P. C. Pandey, EE Dept, IIT Bombay Topic: Signal processing for improving speech perception by persons with sensorineural hearing loss: Challenges and some solutions Abstract Sensorineural hearing loss is caused by abnormalities in the cochlear hair cells or the auditory nerve. It occurs due to aging, excessive exposure to noise, infection, or abnormalities at the time of birth. It is generally associated with elevated hearing thresholds, reduced dynamic range, and increased temporal and spectral masking, leading to degraded perception of speech, particular in noisy environment. Several signal processing techniques have been investigated and reported to address these problems. However, most of these are not suited for use in hearing aids due to distortions caused by processing related artifacts or due to constraints of size, power, and acceptable signal delay. As some of the possible solutions, two signal processing techniques have been investigated: (i) a sliding-band dynamic range compression as a solution to the problem posed by loudness recruitment, and (ii) automated modification of consonant-vowel ratio of stop consonants as a solution to the problem posed by increased intraspeech spectral and temporal masking. Both techniques have been implemented using a 16-bit fixed-point DSP processor and tested for satisfactory operation. Persons with sensorineural loss generally have a highly reduced dynamic range of hearing, with a significant frequency-dependent elevation of hearing threshold levels without corresponding increase in the upper comfortable listening levels. Signal processing for dynamic range compression is used to present the sounds comfortably within the limited dynamic range of the listener. Analog hearing aids generally use single- band compression with the gain being dependent on the time-varying signal level. As the power is mostly contributed by the low-frequency components, the amplification of the high-frequency components depends on the energy in the low-frequency components. Thus the high frequency components may become inaudible and distortions in temporal envelope may get introduced. In multiband compression available in most digital hearing aids, the spectral components of the input signal are divided in multiple bands and the gain for each band is calculated on the basis of signal power in that band. This type of processing can introduce spurious spectral distortions and use of a large number of bands reduces spectral contrasts and the modulation depth of speech, resulting in an adverse effect on the perception of certain speech cues. Further, the frequency response of a multiband compression system has a time-varying magnitude response without corresponding variation in the phase response, which can cause audible distortions, particularly for non-speech audio. These distortions may partly offset the advantages of dynamic range compression for the hearing-impaired listener. In order to significantly reduce the temporal and spectral distortions associated with the currently used single-band and multiband compressions in hearing aids, a "sliding-band compression" has been developed. It involves calculating a frequency-dependent gain function, in which the gain for each spectral sample is determined by the short-time power in an auditory critical band centered at it. The gain calculation takes into account the specified hearing thresholds, compression ratios, and attack and release times. Unlike single-band compression, it does not result in any significant temporal distortions because the effect of short-time energy of a spectral component on other spectral components is limited to those located within a critical bandwidth. Due to use of sliding critical bands for calculating the power spectrum, formant transitions do not result in discontinuities in the processed output. The technique is realized using an FFT-based analysis-synthesis method which masks phase related discontinuities and can be integrated with other FFT-based signal processing in hearing aids. The technique is implemented and tested for satisfactory real-time operation on a 16-bit fixed-point DSP processor.

EE Dept., IIT Bombay Increasing the level of the consonant segments relative to the nearby vowel segments, known as consonant-vowel ratio (CVR) modification, is reported to be effective in improving speech intelligibility for listeners in noisy backgrounds and for hearing impaired listeners. A technique for real-time CVR modification of stops using the rate of change of spectral centroid for detection of spectral transitions is presented. Its effectiveness in improving the recognition of consonants in the presence of speech spectrum shaped noise is evaluated by conducting listening tests on normal-hearing subjects. At lower values of SNR, there was an increase of % in recognition scores and an equivalent SNR advantage of 3 dB. The technique is implemented on a DSP board based on a 16-bit fixed point processor with on-chip FFT hardware and tested for satisfactory real-time operation. References [1] N. Tiwari and P. C. Pandey, A sliding-band dynamic range compression for use in hearing aids, Proc. National Conference on Communications 2014 (NCC 2014), Kanpur, Feb Mar. 2, 2014, paper no [2] A. R. Jayan & P. C. Pandey, "Automated modification of consonant-vowel ratio of stops for improving speech intelligibility", Int. J. Speech Technology, 2014 (accepted for publication). Dr. Prem C. Pandey Dr. Pandey is a Professor in Electrical Engineering at IIT Bombay. He is currently also the Associate Dean of Academic Programmes. He received B.Tech. in electronics engineering from Banaras Hindu University in 1979, M.Tech. in electrical engineering from IIT Kanpur in 1981, and Ph.D. in electrical & biomedical engineering from the University of Toronto (Canada) in In 1987, he joined the University of Wyoming (USA) as an assistant professor and later joined IIT Bombay in His research interests include speech & signal processing; biomedical signal processing & instrumentation; electronic instrumentation & embedded system design. The focus of his R&D efforts has been in the areas of impedance cardiography and aids for persons with speech and hearing impairment.