Interrupted speech perception Su-Hyun Jin, Ph.D. University of Texas & Peggy B. Nelson, Ph.D. University of Minnesota.

Slides:



Advertisements
Similar presentations
Considerations for the Development and Fitting of Hearing-Aids for Auditory-Visual Communication Ken W. Grant and Brian E. Walden Walter Reed Army Medical.
Advertisements

Frequency Band-Importance Functions for Auditory and Auditory- Visual Speech Recognition Ken W. Grant Walter Reed Army Medical Center Washington, D.C.
Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements Christopher A. Shera, John J. Guinan, Jr., and Andrew J. Oxenham.
Figures for Chapter 7 Advanced signal processing Dillon (2001) Hearing Aids.
Sounds that “move” Diphthongs, glides and liquids.
Hearing relative phases for two harmonic components D. Timothy Ives 1, H. Martin Reimann 2, Ralph van Dinther 1 and Roy D. Patterson 1 1. Introduction.
Auditory scene analysis 2
Frequency selectivity of the auditory system. Frequency selectivity Important for aspects of auditory perception such as, pitch, loudness, timbre, melody,
3pSC9. Effect of reduced audibility on masking release for normal- and hard-of-hearing listeners Peggy Nelson, Yingjiu Nie, Elizabeth Anderson, Bhagyashree.
Hearing Aids and Hearing Impairments Part II Meena Ramani 02/23/05.
Timbre perception. Objective Timbre perception and the physical properties of the sound on which it depends Formal definition: ‘that attribute of auditory.
Periodicity and Pitch Importance of fine structure representation in hearing.
Room Acoustics: implications for speech reception and perception by hearing aid and cochlear implant users 2003 Arthur Boothroyd, Ph.D. Distinguished.
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
Hearing and Deafness Outer, middle and inner ear.
Speech Science XII Speech Perception (acoustic cues) Version
Jessica E. Huber Ph.D. in Speech Science from University at Buffalo MA in Speech-Language Pathology, Certified Speech- Language Pathologist Assistant Professor,
Pitch Perception.
Chapter 6: Masking. Masking Masking: a process in which the threshold of one sound (signal) is raised by the presentation of another sound (masker). Masking.
Vocal Emotion Recognition with Cochlear Implants Xin Luo, Qian-Jie Fu, John J. Galvin III Presentation By Archie Archibong.
A.Diederich– International University Bremen – Sensation and Perception – Fall Frequency Analysis in the Cochlea and Auditory Nerve cont'd The Perception.
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Localization cues with bilateral cochlear implants Bernhard U. Seeber and Hugo Fastl (2007) Maria Andrey Berezina HST.723 April 8 th, 2009.
Fitting Formulas Estimate amplification requirements of individual patients Maximize intelligibility of speech Provide good overall sound quality Keep.
Spectral centroid 6 harmonics: f0 = 100Hz E.g. 1: Amplitudes: 6; 5.75; 4; 3.2; 2; 1 [(100*6)+(200*5.75)+(300*4)+(400*3.2)+(500*2 )+(600*1)] / = 265.6Hz.
TOPIC 4 BEHAVIORAL ASSESSMENT MEASURES. The Audiometer Types Clinical Screening.
Relationship between perception of spectral ripple and speech recognition in cochlear implant and vocoder listeners L.M. Litvak, A.J. Spahr, A.A. Saoji,
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
Cross-Spectral Channel Gap Detection in the Aging CBA Mouse Jason T. Moore, Paul D. Allen, James R. Ison Department of Brain & Cognitive Sciences, University.
Sound source segregation (determination)
Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06.
Speech Segregation Based on Sound Localization DeLiang Wang & Nicoleta Roman The Ohio State University, U.S.A. Guy J. Brown University of Sheffield, U.K.
CSD 5400 REHABILITATION PROCEDURES FOR THE HARD OF HEARING Auditory Perception of Speech and the Consequences of Hearing Loss.
Linical & Experimental Audiology Speech-in-noise screening tests by internet; improving test sensitivity for noise-induced hearing loss Monique Leensen.
Super Power BTE A great new Trimmer Family. The new & complete, fully digital Trimmer family ReSound is proud to introduce the complete new trimmer family,
METHODOLOGY INTRODUCTION ACKNOWLEDGEMENTS LITERATURE Low frequency information via a hearing aid has been shown to increase speech intelligibility in noise.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
perceptual constancy in hearing speech played in a room, several metres from the listener has much the same phonetic content as when played nearby despite.
Sh s Children with CIs produce ‘s’ with a lower spectral peak than their peers with NH, but both groups of children produce ‘sh’ similarly [1]. This effect.
Sounds in a reverberant room can interfere with the direct sound source. The normal hearing (NH) auditory system has a mechanism by which the echoes, or.
Hearing & Aging Or age brings wisdom and other bad news.
Need for cortical evoked potentials Assessment and determination of amplification benefit in actual hearing aid users is an issue that continues to be.
creating sound value TM Spatial release from masking deficits in hearing-impaired people: Is inadequate audibility the problem? Helen.
Applied Psychoacoustics Lecture 3: Masking Jonas Braasch.
Hearing Research Center
Temporal masking of spectrally reduced speech: psychoacoustical experiments and links with ASR Frédéric Berthommier and Angélique Grosgeorges ICP 46 av.
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
Additivity of auditory masking using Gaussian-shaped tones a Laback, B., a Balazs, P., a Toupin, G., b Necciari, T., b Savel, S., b Meunier, S., b Ystad,
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
Predicting the Intelligibility of Cochlear-implant Vocoded Speech from Objective Quality Measure(1) Department of Electrical Engineering, The University.
Figures for Chapter 8 Candidacy Dillon (2001) Hearing Aids.
Fletcher’s band-widening experiment (1940)
What can we expect of cochlear implants for listening to speech in noisy environments? Andrew Faulkner: UCL Speech Hearing and Phonetic Sciences.
Speech Audiometry Lecture 8.
Speech and Singing Voice Enhancement via DNN
Auditory Localization in Rooms: Acoustic Analysis and Behavior
4aPPa32. How Susceptibility To Noise Varies Across Speech Frequencies
PSYCHOACOUSTICS A branch of psychophysics
Precedence-based speech segregation in a virtual auditory environment
Prescribing hearing aids and the new NAL-NL2 prescription rule
Consistent and inconsistent interaural cues don't differ for tone detection but do differ for speech recognition Frederick Gallun Kasey Jakien Rachel Ellinger.
Evaluation of Classroom Audio Distribution and Personal FM Systems
Ana Alves-Pinto, Joseph Sollini, Toby Wells, and Christian J. Sumner
Copyright © American Speech-Language-Hearing Association
The influence of hearing loss and age on sensitivity to temporal fine structure Brian C.J. Moore Department of Experimental Psychology, University of Cambridge,
FM Hearing-Aid Device Checkpoint 2
Speech Perception (acoustic cues)
The Role of Temporal Fine Structure Processing
Aging, Hearing Loss and Amplification: Beyond the Audiogram
Presentation transcript:

Interrupted speech perception Su-Hyun Jin, Ph.D. University of Texas & Peggy B. Nelson, Ph.D. University of Minnesota

Introduction In everyday life, speech is not equally intelligible over time due to background noise  quite fluctuating rather than steady in terms of time, amplitude and frequency Normal hearing (NH) listeners can take advantage of fluctuating nature of noise  Masking release: improvements in speech recognition in fluctuating noise compared to performance in steady noise.  Listener ’ s ability to utilize the momentary decrease (dips) in fluctuating noise to resolve the background fluctuations in order to extract speech information (Dubno et al., 2002; Nelson et al. 2003; Jin & Nelson, 2006)

Speech and Noise (NH) Quiet speech Speech in noise Speech in fluctuating noise

Introduction Effect of noise on HI speech perception  only little or no release from masking in fluctuating noise  Even in listeners with mild hearing loss ( Bacon et al., 1998; Dubno et al. 2002) speech recognition in steady noise was close to normal SOME significantly worse than normal in fluctuating noise  Speech perception in fluctuating noise may provide a more sensitive measure of impairment due to hearing loss

Jin & Nelson (2006) Investigated the relationship between the amount of masking release (MR) and hearing sensitivity and temporal resolution in NH and HI listeners  Sentence and consonant recognition in quiet, steady and fluctuating noise  Hearing sensitivity and forward masking

Jin & Nelson (06) Amplification and shaping Two-stage process to amplify speech and noise for HI listeners  Shaping applied based on half-gain rule to compensate for hearing loss configuration  Overall amplification added to bring listeners to maximum sentence recognition (90% or better) in quiet  Process applied to speech and noise for each HI listener individually

Jin & Nelson (06) Role of hearing sensitivity  Listeners with sensorineural hearing loss showed reduced hearing sensitivity The effect of noise on HI listeners becomes more detrimental than it is to NH listeners because they already have reduced redundancy of the speech signal in quiet (Van Tasell, 1993). fluct Role of temporal resolution  Compared to NH, HI listeners are more affected by non- simultaneous maskers  Strong correlation between masking release and forward masking threshold (Dubno et al., 2002) Observed that syllable recognition in fluctuating noise might be associated with age-related increases in forward-masked thresholds

Result 1: sentence recognition Percent correct keyword identification at -5 dB SNR

Result 1: Sentence recognition Masking release at -5 dB SNR

Result 1: Speech recognition for both sentence and syllable identification In Quiet and Steady noise  There were no significant NH and HI group differences (p > 0.177). In Gated noise  Significant improvements in gated noise (over steady) were seen for both NH and HI listeners.  NH listeners could take advantage of gating about to regain 80% of their quiet performance whereas HI listeners only gained 15% to 55%.

Result 1b: Percent correct syllable identification and MR at -5 dB SNR

Result 2: FWD Masked Threshold Masked thresholds for HI listeners were higher than those for NH listeners  Those with lower thresholds in quiet showed better thresholds in the presence of noise. The slopes of recovery function were shallower for HI than NH listeners  Those HI listeners with close to normal recovery function showed relatively larger amount of masking release in fluctuating noise cv?

Discussion Relation between MR and other measures  High negative correlation (r  -0.8) between the amount of MR and the results of forward masking for both NH and HI groups  When the results of HI listeners were analyzed, only a few factors retained a strong correlation with MR. Hearing thresholds at 0.5 and 1.0 kHz, and FWD thresholds at 2 kHz The MR for sentence recognition and CV syllable recognition were analyzed separately.  ??

Result : regression analyses The strength of the relationship between MR and the predictors:  For sentence recognition in gated noise, hearing sensitivity at low-to-mid frequencies (0.5 and 1 k Hz) accounted for a substantial proportion of variance in the MR.  For consonant recognition, forward masked thresholds contributed primarily to the variance in the MR.

Discussion HI listeners who performed close to normal in quiet and in steady noise still show reduced masking release for both sentence and consonant recognition in gated noise  The overall pattern of masking release measured using sentences and using CV syllables was similar.

Discussion Several factors seemed to contribute to the MR  Hearing sensitivity for low-to-mid frequencies (0.5 kHz and 1.0 kHz) were strongly related to the MR for sentence recognition but not for syllable identification.  Forward masking thresholds were more strongly related to the MR for CV syllable identification. Consistent with Dubno et al. (2002)

Follow-up Continued to investigate additional factors that might contribute the reduced sentence recognition in fluctuating noise for HI listeners  Auditory integration  Frequency resolution

Follow-up Role of spectral resolution  Hearing impairment often associated with reduced frequency selectivity. Reduced auditory representation of spectral peaks and valleys in speech (Miller et al., 1997)  Cochlear implant listeners Had little MR for sentence recognition in fluctuating noise (Nelson et al., 2003). Are known to have normal like temporal resolution (Nelson & Donaldson, 2001) but limited spectral resolution. Reduced MR may be related to broader auditory filters

Follow-up Role of auditory integration  Understanding speech in real life requires a listener to analyze complex sounds and separate the acoustic characteristics of the input signal from background noise.  This process has been known as auditory stream segregation (Bregman, 1990).  Listeners seem to be able to segregate into streams when sounds are different in spectral cues (Rose & Moore), or in the harmonicity and fundamental frequency (F0) of a signal (Qin & Oxenham, 2003).

Follow-up Two tasks  Auditory filter characteristics  Auditory integration : interrupted sentence recognition the same NH and HI listeners who participated in Jin & Nelson (2006) Analysis  Examine the relationship between the MR from Jin & Nelson (2006) and the auditory filter shape and interrupted speech recognition

Follow-up Auditory filter characteristics  Used the filter shape equation (Patterson et al., 1982) Estimate equivalent rectangular bandwidths (ERB) and slope (p) of the auditory filters for 2000 and 4000 Hz Interrupted IEEE sentence recognition by silence gap  IEEE sentences were gated at rates of 1, 2, 4, 8 and 16 Hz.  There was no noise (no masking).  Depending on the gate frequencies, whole or only parts of words were available to listeners.  Percent correct keyword identification was recorded.

Speech in gated noise VS interrupted speech silent gaps Fluctuating noise at 8 Hz Interrupted sentence at 8 Hz

Follow-up Result 1: frequency resolution Compared to NH group, HI listeners showed greater ERBs and shallower slopes for both 2 kHz and 4 kHz.  The average ERB for HI listeners (for both 2 kHz and 4 kHz) were about times to those of NH group.

Follow-up Result 2: interrupted sentence recognition The average percent correct scores of the NH group at each gate frequency were higher than the scores of HI listeners. The relationship between sentence recognition in gated noise and interrupted sentence recognition for both NH and HI listeners were significant (r ≈ 0.8).  when the scores of HI listeners only were compared, the correlation remained strong (r ≥ 0.8)

Follow-up Result 2: interrupted sentence recognition

Discussion Several factors seemed to contribute to the MR  Hearing sensitivity for low-to-mid frequencies (0.5 kHz and 1.0 kHz) as well as the auditory filter shape at 2 and 4 kHz were strongly related to the MR for sentence recognition.  Understanding speech interrupted either by noise or silence might require a similar underlying integration process. Percent correct scores for interrupted sentence recognition and sentence recognition in gated noise were strongly correlated. The hearing sensitivity for the low-to-mid frequencies and ERBs that were significant predictors for both the MR and interrupted sentence recognition

Follow-up 2: interrupted sentence recognition The degree of MR in sentence recognition seemed to be correlated with low-mid frequency hearing sensitivity  Ga ë tan & Christophe (2002) found older listeners with mild HL put more perceptual weight on mid frequency band ( Hz) compared to NH listeners.  Qin & Oxenham (2003) suggested that a strong pitch cue is important for a listener to segregate speech from noise. Low-mid frequency information may be more important for understanding sentence with competing noise for HI listeners  Reducing low-frequency gain to improve comfort in noise may have unwanted consequences for HI listeners

Follow-up 2 Task: Understanding IEEE sentences  in quiet  in steady noise  in gated noise  gated with silent gaps Participants  10 young adult NH listeners

Follow-up 2 Stimuli  Both speech and noise were either unprocessed (natural) or processed through 4 different filters  Band-pass (bp) filter cutoff frequencies: kHz, kHz, 1-2 kHz, and 2-3 kHz

Follow-up 2 Speech  IEEE sentences spoken by ten talkers  presented at comfortable level (70-75 dB SPL) for individual NH listeners Noise  Long-term spectrum of speech (SNR of -5 dB)  steady or fluctuating  Gate frequency: 8 Hz  the RMS amplitude of noise was adjusted relative to the RMS of the target speech

Results QuietNoise_SteadyNoise_Gated Natural bp 0.5_0.75 kHz bp 0.75_1 kHz bp 1_2 kHz bp 2_3 kHz

Results In quiet, performance was similar across filter conditions (above 90 %) except band-pass filtered at 1 kHz-2 kHz. In steady noise at -5 SNR, the performance was quite low (below 5 % ).  except when only mid-to-high frequency information was audible (bp 2-3 kHz) In gated noise, performance was significantly better than in steady noise.  NH listeners were able to use the limited spectral information in the dips of noise to separate speech from noise  except bp 1_2 kHz poorer performance both in quiet and steady noise compared to other filtering conditions

Percent correct key word identification Masking Release

Key word identification of interrupted sentences VS Masking Release

Results the performance in interrupted speech recognition showed similar pattern to the performance in gated noise  If only limited spectral information is available, it results in low performance level in both gated noise and interrupted speech (for example, bp1_2 kHz).  With more spectral cues available in dips, the performance was better in both conditions.

Discussion NH listeners were able to use limited spectral information to understand speech in quiet and gated noise. Using high frequency cues in filtered speech such as bp2_3 kHz, NH listeners were able to segregate speech from gated noise better and showed greater amount of masking release.  the high frequency cues might be more important for speech recognition in complex noise background than the low frequency ones for NH listeners.  it would be less available for HI listeners

Discussion Similar performance in recognition of speech with gated noise and recognition of gated speech  Understanding interrupted speech is dominated by the information in the dips. (Limited information in the dips = limited masking release)  suggesting that masking release is determined by listeners' ability to decode the information in the dips. (consistent with Kwon and Turner, 2001.)

Future directions??????? Investigate recognition of narrow band pass filtered speech from HI listeners  Different degrees and configuration of HL Compare the perceptual weight function of NH and HI listeners by using sentence recognition in different types of noise. Implications