Frequency Band-Importance Functions for Auditory and Auditory- Visual Speech Recognition Ken W. Grant Walter Reed Army Medical Center Washington, D.C.

Slides:



Advertisements
Similar presentations
Considerations for the Development and Fitting of Hearing-Aids for Auditory-Visual Communication Ken W. Grant and Brian E. Walden Walter Reed Army Medical.
Advertisements

Learning Introductory Signal Processing Using Multimedia 1 Outline Overview of Information and Communications Some signal processing concepts Tools available.
The Fully Networked Car Geneva, 4-5 March Automotive Speech Enhancement of Today: Applications, Challenges and Solutions Tim Haulick Harman/Becker.
Chapter 18 Methodology – Monitoring and Tuning the Operational System Transparencies © Pearson Education Limited 1995, 2005.
Advances in Speech Synthesis
Tom Lentz (slides Ivana Brasileiro)
Clinical Applications of Spectral Analysis Winni Hofman, PhD University of Amsterdam Medcare Amsterdam.
Frequency representation The ability to use the spectrum or the fine structure of sound to detect, discriminate, or identify sound.
Improving audibility as a foundation for better speech understanding Pamela Souza, PhD Northwestern University Evanston, IL.
INTRODUCTION Human hearing and speech cover a wide frequency range from 20 to 20,000 Hz, but only a 300 to 3,400 Hz range is typically used for speech.
Visual speech speeds up the neural processing of auditory speech van Wassenhove, V., Grant, K. W., & Poeppel, D. (2005) Proceedings of the National Academy.
The case of the missing pitch templates: How harmonic templates emerge in the early auditory system Shihab Shamma and David Klein, 2000.
Advanced Speech Enhancement in Noisy Environments
Sound source segregation Development of the ability to separate concurrent sounds into auditory objects.
Periodicity and Pitch Importance of fine structure representation in hearing.
Room Acoustics: implications for speech reception and perception by hearing aid and cochlear implant users 2003 Arthur Boothroyd, Ph.D. Distinguished.
V Telecommunications Industry AssociationTR L.
Reflections Diffraction Diffusion Sound Observations Report AUD202 Audio and Acoustics Theory.
Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012.
SYED SYAHRIL TRADITIONAL MUSICAL INSTRUMENT SIMULATOR FOR GUITAR1.
Speech perception Relating features of hearing to the perception of speech.
ICA Madrid 9/7/ Simulating distance cues in virtual reverberant environments Norbert Kopčo 1, Scott Santarelli, Virginia Best, and Barbara Shinn-Cunningham.
Interrupted speech perception Su-Hyun Jin, Ph.D. University of Texas & Peggy B. Nelson, Ph.D. University of Minnesota.
Speech Perception Richard Wright Linguistics 453.
Spectral centroid 6 harmonics: f0 = 100Hz E.g. 1: Amplitudes: 6; 5.75; 4; 3.2; 2; 1 [(100*6)+(200*5.75)+(300*4)+(400*3.2)+(500*2 )+(600*1)] / = 265.6Hz.
Relationship between perception of spectral ripple and speech recognition in cochlear implant and vocoder listeners L.M. Litvak, A.J. Spahr, A.A. Saoji,
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
1 Recent development in hearing aid technology Lena L N Wong Division of Speech & Hearing Sciences University of Hong Kong.
Different evaluations for different kinds of hearing Matthew B. Winn Au.D., Ph.D. Waisman Center, UW-Madison Dept. of Surgery.
A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
CSD 5400 REHABILITATION PROCEDURES FOR THE HARD OF HEARING Auditory Perception of Speech and the Consequences of Hearing Loss.
♠♠♠♠ 1Intro 2.Loudness 3.Method. 4.Results 5.Concl. ♦♦ ◄◄ ► ► 1/161Intro 2.Loudness 3.Method. 4.Results 5.Concl. ♦♦ ◄ ► IIT Bombay ICA 2010 : 20th Int.
Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.
Speech Enhancement Using Spectral Subtraction
1 Auditory, tactile, and vestibular sensory systems n Perceptually relevant characteristics of sound n The receptor system: The ear n Basic sensory characteristics.
METHODOLOGY INTRODUCTION ACKNOWLEDGEMENTS LITERATURE Low frequency information via a hearing aid has been shown to increase speech intelligibility in noise.
Survey of ICASSP 2013 section: feature for robust automatic speech recognition Repoter: Yi-Ting Wang 2013/06/19.
Speech Perception 4/4/00.
1 Loudness and Pitch Be sure to complete the loudness and pitch interactive tutorial at … chophysics/pitch/loudnesspitch.html.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Need for cortical evoked potentials Assessment and determination of amplification benefit in actual hearing aid users is an issue that continues to be.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
Analog vs. Digital “’A Hearing Perspective” Andy Raguskus, CEO SONIC innovations, Inc.
Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.
Speech Intelligibility Derived from Asynchronous Processing of Auditory-Visual Information Steven Greenberg International Computer Science Institute 1947.
Temporal masking of spectrally reduced speech: psychoacoustical experiments and links with ASR Frédéric Berthommier and Angélique Grosgeorges ICP 46 av.
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
IIT Bombay {pcpandey,   Intro. Proc. Schemes Evaluation Results Conclusion Intro. Proc. Schemes Evaluation Results Conclusion.
 Sound effects  Our project  Background & techniques  Applications  Methodology  Results  Conclusion.
Speech Perception.
The Relation Between Speech Intelligibility and The Complex Modulation Spectrum Steven Greenberg International Computer Science Institute 1947 Center Street,
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
Multivariate time series analysis Bijan Pesaran Center for Neural Science New York University.
Predicting Speech Intelligibility Where we were… Model of speech intelligibility Good prediction of Greenberg’s bands Data.
The role of reverberation in release from masking due to spatial separation of sources for speech identification Gerald Kidd, Jr. et al. Acta Acustica.
What can we expect of cochlear implants for listening to speech in noisy environments? Andrew Faulkner: UCL Speech Hearing and Phonetic Sciences.
Spectral subtraction algorithm and optimize Wanfeng Zou 7/3/2014.
[1] National Institute of Science & Technology Technical Seminar Presentation 2004 Suresh Chandra Martha National Institute of Science & Technology Audio.
Speech Enhancement Algorithm for Digital Hearing Aids
Speech and Singing Voice Enhancement via DNN
Speech Enhancement Summer 2009
Auditory Localization in Rooms: Acoustic Analysis and Behavior
4aPPa32. How Susceptibility To Noise Varies Across Speech Frequencies
Copyright © American Speech-Language-Hearing Association
Consistent and inconsistent interaural cues don't differ for tone detection but do differ for speech recognition Frederick Gallun Kasey Jakien Rachel Ellinger.
Evaluation of Classroom Audio Distribution and Personal FM Systems
Speech Perception.
1-channel 2-channel 4-channel 8-channel 16-channel Original
Speechreading Perry C. Hanavan.
Presentation transcript:

Frequency Band-Importance Functions for Auditory and Auditory- Visual Speech Recognition Ken W. Grant Walter Reed Army Medical Center Washington, D.C

Background Speech recognition involves broadband listening. Speech recognition involves broadband listening. Information is not uniformly distributed across the frequency spectrum. Information is not uniformly distributed across the frequency spectrum. –different cues (spectral and temporal) of different relative value reside at different frequencies. –in general, more importance is placed at mid- frequencies around Hz. –probably related to place-of-articulation cues (F2/F3 transitions)

Background (continued) How can we determine the relative importance or weights that listeners place on various frequency regions? How can we determine the relative importance or weights that listeners place on various frequency regions? Doherty and Turner, 1996; Turner et al., 1998 Doherty and Turner, 1996; Turner et al., 1998 –correlational procedure (Lutfi, 1995; Richards and Zhu, 1994) applied to speech recognition. –partition speech into a number of spectral bands. –perturb each band so that amount of information in each band can be correlated with a listeners performance.

Correlation Method for Speech Band 1Band 2Band 3Band 4 Frequency (Hz)

Background (continued) Are the relative importance of different frequency regions altered by the presence of visual speech cues? Are the relative importance of different frequency regions altered by the presence of visual speech cues? Past results using isolated spectral bands of speech show that low-frequency speech provides more benefit to speechreading than other spectral regions (Grant and Walden, 1996). Past results using isolated spectral bands of speech show that low-frequency speech provides more benefit to speechreading than other spectral regions (Grant and Walden, 1996).

Background (continued) From Grant and Walden (1996). JASA, 100,

Background (continued) Evidence from electrophysiological studies show that visual speech cues fundamentally alter the way the auditory cortex responds to sound input (Calvert, 1977; van Wassenhove et al., 2005). Evidence from electrophysiological studies show that visual speech cues fundamentally alter the way the auditory cortex responds to sound input (Calvert, 1977; van Wassenhove et al., 2005). –reduction in N1-P2 amplitude. –latency shift in N2 peak for highly visible consonants.

Visual Speech Alters Neural Processing of Auditory Speech CPz From van Wassenhove, Grant, and Poeppel (2005). PNAS, 102,

Goals Determine relative importance of different frequency regions for auditory and auditory-visual speech. Determine relative importance of different frequency regions for auditory and auditory-visual speech. Minimize band-on-band interactions by partitioning the speech signal into widely spaced narrow bands. Minimize band-on-band interactions by partitioning the speech signal into widely spaced narrow bands.

Spectral Slits - Sentences From Greenberg, Arai, and Silipo (1998). Proc. ICSLP, Sydney, Dec %60%13% Slit Number CF (Hz) 2%9%9%4% Slit Number CF (Hz)

Spectral Slits - Consonants 91%76%63% Slit Number CF (Hz) 21%22%48%50% Slit Number CF (Hz)

Spectral Slits - Consonants 8.2%7.4%8.6%7.6% Slit Number CF (Hz) Individual band scores are too high for AV testing. AV scores would be at ceiling.Individual band scores are too high for AV testing. AV scores would be at ceiling. Different amounts of masking noise needed for each band.Different amounts of masking noise needed for each band. Goal in selecting noise levels was to:Goal in selecting noise levels was to: –make each band roughly equal in intelligibility. –make the the combination of all 4 bands roughly 40% intelligibile.

Correlation Method for Speech Frequency (Hz)

Band Number Normalized Band Importance A = 44.3% A = 70.9% Band Importance (Audio Alone)

Band Number Normalized Band Importance A = 44.3% AV = 78.1% Band Importance (A versus AV) A = 70.9%

Discussion – Audio Alone Frequency-importance functions for auditory alone conditions show that listeners consistently weighted band 2 the greatest. Frequency-importance functions for auditory alone conditions show that listeners consistently weighted band 2 the greatest. Relative importance changed slightly when the overall intelligibility of the auditory condition was increased. Relative importance changed slightly when the overall intelligibility of the auditory condition was increased. –band 2 still given the greatest weight. –relative weight for bands 3 and 4 are swapped.

Discussion – Audiovisual When visual speech cues are present, listeners place more importance on low frequencies. When visual speech cues are present, listeners place more importance on low frequencies. Results are consistent with past studies using isolated spectral bands of speech. Results are consistent with past studies using isolated spectral bands of speech. –low-frequency speech provides cues for voicing which is highly complementary with speechreading. –mid-to-high-frequency speech provides cues for place of articulation which is highly redundant with speechreading.

Conclusions - Questions For robust speech recognition, information must be extracted from many different spectral regions. For robust speech recognition, information must be extracted from many different spectral regions. The presence or absence of visual speech cues alters the importance of different spectral regions for the listener. The presence or absence of visual speech cues alters the importance of different spectral regions for the listener. For listening conditions where low-frequency speech cues are compromised (noise, reverberation, hearing loss), enhancement of the low frequencies of speech may be advantageous, especially in situations where visual cues are available. For listening conditions where low-frequency speech cues are compromised (noise, reverberation, hearing loss), enhancement of the low frequencies of speech may be advantageous, especially in situations where visual cues are available.