Extracting the pinna spectral notches Vikas C. Raykar | Ramani Duraiswami University of Maryland, CollegePark B. Yegnanaryana Indian Institute of Technology,

Slides:



Advertisements
Similar presentations
Auditory Localisation
Advertisements

Sound Localization Superior Olivary Complex. Localization: Limits of Performance Absolute localization: localization of sound without a reference. Humans:
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Image Processing Lecture 4
Psychoacoustics Perception of Direction AUD202 Audio and Acoustics Theory.
Binaural Hearing Or now hear this! Upcoming Talk: Isabelle Peretz Musical & Non-musical Brains Nov. 12 noon + Lunch Rm 2068B South Building.
Computational Rhythm and Beat Analysis Nick Berkner.
Spatial Perception of Audio J. D. (jj) Johnston Neural Audio Corporation.
3-D Sound and Spatial Audio MUS_TECH 348. Psychology of Spatial Hearing There are acoustic events that take place in the environment. These can give rise.
3-D Sound and Spatial Audio MUS_TECH 348. Wightman & Kistler (1989) Headphone simulation of free-field listening I. Stimulus synthesis II. Psychophysical.
Localizing Sounds. When we perceive a sound, we often simultaneously perceive the location of that sound. Even new born infants orient their eyes toward.
Improvement of Audibility for Multi Speakers with the Head Related Transfer Function Takanori Nishino †, Kazuhiro Uchida, Naoya Inoue, Kazuya Takeda and.
Source Localization in Complex Listening Situations: Selection of Binaural Cues Based on Interaural Coherence Christof Faller Mobile Terminals Division,
Development of sound localization
1 Introduction to MPEG Surround 韓志岡 2/9/ Outline Background – Motivation – Perception of sound in space Pricicple of MPEG Surround – Downmixing.
3-D Spatialization and Localization and Simulated Surround Sound with Headphones Lucas O’Neil Brendan Cassidy.
Hearing & Deafness (3) Auditory Localisation
AUDITORY PERCEPTION Pitch Perception Localization Auditory Scene Analysis.
STUDIOS AND LISTENING ROOMS
Spectral centroid 6 harmonics: f0 = 100Hz E.g. 1: Amplitudes: 6; 5.75; 4; 3.2; 2; 1 [(100*6)+(200*5.75)+(300*4)+(400*3.2)+(500*2 )+(600*1)] / = 265.6Hz.
Sound source segregation (determination)
Binaural Sound Localization and Filtering By: Dan Hauer Advisor: Dr. Brian D. Huggins 6 December 2005.
Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:
3-D Sound and Spatial Audio MUS_TECH 348. What are Some Options for Creating DTFs?
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Speech Segregation Based on Sound Localization DeLiang Wang & Nicoleta Roman The Ohio State University, U.S.A. Guy J. Brown University of Sheffield, U.K.
Topics covered in this chapter
Binaural Sonification of Disparity Maps Alfonso Alba, Carlos Zubieta, Edgar Arce Facultad de Ciencias Universidad Autónoma de San Luis Potosí.
On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan,
Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.
On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan,
Image recognition using analysis of the frequency domain features 1.
Improved 3D Sound Delivered to Headphones Using Wavelets By Ozlem KALINLI EE-Systems University of Southern California December 4, 2003.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Applied Psychoacoustics Lecture: Binaural Hearing Jonas Braasch Jens Blauert.
Issac Garcia-Munoz Senior Thesis Electrical Engineering Advisor: Pietro Perona.
3-D Sound and Spatial Audio MUS_TECH 348. Main Types of Errors Front-back reversals Angle error Some Experimental Results Most front-back errors are front-to-back.
THE MANIFOLDS OF SPATIAL HEARING Ramani Duraiswami | Vikas C. Raykar Perceptual Interfaces and Reality Lab University of Maryland, College park.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Audio Systems Survey of Methods for Modelling Sound Propagation in Interactive Virtual Environments Ben Tagger Andriana Machaira.
3-D Sound and Spatial Audio MUS_TECH 348. Physical Modeling Problem: Can we model the physical acoustics of the directional hearing system and thereby.
L INKWITZ L AB S e n s i b l e R e p r o d u c t i o n & R e c o r d i n g o f A u d i t o r y S c e n e s Hearing Spatial Detail in Stereo Recordings.
Spatial and Spectral Properties of the Dummy-Head During Measurements in the Head-Shadow Area based on HRTF Evaluation Wersényi György SZÉCHENYI ISTVÁN.
Autonomous Robots Vision © Manfred Huber 2014.
Auditory Neuroscience 1 Spatial Hearing Systems Biology Doctoral Training Program Physiology course Prof. Jan Schnupp HowYourBrainWorks.net.
Fundamentals of Sensation and Perception THE AUDITORY BRAIN AND PERCEIVING AUDITORY SCENE ERIK CHEVRIER OCTOBER 13 TH, 2015.
Listeners weighting of cues for lateral angle: The duplex theory of sound localization revisited E. A. MacPherson & J. C. Middlebrooks (2002) HST. 723.
3-D Sound and Spatial Audio MUS_TECH 348. Are IID and ITD sufficient for localization? No, consider the “Cone of Confusion”
On the manifolds of spatial hearing
Development of Sound Localization 2 How do the neural mechanisms subserving sound localization develop?
$ studying barn owls in the laboratory $ sound intensity cues $ sound timing cues $ neural pathways for sound location $ auditory space $ interaural time.
PSYC Auditory Science Spatial Hearing Chris Plack.
Fletcher’s band-widening experiment (1940)
Digital Signal Processing Lecture 6 Frequency Selective Filters
SPATIAL HEARING Ability to locate the direction of a sound. Ability to locate the direction of a sound. Localization: In free field Localization: In free.
Fundamentals of Sensation and Perception
3-D Sound and Spatial Audio MUS_TECH 348. What do these terms mean? Both terms are very general. “3-D sound” usually implies the perception of point sources.
Auditory Localization in Rooms: Acoustic Analysis and Behavior
Precedence-based speech segregation in a virtual auditory environment
Spatial Audio - Spatial Sphere Demo Explained
(C) 2002 University of Wisconsin, CS 559
Volume 62, Issue 1, Pages (April 2009)
Hearing Spatial Detail
EE513 Audio Signals and Systems
Volume 62, Issue 1, Pages (April 2009)
Motion-Based Analysis of Spatial Patterns by the Human Visual System
Localizing Sounds.
Govt. Polytechnic Dhangar(Fatehabad)
3 primary cues for auditory localization: Interaural time difference (ITD) Interaural intensity difference Directional transfer function.
Lecture 4. Human Factors : Psychological and Cognitive Issues (II)
Presentation transcript:

Extracting the pinna spectral notches Vikas C. Raykar | Ramani Duraiswami University of Maryland, CollegePark B. Yegnanaryana Indian Institute of Technology, Chennai, India

Plan of the talk  Human Spatial Hearing.  Role of the pinna.  Extracting the pinna spectral notches.  A few exploratory studies.

 Primary cues  Interaural Time Difference (ITD)  Interaural Level Difference (ILD)  Can explain only localization in the horizontal plane.  All points in the one half of the hyperboloid of revolution have the same ITD and IID.  [cone of confusion ]  Other cues  Pinna shape gives elevation cues for higher frequencies.  Torso and Head give elevation cues for lower frequencies. Human spatial hearing Intricate system to be completely modelled Source HEA D Left earRight ear

Head Related Transfer Function (HRTF)  The spectral filtering caused by the head, torso and the pinna can be described by the HRTF or HRIR.  Can experimentally measure HRTF for all elevation and azimuth for both ears for different persons. Convolve the source signal with the measured HRIR to create virtual audio

CIPIC Database  Public Domain HRIR Database  HRIRs sampled at 1250 points around the head  45 subjects including the KEMAR  Anthropometry measurements V. Ralph Algazi, Richard O. Duda, Dennis M. Thompson, Carlos Avendano,"The CIPIC HRTF database, "in WASSAP '01 (2001 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, Mohonk Mountain House, New Paltz, NY, Oct. 2001)."The CIPIC HRTF database, "

Interaural polar Coordinate system Azimuth Elevation

Sample HRTF

HRTF Visualization

Motivation for our work

HRTF Composition Direct Pulse Torso Reflection Knee Reflection Pinna Spectral notches Pinna Resonances Head Diffraction

HRTF Composition Good parametric models exist for the head and the torso effects. Role of pinna yet to be properly understood.

HRTFs for different pinnae

Features due to pinna  Pinna contributes to the sharp spectral notches and peaks.  Pinna spectral notches are important for elevation perception.  Substantial psychoacoustical, behavioral, and neurophysiological evidence.  We propose a method to automatically extract the frequencies of the pinna spectral notches.

Extracting the spectral notches  Direct application of Pole-Zero modelling techniques do not work.  All pole-model is very good at picking the dominant spectral peaks. Can use Linear Prediction analysis.  However Pole-Zero models are highly sensitive to noise.  The measured HRIR includes the combined effects of the head diffraction and shoulder, torso, and the knee reflection. Due to the combined effects it is difficult to isolate the notches due to pinna alone.  The order of the model is also not known.  We are not guaranteed that perceptually relevant nulls will be captured.  They do a good fit for the envelope of the signal (provided the order is high), but perceptually relevant features may not be captured.

Pole-Zero Modelling

Determine the initial onset  Slope of the unwrapped phase spectrum gives the initial delay.

Eliminate torso/knee reflections  Window the signal using a Hann window. Torso reflection Knee reflection

Effect of Delay

Effect of windowing Group delay function

In order to emphasize the spectral nulls we compute the group delay function. It is the negative first derivative of the phase response of the filter. The notch frequencies can be extracted by finding the extrema in the group delay function. The group delay function

Remove the spectral peaks The poles can be extracted by doing a Linear Prediction(LP)Analysis. LP analysis basically fits an all pole model of order p to the HRIR. From the given HRIR signal we can compute the LP residual by passing it through the inverse filter.

Window the LP residual LP residual Original signal

Autocorrelation of the windowed LP residual LP residual Autocorrelation

The complete algorithm

A particular subject

Different subjects

 Psychoacoustical experiments have shown that high frequencies are essential for vertical localization. [Gardner and Gardner, Hebrank and Wright, Musicant and Butler ]  When pinna cavities are occluded with plastic moulds localization ability decreases. [Gardner and Gardner, Hoffman et.al.]  There is a consensus that elevation perception is monaural. [Wightman and Kistler, Middlebrooks and Green] Pinna is essential for elevation perception

 Are the spectral notches perceptually significant ?  How are these spectral notches caused?  What can we do once we have these spectral notches?  Interpolation  Customization  Anthropometric studies Three questions

 Spectral peaks and notches are the dominant cues contributed by the pinna.  As elevation increases the notch frequency increases.  The spectral peaks do not show a definite trend with elevation.  Spectral notches can be detected and discriminated. [ Wright et.al and Moore et.al]  Experiments on cats suggest that single auditory nerve fibers are able to signal in their discharge rates the presence of a spectral notch. [Poon and Brugge]  Interestingly a moving notch is better, [implications for head movement]  Vertical illusion in cats [Tollin and Yin] The case for notches ?

Vertical Illusion in cats Tollin DJ and Yin TCT (2003). Spectral cues explain illusory elevation effects with stereo sounds in cats, J.Neurophysiol., 90: Spectral cues explain illusory elevation effects with stereo sounds in cats,

HRTFs generally measured for a finite sampling grid of elevation and azimuth. For implementing a virtual audio system we need to interpolate HRTFs. HRTF measurement is a tedious and time consuming process. Normally takes 2 to 4 hours. Subject must be immoblile. Problem 1: Interpolation

Time Domain Frequency Domain Principal Components domain These may not be the right way to do interpolation since they do not consider the perceptual aspect of human hearing. Ideally interpolation should be done in perceptually important feature space. [ Those features in the HRIR important for source localization ] Current approaches for interpolation

Interpolation in the frequency domain

Experimentally shown that with HRTF measured for a particular person if used for different persons elevation perception is very poor. Ear shape of each person is unique and also the anatomy. Each person’s localizing capabilites are tuned to the shape of their ear and anatomy. Problem 2: HRTF Customization

Complete Measurement. Database Matching. Numerical Modelling. Frequency Scaling. Current approaches for Customization

Pinna shape and notches Correlate the pinna spectral notches with elevation, azimuth and anthropometric measurements

Empirical approaches

The cause for notches ? Batteau’s reflection model Hebrank and wright- reflection from the posterior concha wall Lopez-Poveda and Meddis incorporated diffraction into the model. Shaws model for resonances

Reflection Model

Shaw’s modes

Extracted Pinna Resonances

Compensating for the pinna angles

Coordinate system alignment

Reduction in ISD

 Proposed a method to automatically extract the frequencies of the pinna spectral notches.  Pinna spectral notches are perceptually significant for source localization. So instead of convolving with the complete HRIR we can build simplified models based on the features extracted.  Interpolation can be done in the perceptually important feature domain.  The pinna spectral noches can be related to the shape and the size of the pinna and customization of the HRIR is possible. Scope of our work

How sensitive are the notch frequencies to the probe position? Given an image of the ear, can we hope to get the HRTF? What happens behind the ears? Are multiple spectral notches redundant? What is the role of crus helias? How are the spectral notches from the right and left ear integrated? Is analysing in terms of notches the right approach? Could there be a template matching approach. Probably open questions

Thank You ! | Questions ?