Chapter 12: Sound Localization and the Auditory Scene
Overview of Questions What makes it possible to tell where a sound is coming from in space? When we are listening to a number of musical instruments playing at the same time, how can we perceptually separate the sounds coming from the different instruments? Why does music sound better in some concert halls than in others?
Auditory Localization Auditory space - surrounds an observer and exists wherever there is sound Researchers study how sounds are localized in space by using: –Azimuth coordinates - position left to right –Elevation coordinates - position up and down –Distance coordinates - position from observer
Figure 12.1 The three directions used for studying sound localization: azimuth (left-right); elevation (up- down); and direction.
Auditory Localization - continued On average, people can localize sounds –Directly in front of them most accurately –To the sides and behind their heads least accurately. Location cues are not contained in the receptor cells like on the retina in vision; thus, location for sounds must be calculated.
Figure 12.3 Comparing location information for vision and hearing. Vision: The bird and the cat are at different locations and are imaged on different places on the retina. Hearing: The frequencies in the sounds from the bird and the cat are spread out over the cochlea, with no regard to the locations of the bird and the cat.
Cues for Sound Location Binaural cues - location cues based on the comparison of the signals received by the left and right ears –Interaural time difference (ITD)- difference between the times sounds reach the two ears When distance to each ear is the same, there are no differences in time. When the source is to the side of the observer, the times will differ.
Figure 12.4 The principle behind interaural time difference (ITD). The tone directly in front of the listener, at A, reaches the left and the right ears at the same time. However, when the tone is off to the side, at B, it reaches the listener’s right before it reaches the left ear.
Binaural Cues –Interaural level difference (ILD)- difference in sound pressure level reaching the two ears Reduction in intensity occurs for high frequency sounds for the far ear. –The head casts an acoustic shadow. This effect doesn’t occur for low frequency sounds.
Figure Why interaural level difference (ILD) occurs for high frequencies but not for low frequencies. (a) When water ripples are small compared to an object, such as this boat, they are stopped by the object. (b) The spaces between high-frequency sound waves is small compared to the head. The head interferes with the sound waves, creating an acoustic shadow on the other side of the head. (c) The same ripples are large compared to the single cattail, so they are unaffected by it. (d) The spacing between low-frequency sound waves is large compared to the person’s head, so the sound is unaffected by the head.
Figure 12.6 The three curves indicate interaural level difference (ILD) as a function of frequency for three different sound source locations. Note that the difference in ILD for different locations is higher at high frequencies (Adapted from Hartmann, 1999).
Monaural Cue for Sound Location The pinna and head affect the intensities of frequencies. Measurements have been performed by placing small microphones in ears and comparing the intensities of frequencies with those at the sound source. –This is a spectral cue since the information for location comes from the spectrum of frequencies.
Judging Elevation ILD and ITD are not effective for judgments on elevation since in many locations they may be zero. Experiment investigating spectral cues –Listeners were measured for performance locating sounds differing in elevation. –They were then fitted with a mold that changed the shape of their pinnae.
Experiment on Judging Elevation –Right after the molds were inserted, performance was poor for elevation but was unaffected for azimuth. –After 19 days, performance for elevation was close to original performance. –Once the molds were removed, performance stayed high. –This suggests that there might be two different sets of neurons—one for each set of cues.
Figure 12.9 How localization changes when a mold is placed in the ear. See text for explanation. (Hofman et al., 1998).
The Physiological Representation of Auditory Space Two mechanisms have been proposed –Narrowly tuned ITD neurons They are found in the inferior colliculus and the superior olivary nuclei. This response is a form of specificity coding.
Figure ITD tuning curves for six neurons that each respond to a narrow range of ITD’s. The neurons on the left respond when sound reaches the left ear first. The ones on the right respond when sound reaches the right ear first. Neurons such as this have been recorded from the barn owl and other animals (Adapted from McAlpine, 2005).
The Physiological Representation of Auditory Space - continued Jeffress Model for narrowly tuned ITD neurons –These neurons receive signals from both ears. –Coincidence detectors fire only when signals arrive from both ears simultaneously. –Other neurons in the circuit fire to locations corresponding to other ITDs.
Figure How the Jeffress circuit operates. Axons transmit signals from the left ear (blue) and right ear (red) to neurons, indicated by circles. (a) Sound in front: signals start in left and right channels simultaneously. (b) Signals meet at neuron 5, causing it to fire. (c) Sound to the right: signal starts in the right channel first. (d) Signals meet at neuron 3, causing it to fire.
The Physiological Representation of Auditory Space - continued Broadly-tuned ITD neurons –Research on gerbils indicates that neurons in the left hemisphere respond best to sound from the right, and vice versa. –Location of sound is indicated by the ratio of responding for two types of neurons. –This is a distributed coding system.
Figure (a) ITD tuning curves for broadly-tuned neurons. The left curve represents the tuning of neurons in the right hemisphere; the right curve is the tuning of neurons in the left hemisphere; (b) Patterns of response of the broadly tuned curves for stimuli coming from the left, in front, and the right. Neurons such as this have been recorded from the gerbil (Adapted from McAlpine, 2005).
Identifying Sound Sources Auditory Scene - the array of all sound sources in the environment Auditory Scene Analysis - process by which sound sources in the auditory scene are separated into individual perceptions This does not happen at the cochlea since simultaneous sounds are together in the pattern of vibration of the basilar membrane.
Figure Each musician produces a sound stimulus, and all three sounds are combined in the output of the loudspeaker.
Principles of Auditory Grouping Heuristics that help to perceptually organize stimuli –Onset time - sounds that start at different times are likely to come from different sources –Location - a single sound source tends to come from one location and to move continuously –Similarity of timbre and pitch - similar sounds are grouped together
Auditory Stream Segregation Compound melodic line in music is an example of auditory stream segregation. Experiment by Bregman and Campbell –Stimuli were alternating high and low tones –When stimuli played slowly, the perception is hearing high and low tones alternating. –When the stimuli are played quickly, the listener hears two streams; one high and one low.
Figure Four measures of a composition by J. S. Bach (Chorale Prelude on Jesus Christus unser Heiland, 1739). When played rapidly, the upper notes become perceptually grouped and the lower notes become perceptually grouped, a phenomenon called auditory stream segregation.
Figure (a) When high and low tones are alternated slowly, auditory stream segregation does not occur, so the listener perceives alternating high and low tones. (b) Faster alternation results in segregation into high and low streams.
Auditory Stream Segregation - continued Experiment by Bregman & Rudnicky –Listener hears two standard tones (X & Y) and can easily perceive the order. –Then two distractor (D) tones are placed between X & Y and the listener can no longer perceive the order. –Adding a series of captor tones (C) forms a stream with the D tones, and the listener can once again hear the order of X & Y.
Figure Bregman and Rudnicky’s (1975) experiment. (a) The standard tones X and Y have different pitches. (b) The distractor (D) tones group with X and Y, making it difficult to judge the order of X and Y. (c) The addition of captor (C) tones with the same pitch as the distractor tones causes the distractor tones to form a separate stream (grouping by similarity), making it easier to judge the order of tones X and Y (Based on Bregman & Rudnicky, 1975).
Figure (a) Two sequences of stimuli: a series of similar notes (red), and a scale (blue). (b) Perception of these stimuli: Separate streams are perceived when they are far apart in frequency, but when the frequencies are in the same range, the tones appear to jump back and forth between stimuli.
Auditory Stream Segregation - continued Experiment by Deutsch - the scale illusion or melodic channeling –Stimuli were two sequences alternating between the right and left ears. –Listeners perceive two smooth sequences by grouping the sounds by similarity in pitch. –This demonstrates the perceptual heuristic that sounds with the same frequency come from the same source, which is usually true in the environment.
Figure (a) These stimuli were presented to a listener’s left ear (blue) and right ear (red) in Deutsch’s (1975) scale illusion experiment. Notice how the notes presented to each ear jump up and down. (b) What the listener hears. Although the notes in each ear jump up and down, the listener perceived a smooth sequence of notes. This effect is called the scale illusion, or melodic channeling. (Adapted from Deutch, 1975).
Principles of Auditory Grouping - continued Proximity in time - sounds that occur in rapid succession usually come from the same source –This principle was illustrated in auditory streaming. Auditory continuity - sounds that stay constant or change smoothly are usually from the same source
Good Continuation Experiment by Warren et al. –Tones were presented interrupted by gaps of silence or by noise. –In the silence condition, listeners perceived that the sound stopped during the gaps. –In the noise condition, the perception was that the sound continued behind the noise.
Figure A demonstration of auditory continuity, using tones.
Principles of Auditory Grouping - continued Effect of past experience –Experiment by Dowling Melody “Three Blind Mice” is played with notes alternating between octaves Listeners find it difficult to identify the song But after they hear the normal melody, they can then hear it in the modified version using melody schema
Figure “Three Blind Mice”: (a) jumping octave version; (b) normal version.
Hearing Inside Rooms Direct sound - sound that reaches the listener’s ears straight from the source Indirect sound - sound that is reflected off of environmental surfaces and then to the listener When a listener is outside, most sound is direct; however inside a building, there is direct and indirect sound.
Figure (a) When you hear a sound outside, you hear mainly direct sound (path a). (b) When you hear a sound inside a room, you hear both direct (a) and indirect sound (b, c, and d) that is reflected from the walls, floor, and ceiling of the room.
Experiment by Litovsky et al. Listeners sat between two speakers: a lead speaker and a lag speaker. When sound comes from the lead speaker followed by the lag speaker with a long delay, listeners hear two sounds. When the delay is decreased to msec, listeners hear the sound as only coming from the lead speaker - the precedence effect.
Figure (a) When sound is first presented first in one speaker and then the other, with enough time between them, they are heard separately, one after another. (b) If there is only a short delay between the two sounds, then the sound is perceived to come from the lead speaker. This is the precedence effect.
Architectural Acoustics The study of how sounds are reflected in rooms. Factors that affect perception in concert halls. –Reverberation time - the time is takes sound to decrease by 1/1000th of its original pressure If it is too long, sounds are “muddled.” If it is too short, sounds are “dead.” Ideal times are around two seconds.
Factors that Affect Perception in Concert Halls –Intimacy time - time between when sound leaves its source and when the first reflection arrives Best time is around 20 ms. –Bass ratio - ratio of low to middle frequencies reflected from surfaces High bass ratios are best. –Spaciousness factor - fraction of all the sound received by listener that is indirect High spaciousness factors are best.
Acoustics in Classrooms Ideal reverberation time in classrooms is –.4 to.6 second for small classrooms. –1.0 to 1.5 seconds for auditoriums. –These maximize ability to hear voices. –Most classrooms have times of one second or more. Background noise is also problematic. –Signal to noise ratio should be +10 to +15 dB or more.
Interactions Between Vision and Sound Visual capture or the ventriloquist effect - an observer perceives the sound as coming from the visual location rather than the source for the sound Experiment by Sekuler et al. –Balls moving without sound appeared to move past each other. –Balls with an added “click” appeared to collide.
Figure Two conditions in the Sekuler et al. (1999) experiment showing successive positions of two balls that were presented so they appeared to be moving. (a) No sound condition: the two balls were perceived to pass each other and continue moving in a straight-line motion. (b) Click added condition: Observers were likely to see the balls as colliding.