Chapter 12: Auditory Localization and Organization
Figure 12. 1 Coffee shop scene, which contains multiple sound sources Figure 12.1 Coffee shop scene, which contains multiple sound sources. The most immediate sound source for the man in the middle is the voice of the woman talking to him across the table. Additional sources include speakers on the wall behind him, which are broadcasting music, and all the other people in the room who are speaking. The four problems we will consider in this chapter— (1) auditory localization, (2) sound reflection, (3) analysis of the scene into separate sound sources, and (4) musical patterns that are organized in time—are indicated in this figure. Figure 12-1 p290
Auditory Localization Auditory space - surrounds an observer and exists wherever there is sound Researchers study how sounds are localized in space by using: Azimuth coordinates - position left to right Elevation coordinates - position up and down Distance coordinates - position from observer
Figure 12.3 The three directions used for studying sound localization: azimuth (left–right), elevation (up–down), and distance. Figure 12-3 p291
Auditory Localization - continued On average, people can localize sounds Directly in front of them most accurately To the sides and behind their heads least accurately. Location cues are not contained in the receptor cells like on the retina in vision; thus, location for sounds must be calculated.
Figure 12. 2 Comparing location information for vision and hearing Figure 12.2 Comparing location information for vision and hearing. Vision: The bird and the cat, which are located at different places, are imaged on different places on the retina. Hearing: The frequencies in the sounds from the bird and cat are spread out over the cochlea, with no regard to the animals’ locations. Figure 12-2 p291
Binaural Cues for Sound Localization Binaural cues - location cues based on the comparison of the signals received by the left and right ears Interaural time difference (ITD)- difference between the times sounds reach the two ears When distance to each ear is the same, there are no differences in time. When the source is to the side of the observer, the times will differ.
Figure 12. 4 The principle behind interaural time difference (ITD) Figure 12.4 The principle behind interaural time difference (ITD). The tone directly in front of the listener, at A, reaches the left and right ears at the same time. However, when the tone is off to the side, at B, it reaches the listener’s right ear before it reaches the left ear. Figure 12-4 p292
Binaural Cues for Sound Localization - continued Interaural level difference (ILD) - difference in sound pressure level reaching the two ear Reduction in intensity occurs for high frequency sounds for the far ear The head casts an acoustic shadow. This effect doesn’t occur for low frequency sounds. Cone of Confusion
Figure 12.5 Why interaural level difference (ILD) occurs for high frequencies but not for low frequencies. (a) Person listening to a high-frequency sound; (b) person listening to a low-frequency sound. (c) When the spacing between waves is smaller than the size of the object, illustrated here by water ripples that are smaller than the boat, the waves are stopped by the object. This occurs for the high-frequency sound waves in (a) and causes the sound intensity to be lower on the far side of the listener’s head. (d) When the spacing between waves is larger than the size of the object, as occurs for the water ripples and the narrow stalks of the cattails, the object does not interfere with the waves. This occurs for the low-frequency sound waves in (b), so the sound intensity on the far side of the head is not affected. Figure 12-5 p292
Figure 12.6 The three curves indicate interaural level difference (ILD) as a function of frequency for three different sound-source locations. Note that the difference in ILD for different locations is greater at higher frequencies. Figure 12-6 p293
Figure 12. 7 The “cone of confusion Figure 12.7 The “cone of confusion.” There are many pairs of points on this cone that have the same left-ear distance and right-ear distance and so result in the same ILD and ITD. There are also other cones in addition to this one. Figure 12-7 p294
Monaural Cue for Sound Location Monaural cue – uses information from one ear The pinna and head affect the intensities of frequencies. Measurements have been performed by placing small microphones in ears and comparing the intensities of frequencies with those at the sound source. This is a spectral cue since the information for location comes from the spectrum of frequencies.
Figure 12.8 (a) Pinna showing sound bouncing around in nooks and crannies. (b) Frequency spectra recorded by a small microphone inside the listener’s right ear for the same broadband sound coming from two different locations. The difference in the pattern when the sound is 15 degrees above the head (blue curve) and 15 degrees below the head (red curve) is caused by the way different frequencies bounce around within the pinna when entering it from different angles. Figure 12-8 p294
Monaural Cue for Sound Location - continued ILD and ITD are not effective for judgments on elevation since in many locations they may be zero. Experiment investigating spectral cues Listeners were measured for performance locating sounds differing in elevation. They were then fitted with a mold that changed the shape of their pinnae.
Monaural Cue for Sound Location - continued Right after the molds were inserted, performance was poor for elevation but was unaffected for azimuth. After 19 days, performance for elevation was close to original performance. Once the molds were removed, performance stayed high. This suggests that there might be two different sets of neurons—one for each set of cues.
Figure 12. 9 How localization changes when a mold is placed in the ear Figure 12.9 How localization changes when a mold is placed in the ear. See text for explanation. Figure 12-9 p295
The Physiological Auditory Location Auditory nerve fibers synapse in a series of subcortical structures Cochlear nucleus Superior olivary nucleus (in the brain stem) Inferior colliculus (in the midbrain) Medial geniculate nucleus (in the thalamus) Auditory receiving area (A1 in the temporal lobe)
Figure 12. 10 Diagram of the auditory pathways Figure 12.10 Diagram of the auditory pathways. This diagram is greatly simplified, as numerous connections between the structures are not shown. Note that auditory structures are bilateral—they exist on both the left and right sides of the body— and that messages can cross over between the two sides. Figure 12-10 p296
The Physiological Auditory Location - continued Hierarchical processing occurs in the cortex Neural signals travel through the core, then belt, followed by the parabelt area. Simple sounds cause activation in the core area. Belt and parabelt areas are activated in response to more complex stimuli made up of many frequencies.
Figure 12.11 The three main auditory areas in the monkey cortex: the core area, which contains the primary auditory receiving area (A1); the belt area; and the parabelt area. P indicates the posterior end of the belt area, and A indicates the anterior end of the belt area. Signals, indicated by the arrows, travel from core to belt to parabelt. The dark lines indicate where the temporal lobe was pulled back to show areas that would not be visible from the surface. Figure 12-11 p297
The Physiological Representation of Auditory Space - continued Jeffress Model for narrowly tuned ITD neurons These neurons receive signals from both ears. Coincidence detectors fire only when signals arrive from both ears simultaneously. Other neurons in the circuit fire to locations corresponding to other ITDs.
Figure 12. 12 How the circuit proposed by Jeffress operates Figure 12.12 How the circuit proposed by Jeffress operates. Axons transmit signals from the left ear (blue) and the right ear (red) to neurons, indicated by circles. (a) Sound in front. Signals start in left and right channels simultaneously. (b) Signals meet at neuron 5, causing it to fire. (c) Sound to the right. Signal starts in the right channel first. (d) Signals meet at neuron 3, causing it to fire. Figure 12-12 p297
Figure 12.13 ITD tuning curves for six neurons that each respond to a narrow range of ITDs. The neurons on the left respond when sound reaches the left ear first. The ones on the right respond when sound reaches the right ear first. Neurons such as these have been recorded from the barn owl and other animals. However, when we consider mammals, another story emerges. Figure 12-13 p297
Broad ITD Tuning Curves in Mammals Broadly-tuned ITD neurons Research on gerbils indicates that neurons in the left hemisphere respond best to sound from the right, and vice versa. Location of sound is indicated by the ratio of responding for two types of neurons. This is a distributed coding system.
Figure 12.14 (a) ITD tuning curve for a neuron in the gerbil superior olivary nucleus. (b) ITD tuning curve for a neuron in the barn owl’s inferior colliculus. The “range” indicator below each curve indicates that the gerbil curve is much broader than the owl curve. The gerbil curve is, in fact, broader than the range of ITDs that typically occur in the environment. This range is indicated by the light bar (between the dashed lines). Figure 12-14 p298
Figure 12.15 Responses recorded from a neuron in the left auditory cortex of the monkey to sounds originating at different places around the head. The monkey’s position is indicated by the circle in the middle. The firing of a single cortical neuron to a sound presented at different locations around the monkey’s head is shown by the records at each location. Greater firing is indicated by a greater density of dots. This neuron responds to sounds coming from a number of locations on the right. Figure 12-15 p298
Figure 12.16 (a) ITD tuning curves for broadly tuned neurons like the one shown in Figure 12.14a. The left curve represents the tuning of neurons in the right hemisphere; the right curve is the tuning of neurons in the left hemisphere. (b) Patterns of response of the broadly tuned curves for stimuli coming from the left, in front, and from the right. Figure 12-16 p299
Localization in Area A1 and the Auditory Belt Area Broadly-tuned ITD neurons Malhorta and Lomber (2007) Cooling and lesioning
Auditory Where (and What) Pathways What, or ventral stream, starts in the anterior portion of the core and belt and extends to the prefrontal cortex. It is responsible for identifying sounds. Where, or dorsal stream, starts in the posterior core and belt and extends to the parietal and prefrontal cortices. It is responsible for locating sounds. Evidence from neural recordings, brain damage, and brain scanning support these findings.
Figure 12. 17 Auditory what and where pathways Figure 12.17 Auditory what and where pathways. The blue arrow from the anterior core and belt is the what pathway. The red arrow from the posterior core and belt is the where pathway. Figure 12-17 p300
Figure 12. 18 Results of Lomber and Malhortra’s (2008) experiment Figure 12.18 Results of Lomber and Malhortra’s (2008) experiment. (a) When the anterior (what) auditory area of the cat was deactivated by presenting a small cooling probe within the purple area, the cat could not identify sounds but could locate sounds. (b) When the posterior (where) auditory area was deactivated by presenting a cooling probe within the green area, the cat could not locate sounds but could identify sounds. Figure 12-18 p300
Figure 12. 19 (a) Colored areas indicate brain damage for J. G Figure 12.19 (a) Colored areas indicate brain damage for J.G. (left) and E.S. (right). (b) Performance on recognition test (green bar) and localization test (red bar). Figure 12-19 p301
Hearing Inside Rooms Direct sound - sound that reaches the listener’s ears straight from the source Indirect sound - sound that is reflected off of environmental surfaces and then to the listener When a listener is outside, most sound is direct; however inside a building, there is direct and indirect sound.
Figure 12.20 (a) When you hear a sound outdoors, you hear mainly direct sound (path 1). (b) When you hear a sound inside a room, you hear both direct sound (1) and indirect sound (2, 3, and 4) that is reflected from the walls, floor, and ceiling of the room. Figure 12-20 p301
Perceiving Two Sounds That Reach the Ears at Different Times Experiment by Litovsky et al. Listeners sat between two speakers: a lead speaker and a lag speaker. When sound comes from the lead speaker followed by the lag speaker with a long delay, listeners hear two sounds. When the delay is decreased to 5 - 20 msec, listeners hear the sound as only coming from the lead speaker - the precedence effect.
Figure 12.21 (a) When sound is presented first in one speaker and then in the other, with enough time between them, they are heard separately, one after the other. (b) If there is only a short delay between the two sounds, then the sound is perceived to come from the lead speaker only. This is the precedence effect. Figure 12-21 p302
Architectural Acoustics The study of how sounds are reflected in rooms. Factors that affect perception in concert halls. Reverberation time - the time is takes sound to decrease by 1/1000th of its original pressure If it is too long, sounds are “muddled.” If it is too short, sounds are “dead.” Ideal times are around two seconds.
Architectural Acoustics - continued Factors that Affect Perception in Concert Halls Intimacy time - time between when sound leaves its source and when the first reflection arrives Best time is around 20 ms. Bass ratio - ratio of low to middle frequencies reflected from surfaces High bass ratios are best. Spaciousness factor - fraction of all the sound received by listener that is indirect High spaciousness factors are best.
Acoustics in Classrooms - continued Ideal reverberation time in classrooms is .4 to .6 second for small classrooms. 1.0 to 1.5 seconds for auditoriums. These maximize ability to hear voices. Most classrooms have times of one second or more. Background noise is also problematic. Signal to noise ratio should be +10 to +15 dB or more.
Figure 12.22 Each musician produces a sound stimulus, but these signals are combined into one signal, which enters the ear. Figure 12-22 p304
Auditory Organization: Scene Analysis Auditory Scene - the array of all sound sources in the environment Auditory Scene Analysis - process by which sound sources in the auditory scene are separated into individual perceptions This does not happen at the cochlea since simultaneous sounds are together in the pattern of vibration of the basilar membrane.
Auditory Organization: Scene Analysis - continued Heuristics that help to perceptually organize stimuli Onset time - sounds that start at different times are likely to come from different sources Location - a single sound source tends to come from one location and to move continuously Similarity of timbre and pitch - similar sounds are grouped together
Separating the Sources Compound melodic line in music is an example of auditory stream segregation. Experiment by Bregman and Campbell Stimuli were alternating high and low tones When stimuli played slowly, the perception is hearing high and low tones alternating. When the stimuli are played quickly, the listener hears two streams; one high and one low.
Figure 12. 23 Four measures of a composition by J. S Figure 12.23 Four measures of a composition by J. S. Bach (Choral Prelude on Jesus Christus unser Heiland, 1739). When played rapidly, the upper notes become perceptually grouped and the lower notes become perceptually grouped, a phenomenon called auditory stream segregation. Figure 12-23 p305
Figure 12.24 (a) When high and low tones are alternated slowly, auditory stream segregation does not occur, so the listener perceives alternating high and low tones. (b) Faster alternation results in segregation into high and low streams. Figure 12-24 p305
Figure 12.25 (a) Two sequences of stimuli: a sequence of similar notes (red), and a scale (blue). (b) Perception of these stimuli: Separate streams are perceived when they are far apart in frequency, but the tones appear to jump back and forth between stimuli when the frequencies are in the same range. Figure 12-25 p306
Separating the Sources - continued Experiment by Deutsch - the scale illusion or melodic channeling Stimuli were two sequences alternating between the right and left ears. Listeners perceive two smooth sequences by grouping the sounds by similarity in pitch. This demonstrates the perceptual heuristic that sounds with the same frequency come from the same source, which is usually true in the environment.
Figure 12.26 (a) These stimuli were presented to a listener’s left ear (blue) and right ear (red) in Deutsch’s (1975) scale illusion experiment. Notice how the notes presented to each ear jump up and down. (b) Although the notes in each ear jump up and down, the listener perceives a smooth sequence of notes. This effect is called the scale illusion, or melodic channeling. Figure 12-26 p306
Separating the Sources - continued Proximity in time - sounds that occur in rapid succession usually come from the same source This principle was illustrated in auditory streaming. Auditory continuity - sounds that stay constant or change smoothly are usually from the same source
Separating the Sources - continued Experiment by Warren et al. Tones were presented interrupted by gaps of silence or by noise. In the silence condition, listeners perceived that the sound stopped during the gaps. In the noise condition, the perception was that the sound continued behind the noise.
Figure 12.27 A demonstration of auditory continuity, using tones. Figure 12-27 p307
Separating the Sources - continued Effect of past experience Experiment by Dowling Melody “Three Blind Mice” is played with notes alternating between octaves Listeners find it difficult to identify the song But after they hear the normal melody, they can then hear it in the modified version using melody schema
Figure 12. 28 “Three Blind Mice. ” (a) Jumping octave version Figure 12.28 “Three Blind Mice.” (a) Jumping octave version. (b) Normal version. Figure 12-28 p307
Auditory Organization: Perceiving Meter Rhythmic pattern is a series of changes across time. Metrical structure is the underlying beat of music.
Figure 12. 29 First line of The Star-Spangled Banner Figure 12.29 First line of The Star-Spangled Banner. The rhythmic pattern—the series of changes across time—is indicated by the horizontal blue lines. The metrical structure— the underlying beat of the music, determined by the time signature—is indicated by the red arrows. When the piece is performed, there is an equal amount of time between beats. Figure 12-29 p308
Figure 12.30 (a) Subjects listened to sequences of short and long tones. On half the trials, the first tone was short; on the other half, long. The durations of the tones ranged from about 150 ms to 500 ms (durations varied for different experimental conditions), and the entire sequence repeated for 5 seconds. (b) English-speaking subjects (E) were more likely than Japanese- speaking subjects (J) to perceive the stimulus as short–long. (c) Japanese-speaking subjects were more likely than English-speaking subjects to perceive the stimulus as long–short. Figure 12-30 p310
Connections Between Hearing and Vision Visual capture or the ventriloquist effect - an observer perceives the sound as coming from the visual location rather than the source for the sound Experiment by Sekuler et al. Balls moving without sound appeared to move past each other. Balls with an added “click” appeared to collide.
Figure 12. 31 (a) Stimulus for the two-flash illusion Figure 12.31 (a) Stimulus for the two-flash illusion. One flash of light is accompanied by two tones. (b) The illusion occurs when the subject perceives two flashes of light, even though there was just one. Figure 12-31 p311
Figure 12. 32 Two conditions in the Sekuler et al Figure 12.32 Two conditions in the Sekuler et al. (1997) experiment showing successive positions of two balls that were presented so they appeared to be moving. (a) No sound condition: the two balls were perceived to pass each other and continue moving in a straight-line motion. (b) Click-added condition: observers were more likely to see the balls as colliding. Figure 12-32 p311
Hearing and Vision: Physiology The interaction between vision and hearing is multisensory in nature. Thaler et al (2011) – Used expert blind echolocators to create clicking sounds and observed these signals activated the bran.
Figure 12.33 There are connections between the primary receiving areas for vision, hearing, and somatosensory sensation (touch, pain). These connections create interactions between the senses. Figure 12-33 p312
Figure 12.34 Receptive fields of neurons in the monkey’s parietal lobe that respond to (a) auditory stimuli that are located in the lower left area of space, and (b) visual stimuli presented in the lower left area of the monkey’s visual field. (c) Superimposing the two receptive fields indicates that there is a high level of overlap between the auditory and visual fields. Figure 12-34 p312
Figure 12.35 (a) Brain activity for a blind subject listening to sound stimuli. The activity shown here is the activity generated by a stimulus that contained echoes, minus the activity of the same stimulus without the echoes. Because the auditory cortex was activated in both of these conditions, no auditory cortex activation is shown. However, listening to the echo stimulus resulted in activity in the visual cortex shown here. (b) Activity for a sighted subject listening to the same stimuli. Activation with and without echoes was the same, so no activity is shown. Figure 12-35 p313