The Auditory Kappa Effect in a Speech Context

Slides:



Advertisements
Similar presentations
Tone perception and production by Cantonese-speaking and English- speaking L2 learners of Mandarin Chinese Yen-Chen Hao Indiana University.
Advertisements

The Role of F0 in the Perceived Accentedness of L2 Speech Mary Grantham O’Brien Stephen Winters GLAC-15, Banff, Alberta May 1, 2009.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Auditory scene analysis 2
Function words are often reduced or even deleted in casual conversation (Fig. 1). Pairs may neutralize: he’s/he was, we’re/we were What sources of information.
Perceptual Organization in Intonational Phonology: A Test of Parallelism J. Devin McAuley 1 & Laura C. Dilley 2 Department of Psychology Bowling Green.
Prosody Modeling (in Speech) by Julia Hirschberg Presented by Elaine Chew QMUL: ELE021/ELED021/ELEM March 2012.
The Perception of Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes –examples of phonemes: /ba/ in bat, /pa/
Can a prosodic pattern induce/ reduce the perception of a lower- class suburban accent in French? Philippe Boula de Mareüil 1 & Iryna Lehka-Lemarchand.
SPEECH PERCEPTION 2 DAY 17 – OCT 4, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
The Neuroscience of Language. What is language? What is it for? Rapid efficient communication – (as such, other kinds of communication might be called.
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Speech Science XII Speech Perception (acoustic cues) Version
1 The Effect of Pitch Span on the Alignment of Intonational Peaks and Plateaux Rachael-Anne Knight University of Cambridge.
Music Perception. Why music perception? 1. Found in all cultures - listening to music is a universal activity. 2. Interesting from a developmental point.
Nuclear Accent Shape and the Perception of Prominence Rachael-Anne Knight Prosody and Pragmatics 15 th November 2003.
Speech and speaker normalization (in vowel normalization)
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Analyzing Students’ Pronunciation and Improving Tonal Teaching Ropngrong Liao Marilyn Chakwin Defense.
Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.
Sentence Durations and Accentedness Judgments ABSTRACT Talkers in a second language can frequently be identified as speaking with a foreign accent. It.
Auditory Scene Analysis (ASA). Auditory Demonstrations Albert S. Bregman / Pierre A. Ahad “Demonstration of Auditory Scene Analysis, The perceptual Organisation.
A.Diederich – International University Bremen – USC – MMM – Spring 2005 Rhythm and timing  Clarke, E.F. Rhythm and timing in music. In Deutsch, D. Chapter.
AUDITORY PERCEPTION Pitch Perception Localization Auditory Scene Analysis.
Influence of Word Class Proportion on Cerebral Asymmetries for High and Low Imagery Words Christine Chiarello 1, Connie Shears 2, Stella Liu 3, and Natalie.
Cognitive Processes PSY 334 Chapter 2 – Perception.
Streaming David Meredith Aalborg University. Sequential integration The connection of parts of an auditory spectrum over time to form concurrent streams.
Grouping David Meredith Aalborg University. Musical grouping structure Listeners automatically chunk or “segment” music into structural units of various.
Background Infants and toddlers have detailed representations for their known vocabulary items Consonants (e.g., Swingley & Aslin, 2000; Fennel & Werker,
Audio Scene Analysis and Music Cognitive Elements of Music Listening
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
Funded by NIH grant RO1 HD-4152 to J. Arnold NSF BCS and NSF BCS to Z. Griffin Why do speakers modulate acoustic prominence? Listener-oriented.
Perceived prominence and nuclear accent shape Rachael-Anne Knight LAGB 5 th September 2003.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Prosody-driven Sentence Processing: An Event-related Brain Potential Study Ann Pannekamp, Ulrike Toepel, Kai Alter, Anja Hahne and Angela D. Friederici.
Comprehension of Grammatical and Emotional Prosody is Impaired in Alzheimer’s Disease Vanessa Taler, Shari Baum, Howard Chertkow, Daniel Saumier and Reported.
Tone sensitivity & the Identification of Consonant Laryngeal Features by KFL learners 15 th AATK Annual Conference Hye-Sook Lee -Presented by Hi-Sun Kim-
The role of prosody in dialect synthesis and authentication Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS.
Results Tone study: Accuracy and error rates (percentage lower than 10% is omitted) Consonant study: Accuracy and error rates 3aSCb5. The categorical nature.
A prosodically sensitive diphone synthesis system for Korean Kyuchul Yoon Linguistics Department The Ohio State University.
1. Background Evidence of phonetic perception during the first year of life: from language-universal listeners to native listeners: Consonants and vowels:
Chapter 7: Loudness and Pitch. Loudness (1) Auditory Sensitivity: Minimum audible pressure (MAP) and Minimum audible field (MAF) Equal loudness contours.
Segmental encoding of prosodic categories: A perception study through speech synthesis Kyuchul Yoon, Mary Beckman & Chris Brew.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
Epenthetic vowels in Japanese: a perceptual illusion? Emmanual Dupoux, et al (1999) By Carl O’Toole.
Background: Speakers use prosody to distinguish between the meanings of ambiguous syntactic structures (Snedeker & Trueswell, 2004). Discourse also has.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
Intonational Meaning in Discourse Jennifer J. Venditti Tutorial for the IRCS 5 th Annual Undergraduate Summer Workshop in Cognitive Science 18 June 2002.
Acoustic Illusions & Sound Segregation Reading Assignments for Quizette 3 Chapters 10 & 11.
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
Control of prosodic features under perturbation in collaboration with Frank Guenther Dept. of Cognitive and Neural Systems, BU Carrie Niziolek [carrien]
The role of prosody in dialect authentication Simulating Masan dialect with Seoul speech segments Kyuchul Yoon Division of English, Kyungnam University.
Dialect Simulation through Prosody Transfer: A preliminary study on simulating Masan dialect with Seoul dialect Kyuchul Yoon Division of English, Kyungnam.
Audio Scene Analysis and Music Cognitive Elements of Music Listening Kevin D. Donohue Databeam Professor Electrical and Computer Engineering University.
What you see is what you get? Heather Johnston March 24, 2005.
Suprasegmental features and Prosody Lect 6A&B LING1005/6105.
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
Auditory Perception 1 Streaming 400 vs. 504 Hz 400 vs. 566 Hz 400 vs. 635 Hz 400 vs. 713 Hz A 400-Hz tone (tone A) is alternated with a tone of a higher.
Sentence Durations and Accentedness Judgments
Caterina Petrone. , Mariapaola D’Imperio. , Susanne Fuchs
Cognitive Processes PSY 334
Studying Intonation Julia Hirschberg CS /21/2018.
Intonational and Its Meanings
Intonational and Its Meanings
The American School and ToBI
Representing Intonational Variation
Representing Intonational Variation
Integrating Segmentation and Similarity in Melodic Analysis
Speech Perception (acoustic cues)
Presentation transcript:

The Auditory Kappa Effect in a Speech Context Alejna Brugos & Jonathan Barnes Speech Prosody, May 22, 2012 Shanghai, China

The perception of time

The perception of time Measured time ≠ Perceived time

The interaction of pitch and timing Dynamic F0 in speech can lead to longer perceived vowel duration (Yu, 2010; Cumming, 2011) Non-speech research showing that pitch manipulations can alter perception of timing (Crowder & Neath, 1995 ; Henry, 2011; inter alia) The auditory kappa effect (Cohen et al., 1954; Henry & McAuley, 2009; inter alia)

The auditory kappa effect In a sequence of level tones, the relative frequency of the tones can distort the perception of silent intervals between them. The two silent intervals t1 and t2 are of the same objective duration, but t2 is perceived as longer than t1 The mind expects that a greater pitch distance will take longer to traverse than a shorter distance, and adjusts perception accordingly.

Does the auditory kappa effect obtain in speech? Conducted an experiment closely modelled after non-speech kappa studies Sequences of short spoken words in place of short tones Used concatenated AXB sequences A, X, and B were all resynthesized versions of the spoken word one Single-word full IP (H* L-L%) To be speech-like (vs. sounding like singing) Symmetrical rise-fall From same 302 ms. naturally spoken base recording

The kappa cell paradigm Following Shigeno, 1986; MacKenzie, 2007 In AXB sequences of sound events, A and B are fixed in pitch space, and in time relative to each other  Only the intermediate event X changes, in both time and pitch space A pitch X B time

The kappa cell paradigm Following Shigeno, 1986; MacKenzie, 2007 In AXB sequences of sound events, A and B are fixed in pitch space, and in time relative to each other  Only the intermediate event X changes, in both time and pitch space A pitch X B time

The kappa cell paradigm Following Shigeno, 1986; MacKenzie, 2007 In AXB sequences of sound events, A and B are fixed in pitch space, and in time relative to each other  Only the intermediate event X changes, in both time and pitch space A pitch X B time

The kappa cell paradigm Following Shigeno, 1986; MacKenzie, 2007 In AXB sequences of sound events, A and B are fixed in pitch space, and in time relative to each other  Only the intermediate event X changes, in both time and pitch space A pitch X B time

Stimuli: timing & pitch steps The whole rise-fall contour was shifted in 1 st. steps Highest contour 8 st above base Base contour had range of 150-200 hz 7 intermediate steps for X AXB sequences concatenated with 2 intervening silences, t1 and t2 t1 + t2 always equal to 1000 ms. 10 time steps for each between 410 and 590 ms.

Stimuli: pitch change direction 2 directions: descending & ascending A B pitch pitch X X B A time time

A sample stimulus A X B

A sample stimulus A X B 6 semitones 2 semitones

A sample stimulus A X B 6 semitones 2 semitones t1=490 ms. t2=510 ms.

A sample stimulus A X B 6 semitones 2 semitones t1=490 ms. t2=510 ms.

Task Task: subjects asked to indicate whether the middle one was closer in time to the first or last one Explicitly instructed to try to ignore pitch 31 subjects 16 for the ascending condition, 15 for the descending All heard 4 repetitions of 70 stimuli (7 pitch steps x 10 time steps)

Results

Results X sounds closer to B

Results X sounds closer to A

Results X is closer to A

Results X is closer to B

Idealized time perception

Expected time perception

Results: all pitch steps merged

Results: time perception by pitch step

Results: time perception by pitch step

Analysis: The kappa effect obtains Subject responses were based primarily on interval duration, but modulated by relative pitch.  As with the kappa effect in non-speech studies, perception of pause duration was distorted by pitch differences  Closer in pitch sounded closer in time. Many possible directions to go… Exploring the magnitude, robustness and generalizability of the effect Order effects Effect of pitch change velocity, length of material Cross linguistic studies How might these same manipulations affect linguistic judgments?

Follow-up experiment: Prosodic Grouping Using the same materials, this time we asked subjects not about the timing of the words, but their “grouping” Did the sequences of numbers sound like (one one) (one) or (one) (one one) ? Identical stimuli to the timing experiment 14 subjects, descending order only

Results: grouping perception Proportion responses: X grouped with B

Results: grouping perception Proportion responses: X grouped with B

Timing perception Grouping perception

Analysis: grouping perception Surprisingly, timing affected judgments of grouping fairly little Items closer in pitch were perceived as grouped together The results looking strikingly different from those of the time judgment task. If the kappa effect is active in speech perception, this in itself is not sufficient to explain the results The effect of pitch looks strikingly categorical Only the middle (ambiguous) pitch steps showed a strong effect of time It looks like pitch distance may have some sort of status of its own for prosodic grouping

Pitch, timing & grouping F0 cues are recognized as important to grouping Phrase accents and boundary tones (Beckman & Ayers Elam,1997) Phrase-initial reset (Jun, 2006; Lin & Fon, 2011) Pitch accent scaling (Ladd, 1988; Féry & Truckenbrodt, 2005) Discourse segmentation (Oliveira & Cunha, 2004 ; Hirschberg, 2004; Carlson et al. 2005) F0 cues are sometimes found to be secondary to timing ones (Holzgrefe et al 2011; Hansson, 2003)   F0  omitted from some studies Quantification of boundary strength based only on objective duration may miss powerful cues from F0.

Exploring pitch/time interaction Investigations of pitch/time interaction in perception may: Shed light on mismatches of duration and phrasing perception Jumps in pitch across pauses may signal stronger boundaries Steady pitch may signal weaker boundary than duration indicates Contribute to our understanding of grouping across phrases Compatible with boundary strength being inherently relative and grouping being recursive (Wagner & Crivellaro, 2010; Kentner & Féry, forthcoming) We may consider pitch distance between phrases (with timing distance) in the light of principles of grouping: Proximity & Anti-proximity (Kentner & Fery, forthcoming) Gestalt principles of grouping (Lerdahl & Jackendoff, 1983; Wertheimer, 1938) Auditory streaming, auditory scene analysis (Bregman, 1990)

Points of departure Listeners are sensitive to F0, even when judging time Perceived time is subject to F0-based distortions Pitch and timing may be in a cue trading relationship (Beach, 1991) Future directions: Segmental length, boundary-related lengthening Interaction of pitch jumps with F0 contour/boundary tone Look for similar effects in other languages Influence of temporal factors on perceived pitch (tau effect) Look at production data, spontaneous speech We should work towards a quantitative measure of boundary strength that incorporates aspects of both pitch and duration

Timing perception Grouping perception There is much to be investigated in the interaction of timing and pitch in speech perception.

Thank you! Acknowledgments: This work was supported by NSF grant #1023853

Timing perception Grouping perception There is much to be investigated in the interaction of timing and pitch in speech perception.

References Beach, C. (1991). The interpretation of prosodic patterns at points of syntactic structure ambiguity: Evidence for cue trading relations. Journal of Memory and Language, 30(6): 644–663. Beckman, M., & Ayers Elam, G. (1997). Guidelines for ToBI Labelling. (v. 3). Carlson, R., Hirschberg, J., & Swerts, M. (2005). Cues to upcoming Swedish prosodic boundaries: Subjective judgment studies and acoustic correlates. Speech Communication, 46(3-4), 326–333. Cohen, J., Hansel, C. & Sylvester, J. (1954). Interdependence of temporal and auditory judgments. Nature, 174: 642–644. Crowder, R. & Neath, I. (1995). The influence of pitch on time perception in short melodies. Music Perception, 12(4): 379–386. Cumming, R. (2001).  The effect of dynamic fundamental frequency on the perception of duration. Journal of Phonetics, 39(3): 375–387. Féry, C. & Truckenbrodt, H. (2005).  Sisterhood and tonal scaling. Studia  Linguistica, 59(3): 223-243. Hansson, P., 2003. Prosodic phrasing in spontaneous Swedish. PhD thesis. Lund University, Sweden.

Henry, M. & McAuley, J.  (2009). Evaluation of an imputed pitch velocity model of the auditory kappa effect. Journal of Experimental Psychology: HPP, 35(2): 551–564. Henry, M. (2011). A Test of an Auditory Motion Hypothesis for Continous and Discrete Sounds Moving in Pitch Space. PhD. Dissertation. Bowling Green State University. Hirschberg, J. (2004). Pragmatics and intonation. The handbook of pragmatics, 515–537. Holzgrefe, J., Schröder, C., Höhle, B. & Wartenburger, I.  (2011). Neurophysiological investigations on the processing of prosodic boundary cues. ETAP 2, Montreal. Jun, S.-A. (2006). Intonational phonology of Seoul Korean revisited. In T. Vance & K.  Jones (Eds.), Japanese/Korean Linguistics 14 (p. 15-26). Stanford: CSLI. Kentner, G. & Féry, C. (forthcoming). A new approach of prosodic grouping. The Linguistic Review. Ladd, D. (1988). Declination ‘reset’ and the hierarchical organization of utterances.JASA, 84: 530-544. Lerdahl, F., Jackendoff, R., 1983. A generative theory of tonal music. The MIT Press.

Lin, H. & Fon, J. (2011). The role of pitch reset in perception at discourse boundaries.  ICPhS XVII, Hong Kong. MacKenzie, N.  (2007). The kappa effect in pitch/time context. PhD. Dissertation, Ohio State University. Oliveira, M., Jr, & Cunha, D. (2004). Prosody As Marker of Direct Reported Speech Boundary. Speech Prosody. Shigeno, S. (1986). The auditory tau and kappa effects for speech and nonspeech stimuli. Perception & Psychophysics, 40(1): 9–19. Wagner, M. & Crivellaro,  (2010). Relative Prosodic Boundary Strength and Prior Bias in Disambiguation. SpPros, Chicago. Wertheimer, M. (1938). Laws of organization in perceptual forms. In: Ellis, W. (Ed.), A source book of Gestalt psychology. London: Routledge & Kegan Paul, pp. 71–88. Wightman, C., Shattuck-Hufnagel, S., Ostendorf, M. & Price, P.  (1992).  Segmental durations in the vicinity of prosodic phrase boundaries. JASA, 91: 1707–1717. Yu, A. (2010). Tonal effects on perceived vowel duration. In C. Fougeron, B. Kühnert, M. D’Imperio & N. Vallée (Eds.), Papers in Lab. Phon. (10). Berlin: M. de Gruyter.