Splice: From vowel offset to vowel onset FIG 3. Example of stimulus spliced from the repetitive syllables. EXPERIMENT 2 (Voicing ID) METHOD Speech materials:

Slides:



Advertisements
Similar presentations
Tom Lentz (slides Ivana Brasileiro)
Advertisements

CNBH, Physiology Department, Cambridge University 2. Experimental procedure The experiment is a 2AFC paradigm design in which.
Tone perception and production by Cantonese-speaking and English- speaking L2 learners of Mandarin Chinese Yen-Chen Hao Indiana University.
Sounds that “move” Diphthongs, glides and liquids.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Hearing relative phases for two harmonic components D. Timothy Ives 1, H. Martin Reimann 2, Ralph van Dinther 1 and Roy D. Patterson 1 1. Introduction.
Voice quality variation with fundamental frequency in English and Mandarin.
Effects of Competence, Exposure, and Linguistic Backgrounds on Accurate Production of English Pure Vowels by Native Japanese and Mandarin Speakers Malcolm.
Voice Onset Time (VOT) An Animated and Narrated Glossary of Terms used in Linguistics presents.
The perception of dialect Julia Fischer-Weppler HS Speaker Characteristics Venice International University
Phonetic variability of the Greek rhotic sound Mary Baltazani University of Ioannina, Greece  Rhotics exhibit considerable phonetic variety cross-linguistically.
Infant sensitivity to distributional information can affect phonetic discrimination Jessica Maye, Janet F. Werker, LouAnn Gerken A brief article from Cognition.
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Interlanguage Production of English Stop Consonants: A VOT Analysis Author: Liao Shu-jong Presenter: Shu-ling Hung (Sherry) Advisor: Raung-fu Chung Date:
Speech perception 2 Perceptual organization of speech.
Development of Speech Perception. Issues in the development of speech perception Are the mechanisms peculiar to speech perception evident in young infants?
Nuclear Accent Shape and the Perception of Prominence Rachael-Anne Knight Prosody and Pragmatics 15 th November 2003.
Speech and speaker normalization (in vowel normalization)
Perception of syllable prominence by listeners with and without competence in the tested language Anders Eriksson 1, Esther Grabe 2 & Hartmut Traunmüller.
Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.
Sentence Durations and Accentedness Judgments ABSTRACT Talkers in a second language can frequently be identified as speaking with a foreign accent. It.
Identification and discrimination of the relative onset time of two component tones: Implications for voicing perception in stops David B. Pisoni ( )
Phonetic Similarity Effects in Masked Priming Marja-Liisa Mailend 1, Edwin Maas 1, & Kenneth I. Forster 2 1 Department of Speech, Language, and Hearing.
Profile of Phoneme Auditory Perception Ability in Children with Hearing Impairment and Phonological Disorders By Manal Mohamed El-Banna (MD) Unit of Phoniatrics,
TEMPLATE DESIGN © Perceptual compensation for /u/-fronting in American English KATAOKA, Reiko Department.
TEMPLATE DESIGN © Listener’s variation in phoneme category boundary as a source of sound change: a case of /u/-fronting.
Chapter three Phonology
Influence of Word Class Proportion on Cerebral Asymmetries for High and Low Imagery Words Christine Chiarello 1, Connie Shears 2, Stella Liu 3, and Natalie.
Auditory-acoustic relations and effects on language inventory Carrie Niziolek [carrien] may 2004.
The ‘when’ pathway of the right parietal lobe L. Battelli A. Pascual - LeoneP. Cavanagh.
Distinguishing Evidence Accumulation from Response Bias in Categorical Decision-Making Vincent P. Ferrera 1,2, Jack Grinband 1,2, Quan Xiao 1,2, Joy Hirsch.
Conclusions  Constriction Type does influence AV speech perception when it is visibly distinct Constriction is more effective than Articulator in this.
Interarticulator programming in VCV sequences: Effects of closure duration on lip and tongue coordination Anders Löfqvist Haskins Laboratories New Haven,
Present Experiment Introduction Coarticulatory Timing and Lexical Effects on Vowel Nasalization in English: an Aerodynamic Study Jason Bishop University.
Sebastián-Gallés, N. & Bosch, L. (2009) Developmental shift in the discrimination of vowel contrasts in bilingual infants: is the distributional account.
Speech Perception. Phoneme - a basic unit of a speech sound that distinguishes one word from another Phonemes do not have meaning on their own but they.
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
Statistical learning, cross- constraints, and the acquisition of speech categories: a computational approach. Joseph Toscano & Bob McMurray Psychology.
Jiwon Hwang Department of Linguistics, Stony Brook University Factors inducing cross-linguistic perception of illusory vowels BACKGROUND.
VOT trumps other measures in predicting Korean children’s early mastery of tense stops Eun Jong Kong Mary E. Beckman Jan Edwards LSA2010 January 7 th.
Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.
Results 1.Boundary shift Japanese vs. English perceptions Korean vs. English perceptions 1.Category boundary was shifted toward boundaries in listeners’
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Results Tone study: Accuracy and error rates (percentage lower than 10% is omitted) Consonant study: Accuracy and error rates 3aSCb5. The categorical nature.
5aSC5. The Correlation between Perceiving and Producing English Obstruents across Korean Learners Kenneth de Jong & Yen-chen Hao Department of Linguistics.
Acoustic Cues to Laryngeal Contrasts in Hindi Susan Jackson and Stephen Winters University of Calgary Acoustics Week in Canada October 14,
1. Background Evidence of phonetic perception during the first year of life: from language-universal listeners to native listeners: Consonants and vowels:
Sh s Children with CIs produce ‘s’ with a lower spectral peak than their peers with NH, but both groups of children produce ‘sh’ similarly [1]. This effect.
Epenthetic vowels in Japanese: a perceptual illusion? Emmanual Dupoux, et al (1999) By Carl O’Toole.
Sensation & Perception
The long-term retention of fine- grained phonetic details: evidence from a second language voice identification training task Steve Winters CAA Presentation.
The New Normal: Goodness Judgments of Non-Invariant Speech Julia Drouin, Speech, Language and Hearing Sciences & Psychology, Dr.
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
CSD 2230 INTRODUCTION TO HUMAN COMMUNICATION DISORDERS Normal Sound Perception, Speech Perception, and Auditory Characteristics at the Boundaries of the.
Neurophysiologic correlates of cross-language phonetic perception LING 7912 Professor Nina Kazanina.
Katherine Morrow, Sarah Williams, and Chang Liu Department of Communication Sciences and Disorders The University of Texas at Austin, Austin, TX
B10. On Segmental Factorability in Second Language Learning Kenneth de Jong and Noah Silbert Linguistics and Cognitive Science, Indiana University (kdejong.
A STUDY ON PERCEPTUAL COMPENSATION FOR / /- FRONTING IN A MERICAN E NGLISH Reiko Kataoka February 14, 2009 BLS 35.
Early Time Course Hemisphere Differences in Phonological & Orthographic Processes Laura K. Halderman 1, Christine Chiarello 1 & Natalie Kacinik 2 1 University.
Speech Perception in Infants Peter D. Eimas, Einar R. Siqueland, Peter Jusczyk, and James Vigorito 1971.
Danielle Werle Undergraduate Thesis Intelligibility and the Carrier Phrase Effect in Sinewave Speech.
Effects of Musical Experience on Learning Lexical Tone Categories
Sentence Durations and Accentedness Judgments
The 157th Meeting of Acoustical Society of America in Portland, Oregon, May 21, pSW35. Confusion Direction Differences in Second Language Production.
Copyright © American Speech-Language-Hearing Association
Elaine R. Hitchcocka, Ph.D., Laura L. Koenigb,c, Ph.D.
Jessica McKee Speech, Language and Hearing Sciences
Vincent Porretta & Benjamin V. Tucker University of Alberta
Speech Perception (acoustic cues)
A Japanese trilogy: Segment duration, articulatory kinematics, and interarticulator programming Anders Löfqvist Haskins Laboratories New Haven, CT.
Presentation transcript:

Splice: From vowel offset to vowel onset FIG 3. Example of stimulus spliced from the repetitive syllables. EXPERIMENT 2 (Voicing ID) METHOD Speech materials: The speech corpus from EXPERIMENT 1 was used. Stimuli: 21 stimuli were spliced from each repetitive utterance as shown in FIG 3. Each stimulus consists of three repeated syllables. VOT and Syllable duration for each stimulus were based on the middle syllable in the stimulus. Listeners: 18 native listeners of American English Task: Four-alternative forced choice test (‘pea’, ‘bee’, ‘eep’, and ‘eeb’) Analysis: We discuss the results for voicing judgments only, not for syllable structure. RESULTS Identification of /p/ is accurate even at fast rates. Accuracy of /b/ identification decreases at fast rates. The perceptual boundary matches the VOT boundaries found in EXP 1. Boundary VOT values were estimated by logistic regression analysis with VOT and SYLLABLE DURATION as predicted variables. Positive slope indicates that as syllable duration increases, boundary VOT values increase. The perceptual boundary from responses to natural speech matches speakers’ produced voicing distinction more accurately than the perceptual boundary from responses to synthesized speech. 2aSC23 Perceptual rate normalization in naturally produced bilabial stops Kyoko Nagao Kenneth de Jong Department of Linguistics, Indiana University The 146 th Meeting of the ASA at Austin TX, November 11, 2003 INTRODUCTION Influence of speech rate 1. Speaking rate affects acoustic properties of voicing production (Miller, Green, & Reeves, 1986; Volaitis & Miller, 1992) VOT systematically increases as syllable duration increases. VOT for voiceless consonants changes more with rate than voiced consonants. The range of VOT distribution increases as syllable duration increases. 2. Speaking rate affects voicing contrast perception (Summerfield 1981; Miller & Volaitis, 1989; Volaitis & Miller, 1992) Changing rate shifts perceptual boundary. The boundary between perceived /b/ and /p/ is at a longer VOT when syllable duration is longer. Changing rate shifts best example. VOT values of the highest rated /p/ are longer when syllable duration is longer. Characteristics of previous studies Production experiments examined rates which were self-controlled by speakers. (Summerfield, 1981; Miller, Green, & Reeves, 1986; Volaitis & Miller, 1992) Perception experiments used synthesized syllables. (Summerfield, 1981; Miller & Volaitis, 1989; Volaitis & Miller, 1992) Studies by Miller and colleagues (Miller & Volaitis, 1989; Volaitis & Miller, 1992) VOT was modified in proportion to the total syllable duration. Stimuli included three categories, /b/, /p/, and a third unnatural category called the exaggerated voiceless consonant (*/p/). The third category was introduced to demarcate the category for /p/. The stimulus space is shown below. The perceptual boundary did not match the boundary estimated from natural speech (Miller, Green, & Reeves, 1986 ) nor other studies ( Lisker & Abramson, 1970, Summerfield, 1981 ). Perceptual VOT boundaries were at much higher values than estimates from production studies. References de Jong, K. (2001a). Effects of syllable affiliation and consonant voicing on temporal adjustment in a repetitive speech-production task. Journal of Speech, Language, and Hearing Research, 44, de Jong, K. (2001b). Rate-induced resyllabification revisited. Language and Speech, 44, Lisker, L., & Abramson, A. S. (1970). The voicing dimension: Some experiments in comparative phonetics. In Proceedings of the 6th International Congress of Phonetic Sciences (pp ). Prague: Academia. Miller, J. L., Green, K. P., & Reeves, A. (1986). Speaking rate and segments: A look at the relation between speech production and speech perception for the voicing contrast. Phonetica, 43, Miller, J. L., & Volaitis, L. E. (1989). Effect of speaking rate on the perceptual structure of a phonetic category. Perception & Psychophysics, 46(6), Summerfield, Q. (1981). Articulatory rate and perceptual constancy in phonetic perception. Journal of Experimental Psychology: Human Perception and Performance, 7(5), Volaitis, L. E., & Miller, J. L. (1992). Phonetic prototypes: Influence of place of articulation and speaking rate on the internal structure of voicing categories. Journal of the Acoustical Society of America, 92(2), EXPERIMENT 1 ( Acoustic analysis ) METHOD Speakers: 4 native speakers of American English (2 female, 2 male) Speech materials: Speech corpus originally collected for studying speech production (de Jong, 2001a, 2001b). Speakers repeated the same syllable approximately 25 times. Syllables used in current study were /pi/ and /bi/. Each syllable was repeated in time with a metronome. The metronome increased the rate of production throughout each utterance. Acoustic measurements: The fastest 21 syllables of each /bi/ and /pi/ utterance were measured. Measures: VOT: Duration from the consonant release to the beginning of voicing Syllable duration: Duration from the consonant release to the ending of vowel The /b/-/p/ boundary was estimated by logistic regression analysis. RESULTS Rate-induced method successfully elicits fast rate of speech. As syllable duration increases, VOT values for /p/ increase. This rate effect on VOT is large for /p/, whereas there is little rate effect on VOT for /b/. Range of VOT distribution of /p/ is wider than /b/. VOT values for /b/ and /p/ are overlapped at fast rates. Some tokens were produced with VOT values not typical of their category; long VOT for voiceless stops, and short VOT for voiced stops. Rate-dependent optimal VOT values for /b/-/p/ boundary in Miller et al (1989) does not differentiate categories successfully at fast rates. Stimulus VOT ranges in the previous perceptual studies greatly exceeded the ranges in natural speech. SUMMARY VOT values in production are rate sensitive. /p/ and /b/ values overlap at fast speech rates. In general, identification of voicing categories is accurate. Identification of /b/ is more affected by speech rate than identification of /p/. The perceptual boundary between voicing categories is rate sensitive. The perceptual VOT boundaries from natural speech match the values found in production. Perceptual boundaries with synthesized speech are much higher, possibly due to task-related effects in previous work, or perhaps due to a hyperspace effect in response to synthetic speech. Goodness judgments are also affected by speech rate. CONCLUSIONS Rate normalization effects from synthesized speech also occur in the perception of natural speech. Speech rate affects both production and perception in a similar manner. The perceptual identification system is neatly tuned to the distributions found in production. The results of goodness ratings indicate that listeners store fine-grained information to distinguish voicing contrasts. Accurate identification of segments with aberrant VOT values suggests listeners use signal attributes in addition to VOT to differentiate the contrast. Even though rate-varied repetitive speech is uncommonly encountered, listeners effectively deal with rate-induced variation in categorization tasks. This suggests an active component in listeners’ perceptual systems which generalizes to novel circumstances. RESEARCH QUESTIONS Is rate-controlled speech similar to rate-self-controlled speech ? Since production and perception studies find different boundaries, how does perception of naturally rate-varied speech compare with studies of previous synthetic speech? How do produced rate variation in voicing and perceived rate-normalized voicing judgments correlate with each other? FIG 1. Stimuli space and perceptual /b/-/p/ boundary (Volaitis & Miller 1992) and /b/-/p/ boundary in production study (Miller et al 1989). FIG 4. Relationship between perception and production results. The size of markers indicates the degree of performance on voicing identifications (The larger, the better). The dotted line is the estimated perceptual /b/-/p/ boundary in Volaitis & Miller (1992) and the straight line is the estimated perceptual boundary from the current study. EXPERIMENT 3 (Goodness Judgments) METHOD Stimuli: The same stimuli from EXPERIMENT 2 were used. Listeners: 17 native listeners of American English (different from EXP 2). Tasks: 1. Two two-alternative forced choice tests, one for consonant identification (‘p’ or ‘b’) and one for syllable structure identification (‘Consonant before Vowel’ or ‘Vowel before Consonant’). 2. Goodness rating of identified consonant from 1 (=Terrible) to 10 (=Excellent). Analysis: We focus on goodness judgments for voicing. Mean goodness ratings were separately calculated for correct and incorrect identification. Stimuli were grouped in terms of values of VOT and syllable duration. RESULTS The consonant identification results from EXP 2 were replicated. Two- tailed paired t-test of (ArcSine transformed) mean percent of /p/ responses revealed no significant difference between EXP 2 and EXP3; p= The highest rated /b/ and /p/ both had greater VOT values as syllable duration increased. Syllable duration affects goodness judgments of consonants. The listeners tended to rate /b/ with relatively short VOT and /p/ with relatively long VOT as better than their more moderate counterparts. This appears to be a ‘Hyperspace effect’. Presumably it would be even more evident with the synthesized stimuli in previous studies. When VOT values were typical of their category, and the listeners categorized such tokens in terms of their VOT, goodness ratings were high. When VOT were typical of their category, and the listeners misidentified a consonant, goodness ratings were lower than when they correctly identified it. Hence, listeners apparently were aware of the stimuli’s properties which mismatched their identifications, and were basing their identifications on some other property of the stimuli. FIG 2. VOT values as function of syllable duration for /b/ and /p/. The dotted line is the optimal VOT values to differentiate /b/ and /p/ given in Miller et al (1989), while the solid line is the estimated boundary from the current study. FIG 5. Mean goodness ratings for correctly identified /b/- and /p/-inputs (left top panel and left bottom panel) and for misidentified /b/- and /p/-inputs (right top panel and right bottom panel) as function of VOT values. Syllable duration (ms) VOT boundary based on production (Miller et al, 1989) 1437 Perceptual /b/-/p/ boundary (Volaitis & Miller, 1992) Best /p/ VOT (Volaitis & Miller, 1992) Table 1. Estimated VOT values of the category boundary and of the best /p/. VOT boundaries in other studies (Lisker & Abramson, 1970; Summerfield, 1980) are 20-30ms. Acknowledgements This work is supported by the NIDCD (grant# R03 DC04095) and the NSF (grant# BCS ). We also appreciate the Indiana University Linguistics Club for their support. Appendix Summary of Logistic regression analysis: /p/-inputs and /p/-responses were coded 1, and /b/-inputs and /b/-responses were coded 0. VOT values at 50% cutoff points (logit = 0.5) were computed as the /p/-/b/ boundaries. logit = *VOT *SYLLABLE DURATION (EXP1) logit = *VOT *SYLLABLE DURATION (EXP2) logit = *VOT *SYLLABLE DURATION (EXP3) Syllable duration (ms) Estimated VOT at boundary (EXP 1) Estimated VOT at boundary (EXP 2) Estimated VOT at boundary (EXP 3)