Words, Voices and Memories: the interaction of linguistic and indexical information in cross-language speech perception Steve Winters (in collaboration with Susie Levi, David Pisoni and Karen Lichtman) Cognition and Cognitive Development Research Group February 5, 2009
Leading Off… A story about God and electricity. The moral of the story? Perceivers make use of whatever information is available to them to make sense of what they’re experiencing. And: perceptual boundaries are not necessarily limited by the intentions of production. “Any potential cue is an actual cue.” (Liberman, 1985) This observation is an essential feature of exemplar- based models of speech perception. (Johnson, 1997)
Exemplar Theory: Basic Precepts 1.Listeners store in memory every speech experience they have in their lifetime. Including all details of those experiences. 2.Speech tokens in memory (traces) are associated with different (linguistic, indexical) category labels. 3.New speech tokens (probes) are categorized on the basis of their similarity to exemplars in memory. Categories which are associated with the most similar exemplars receive the most activation. 4.The perceptual response to speech (the echo) is a weighted average of the activated exemplars.
Important! Echoes are similar to, but more abstract than probes. General properties are not stored explicitly in memory; they just emerge during the processing of speech. Note: Francis Galton
Normalization Theory In contrast, normalization theories of speech perception hold that listeners identify speech sounds on the basis of their defining properties. Mental representations of speech sounds are abstract and sparse [+voice], [-round], [+nasal] “Indexical” properties, such as a speaker’s identity, age, gender, etc., are “extra-linguistic” (Abercrombie, 1967) = “noise” in the signal that perception must sort through to get to the good stuff.
Received Wisdom “...when we learn a new word we practically never remember most of the salient acoustic properties that must have been present in the signal that struck our ears; for example, we do not remember the voice quality, speed of utterance, and other properties directly linked to the unique circumstances surrounding every utterance.” -- Morris Halle (1985) Is this true? Are words and voice quality stored separately in memory?
Is this true? No. Voice information does affect language processing. 1.Phoneme Classification (Mullennix and Pisoni, 1990) 2.Recognition Memory (Palmeri, Goldinger and Pisoni, 1993) 3.Spoken Word Intelligibility (Nygaard, Sommers and Pisoni, 1994)
Maybe the Other Way? No. A familiar language facilitates voice identification. Forensic phonetics and the “voice line-up” task Thompson (1987) English > Spanish-accented English > Spanish Goggin, Thompson, Strube and Simental (1991) English, German, bilingual listeners familiar language > unfamiliar language Sullivan and Schlichting (2000) L2 learners of Swedish > no knowledge of Swedish
Testing Independence Evidence indicates that the indexical and linguistic properties of speech are integrated in memory. Note: in previous research, both language and talkers were changed between listening conditions. Q: What happens when the language changes but the talker remains the same?
Experimental Plan Experiment 1: Cross-language voice identification Train listeners to identify talkers in one language Test ability to generalize to new language Experiment 2: Cross-linguistic transfer task Train listeners to identify talkers in one language Test ability of listeners to recognize words (in noise) spoken by the same talkers in a different language. Experiment 3: Continuous word recognition task Listeners identify words--from a language they don’t know--as “old” or “new” words repeated in either same or different voices
Experiment 1: Hypotheses Q: Can listeners generalize knowledge of talkers across languages? If linguistic and voice information are integrated in processing (and memory): Transfer of talker knowledge across languages should be incomplete Also: expect better talker identification accuracy for familiar language If language and voice information are processed separately: Complete transfer of talker knowledge should occur across languages
Experiment 1: Materials 10 L1 German / L2 English talkers 5 male, 5 female Similar dialect Similar in perceived nativeness These talkers produced 360 CVC English words (e.g., buzz, cheek) 360 CVC German words (e.g., hoch, Rahm)
Listeners 40 L1 English listeners No knowledge of German 20 were trained on English stimuli only 20 were trained on German stimuli only Had to show evidence of learning > 40% accuracy on half of testing sessions
test.tiff Training Demo
Procedure: Training 4 days of training 2 sessions per day (~30 min each) Each session involved: Familiarization: same 5 words from each talker Re-familiarization: same word from each talker Recognition: 5 words/talker, heard twice with feedback Testing: 10 words/speaker no feedback x2
Procedure: Generalization 5th day Familiarization: 3 words from each talker Re-familiarization: 1 word from each talker Testing (both languages) 10 novel words/talker in each language blocked by language counterbalanced for which language was first
test.tiff Generalization Demo
Training Results: no effect of language
Generalization: English Trained
Generalization: German Trained
Findings: Experiment 1 Knowledge of bilinguals’ voices can generalize across languages Training language interacts with generalization = effect of language Listeners learn at the same rate regardless of their training language = no effect of language
Discussion Evidence that talker identification is both language- independent and language-dependent. Q1: What are the language-independent properties of the signal? two possibilities: F0, duration Q2: Why the generalization asymmetry? A1: It’s easier to generalize to a familiar language. A2: Listeners may rely on language-dependent indexical cues only when they hear a language that they know.
F0 chart #1: Female English
F0 Traces, Females, part 2
F0 Chart #3: Females
F0 Chart #4: Female All
Language-Specific Cues? Percent correct talker identifications, in generalization to German, by English-trained listeners: Phonological type% CorrectExamples “Same”50.9%mein, Bahn /x/54.0%sich, Loch Final liquid51.3%Bier, Ball /pf/46.0%Pfiff, Kopf Initial /r/44.4%Rad, Rock Front, rounded V38.9%kühl, schön Long /o/ or /e/38.1%Boot, Weg
Experiment 2 Q: Does knowledge of a talker in one language facilitate linguistic processing of that talker in another? Training task: talker identification English-speaking listeners (monolingual) Bilingual talkers, speaking in either English or German. Testing task: English word recognition in noise Three talker groups: Familiar bilinguals Unfamiliar bilinguals Native English talkers
Experiment 2: Motivation Known: ability to identify a talker’s voice in English facilitates recognition of words spoken by that talker in English. (Nygaard et al., 1994) 1.Exemplar-based account: linguistic representations include talker-specific information. Processing is facilitated by similarity to traces in memory. 2.Normalization account: listeners learn how to filter indexical properties of particular talkers. …thereby becoming more adept at revealing the linguistic core of the spoken word.
Experiment 2: Predictions Known: listeners show complete generalization of talker knowledge from German to English. (Experiment 1) These listeners identify talkers based on language- independent information in speech. Exemplar-based prediction: Learning to identify talkers in German will not facilitate word recognition in English. (Listeners do not develop integrated representations.) Normalization prediction: Listeners filter same talker properties in both languages facilitation should occur across languages.
Experiment 2: Training Listeners were trained to identify voices of either: Group 1 (five German female talkers) Group 2 (five German female talkers) Half trained in German; half trained in English Three days of training Two sessions per day Criterion: 40% correct on at least three training sessions.
Experiment 2: Results
Word Recognition Across Groups
English LearnersGerman Learners Results: Word Recognition, all listeners Interaction between listener and talker groups is not significant.
Splitting Hairs Review of literature revealed that Nygaard et al. (1994) split listeners up into “good” and “poor” listeners. Good listeners = 70% correct or better in training. Poor listeners = < 70% correct in training. Splitting listeners in the same way yielded significant interactions in Experiment 2 data.
Results: Word Recognition, English Listeners Good LearnersPoor Learners Interaction (Good learners): p =.008; Interaction (Poor learners): p =.025.
Results: Word Recognition, German Listeners Good LearnersPoor Learners Interaction between listener and talker groups is not significant.
Discussion (Good) English-trained listeners exhibited better word recognition scores for familiar talkers. (Good) German-trained listeners did not. Familiar talker effect is based on rich, talker-specific linguistic representations… rather than a filtering of “extra-linguistic” talker information. Caveat: some listeners develop these representations better than others.
Patterns 1.English-trained listeners displayed: Interactions between linguistic and talker categories in both experiments. 2.German-trained listeners: No interactions between linguistic and talker categories in either experiment. Implications: English-trained listeners can develop richly detailed, exemplar-like representations of speech. German-trained listeners develop sparser, language-independent representations of voices.
Experiment 3: Motivation Experiments 1 and 2: voice identification training. German-trained listeners developed language- independent representations of voices. One explanation: listeners simply ignore the words. 1.meaningless to them 2.irrelevant to the task Q: Is there a double dissociation? Do English listeners ignore voices when listening to German words?
Experiment 3: Task Task: continuous word recognition Listeners hear a series of words; Must decide if each word is “new” or a “repeat” of an earlier word in the list. The catch: some words are repeated in the same voice others are repeated in a different voice. Finding: same-voice repeats are easier to recognize. (Palmeri et al., 1993) Q: Is this also true in an unfamiliar language?
Experiment 3: Methods Stimuli: German words only 5 male talkers in one series 5 female talkers in another series 160 (distinct) trials in each series 40 repeats in old voice 40 repeats in new voice Listeners: native English listeners 17 with no knowledge of German 19 German L2 learners Sample:
Raw Results Old-Same > Old-Diff, p <.001 No main effect of language group (!)
Velar Fricative Data
Front Rounded Vowel Data
Conclusions 1. Voice information affects word recognition even when listeners hear words from a language they do not know. 2. L2 learners seem to encode exotic sounds more effectively than naïve listeners. Apparent interaction asymmetry: Linguistic information may be ignored Voice information may not
Thoughts for the future? Perhaps we only store what we know how to label. (Pierrehumbert, 2001) An alternative task: continuous voice recognition New voice or old voice? Some old voices repeated with same word; Others repeated with different word. (Anecdotally: this is an extremely difficult task)
Future Directions: Thai What (phonetic) properties do listeners use to identify voices across languages? Are there limitations on the amount and kinds of phonetic information listeners can store in memory? Ex: second language perception Next up: creation of Thai-English bilingual database Thai has: lexical tone distinctions three-way VOT distinctions 3 x 3 vowel space distinction
Future Directions: Thai Talker identification training paradigm: Talker A is associated with Tone 1 Talker B is associated with Tone 2 Talker C is associated with Tone 3, etc. Generalization: Talker A is presented with not-Tone 1 Talker B is presented with not-Tone 2, etc. How much is identification accuracy impaired in generalization? = How robust are the cognitive representations of the talkers’ voices?
Conclusions These results pose something of a challenge to both existing models of speech perception. 1.Generalization of talker knowledge across languages is possible. Extent depends on language of training. 2.Familiar talker facilitation of word recognition is based on talker-specific linguistic representations… Rather than a filtering of extra-linguistic indexical information. 3.Exemplar-like representations seem to depend on: the language being heard the information in the task which can be ignored
Future Directions Theoretical possibility: you only store in exemplars what you know how to label. (Pierrehumbert, 2001) But then…how do you learn how to label it? Research Option #1: repeat experiments with bilinguals, who know category labels for both languages. Should get interactions both ways. Research Option #2: try languages which aren’t so phonetically similar. Ex: Chinese (tone), Japanese (F0 differences)
Study #1
Answer #1 Theories? Integral (exemplar) view: language should affect the perception of talker identity (talker identification process depends on the phonological particulars in different languages) Separable (formal) view: voice information is processed separately from language. Predicts transfer of knowledge across languages …because listeners attend to language- independent voice information.
Nobody’s Perfect Do listeners really store every acoustic detail of spech exemplars in memory? Some limitations to think about: 1.Storage space/forgetting 2.Categorical Perception: sound discrimination follows sound identification Note: this works best with speech sounds that exhibit rapid spectral changes (POA) rather than more gradual or stable spectral qualities (vowels) 3.Second Language (L2) speech perception 4.Development of first language (L1) speech perception
Questions We know that linguistic and indexical information interact within a given language… But how language-dependent is a listener’s mental representation of a particular talker’s voice? If you know what a bilingual talker sounds like in one language… Can you identify them when they are speaking in a different language? o voice and linguistic information interact when listeners identify bilingual talkers across languages? (and are there any language-independent properties in the signal that listeners might attend to when identifying voices speaking in two different languages?) How much do we need to consider voice information in phonological representations of speech?
References Garner, W.R. (1974) The Processing of Information and Structure. Erlbaum, Potomac, Wiley, New York. Garner and Felfoldy, 1970 W.R. Garner and G.L. Felfoldy, Integrality of stimulus dimensions in various types of information processing, Cognit. Psychol. 1 (1970), pp. 225–241.
Leading Off… A story about God and electricity. The moral of the story? Perceivers will make use of whatever information is available to them to make sense of what they’re experiencing. “Any potential cue is an actual cue.” (Klatt, 1979) Any arbitrary detail may be perceived to contribute to the meaning of a ritualistic act. …especially if it’s consistent. The mind does not always draw the perceptual boundaries you expect it to draw. especially if the meaning of what it perceives is arbitrary. Also: any potential cue is an actual cue. (Klatt, 1979)
All is One? Cognitive psychologists distinguish between integral and separable dimensions in perception. (Garner, 1974) Integral dimensions: size and shape of polygons in an area estimation task E.g. which is bigger:
All is One? Cognitive psychologists distinguish between integral and separable dimensions in perception. Separable dimensions: size and color of polygons in an area estimation task E.g. which is bigger:
Linguistic Distinctions In formal phonology, features of speech sounds have traditionally been represented as a set of separable dimensions. (Chomsky & Halle, 1968) [+voice], [-back], [-round], etc. This set has been modified somewhat over the years… But the basic set of distinctive features still operates independently of “micro-features” …such as the phonetic properties of different speakers’ voices or the environmental setting in which sounds are heard, etc.
Interactions and Exemplars The interaction of voice and linguistic information in speech perception seems to supports the predictions of exemplar theories of speech perception.
Two Theories The normalization view: Categorization of speech sounds operates on the basis of properties in the signal. Linguistic and indexical properties are orthogonal. (=voice information is meaningless noise) The exemplar view: Categorization of speech sounds operates on the basis of exemplars in memory. Linguistic and indexical properties are not separated in cognitive representations… And may be integrated in processing.
Questions We know that linguistic and indexical information interact within a given language… But how language-dependent is a listener’s mental representation of a particular talker’s voice? If you know what a bilingual talker sounds like in one language… Can you identify them when they are speaking in a different language? How much do we need to consider voice information in phonological representations of speech?
Game Plan Study 1: identification of bilingual talkers across languages Study 2: transfer of voice identification knowledge to a word identification task (across languages) Study 3: continuous word recognition in an unfamiliar language Wrap-up/future plans.
Experiment 2: Methods For starters: a norming study. 32 monolingual, native English listeners 15 female talkers: Group 1: 5 native English talkers Group 2: 5 native German-English bilinguals Group 3: 5 native German-English bilinguals Task: identify words spoken in four levels of white noise by all three groups of talkers.
Baseline Word Recognition in English Graph and Demos