Speech Perception Models

Slides:



Advertisements
Similar presentations
Cognitive Systems, ICANN panel, Q1 What is machine intelligence, as beyond pattern matching, classification and prediction. What is machine intelligence,
Advertisements

Accessing spoken words: the importance of word onsets
Marslen-Wilson Big Question: “What processes take place during the period that the sensory information is accumulating for the listener” during spoken.
Psycholinguistic what is psycholinguistic? 1-pyscholinguistic is the study of the cognitive process of language acquisition and use. 2-The scope of psycholinguistic.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
The Perception of Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes –examples of phonemes: /ba/ in bat, /pa/
Language Comprehension Speech Perception Semantic Processing & Naming Deficits.
Speech perception 2 Perceptual organization of speech.
Development of Speech Perception. Issues in the development of speech perception Are the mechanisms peculiar to speech perception evident in young infants?
The Perception of Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes –examples of phonemes: /ba/ in bat, /pa/
Speech Perception Overview of Questions Can computers perceive speech as well as humans? Does each word that we hear have a unique pattern associated.
Exam 1 Monday, Tuesday, Wednesday next week WebCT testing centre Covers everything up to and including hearing (i.e. this lecture)
Cognitive Processes PSY 334 Chapter 2 – Perception April 9, 2003.
Organizational Notes no study guide no review session not sufficient to just read book and glance at lecture material midterm/final is considered hard.
PSY 369: Psycholinguistics
Psych 56L/ Ling 51: Acquisition of Language Lecture 8 Phonological Development III.
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
Visual Cognition II Object Perception. Theories of Object Recognition Template matching models Feature matching Models Recognition-by-components Configural.
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
What is Cognitive Science? … is the interdisciplinary study of mind and intelligence, embracing philosophy, psychology, artificial intelligence, neuroscience,
What is Cognitive Science? … is the interdisciplinary study of mind and intelligence, embracing philosophy, psychology, artificial intelligence, neuroscience,
Language Comprehension Speech Perception Naming Deficits.
Prerequisites for a Theory of Intelligence - G. Ananthakrishnan -Simon Benjaminsson.
The Perception of Speech
Psycholinguistics 05 Internal Lexicon.
Language Comprehension Speech Perception Meaning Representation.
Cognitive Processes PSY 334 Chapter 2 – Perception.
Psych 56L/ Ling 51: Acquisition of Language Lecture 8 Phonological Development III.
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Speech Science Fall 2009 Nov 2, Outline Suprasegmental features of speech Stress Intonation Duration and Juncture Role of feedback in speech production.
Speech Science IX How is articulation organized? Version WS
SPEECH PERCEPTION DAY 18 – OCT 9, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
Pragmatically-guided perceptual learning Tanya Kraljic, Arty Samuel, Susan Brennan Adaptation Project mini-Conference, May 7, 2007.
The Holes in the Brian Help Us Sort Out Sounds..  I. The Brain’s ability to sort out sounds  1. speech sounds are categorized.  2.Misinterpretations.
Sounds and speech perception Productivity of language Speech sounds Speech perception Integration of information.
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
Speech Perception.
Language Perception.
The Process of Forming Perceptions SHMD219. Perception The ability to see, hear, or become aware of something through the senses. Perception is a series.
Against formal phonology (Port and Leary).  Generative phonology assumes:  Units (phones) are discrete (not continuous, not variable)  Phonetic space.
Chapter 1 Introduction PHONOLOGY (Lane 335). Phonetics & Phonology Phonetics: deals with speech sounds, how they are made (articulatory phonetics), how.
VISUAL WORD RECOGNITION. What is Word Recognition? Features, letters & word interactions Interactive Activation Model Lexical and Sublexical Approach.
Adapted from by E.Day THE COGNITIVE APPROACH TYPES OF PROCESSING.
Chapter 9 Knowledge. Some Questions to Consider Why is it difficult to decide if a particular object belongs to a particular category, such as “chair,”
Brain Mechanisms in Early Language Acquisition
Recognizing Visual and Auditory Stimuli
PSYC 206 Lifespan Development Bilge Yagmurlu.
Human MEMORY.
Cognitive Processes in SLL and Bilinguals:
Phonological Priming and Lexical Access in Spoken Word Recognition
Cognitive Processes PSY 334
Copyright © American Speech-Language-Hearing Association
Abstraction versus exemplars
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
What is Linguistics? The scientific study of human language
Speech Perception.
Sensation and Perception
Language Comprehension
Introduction to Linguistics
Cognitive Processes PSY 334
SENSATION AND PERCEPTION
عمادة التعلم الإلكتروني والتعليم عن بعد
SENSATION AND PERCEPTION
Phonological Priming and Lexical Access in Spoken Word Recognition
Perception.
Motor theory.
SENSATION AND PERCEPTION
Introducing SLA of phonology research:
Topic: Language perception
Presentation transcript:

Speech Perception Models Perry C. Hanavan, Au.D.

Speech Perception Issues Linearity Segmentation Speaker normalization Basic unit of perception Specialization of speech perception

Linearity & Segmentation Linearity Principle: A specific sound in a word corresponds to specific phoneme Segmentation the ability to break the spoken language signal into the parts that make up words Thus, these two principles suggest speech perception is based on a linear correspondence between the acoustic signal and the phoneme units Although we perceive speech as a series of separate and distinct phonemes and words, the acoustic boundaries between phonemes is blurred eg. /ki/ vs. /ku/ (speech is not invariant)

Speaker Normalization How are listeners able to recognize speech sounds and words despite wide variations in speaker production? Speaker variations Gender differences Age differences Normalization is the ability to perceive words spoken by different speakers, at different rates, and in different phonetic contexts as the same.

Basic Unit of Perception What is the basic unit of speech perceptions? Acoustic-phoneme features Allophones Phonemes Syllables Words Listening in noise (focus on smaller units) Young children focus on syllables and formant transitions

Specialization of Speech Perception Is speech perception a specialized function/process in humans However, animals have been able to demonstrate categorical perception Perceptual magnet effect not demonstrated in animals

Categories of Speech Perception Theories Active vs. Passive Bottom up Top Down Autonomous vs. Interactive

Active vs. Passive Active theories suggests that speech perception and production are closely related Listener knowledge of how sounds are produced facilitates recognition of sounds Passive theories emphasizes the sensory aspects of speech perception Listeners utilize internal filtering mechanisms Knowledge of vocal tract characteristics plays a minor role, for example when listening in noise conditions

Bottom up Top Down Top-down processing works with knowledge a listener has about a language, context, experience, etc. Listeners use stored information about language and the world to make sense of the speech Bottom-up processing works in the absence of a knowledge base providing top-down information listeners receive auditory information, convert it into a neural signal and process the phonetic feature information

Autonomous vs. Interactive Autonomous theories posit feed-forward processing with lexical influence restricted to post-perceptual decision processes (uni-directional) Interactive theories posit information and knowledge from many sources available to the listener a re involved at any or all stages of the processing of the signal (bi-directional)

Speech Perception Theories Motor Theory Acoustic Invariance Theory Direct Realism Trace Model Logogen Theory Cohort Theory Fuzzy Logic Model of Perception Native Language Magnet Theory

Question This theory postulates speech is perceived by reference to how it is produced: Motor Theory Acoustic Invariance Theory Direct Realism Trace Model Logogen Theory Cohort Theory Fuzzy Logic Model of Perception Native Language Magnet Theory

Motor Theory Postulates speech is perceived by reference to how it is produced when perceiving speech, listeners access their own knowledge of how phonemes are articulated Articulatory gestures (such as rounding or pressing the lips together) are units of perception that directly provide the listener with phonetic information Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967

Question The acoustic properties of the landmarks constitute the basis for establishing the distinctive features: Motor Theory Acoustic Invariance Theory Direct Realism Trace Model Logogen Theory Cohort Theory Fuzzy Logic Model of Perception Native Language Magnet Theory

Acoustic Invariance Theory Listeners inspect the incoming signal for the so-called acoustic landmarks which are particular events in the spectrum carrying information about gestures which produced them. Gestures are limited by the capacities of humans’ articulators and listeners are sensitive to their auditory correlates, the lack of invariance simply does not exist in this model. The acoustic properties of the landmarks constitute the basis for establishing the distinctive features. Bundles of the distinctive features uniquely specify phonetic segments (phonemes, syllables, words).  Stevens, K.N. (2002). "Toward a model of lexical access based on acoustic landmarks and distinctive features" (PDF). Journal of the Acoustical Society of America 111 (4): 1872–1891. 

Question Hypothesizes that perception allows listeners to have direct awareness of the world because it involves direct recovery of the distal source of the event that is perceived. Motor Theory Acoustic Invariance Theory Direct Realism Trace Model Logogen Theory Cohort Theory Fuzzy Logic Model of Perception Native Language Magnet Theory

Direct Realism Hypothesizes that perception allows listeners to have direct awareness of the world because it involves direct recovery of the distal source of the event that is perceived. Asserts that the objects of perception are actual vocal tract movements, or gestures, and not abstract phonemes or (as in the Motor Theory) events that are causally antecedent to these movements, i.e. intended gestures. Listeners perceive gestures not by means of a specialized decoder (as in the motor theory) but because information in the acoustic signal specifies the gestures that form it. Suggests that the actual articulatory gestures that produce different speech sounds are themselves the units of speech perception. Fowler, C. A. (1986). "An event approach to the study of speech perception from a direct-realist perspective". Journal of Phonetics14: 3–28.

Trace Model Assumes there is a cognitive unit for each feature (for example, nasality) at the feature level, for each phoneme at the phoneme level, and for each word at the word level. At any given time, all of these units are activated to a greater or lesser extent, as opposed to all or none. When units are activated above a certain threshold, they may influence other units at the same or different levels. These effects may be either excitatory or inhibitory; that is, they may increase or decrease the activation of other units. The entire network of units is referred to as the trace, because “the pattern of activation left by a spoken input is a trace of the analysis of the input at each of the three processing levels” The network is active and changes with subsequent input. McClelland, J.L., & Elman, J.L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18, 1-86

TRACE Model For example, a listener hears the beginning of bald, and the words bald, ball, bad, bill become active in memory. Then, soon after, only bald and ball remain in competition (bad, bill have been eliminated because the vowel sound doesn't match the input). Soon after, bald is recognized. TRACE simulates this process by representing the temporal dimension of speech, allowing words in the lexicon to vary in activation strength, and by having words compete during processing. Figure 1 shows a line graph of word activation in a simple TRACE simulation.

Logogen Theory Model designed to explain word recognition using a new type of unit known as a “logogen" A critical element, lexicons, or specialized aspects of memory that include semantic and phonemic information about each item that is contained in memory. A given lexicon consists of many smaller, abstract items known as logogens. Logogens contain a variety of properties about given word such as their appearance, sound, and meaning. Logogens do not store words within themselves, they store information that is specifically necessary for retrieval of whatever word is being searched for. Morton, J. (1969). Interaction of information in word recognition. Psychological Review, 76, 165-178 

Logogen Theory A given logogen will become activated by stimuli or contextual information (words) that is consistent with the properties of that specific logogen and when the logogen's activation level rises to or above its threshold level, the pronunciation of the given word is sent to the output system. Certain stimuli can affect the activation levels of more than one word at a time, usually involving words that are similar to one another. When this occurs, whichever of the words' activation levels reaches the threshold level, it is that word that is then sent to the output system with the listener remaining unaware of any partially excited logogens.

Cohort Theory Designed specifically to account for auditory word recognition. Breaks the word down. This model states that when a word is heard all words that begin with the first sound of the target word are activated. This set of words is considered the Cohort. Once the first cohort has been activated, the other information, or sounds in the word narrow down the choices. The listener recognizes the word when left with a single choice; considered the "recognition point." Marslen-Wilson, W. (1987). "Functional parallelism in spoken word recognition." Cognition, 25, 71-102.

Fuzzy Logic Model of Perception Proposes that people remember speech sounds in a probabilistic, or graded, way. Suggests people remember descriptions of the perceptual units of language, called prototypes. Within each prototype various features may combine. Features are not binary (true or false) -- there is a fuzzy value corresponding to how likely it is that a sound belongs to a particular speech category. When perceiving a speech signal, decision about what is actually heard is based on the relative goodness of the match between the stimulus information and values of particular prototypes. The final decision is based on multiple features or sources of information, even visual information (this explains the McGurk effect). Computer models of the fuzzy logical theory demonstrate that the theory's predictions of how speech sounds are categorized correspond to the behavior of human listeners iBaldi Massaro D. The logic of the fuzzy logical model of perception Behavioral and Brain Sciences (1989), 12: 778-794 Cambridge University Press

Native Language Magnet Theory According to Kuhl’s (1994) Native Language Magnet (NLM) theory the phonetic perceptual space is organized in terms of prototypes. Prototypes are defined as particularly good category exemplars that function as perceptual references for linguistic phonetic units (mental representations or perceptual maps of the speech). Prototypes function as “perceptual-magnets” and exert an attracting force on neighboring auditory representations which tend to be assimilated by (attracted towards) the prototype that conforms to phonetic categories in the language that is heard Thus, the perceptual space appears to be warped in the neighborhood of a prototype because a prototype attracts exemplars that fall within its zone of influence. Patricia K Kuhl, Barbara T Conboy, Sharon Coffey-Corina, Denise Padden, Maritza Rivera-Gaxiola, and Tobey Nelson  Phonetic learning as a pathway to language: new data and native language magnet theory expanded (NLM-e) Phil Trans R Soc B 2008 363: 979-1000.