Download presentation
Presentation is loading. Please wait.
Published bySilas Horn Modified over 9 years ago
1
Perceptual and Neural Modeling Automatic Speech Attribute Transcription (ASAT) Project Sorin Dusan Center for Advanced Information Processing Rutgers University Piscataway, NJ Project Kickoff Meeting – Rutgers University 9-13-04
2
Can Automatic Speech Recognition Learn from Human Speech Perception? Human auditory system as a model (Geisler ’98, Warren ’99, Plomp ’02, Ledoux ‘02) The neuro-cognitive process of speech perception is still not totally understood More understanding today about auditory processing and speech perception than 30-50 years ago due to technology advances: functional magnetic resonance imaging (fMRI), positron emission tomography (PET), magneto- encephalography (MEG) Better models of speech perception that explain the data (e.g., FLMP Oden&Massaro ‘78, TRACE McClelland&Elman ‘86) View of speech perception as a process related to other processes of perceptions (e.g., reading – Massaro ‘87) Take an engineering look at recent findings and understandings about auditory system and speech perception from neuroscience and psychology Sorin Dusan Sept. 13, 2004 NSF ASAT Project
3
Automatic Speech Recognition: from Sound to Words What are the possible levels of perceptual representations in speech: words, phonemes, features? The use of subword units for ASR is extremely appealing due to the increased efficiency of modeling, but … Any kind of subword “units” of speech recognition could damage the sound-to-words mapping accuracy Is it possible to replace the phoneme? Is it the right time to dethrone the phoneme in speech processing? Sorin Dusan Sept. 13, 2004 NSF ASAT Project Words Phonemes Features Sound words phonemes features Neural Speech Processing
4
Automatic Speech Recognition: from Sound to Words The ASR can be simply seen as a mapping from acoustics to words with no hard-coded intermediate units Can one build a system to directly map sound or features to lexical representations? (Marslen-Wilson&Warren ’94) What are the system architectural implications of such a mapping? (levels, complexity, processing time, etc.) Sorin Dusan Sept. 13, 2004 NSF ASAT Project Speech Sound Word 1 Word 2 Word 3 Word N Measurements Phonological Features Hypothesis 1: 1 2 3 4 Complexity: 1 -> 2 -> 3 -> 4
5
Automatic Speech Recognition: from Sound to Words Speech recognition could be a heterogeneous process using simultaneously multiple types of phonological representations (features, phonemes, diphones, syllables, words) Test this hypothesis by building a hybrid system using for example both features and phonemes and compare performance with those of individual systems Add a top-down structure for context and knowledge integration to the system that uses the same processing principle as the bottom-up structure (Plomp ’02, Massaro ’75) Sorin Dusan Sept. 13, 2004 NSF ASAT Project Hypothesis 2: Speech Sound Feature-Based Recognizer Phoneme-Based Recognizer Word-Based Recognizer Fusion Word 1 Word 2 Word N
6
References Geisler, C. D., From Sound to Synapse, Oxford University Press, 1998 Ledoux, J., Synaptic Self: How Our Brains Become Who We Are, New York, 2002 Massaro, D. W., Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry, LEA Publishers, Hillsdale, London, 1987 Marslen-Wilson, W. and Warren, P., “Levels of Perceptual Representation and Process in Lexical Access: Words, Phonemes, and Features”, Psychological Review, Vol. 101, Issue 4, pp. 653-675, 1994 Massaro, D. W., Understanding Language – An Information Processing Analysis of Speech Perception, Reading, and Psycholinguistics, Academic Press, New York, 1975 McClelland, J. L. and Elman, J. L., “The TRACE Model of Speech Perception”, Cognitive Psychology, Vol. 18, 1-86, 1986 Oden, G. C. and Massaro, D. W., “Integration of Featural Information in Speech Perception”, Psychological Review, Vol. 85, pp. 172-191, 1978 Plomp, R., The Intelligent Ear, LEA Publishers, Mahwah, London, 2002 Warren, R. M., Auditory Perception – A New Analysis and Synthesis, Cambridge University Press, 1999 Sorin Dusan Sept. 13, 2004 NSF ASAT Project
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.