Presentation is loading. Please wait.

Presentation is loading. Please wait.

Perceptual and Neural Modeling Automatic Speech Attribute Transcription (ASAT) Project Sorin Dusan Center for Advanced Information Processing Rutgers University.

Similar presentations


Presentation on theme: "Perceptual and Neural Modeling Automatic Speech Attribute Transcription (ASAT) Project Sorin Dusan Center for Advanced Information Processing Rutgers University."— Presentation transcript:

1 Perceptual and Neural Modeling Automatic Speech Attribute Transcription (ASAT) Project Sorin Dusan Center for Advanced Information Processing Rutgers University Piscataway, NJ Project Kickoff Meeting – Rutgers University 9-13-04

2 Can Automatic Speech Recognition Learn from Human Speech Perception?  Human auditory system as a model (Geisler ’98, Warren ’99, Plomp ’02, Ledoux ‘02)  The neuro-cognitive process of speech perception is still not totally understood  More understanding today about auditory processing and speech perception than 30-50 years ago due to technology advances: functional magnetic resonance imaging (fMRI), positron emission tomography (PET), magneto- encephalography (MEG)  Better models of speech perception that explain the data (e.g., FLMP Oden&Massaro ‘78, TRACE McClelland&Elman ‘86)  View of speech perception as a process related to other processes of perceptions (e.g., reading – Massaro ‘87)  Take an engineering look at recent findings and understandings about auditory system and speech perception from neuroscience and psychology Sorin Dusan Sept. 13, 2004 NSF ASAT Project

3 Automatic Speech Recognition: from Sound to Words  What are the possible levels of perceptual representations in speech: words, phonemes, features?  The use of subword units for ASR is extremely appealing due to the increased efficiency of modeling, but …  Any kind of subword “units” of speech recognition could damage the sound-to-words mapping accuracy  Is it possible to replace the phoneme? Is it the right time to dethrone the phoneme in speech processing? Sorin Dusan Sept. 13, 2004 NSF ASAT Project Words Phonemes Features Sound words phonemes features Neural Speech Processing

4 Automatic Speech Recognition: from Sound to Words  The ASR can be simply seen as a mapping from acoustics to words with no hard-coded intermediate units  Can one build a system to directly map sound or features to lexical representations? (Marslen-Wilson&Warren ’94)  What are the system architectural implications of such a mapping? (levels, complexity, processing time, etc.) Sorin Dusan Sept. 13, 2004 NSF ASAT Project Speech Sound Word 1 Word 2 Word 3 Word N Measurements Phonological Features Hypothesis 1: 1 2 3 4 Complexity: 1 -> 2 -> 3 -> 4

5 Automatic Speech Recognition: from Sound to Words  Speech recognition could be a heterogeneous process using simultaneously multiple types of phonological representations (features, phonemes, diphones, syllables, words)  Test this hypothesis by building a hybrid system using for example both features and phonemes and compare performance with those of individual systems  Add a top-down structure for context and knowledge integration to the system that uses the same processing principle as the bottom-up structure (Plomp ’02, Massaro ’75) Sorin Dusan Sept. 13, 2004 NSF ASAT Project Hypothesis 2: Speech Sound Feature-Based Recognizer Phoneme-Based Recognizer Word-Based Recognizer Fusion Word 1 Word 2 Word N

6 References  Geisler, C. D., From Sound to Synapse, Oxford University Press, 1998  Ledoux, J., Synaptic Self: How Our Brains Become Who We Are, New York, 2002  Massaro, D. W., Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry, LEA Publishers, Hillsdale, London, 1987  Marslen-Wilson, W. and Warren, P., “Levels of Perceptual Representation and Process in Lexical Access: Words, Phonemes, and Features”, Psychological Review, Vol. 101, Issue 4, pp. 653-675, 1994  Massaro, D. W., Understanding Language – An Information Processing Analysis of Speech Perception, Reading, and Psycholinguistics, Academic Press, New York, 1975  McClelland, J. L. and Elman, J. L., “The TRACE Model of Speech Perception”, Cognitive Psychology, Vol. 18, 1-86, 1986  Oden, G. C. and Massaro, D. W., “Integration of Featural Information in Speech Perception”, Psychological Review, Vol. 85, pp. 172-191, 1978  Plomp, R., The Intelligent Ear, LEA Publishers, Mahwah, London, 2002  Warren, R. M., Auditory Perception – A New Analysis and Synthesis, Cambridge University Press, 1999 Sorin Dusan Sept. 13, 2004 NSF ASAT Project


Download ppt "Perceptual and Neural Modeling Automatic Speech Attribute Transcription (ASAT) Project Sorin Dusan Center for Advanced Information Processing Rutgers University."

Similar presentations


Ads by Google