Download presentation
Presentation is loading. Please wait.
Published byAlexina Stafford Modified over 9 years ago
1
Prof. Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49 681) 302-5341 e-mail: wahlster@dfki.de WWW:http://www.dfki.de/~wahlster Brain and Communication Mainz Friday, 24 November 2000 Computers that read, hear and understand
2
© Wolfgang Wahlster, DFKI Pervasive Speech and Language Technology A capuccino in 10 minutes, please! Send the following email to Mark Maybury: Hi Mark, please forward the following agenda to your project partners! Let‘s go to Baker Street in Berkeley! I would like to hear Mozart‘s piano concert No. 3! Speech-controlled coffee machine Speech-based car navigation Speech-enabled music selection Dictation
3
© Wolfgang Wahlster, DFKI Show me all CNN news of the last 3 months that feature Bill Clinton discussing health care! I would like to make an appointment with Dr. Kuremastu in Kyoto next week! Pervasive Speech and Language Technology What has Jim Hendler said about DAML during our recent Dagstuhl seminar? Information on demand Audio Mining Speech-to-Speech Translation
4
© Wolfgang Wahlster, DFKI What has the speaker said? 100 Alternatives What has the speaker meant? 10 Alternatives What does the speaker want? Unambiguous Understanding in the Dialog Context Reduction of Uncertainty Sprachanalyse Speech Recognition Speech Input Discourse Context Knowledge about Domain of Discourse Grammar Lexical Meaning Acoustic Language Models Word Lists Speech Analysis Speech Under- standing Three Levels of Language Processing
5
© Wolfgang Wahlster, DFKI Input Conditions Naturalness Adaptability Dialog Capabilities Increasing Complexity Close-Speaking Microphone/Headset Push-to-talk Telephone, Pause-based Segmentation Isolated Words Read Continuous Speech Speaker Independent Speaker Dependent Monolog Dictation Information- seeking Dialog Open Microphone, GSM Quality Spontaneous Speech Speaker adaptive Multiparty Negotiation Challenges for Language Engineering
6
© Wolfgang Wahlster, DFKI Wann fährt der nächste Zug nach Hamburg ab? When does the next train to Hamburg depart? Wo befindet sich das nächste Hotel? Where is the nearest hotel? Context-Sensitive Speech-to-Speech Translation Verbmobil Server
7
© Wolfgang Wahlster, DFKI Mobile Speech-to-Speech Translation of Spontaneous Dialogs Verbmobil Speech Translation Server Solution: Conference Call: The Verbmobil Speech Translation Server is accessed by GSM mobile phones.
8
© Wolfgang Wahlster, DFKI Speech-to-Speech Translation
9
© Wolfgang Wahlster, DFKI The Control Panel of Verbmobil
10
© Wolfgang Wahlster, DFKI General Speech Recognition Task German English Japanese Audio SignalRecognizersWord Hypotheses Graph
11
© Wolfgang Wahlster, DFKI Machine Learning for the Integration of Statistical Properties into Symbolic Models for Speech Recognition, Parsing, Dialog Processing, Translation Transcribed Speech Data Segmented Speech with Prosodic Labels Annotated Dialogs with Dialog Acts Treebanks & Predicate- Argument Structures Aligned Bilingual Corpora Hidden Markov Models Neural Nets, Multilayered Perceptrons Probabilistic Automata Probabilistic Grammars Probabilistic Transfer Rules Extracting Statistical Properties from Large Corpora
12
© Wolfgang Wahlster, DFKI The Use of Prosodic Information at All Processing Stages Speech SignalWord Hypotheses Graph Multilingual Prosody Module Prosodic features: duration pitch energy pause Search Space Restriction Parsing Dialog Act Segmentation and Recognition Dialog Understanding Constraints for Transfer Translation Lexical Choice Generation Speech Synthesis Speaker Adaptation Boundary Information Boundary Information Boundary Information Boundary Information Sentence Mood Sentence Mood Accented Words Accented Words Prosodic Feature Vector
13
© Wolfgang Wahlster, DFKI I need a car next Tuesdayoops Monday Original Utterance Editing PhaseRepair Phase Reparandum Hesitation Reparans Recognition of Substitutions Transformation of the Word Hypothesis Graph I need a car next Monday Verbmobil Technology:Understands Speech Repairs and extracts the intended meaning Dictation Systems like: ViaVoice, VoiceXpress, FreeSpeech, Naturally Speaking cannot deal with spontaneous speech and transcribe the corrupted utterances. The Understanding of Spontaneous Speech Repairs
14
© Wolfgang Wahlster, DFKI Wir treffen uns in Mannheim, äh, in Saarbrücken. (We are meeting in Mannheim, oops, in Saarbruecken.) We are meeting in Saarbruecken. English German Automatic Understanding and Correction of Speech Repairs in Spontaneous Telephone Dialogs
15
© Wolfgang Wahlster, DFKI Fielded applications Train schedules (German Railway System, DB) TABA (Philips) +49 241 60 40 20 OSCAR (DaimlerChrysler) +49 1805 99 66 22 Flight Schedules (Lufthansa) ALF (Philips) +49 1803 00 00 74 Technical Challenges: phone -based dialogs, many proper names, clarification subdialogs Spoken Dialogs about Schedules
16
© Wolfgang Wahlster, DFKI Microphone Push-to-talk Switch Please call Doris Wahlster. Open the left window in the back. I want to hear the weather channel. When will I reach the next gas station? Where is the next parking lot? Speech control of: cellular phone, radio, windows / AC, route guidance system Option for S-, C-, and E-Class of Mercedes and BMW Speaker-independent, Garbage models for non-speech (blinker, AC, wheels) Linguatronic : Spoken Dialogs with Mercedes-Benz
17
© Wolfgang Wahlster, DFKI With Maier on 25 Oktober, with Tetzlaff, and with Streit too. Oops, not with Streit. From 2 to 3. Okay! Speech-based Interaction with an Organizer on a WAP Phone (Voice In - WML out)
18
© Wolfgang Wahlster, DFKI Augmented Reality: Combining Speech, Gestures and Graphics for Mobile Access to a Digital Library Mobile Dialog with a Virtual Tourist Guide for the Heidelberg Castle Location-adaptive Query Interpretation
19
© Wolfgang Wahlster, DFKI Multimodal Route Description Mobile Speech Translation and Multilingual Information Access Augmented Reality: Combining Speech, Gestures and Graphics for Mobile Access to a Digital Library
20
© Wolfgang Wahlster, DFKI Speech-based Access to 3D Virtual Views Multimodal Output from a Digital Library and Speech-based Access to Internet Content Augmented Reality: Combining Speech, Gestures and Graphics for Mobile Access to a Digital Library
21
© Wolfgang Wahlster, DFKI Multilingual and Mobile Communication Assistants Multimodal Interfaces SmartKom Speech-based Web Access to Multilingual Web pages WAP Phones WebTV Multilingual Audio Retrieval and Audio Mining Discussions Lecture Notes Organizers Multilingual Indexing and Annotation of Videos Video Archives News Archives Call Centers ECommerce Mobile Travel Assistance Telephone Translations Verbmobil Dialog Translation International Research Trends in Multilingual Systems Multilingual Language Technology Speech Recognition, Language Understanding, Language Generation, and Speech Synthesis Multilingual Language Technology Speech Recognition, Language Understanding, Language Generation, and Speech Synthesis Spontaneous Speech, Robust Processing and Translation, Semantic and Pragmatic Understanding
22
© Wolfgang Wahlster, DFKI Open Problems for the Next Decade l Problems with current machine learning approaches Expensive data collection Cognitively unrealistic training data Data sparseness l Problems with current hand-crafted knowledge sources Brittleness Domain dependence Limited scalability
23
© Wolfgang Wahlster, DFKI A Speculative Conclusion (+50 years) -500 years TODAY +50 years Oral Society Textual Society Oral Society News and knowledge is passed orally No mass storage No automatic processing No automatic retrieval Mass storage of texts Text Processing Text Retrieval Mass storage of speech Speech Processing Audio Retrieval News and knowledge is passed textually News and knowledge is passed orally
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.