Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162.

Similar presentations


Presentation on theme: "Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162."— Presentation transcript:

1 Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49 681) 302-5341 e-mail: wahlster@dfki.de WWW:http://www.dfki.de/~wahlster Dagstuhl 2000 Pervasive Speech and Language Technology

2 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Pervasive Speech and Language Technology A capuccino in 10 minutes, please! Send the following email to Mark Maybury: Hi Mark, please forward the following agenda to your project partners! Let‘s go to Baker Street in Berkeley! I would like to hear Mozart‘s piano concert No. 3! Speech-controlled coffee machine Speech-based car navigation Speech-enabled music selection Dictation

3 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Show me all CNN news of the last 3 months that feature Bill Clinton discussing health care! I would like to make an appointment with Dr. Kuremastu in Kyoto next week! Pervasive Speech and Language Technology What has Jim Hendler said about DAML during our recent Dagstuhl seminar? Information on demand Audio Mining Speech-to-Speech Translation

4 Dagstuhl 2000 © Wolfgang Wahlster, DFKI What has the speaker said? 100 Alternatives What has the speaker meant? 10 Alternatives What does the speaker want? Unambiguous Understanding in the Dialog Context Reduction of Uncertainty Sprachanalyse Speech Recognition Speech Input Discourse Context Knowledge about Domain of Discourse Grammar Lexical Meaning Acoustic Language Models Word Lists Speech Analysis Speech Under- standing Three Levels of Language Processing

5 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Input Conditions Naturalness Adaptability Dialog Capabilities Increasing Complexity Close-Speaking Microphone/Headset Push-to-talk Telephone, Pause-based Segmentation Isolated Words Read Continuous Speech Speaker Independent Speaker Dependent Monolog Dictation Information- seeking Dialog Open Microphone, GSM Quality Spontaneous Speech Speaker adaptive Multiparty Negotiation Verbmobil Challenges for Language Engineering

6 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Wann fährt der nächste Zug nach Hamburg ab? When does the next train to Hamburg depart? Wo befindet sich das nächste Hotel? Where is the nearest hotel? Context-Sensitive Speech-to-Speech Translation Verbmobil Server

7 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Mobile Speech-to-Speech Translation of Spontaneous Dialogs As the name Verbmobil suggests, the system supports verbal communication with foreign dialog partners in mobile situations. 1 2 face-to-face conversations telecommunication

8 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Mobile Speech-to-Speech Translation of Spontaneous Dialogs Verbmobil Speech Translation Server Solution: Conference Call: The Verbmobil Speech Translation Server is accessed by GSM mobile phones.

9 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Speech-to-Speech Translation

10 Dagstuhl 2000 © Wolfgang Wahlster, DFKI The Control Panel of Verbmobil

11 Dagstuhl 2000 © Wolfgang Wahlster, DFKI General Speech Recognition Task German English Japanese Audio SignalRecognizersWord Hypotheses Graph

12 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Word Hypotheses Graphs (WHGs) WHGs realize the interface between acoustic and linguistic processing Edge = Word Best Hypothesis Acoustic Score

13 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Massive Data Collection Efforts Transliteration Variant 1 Transliteration Variant 2 Lexical Orthography Canonical Pronounciation Manual Phonological Segmentation Automatic Phonological Segmentation Word Segmentation Prosodic Segmentation Dialog Acts Noises Superimposed Speech Syntactic Category Word Category Syntactic Function Prosodic Boundaries The so-called Partitur (German word for musical score) orchestrates fifteen strata of annotations 3,200 dialogs (182 hours) with 1,658 speakers 79,562 turns distributed on 56 CDs, 21.5 GB

14 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Machine Learning for the Integration of Statistical Properties into Symbolic Models for Speech Recognition, Parsing, Dialog Processing, Translation Transcribed Speech Data Segmented Speech with Prosodic Labels Annotated Dialogs with Dialog Acts Treebanks & Predicate- Argument Structures Aligned Bilingual Corpora Hidden Markov Models Neural Nets, Multilayered Perceptrons Probabilistic Automata Probabilistic Grammars Probabilistic Transfer Rules Extracting Statistical Properties from Large Corpora

15 Dagstuhl 2000 © Wolfgang Wahlster, DFKI M1 M2M3 M5 M6 M4 BB 2BB 1 BB 3 M1 Multi-Agent Architecture Multi-Blackboard Architecture Each module must know, which module produces what data Direct communication between modules Each module has only one instance Heavy data traffic for moving copies around Multiparty and telecooperation applications are impossible Software: ICE and ICE Master Basic Platform: PVM All modules can register for each blackboard dynamically No direct communication between modules Each module can have several instances No copies of representation structures (word lattice, VIT chart) Multiparty and Telecooperation applications are possible Software: PCA and Module Manager Basic Platform: PVM From Multi-Agent Architectures to a Multi- Blackboard Architectures Blackboards M2 M3 M6 M4 M5

16 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Audio Data Word Hypotheses Graph with Prosodic Labels VITs Underspecified Discourse Representations Command Recognizer Spontaneous Speech Recognizer Channel/Speaker Adaptation Prosodic Analysis Statistical Parser Dialog Act Recognition Chunk Parser HPSG Parser Semantic Construction Robust Dialog Semantics Semantic Transfer Generation A Multi-Blackboard Architecture for the Combination of Results from Deep and Shallow Processing Modules

17 Dagstuhl 2000 © Wolfgang Wahlster, DFKI The Use of Prosodic Information at All Processing Stages Speech SignalWord Hypotheses Graph Multilingual Prosody Module Prosodic features: duration pitch energy pause Search Space Restriction Parsing Dialog Act Segmentation and Recognition Dialog Understanding Constraints for Transfer Translation Lexical Choice Generation Speech Synthesis Speaker Adaptation Boundary Information Boundary Information Boundary Information Boundary Information Sentence Mood Sentence Mood Accented Words Accented Words Prosodic Feature Vector

18 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Competing Strategies for Robust Speech Translation Concurrent processing modules combine deep semantic translation with shallow surface-oriented translation methods. Word Lattice time out? time out? Acceptable Translation Rate Selection of best result Selection of best result Expensive, but precise Translation Cheap, but approximate Translation Principled and compositional syntactic and semantic analysis Semantic-based transfer of Verbmobil Interface Terms (VITs) as set of underspecified DRS Case-based Translation Dialog-act based translation Statistical translation Results with Confidence Values Results with Confidence Values

19 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Robust Dialog Semantics Combination and knowledge- based reconstruction of complete VITs Robust Dialog Semantics Combination and knowledge- based reconstruction of complete VITs Complete and Spanning VITs Complete and Spanning VITs Integrating Shallow and Deep Analysis Components in a Multi-Blackboard Architecture Chunk Parser Statistical Parser HPSG Parser partial VITs Chart with a combination of partial VITs Chart with a combination of partial VITs Augmented Word Hypotheses Graph Augmented Word Hypotheses Graph

20 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Incremental chart construction and anytime processing Rule-based combination and transformation of partial UDRS coded as VITs Selection of a spanning analysis using a bigram model for VITs (trained on a tree bank of 24 k VITs) Chart Parser using cascaded finite-state transducers Statistical LR parser trained on treebank Very fast HPSG parser Semantic Construction VHG: A Packed Chart Representation of Partial Semantic Representations

21 Dagstuhl 2000 © Wolfgang Wahlster, DFKI I need a car next Tuesdayoops Monday Original Utterance Editing PhaseRepair Phase Reparandum Hesitation Reparans Recognition of Substitutions Transformation of the Word Hypothesis Graph I need a car next Monday Verbmobil Technology:Understands Speech Repairs and extracts the intended meaning Dictation Systems like: ViaVoice, VoiceXpress, FreeSpeech, Naturally Speaking cannot deal with spontaneous speech and transcribe the corrupted utterances. The Understanding of Spontaneous Speech Repairs

22 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Wir treffen uns in Mannheim, äh, in Saarbrücken. (We are meeting in Mannheim, oops, in Saarbruecken.) We are meeting in Saarbruecken. English German Automatic Understanding and Correction of Speech Repairs in Spontaneous Telephone Dialogs

23 Dagstuhl 2000 © Wolfgang Wahlster, DFKI The preposition ‚in‘ is missing in all paths through the word hypotheses graph. A temporal NP is transformed into a temporal modifier using a underspecified temporal relation: [temporal_np(V1)]  [typeraise_to_mod (V1, V2)] & V2 The modifier is applied to a proposition: [type (V1, prop), type (V2, mod)]  [apply (V2, V1, V3)] & V3 Let us meet the late afternoon to catch the train to Frankfurt Let us meet (in) the late afternoon to catch the train to Frankfurt Robust Dialog Semantics: Combining and Completing Partial Representations

24 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Integrating Deep and Shallow Processing: Combining Results from Concurrent Translation Threads Segment 1 Translated by Semantic Transfer Segment 1 Translated by Semantic Transfer Segment 2 Translated by Case-Based Translation Segment 2 Translated by Case-Based Translation Alternative Translations with Confidence Values Statistical Translation Statistical Translation Dialog-Act Based Translation Dialog-Act Based Translation Semantic Transfer Semantic Transfer Case-Based Translation Case-Based Translation Segment 1 If you prefer another hotel, Segment 1 If you prefer another hotel, Segment 2 please let me know. Segment 2 please let me know. Selection Module

25 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Ihavetimemonday.on Sentence to synthesize Ihavetimemonday Ihavetimemonday Ihavemonday I on Tokens S E Edge direction SE have time Imondayon Unit Selection Algorithm

26 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Microphone Push-to-talk Switch Please call Doris Wahlster. Open the left window in the back. I want to hear the weather channel. When will I reach the next gas station? Where is the next parking lot? Speech control of: cellular phone, radio, windows / AC, route guidance system Option for S-, C-, and E-Class of Mercedes and BMW Speaker-independent, Garbage models for non-speech (blinker, AC, wheels) Linguatronic : Spoken Dialogs with Mercedes-Benz

27 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Multilingual and Mobile Communication Assistants Multimodal Interfaces SmartKom Speech-based Web Access to Multilingual Web pages WAP Phones WebTV Multilingual Audio Retrieval and Audio Mining Discussions Lecture Notes Organizers Multilingual Indexing and Annotation of Videos Video Archives News Archives Call Centers ECommerce Mobile Travel Assistance Telephone Translations Verbmobil Dialog Translation International Research Trends in Multilingual Systems Multilingual Language Technology Speech Recognition, Language Understanding, Language Generation, and Speech Synthesis Multilingual Language Technology Speech Recognition, Language Understanding, Language Generation, and Speech Synthesis Spontaneous Speech, Robust Processing and Translation, Semantic and Pragmatic Understanding

28 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Real-world problems in language technology like the understanding of spoken dialogs, speech-to-speech translation and multimodal dialog systems can only be cracked by the combined muscle of deep and shallow processing approaches. In a multi-blackboard architecture based on packed representations on all processing levels (speech recognition, parsing, semantic processing, translation, generation) using charts with underspecified representations (eg. UDRS) the results of concurrent processing threads can be combined in an incremental fashion.   Conclusion I

29 Dagstuhl 2000 © Wolfgang Wahlster, DFKI All results of concurrent processing modules should come with a confidence value, so that a selection module can choose the most promising result at a each processing stage. Packed representations together with formalisms for underspecification capture the uncertainties in a each processing phase, so that the uncertainties can be reduced by linguistic, discourse and domain constraints as soon as they become applicable.   Conclusion II

30 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Deep Processing can be used for merging, completing and repairing the results of shallow processing strategies. Shallow methods can be used to guide the search in deep processing. Statistical methods must be augmented by symbolic models (eg. Class-based language modelling, word order normalization as part of statistical translation). Statistical methods can be used to learn operators or selection strategies for symbolic processes. It is much more than a balancing act... (see Klavans and Resnik 1996) Conclusion III    

31 Dagstuhl 2000 © Wolfgang Wahlster, DFKI Open Problems for the Next Decade l Problems with current machine learning approaches  Expensive data collection  Cognitively unrealistic training data  Data sparseness l Problems with current hand-crafted knowledge sources  Brittleness  Domain dependence  Limited scalability

32 Dagstuhl 2000 © Wolfgang Wahlster, DFKI A Speculative Conclusion (+50 years) -500 years TODAY +50 years Oral Society  Textual Society  Oral Society News and knowledge is passed orally No mass storage No automatic processing No automatic retrieval Mass storage of texts Text Processing Text Retrieval Mass storage of speech Speech Processing Audio Retrieval News and knowledge is passed textually News and knowledge is passed orally


Download ppt "Wolfgang Wahlster German Research Center for Artificial Intelligence, DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162."

Similar presentations


Ads by Google