Wolfgang Wahlster German Research Center for Artificial Intelligence DFKI GmbH Seventeenth International Joint Conference on Artificial Intelligence, IJCAI-01 Seattle Wednesday, 8 August 2001 Robust Translation of Spontaneous Speech: A Multi-Engine Approach
© Wolfgang Wahlster, DFKI GmbH Mobile Speech-to-Speech Translation of Spontaneous Dialogs As the name Verbmobil suggests, the system supports verbal communication with foreign dialog partners in mobile situations. 1 2 face-to-face conversations telecommunication
© Wolfgang Wahlster, DFKI GmbH Mobile Speech-to-Speech Translation of Spontaneous Dialogs Verbmobil Speech Translation Server Conference Call: The Verbmobil Speech Translation Server connects GSM cell phone users
© Wolfgang Wahlster, DFKI GmbH Robust Realtime Translation with Verbmobil At a German Airport: An American business man calls the secretary of a German business partner.
© Wolfgang Wahlster, DFKI GmbH Verbmobil‘s Multi-Blackboard and Multi-Engine Architecture Exploiting Underspecification in a Multi-Stratal Semantic Representation Language Combining Deep and Shallow Processing Strategies for Robust Dialog Translation Evaluation and Technology Transfer Lessons Learned and Conclusions Outline
© Wolfgang Wahlster, DFKI GmbH Telephone-based Dialog Translation German English German Verbmobil Server Cluster American Dialog Partner American Dialog Partner German Dialog Partner German Dialog Partner Bianca/Brick XS BinTec ISDN-LAN Router Bianca/Brick XS BinTec ISDN-LAN Router German English English German Sun Server 450 LINUX Server Sun ULTRA 60/80 ISDN Conference Call (3 Participants): -German Speaker -Verbmobil -American Speaker Speech-based Set-up of the Conference Call
© Wolfgang Wahlster, DFKI GmbH Verbmobil: The First Speech-Only Dialog Translation System American Speaker: “Verbmobil” (Voice Dialing) Mobile DECT Phone Mobile GSM Phone
© Wolfgang Wahlster, DFKI GmbH Verbmobil: The First Speech-Only Dialog Translation System American Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server Mobile DECT Phone Mobile GSM Phone
© Wolfgang Wahlster, DFKI GmbH Verbmobil: The First Speech-Only Dialog Translation System American Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server Verbmobil: “Welcome to the Verbmobil Translation System. Please speak the telephone number of your partner.” Mobile GSM Phone Mobile DECT Phone
© Wolfgang Wahlster, DFKI GmbH Verbmobil: The First Speech-Only Dialog Translation System American Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server American Speaker: “ ” Verbmobil: “Welcome to the Verbmobil Translation System. Please speak the telephone number of your partner.” Mobile GSM Phone Mobile DECT Phone
© Wolfgang Wahlster, DFKI GmbH Verbmobil: The First Speech-Only Dialog Translation System American Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server Foreign Participant is placed into the Conference Call To German Participant Verbmobil: Verbmobil hat eine neue Verbindung aufgebaut. Bitte sprechen Sie jetzt. To American Participant Verbmobil: Welcome to the Verbmobil server. Please start your input after the beep. Verbmobil: “Welcome to the Verbmobil Translation System. Please speak the telephone number of your partner.” American Speaker: “ ” Mobile GSM Phone Mobile DECT Phone
© Wolfgang Wahlster, DFKI GmbH Verbmobil is a Multilingual System German English (American) German Japanese It supports bidirectional translation between: German Chinese (Mandarine)
UNIVERSITÄT DES SAARLANDES RUHR-UNIVERSITÄT BOCHUM Phase 2 UNIVERSITÄT HAMBURG UNIVERSITÄT KARLSRUHE UNIVERSITÄT BIELEFELD TECHNISCHE UNIVERSITÄT MÜNCHEN FRIEDRICH- ALEXANDER- UNIVERSITÄT ERLANGEN-NÜRNBERG UNIVERSITÄT STUTTGART RHEINISCHE FRIEDRICH WILHELMS-UNIVERSITÄT BONN LUDWIG MAXIMILIANS UNIVERSITÄT MÜNCHEN TU-BRAUNSCHWEIG EBERHARDT-KARLS UNIVERSITÄT TÜBINGEN W. Wahlster, DFKI D AIMLER C HRYSLER Verbmobil Partner
© Wolfgang Wahlster, DFKI GmbH What has the caller said? 100 Alternatives What has the caller meant? 10 Alternatives What does the caller want? Unambiguous Understanding in the Dialog Context Reduction of Uncertainty Sprachanalyse Speech Recognition Speech Telephone Input Discourse Context Knowledge about Domain of Discourse Grammar Lexical Meaning Acoustic Language Models Word Lists Speech Analysis Speech Under- stan- ding Three Levels of Language Processing
© Wolfgang Wahlster, DFKI GmbH Open Microphone, GSM Quality Spontaneous Speech Speaker adaptive Multiparty Negotiation Verbmobil Increasing Complexity Input Conditions Naturalness Adaptability Dialog Capabilities Close-Speaking Microphone/ Headset Push-to-talk Isolated Words Speaker Dependent Monolog Dictation Telephone, Pause-based Segmentation Read Continuous Speech Speaker Independent Information- seeking Dialog Challenges for Language Engineering
© Wolfgang Wahlster, DFKI GmbH Scenario 1 Appointment Scheduling Scenario 2 Travel Planning & Hotel Reservation Scenario 3 PC-Maintenance Hotline Verbmobil II: Three Domains of Discourse
© Wolfgang Wahlster, DFKI GmbH Scenario 1 Appointment Scheduling Scenario 2 Travel Planning & Hotel Reservation Scenario 3 PC-Maintenance Hotline When? When? Where? How? What? When? Where? How? Verbmobil II: Three Domains of Discourse
© Wolfgang Wahlster, DFKI GmbH Scenario 1 Appointment Scheduling Scenario 2 Travel Planning & Hotel Reservation Scenario 3 PC-Maintenance Hotline When? When? Where? How? What? When? Where? How? Focus on temporal expressions Focus on temporal and spatial expressions Integration of special sublanguage lexica Verbmobil II: Three Domains of Discourse
© Wolfgang Wahlster, DFKI GmbH Scenario 1 Appointment Scheduling Scenario 2 Travel Planning & Hotel Reservation Scenario 3 PC-Maintenance Hotline When? When? Where? How? What? When? Where? How? Focus on temporal expressions Focus on temporal and spatial expressions Integration of special sublanguage lexica Vocabulary Size: 6000 Vocabulary Size: Vocabulary Size: Verbmobil II: Three Domains of Discourse
© Wolfgang Wahlster, DFKI GmbH Wann fährt der nächste Zug nach Hamburg ab? When does the next train to Hamburg depart? Wo befindet sich das nächste Hotel? Where is the nearest hotel? Context-Sensitive Speech-to-Speech Translation Verbmobil Server
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH Verbmobil‘s Massive Data Collection Effort Transliteration Variant 1 Transliteration Variant 2 Lexical Orthography Canonical Pronounciation Manual Phonological Segmentation Automatic Phonological Segmentation Word Segmentation Prosodic Segmentation Dialog Acts Noises Superimposed Speech Syntactic Category Word Category Syntactic Function Prosodic Boundaries The so-called Partitur (German word for musical score) orchestrates fifteen strata of annotations 3,200 dialogs (182 hours) with 1,658 speakers 79,562 turns distributed on 56 CDs, 21.5 GB
© Wolfgang Wahlster, DFKI GmbH Machine Learning for the Integration of Statistical Properties into Symbolic Models for Speech Recognition, Parsing, Dialog Processing, Translation Transcribed Speech Data Segmented Speech with Prosodic Labels Annotated Dialogs with Dialog Acts Treebanks & Predicate- Argument Structures Aligned Bilingual Corpora Hidden Markov Models Neural Nets, Multilayered Perceptrons Probabilistic Automata Probabilistic Grammars Probabilistic Transfer Rules Extracting Statistical Properties from Large Corpora
© Wolfgang Wahlster, DFKI GmbH VM1 '97'98 '99.1'99.2' Word accuracy [%] Japanese English German Multilinguality
© Wolfgang Wahlster, DFKI GmbH Language Identification (LID) German Recognizer English Recognizer Japanese Recognizer Speech Independent LID- Module w 1 … w n Multilinguality
© Wolfgang Wahlster, DFKI GmbH M1 M2M3 M5 M6 M4 BB 2BB 1 BB 3 M1 Verbmobil IVerbmobil II Multi-Agent Architecture Multi-Blackboard Architecture Each module must know, which module produces what data Direct communication between modules Heavy data traffic for moving copies around All modules can register for each blackboard dynamically No direct communication between modules No copies of representation structures (word lattice, VIT chart) From a Multi-Agent Architecture to a Multi- Blackboard Architecture Blackboards M2 M3 M6 M4 M5
© Wolfgang Wahlster, DFKI GmbH Module 1.1 Module 2.1 Module 3.1 Blackboard 1 Preprocessed Speech Signal Blackboard 2 Word Lattice Blackboard 3 Syntactic Representation: Parsing Results Blackboard 4 Semantic Representation: Lambda DRS Blackboard 5 Dialog Acts Module 4.1 Module 5.1 Module 6.1 Multi-Blackboard/Multi-Engine Architecture
© Wolfgang Wahlster, DFKI GmbH Audio Data Command Recognizer Spontaneous Speech Recognizer Channel/Speaker Adaptation Prosodic Analysis A Multi-Blackboard Architecture for the Combination of Results from Deep and Shallow Processing Modules
© Wolfgang Wahlster, DFKI GmbH Audio Data Word Hypotheses Graph with Prosodic Labels Command Recognizer Spontaneous Speech Recognizer Channel/Speaker Adaptation Prosodic Analysis Statistical Parser Dialog Act Recognition Chunk Parser HPSG Parser A Multi-Blackboard Architecture for the Combination of Results from Deep and Shallow Processing Modules
© Wolfgang Wahlster, DFKI GmbH Audio Data Word Hypotheses Graph with Prosodic Labels VITs Underspecified Discourse Representations Command Recognizer Spontaneous Speech Recognizer Channel/Speaker Adaptation Prosodic Analysis Statistical Parser Dialog Act Recognition Chunk Parser HPSG Parser Semantic Construction Robust Dialog Semantics Semantic Transfer Generation A Multi-Blackboard Architecture for the Combination of Results from Deep and Shallow Processing Modules
© Wolfgang Wahlster, DFKI GmbH VIT (Verbmobil Interface Terms) as a Multi-Stratal Representation Language used as a common representation scheme for information exchange between all components and processing threads design inspired by underspecified discourse representation structures (UDRS, Reyle/Kamp 1993) compact representation of lexical and structured ambiguities and scope underspecifications of quantifiers, negations and adverbs variable-free sets of non-recursive terms: [beginning (35, i37), arg3 (35, i37,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], streams of literals as flat multi-stratal representations that are very efficient for incremental processing
© Wolfgang Wahlster, DFKI GmbH Vit (vitID (sid (104,a,en,10,80,1,en,y,semantics), % Segment Identifier [word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]), % WHG String index (38, 25,i35), % Index [beginning (35, i37), arg3 (35, i37,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions [in_g (26, 25), in_g (37, 38), in_g (27, 25), in_g (28, 30), in_g (31, 33), in_g (34, 32), in_g (35, 29), in_g (36, 25), leq (25, h41), leq (25, h43), leq (29, h42), leq (29, h44), leq (30, h43), leq (32, h45), leq (33, h43)], % Scope and Grouping Constraints [s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sortal Specifications for Instance Variables [dialog_act (25, inform), dir (36, no), prontype (i36, third,std)], % Discourse and Pragmatics [cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg), num (i38, sg), pcase (l135, i38, of)], % Syntax [ta_aspect (i35, progr), ta_mood (i35, ind), ta_perf (i35, nonperf), ta_tense (i35, pres)], % Tense and Aspect [pros_accent (35)] % Prosody VIT for ‘He is coming at the beginning of August‘
© Wolfgang Wahlster, DFKI GmbH [word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]), % WHG String [beginning (35, i37), arg3 (35, i37,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts [cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax Information between Layers is Linked Together Using Constant Symbols Instances are constants interpreted as skolemized variables
© Wolfgang Wahlster, DFKI GmbH [word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]), % WHG String [beginning (35, i37), arg3 (35, i37,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts [cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax Information between Layers Linked Together Using Constant Symbols Instances are constants interpreted as skolemized variables
© Wolfgang Wahlster, DFKI GmbH [word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]), % WHG String [beginning (35, i37), arg3 (35, i37,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts [cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax Information between Layers Linked Together Using Constant Symbols Instances are constants interpreted as skolemized variables
© Wolfgang Wahlster, DFKI GmbH [word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]), % WHG String [beginning (35, i37), arg3 (35, i37,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts [cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax Information between Layers Linked Together Using Constant Symbols Instances are constants interpreted as skolemized variables
© Wolfgang Wahlster, DFKI GmbH [word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]), % WHG String [beginning (35, i37), arg3 (35, i37,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts [cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax Information between Layers Linked Together Using Constant Symbols Instances are constants interpreted as skolemized variables
© Wolfgang Wahlster, DFKI GmbH The Use of Underspecified Representations Wir telephoniertenmit Freundenaus Schweden. Two Readings in the Source Language Underspecified Semantic Representation Ambiguity Preserving Translations A compact representation of scope ambiguities in a logical language without using disjunctions Two Readings in the Target Language We calledfriends from Sweden.
© Wolfgang Wahlster, DFKI GmbH Verbmobil is the First Dialog Translation System that Uses Prosodic Information Systematically at All Processing Stages Speech SignalWord Hypotheses Graph Multilingual Prosody Module Prosodic features: duration pitch energy pause Search Space Restriction Parsing Dialog Act Segmentation and Recognition Dialog Understanding Constraints for Transfer Translation Lexical Choice Generation Speech Synthesis Speaker Adaptation Boundary Information Boundary Information Boundary Information Boundary Information Sentence Mood Sentence Mood Accented Words Accented Words Prosodic Feature Vector
© Wolfgang Wahlster, DFKI GmbH Using Syntactic-Prosodic Boundaries to Speed- Up the Parsing Process yes S1 no problem S4 Mister Mueller S4 when would you like to go to Hannover S4 without boundaries: # chart edges: 1256 runtime: 1.31 secs with boundaries: #chart edges: 632 runtime: 0.62 secs speed-up: 53%
© Wolfgang Wahlster, DFKI GmbH Using Syntactic-Prosodic Boundaries to Speed- Up the Parsing Process yes S1 no problem S4 Mister Mueller S4 when would you like to go to Hannover S4 without boundaries: # chart edges: 1256 runtime: 1.31 secs with boundaries: #chart edges: 632 runtime: 0.62 secs speed-up: 53%
© Wolfgang Wahlster, DFKI GmbH Chunk Parser Statistical Parser HPSG Parser Integrating Shallow and Deep Analysis Components in a Multi-Engine Approach A* Algorithm guiding through Augmented Word Hypotheses Graph A* Algorithm guiding through Augmented Word Hypotheses Graph
© Wolfgang Wahlster, DFKI GmbH Robust Dialog Semantics Combination and knowledge- based reconstruction of complete VITs Robust Dialog Semantics Combination and knowledge- based reconstruction of complete VITs Complete and Spanning VITs Complete and Spanning VITs Integrating Shallow and Deep Analysis Components in a Multi-Engine Approach Chunk Parser Statistical Parser HPSG Parser partial VITs Chart with a combination of partial VITs Chart with a combination of partial VITs A* Algorithm guiding through Augmented Word Hypotheses Graph A* Algorithm guiding through Augmented Word Hypotheses Graph
© Wolfgang Wahlster, DFKI GmbH Wir treffen uns in Mannheim, äh, in Saarbrücken. (We are meeting in Mannheim, oops, in Saarbruecken.) We are meeting in Saarbruecken. English German Automatic Understanding and Correction of Speech Repairs in Spontaneous Telephone Dialogs
© Wolfgang Wahlster, DFKI GmbH I need a car next Tuesdayoops Monday Original Utterance Editing PhaseRepair Phase Reparandum Hesitation Reparans Recognition of Substitutions Transformation of the Word Hypothesis Graph I need a car next Monday Verbmobil Technology:Understands Speech Repairs and extracts the intended meaning The Understanding of Spontaneous Speech Repairs
© Wolfgang Wahlster, DFKI GmbH VHG: A Packed Chart Representation of Partial Semantic Representations Chart Parser using cascaded finite-state transducers Incremental chart construction and anytime processing
© Wolfgang Wahlster, DFKI GmbH VHG: A Packed Chart Representation of Partial Semantic Representations Chart Parser using cascaded finite-state transducers Statistical LR parser trained on a treebank Incremental chart construction and anytime processing
© Wolfgang Wahlster, DFKI GmbH VHG: A Packed Chart Representation of Partial Semantic Representations Chart Parser using cascaded finite-state transducers Statistical LR parser trained on a treebank Very fast HPSG parser Incremental chart construction and anytime processing
© Wolfgang Wahlster, DFKI GmbH VHG: A Packed Chart Representation of Partial Semantic Representations Chart Parser using cascaded finite-state transducers Statistical LR parser trained on a treebank Very fast HPSG parser Incremental chart construction and anytime processing
© Wolfgang Wahlster, DFKI GmbH Incremental chart construction and anytime processing Rule-based combination and transformation of partial UDRS coded as VITs VHG: A Packed Chart Representation of Partial Semantic Representations Chart Parser using cascaded finite-state transducers Statistical LR parser trained on a treebank Very fast HPSG parser
© Wolfgang Wahlster, DFKI GmbH Incremental chart construction and anytime processing Rule-based combination and transformation of partial UDRS coded as VITs Selection of a spanning analysis using a bigram model for VITs VHG: A Packed Chart Representation of Partial Semantic Representations Chart Parser using cascaded finite-state transducers Statistical LR parser trained on a treebank Very fast HPSG parser
© Wolfgang Wahlster, DFKI GmbH We are meeting in Kaiserslautern. Wir treffen uns Kaiserslautern. (We are meeting Kaiserslautern.) English German Semantic Correction of Recognition Errors
© Wolfgang Wahlster, DFKI GmbH Goals of robust semantic processing (Pinkal, Worm, Rupp) Combination of unrelated analysis fragments Completion of incomplete analysis results Skipping of irrelevant fragments Method:T ransformation rules on VIT Hypothesis Graph: Conditions on VIT structures Operations on VIT structures The rules are based on various knowledge sources: lattice of semantic types domain ontology sortal restrictions semantic constraints Results: 20% analysis is improved, 0.6% analysis gets worse Robust Dialog Semantics: Deep Processing of Shallow Structures
© Wolfgang Wahlster, DFKI GmbH The preposition ‚in‘ is missing in all paths through the word hypothesis graph. A temporal NP is transformed into a temporal modifier using an underspecified temporal relation: [temporal_np(V1)] [typeraise_to_mod (V1, V2)] & V2 The modifier is applied to a proposition: [type (V1, prop), type (V2, mod)] [apply (V2, V1, V3)] & V3 Let us meet the late afternoon to catch the train to Frankfurt Let us meet (in) the late afternoon to catch the train to Frankfurt Robust Dialog Semantics: Combining and Completing Partial Representations
© Wolfgang Wahlster, DFKI GmbH Competing Strategies for Robust Speech Translation The concurrent processing modules of Verbmobil combine deep semantic translation with shallow surface-oriented translation methods. time out? time out? Acceptable Translation Rate Expensive, but precise Translation Cheap, but approximate Translation Principled and compositional syntactic and semantic analysis Semantic-based transfer of Verbmobil Interface Terms (VITs) as set of underspecified DRS Case-based Translation Dialog-act based translation Statistical translation Results with Confidence Values Results with Confidence Values Selection of best result
© Wolfgang Wahlster, DFKI GmbH Architecture of the Semantic Transfer Module Bilingual Dictionary Refined VIT (L1) Refined VIT (L2) Lexical Transfer Monolingual Refinement Rules Monolingual Refinement Rules Disambiguation Rules Disambiguation Rules Monolingual Refinement Rules Monolingual Refinement Rules Disambiguation Rules Disambiguation Rules VIT (L1) VIT (L2) Phrasal Transfer Underspecified VIT (L1) Underspecified VIT (L2) Phrasal Dictionary Refinement
© Wolfgang Wahlster, DFKI GmbH Preserving lexical ambiguities How did you find his office? (get to or like) Wie fanden Sie sein Büro? Disambiguation is not necessary for the translation between German and English. dou kare nojimusho o mitsukeraremashita ka How he POSS office OBJ get to can PAST QUESTION kare nojimusho wadou omoimasu ka He POSS office TOPIC how think QUESTION Lexical Disambiguation On-Demand Disambiguation is necessary for the translation between German and Japanese.
© Wolfgang Wahlster, DFKI GmbH Three English Translations of the German Word “Termin” Found in the Verbmobil Corpus 1.Verschieben wir den Termin. Let’s reschedule the appointment 2.Schlagen Sie einen Termin vor. Suggest a date. 3. Da habe ich einen Termin frei. I have got a free slot there. Subsumption Relations in the Domain Model scheduled event default temporal_specification appointment set_start_timetime_interval date slot
© Wolfgang Wahlster, DFKI GmbH Entries in the Transfer Lexicon: German English (Simplified) tau_lex (termin, appointment,pred_sort (subsume(scheduled_event))). tau_lex (termin, date,pred_sort (subsume(set_start_time)). tau_lex (termin, slot, pred_sort (subsume (time_interval))). tau_lex(verschieben, reschedule, [tau (#S), tau (#0)], pred_args ([#S, #0 & pred_sort (scheduled_event)])) tau_lex(ausmachen, fix, [tau (#S), tau (#0)], pred_args ([#S, #0 & pred_sort (set_start_time)])) tau_lex(freihaben, have_free, [tau (#S), tau (#0)], pred_args ([#S, #0 & pred_sort (time_interval)]))
© Wolfgang Wahlster, DFKI GmbH Using Context and World Knowledge for Semantic Transfer All other dialog translation systems translate word-by-word or sentence-by-sentence. 1 Nehmen wir dieses Hotel, ja. Let us take this hotel. Ich reserviere einen Platz. I will reserve a room. 2 Machen wir das Abendessen dort. Let us have dinner there. Ich reserviere einen Platz. I will reserve a table. 3 Gehen wir ins Theater. Let us go to the theater. Ich möchte Plätze reservieren. I would like to reserve seats. Example: Platz room / table / seat
© Wolfgang Wahlster, DFKI GmbH Segment 1 If you prefer another hotel, Segment 1 If you prefer another hotel, Segment 2 please let me know. Segment 2 please let me know. Integrating Deep and Shallow Processing: Combining Results from Concurrent Translation Threads
© Wolfgang Wahlster, DFKI GmbH Statistical Translation Statistical Translation Dialog-Act Based Translation Dialog-Act Based Translation Semantic Transfer Semantic Transfer Case-Based Translation Case-Based Translation Integrating Deep and Shallow Processing: Combining Results from Concurrent Translation Threads Segment 1 If you prefer another hotel, Segment 1 If you prefer another hotel, Segment 2 please let me know. Segment 2 please let me know. Alternative Translations with Confidence Values
© Wolfgang Wahlster, DFKI GmbH Integrating Deep and Shallow Processing: Combining Results from Concurrent Translation Threads Segment 1 Translated by Semantic Transfer Segment 1 Translated by Semantic Transfer Segment 2 Translated by Case-Based Translation Segment 2 Translated by Case-Based Translation Alternative Translations with Confidence Values Statistical Translation Statistical Translation Dialog-Act Based Translation Dialog-Act Based Translation Semantic Transfer Semantic Transfer Case-Based Translation Case-Based Translation Segment 1 If you prefer another hotel, Segment 1 If you prefer another hotel, Segment 2 please let me know. Segment 2 please let me know. Selection Module
© Wolfgang Wahlster, DFKI GmbH SEQ:=Set of all translation sequences for a turn Seq SEQ:=Sequence of translation segments s 1, s 2,...s n Input: A Machine Learning Approach to the Selection of the Best Translation Result Each translation thread provides for every segment an online confidence value confidence (thread.segment)
© Wolfgang Wahlster, DFKI GmbH SEQ:=Set of all translation sequences for a turn Seq SEQ:=Sequence of translation segments s 1, s 2,...s n Input: Each translation thread provides for every segment an online confidence value confidence (thread.segment) Task: Compute normalized confidence values for translated Seq CONF (Seq) = Length(segment) * (alpha(thread) + beta(thread) * confidence(thread.segment)) segment Seq A Context-Free Approach to the Selection of the Best Translation Result
© Wolfgang Wahlster, DFKI GmbH SEQ:=Set of all translation sequences for a turn Seq SEQ:=Sequence of translation segments s 1, s 2,...s n Input: Task: Compute normalized confidence values for translated Seq CONF (Seq) = Length(segment) * (alpha(thread) + beta(thread) * confidence(thread.segment)) Output: Best (SEQ) = {Seq SEQ | Seq is maximal element in (SEQ CONF ) segment Seq A Context-Free Approach to the Selection of the Best Translation Result Each translation thread provides for every segment an online confidence value confidence (thread.segment)
© Wolfgang Wahlster, DFKI GmbH Turn := segment 1, segment 2...segment n For each turn in a training corpus all segments translated by one of the four translation threads are manually annotated with a score for translation quality. Learning the Normalizing Factors Alpha and Beta from an Annotated Corpus
© Wolfgang Wahlster, DFKI GmbH Turn := segment 1, segment 2...segment n For each turn in a training corpus all segments translated by one of the four translation threads are manually annotated with a score for translation quality. For the sequence of n segments resulting in the best overall translation score at most 4 n linear inequations are generated, so that the selected sequence is better than all alternative translation sequences. Learning the Normalizing Factors Alpha and Beta from an Annotated Corpus
© Wolfgang Wahlster, DFKI GmbH Turn := segment 1, segment 2...segment n For each turn in a training corpus all segments translated by one of the four translation threads are manually annotated with a score for translation quality. For the sequence of n segments resulting in the best overall translation score at most 4 n linear inequations are generated, so that the selected sequence is better than all alternative translation sequences. From the set of inequations for spanning analyses ( 4 n ) the values of alpha and beta can be determined offline by solving the constraint system. Learning the Normalizing Factors Alpha and Beta from an Annotated Corpus
© Wolfgang Wahlster, DFKI GmbH Integrating a Deep HPSG-based Analysis with Probabilistic Dialog Act Recognition for Semantic Transfer Probabilistic Analysis of Dialog Acts (HMM) Probabilistic Analysis of Dialog Acts (HMM) Recognition of Dialog Plans (Plan Operators) Recognition of Dialog Plans (Plan Operators) Dialog Act Type HPSG Analysis Robust Dialog Semantics Robust Dialog Semantics VIT Semantic Transfer Semantic Transfer Dialog Act Type
© Wolfgang Wahlster, DFKI GmbH Probabilistic Analysis of Dialog Acts (HMM) Probabilistic Analysis of Dialog Acts (HMM) Recognition of Dialog Plans (Plan Operators) Recognition of Dialog Plans (Plan Operators) Dialog Phase Dialog Act Type Integrating a Deep HPSG-based Analysis with Probabilistic Dialog Act Recognition for Semantic Transfer HPSG Analysis Robust Dialog Semantics Robust Dialog Semantics VIT Semantic Transfer Semantic Transfer Dialog Act Type
© Wolfgang Wahlster, DFKI GmbH Dialog Act CONTROL_DIALOG MANAGE_TASK PROMOTE_TASK GREETING INTRODUCE POLITENESS_FORMULA THANK DELIBERATE BACKCHANNEL INIT DEFER CLOSE REQUEST SUGGEST INFORM FEEDBACK COMMIT REQUEST_SUGGEST REQUEST_CLARIFY REQUEST_COMMENT REQUEST_COMMIT GREETING_BEGIN GREETING_END DIGRESS EXCLUDE CLARIFY GIVE_REASON DEVIATE_SCENARIO REFER_TO_SETTING CLARIFY_ANSWER FEEDBACK_NEGATIVE REJECT EXPLAINED_REJECT FEEDBACK_POSITIVE ACCEPT CONFIRM The Dialog Act Hierarchy used for Planning, Prediction, Translation and Generation
© Wolfgang Wahlster, DFKI GmbH ( OPERATOR-s goal [IN-TURN confirm-s ?S-3314 ?S-3316] subgoals (sequence [IN-TURN confirm-s-10521?S-3314 ?S-3315] [IN-TURN confirm-s-10522?S-3315 ?S-3316]) PROB 0.72) ( OPERATOR-s goal [IN-TURN confirm-s ?S-3321 ?S-3322] subgoals (sequence[DOMAIN-DEPENDENT accept ?S-3321 ?S-3322]) PROB 0.95) Learning of Probabilistic Plan Operators from Annotated Corpora
© Wolfgang Wahlster, DFKI GmbH Dialog Translation by Verbmobil Multilingual Generation of Summaries HTML- Document in English Transferred by Internet or Fax HTML- Document in German Transferred by Internet or Fax German Dialog Partner American Dialog Partner Automatic Generation of Multilingual Summaries of Telephone Conversations
© Wolfgang Wahlster, DFKI GmbH Dialog Summary Participants: Mr. Jones, Mr. Mueller Date: Time: 8:57 AM to 10:03 AM Theme: Appointment schedule with trip and accommodation Dialog Summary: Scheduling: Mr. Jones and Mr. Mueller will meet at the train station on the 1 st of March 2001 at 10:00 am. Travelling: The trip from Hamburg to Hanover by train will start on the 1 st of March at 10:15 am. Summary automatically generated at :31:24 h
© Wolfgang Wahlster, DFKI GmbH Microplanning: Create Syntactic Building Blocks Method: Mapping of dependency structures Example: Time Expressions DEF (L,I,G,H) DOWF (L1,I,mo) ORD (L2,I,11) MOFY (L3,I,may) MONDAY1 ARG ELEVENTH_DAY SPEC ARG THE MAY ARG OF_P Semantic dependency: VITSyntactical dependency: TAG
© Wolfgang Wahlster, DFKI GmbH Speeding Up the Language Generation Process by the Compilation of the HPSG Grammar to an LTAG Generation Grammar Lexicalized Tree Adjoining Grammar 2,350 Trees Compilation - extended domain of locality - no recursive feature structures - fast generation (0.5 secs average runtime) HPSG Analysis Grammar
© Wolfgang Wahlster, DFKI GmbH IhavetimeMonday.on Sentence to synthesize ihavetimemonday Ihavetimemonday ihavemonday i on Tokens S E Edge direction SE have time imondayon Corpus-based Speech Synthesis
© Wolfgang Wahlster, DFKI GmbH Funding by the German Ministry for Education and Research BMBF (Dr. Reuse) Phase I ( )$ 33 M Phase II ( )$ 28 M 60% Industrial funding according to a shared cost model$ 17 M Additional R&D investments of industrial partners$ 11 M Total$ 89 M Verbmobil: Long-Term, Large-Scale Funding and Its Impact
© Wolfgang Wahlster, DFKI GmbH > 800 Publications (>600 refereed) >Many Patents > 20 Commercial Spin-off Products >8 Spin-off Companies > 900 trained Researchers for German Language Industry Philips, DaimlerChrysler and Siemens are leaders in Spoken Dialog Applications Verbmobil: Long-Term, Large-Scale Funding and Its Impact
© Wolfgang Wahlster, DFKI GmbH Distribution of Sentence Length in Large-Scale Evaluation Web-based Evaluation of 25,345 Translations by 65 Evaluators
© Wolfgang Wahlster, DFKI GmbH Evaluation Results The translation of a turn is approximately correct if it preserves the intention of the speaker and the main propositional content of her utterance. Translation Thread Case-based Translation Statistical Translation Dialog-Act based Translation Semantic Transfer Substring-based Translation Automatic Selection Manual Selection 44% 79% 45% 47% 75% 66% / 83% * 95% 46% 81% 46% 49% 79% 68% / 85% * 97% Word Accuracy 75% 3267 Turns Word Accuracy 80% 2723 Turns * After Training with Instance-based Learning Algorithm
© Wolfgang Wahlster, DFKI GmbH Topic Meeting time Meeting place Means of transportation Departure place Arrival time Who reserves the hotel How to get to departure place Total Number of Tasks Average Percentage of Successful Task Completions Successful Completions/ Attempts 25/28 21/27 30/30 22/25 22/26 28/31 7/9 227/255 Successful Tasks 89,3 77, ,6 90,3 77,8 86,8 Frequency- Based Weighting Factor 0,90 0,87 0,97 0,81 0,84 1 0,29 89,6 Results of End-to-End Evaluation Based on Dialog Task Completion for 31 Trials
© Wolfgang Wahlster, DFKI GmbH Vocabulary Size: for German, Equivalent English Lexicon, 2500 for Japanese Operational Success Criteria: Word recognition rate (16 kHz): German: spontaneous: 75% (cooperative: 85%) English: spontaneous: 72% (cooperative: 82%) Japanese: spontaneous: 75% (cooperative: 85%) (8kHz) spontaneous: 70% (cooperative: 80%) 80% of the translations are approximately correct and the dialog task success rate should be around 90%. The average end-to-end processing time should be four times real time (length of the input signal) Checklist for Final Verbmobil System
© Wolfgang Wahlster, DFKI GmbH Results of the Verbmobil Project have been used in 20 Spin-Off Products by the Industrial Partners DaimlerChrysler, Philips and Siemens Verbmobil Dictation Systems 3 Spoken Dialog Systems 4 Dialog Engines 2 Command & Control Systems 5 Text Classification Systems 3 Translation Systems 3
© Wolfgang Wahlster, DFKI GmbH Speech control of: cell phone, radio, windows / AC, navigation system Option for S-, C-, and E-Class of Mercedes and BMW Speaker-independent, Garbage models for non-speech (blinker, AC, wheels) Linguatronic : Spoken Dialogs with a Mercedes-Benz Mike Please call Doris Wahlster. Open the left window in the back. I want to hear the weather channel. When will I reach the next gas station? Where is the next parking lot?
© Wolfgang Wahlster, DFKI GmbH Fielded applications Train schedules (German Railway System, DB) TABA (Philips) OSCAR (DaimlerChrysler) Flight Schedules (Lufthansa) ALF (Philips) Technical Challenges: phone-based dialogs, many proper names, clarification subdialogs Spoken Dialogs about Schedules
© Wolfgang Wahlster, DFKI GmbH Verbmobil XtraMind Technologies Language Technology for Customer Interaction Services Saarbrücken GSDC GmbH Multilingual Documentation Nürnberg SCHEMA GmbH Document Engineering Nürnberg SYMPALOG GmbH Spoken Dialog Systems Nürnberg RETIVOX GbR Speech Synthesis Systems Bonn CLT Sprachtechnologie GmbH LT for Text Processing Saarbrücken AIXPLAIN AG Human Language Technology Aachen SONICSON GmbH Natural Language Access to Online Music Kaiserslautern Successful Technology Transfer: 8 High-Tec Spin-Off Companies in the Area of Language Technology have been founded by Verbmobil Researchers
© Wolfgang Wahlster, DFKI GmbH Verbmobil Internships 18 Master Students 238 PhD Students 164 Student Research Assistants 483 Habilitations 16 Total 919 Verbmobil was the Key Resource for the Education and Training of Researchers and Engineers Needed to Build Up Language Industry in Germany
© Wolfgang Wahlster, DFKI GmbH Verbmobil SmartKom Today‘s Cell Phone Third Generation UMTS Phone Speech onlySpeech, Graphics and Gesture From Spoken Dialog to Multimodal Dialog
© Wolfgang Wahlster, DFKI GmbH Natural Language Dialog Graphical User Interfaces Gestural Interaction Multimodal Interaction Merging Various User Interface Paradigms see Phil Cohen‘s invited talk on Friday
© Wolfgang Wahlster, DFKI GmbH Main Contractor Project Management Testbed Software Integration DFKI Saarbrücken Main Contractor Project Management Testbed Software Integration DFKI Saarbrücken The SmartKom Consortium: Project Budget: $ 34 M Project Duration: 4 years SmartKom: Intuitive Multimodal Interaction MediaInterface European Media Lab IMS Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart Ludwig-Maximilians- Universität München
© Wolfgang Wahlster, DFKI GmbH Camera GPS Microphone Loudspeaker Stylus- Activated Sketch Pad Wearable Compute Server Docking Station for Car PC Biosensor for Authentication & Emotional Feedback GSM for Telephone, Fax, Internet Connectivity SmartKom-Mobile: A Handheld Communication Assistant
© Wolfgang Wahlster, DFKI GmbH SmartKom: Multimodal Dialogs with a Life-like Character
© Wolfgang Wahlster, DFKI GmbH Verbmobil is a Very Large Dialog System 69 modules communicate via 224 blackboards HPSG for German uses a hierarchy of 2,400 types 15,385 entries in the semantic database 22,783 transfer rules and 13,640 microplanning rules 30,000 templates for case-based translation 691,583 alignment templates 334 finite state-transducers
© Wolfgang Wahlster, DFKI GmbH Deep Processing can be used for merging, completing and repairing the results of shallow processing strategies. Shallow methods can be used to guide the search in deep processing. Statistical methods must be augmented by symbolic models to achieve higher accuracy and broader coverage. Statistical methods can be used to learn operators or selection strategies for symbolic processes. Lessons Learned from Verbmobil
© Wolfgang Wahlster, DFKI GmbH Real-world problems in language technology like the understanding of spoken dialogs, speech-to-speech translation and multimodal dialog systems can only be cracked by the combined muscle of deep and shallow processing approaches. Conclusions and Take-Home Messages
© Wolfgang Wahlster, DFKI GmbH In a multi-blackboard and multi-engine architecture based on packed representations on all processing levels speech recognition parsing semantic processing translation generation using charts with underspecified representations the results of concurrent processing threads can be combined in an incremental fashion. Conclusions and Take-Home Messages
© Wolfgang Wahlster, DFKI GmbH All results of concurrent and competing processing modules should come with a confidence value, so that statistically trained selection modules can choose the most promising result at each stage, if demanded by a following processing step. Conclusions and Take-Home Messages
© Wolfgang Wahlster, DFKI GmbH Packed representations together with formalisms for underspecification capture the uncertainties in a each processing phase, so that the uncertainties can be reduced by linguistic, discourse and domain constraints as soon as they become applicable. Conclusions and Take-Home Messages
© Wolfgang Wahlster, DFKI GmbH Conclusions and Take-Home Messages Underspecification allows disambiguation requirements to be delayed until later processing stages where better-informed decisions can be made. The massive use of underspecification makes the syntax-semantic interface and transfer rules almost deterministic, thereby boosting processing speed.
© Wolfgang Wahlster, DFKI GmbH Integrating top-down knowledge into low-level speech recognition processes Exploiting more knowledge about human interpretation strategies More robust translation of turns with very low word accuracy rates Expensive data collection and cognitively unrealistic training data Open Problems:
© Wolfgang Wahlster, DFKI GmbH You can find a 10-page paper in the IJCAI-01 Proceedings, Vol. 2 see pages An extended version will appear in the Winter issue of the AI Magazine or check the URL: verbmobil.dfki.de Further Reading
© Wolfgang Wahlster, DFKI GmbH Wahlster, W. (2000) (ed.): Verbmobil: Foundations of Speech-to-Speech Translation. Berlin, New York, Tokyo: Springer. 679 pp. 224 figs., 88 tabs. Hardcover ISBN The Verbmobil Book