Download presentation
Presentation is loading. Please wait.
Published byEstella Parrish Modified over 9 years ago
1
Wolfgang Wahlster German Research Center for Artificial Intelligence DFKI GmbH www.dfki.de/~wahlster Seventeenth International Joint Conference on Artificial Intelligence, IJCAI-01 Seattle Wednesday, 8 August 2001 Robust Translation of Spontaneous Speech: A Multi-Engine Approach
2
© Wolfgang Wahlster, DFKI GmbH Mobile Speech-to-Speech Translation of Spontaneous Dialogs As the name Verbmobil suggests, the system supports verbal communication with foreign dialog partners in mobile situations. 1 2 face-to-face conversations telecommunication
3
© Wolfgang Wahlster, DFKI GmbH Mobile Speech-to-Speech Translation of Spontaneous Dialogs Verbmobil Speech Translation Server Conference Call: The Verbmobil Speech Translation Server connects GSM cell phone users
4
© Wolfgang Wahlster, DFKI GmbH Robust Realtime Translation with Verbmobil At a German Airport: An American business man calls the secretary of a German business partner.
5
© Wolfgang Wahlster, DFKI GmbH Verbmobil‘s Multi-Blackboard and Multi-Engine Architecture Exploiting Underspecification in a Multi-Stratal Semantic Representation Language Combining Deep and Shallow Processing Strategies for Robust Dialog Translation Evaluation and Technology Transfer Lessons Learned and Conclusions Outline
6
© Wolfgang Wahlster, DFKI GmbH Telephone-based Dialog Translation German English German Verbmobil Server Cluster American Dialog Partner American Dialog Partner German Dialog Partner German Dialog Partner Bianca/Brick XS BinTec ISDN-LAN Router Bianca/Brick XS BinTec ISDN-LAN Router German English English German Sun Server 450 LINUX Server Sun ULTRA 60/80 ISDN Conference Call (3 Participants): -German Speaker -Verbmobil -American Speaker Speech-based Set-up of the Conference Call
7
© Wolfgang Wahlster, DFKI GmbH Verbmobil: The First Speech-Only Dialog Translation System American Speaker: “Verbmobil” (Voice Dialing) Mobile DECT Phone Mobile GSM Phone
8
© Wolfgang Wahlster, DFKI GmbH Verbmobil: The First Speech-Only Dialog Translation System American Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server +49 631 3111911 Mobile DECT Phone Mobile GSM Phone
9
© Wolfgang Wahlster, DFKI GmbH Verbmobil: The First Speech-Only Dialog Translation System American Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server +49 631 3111911 Verbmobil: “Welcome to the Verbmobil Translation System. Please speak the telephone number of your partner.” Mobile GSM Phone Mobile DECT Phone
10
© Wolfgang Wahlster, DFKI GmbH Verbmobil: The First Speech-Only Dialog Translation System American Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server +49 631 3111911 American Speaker: “0177555” Verbmobil: “Welcome to the Verbmobil Translation System. Please speak the telephone number of your partner.” Mobile GSM Phone Mobile DECT Phone
11
© Wolfgang Wahlster, DFKI GmbH Verbmobil: The First Speech-Only Dialog Translation System American Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server +49 631 3111911 Foreign Participant is placed into the Conference Call To German Participant Verbmobil: Verbmobil hat eine neue Verbindung aufgebaut. Bitte sprechen Sie jetzt. To American Participant Verbmobil: Welcome to the Verbmobil server. Please start your input after the beep. Verbmobil: “Welcome to the Verbmobil Translation System. Please speak the telephone number of your partner.” American Speaker: “0177555” Mobile GSM Phone Mobile DECT Phone
12
© Wolfgang Wahlster, DFKI GmbH Verbmobil is a Multilingual System German English (American) German Japanese It supports bidirectional translation between: German Chinese (Mandarine)
13
UNIVERSITÄT DES SAARLANDES RUHR-UNIVERSITÄT BOCHUM Phase 2 UNIVERSITÄT HAMBURG UNIVERSITÄT KARLSRUHE UNIVERSITÄT BIELEFELD TECHNISCHE UNIVERSITÄT MÜNCHEN FRIEDRICH- ALEXANDER- UNIVERSITÄT ERLANGEN-NÜRNBERG UNIVERSITÄT STUTTGART RHEINISCHE FRIEDRICH WILHELMS-UNIVERSITÄT BONN LUDWIG MAXIMILIANS UNIVERSITÄT MÜNCHEN TU-BRAUNSCHWEIG EBERHARDT-KARLS UNIVERSITÄT TÜBINGEN W. Wahlster, DFKI D AIMLER C HRYSLER Verbmobil Partner
14
© Wolfgang Wahlster, DFKI GmbH What has the caller said? 100 Alternatives What has the caller meant? 10 Alternatives What does the caller want? Unambiguous Understanding in the Dialog Context Reduction of Uncertainty Sprachanalyse Speech Recognition Speech Telephone Input Discourse Context Knowledge about Domain of Discourse Grammar Lexical Meaning Acoustic Language Models Word Lists Speech Analysis Speech Under- stan- ding Three Levels of Language Processing
15
© Wolfgang Wahlster, DFKI GmbH Open Microphone, GSM Quality Spontaneous Speech Speaker adaptive Multiparty Negotiation Verbmobil Increasing Complexity Input Conditions Naturalness Adaptability Dialog Capabilities Close-Speaking Microphone/ Headset Push-to-talk Isolated Words Speaker Dependent Monolog Dictation Telephone, Pause-based Segmentation Read Continuous Speech Speaker Independent Information- seeking Dialog Challenges for Language Engineering
16
© Wolfgang Wahlster, DFKI GmbH Scenario 1 Appointment Scheduling Scenario 2 Travel Planning & Hotel Reservation Scenario 3 PC-Maintenance Hotline Verbmobil II: Three Domains of Discourse
17
© Wolfgang Wahlster, DFKI GmbH Scenario 1 Appointment Scheduling Scenario 2 Travel Planning & Hotel Reservation Scenario 3 PC-Maintenance Hotline When? When? Where? How? What? When? Where? How? Verbmobil II: Three Domains of Discourse
18
© Wolfgang Wahlster, DFKI GmbH Scenario 1 Appointment Scheduling Scenario 2 Travel Planning & Hotel Reservation Scenario 3 PC-Maintenance Hotline When? When? Where? How? What? When? Where? How? Focus on temporal expressions Focus on temporal and spatial expressions Integration of special sublanguage lexica Verbmobil II: Three Domains of Discourse
19
© Wolfgang Wahlster, DFKI GmbH Scenario 1 Appointment Scheduling Scenario 2 Travel Planning & Hotel Reservation Scenario 3 PC-Maintenance Hotline When? When? Where? How? What? When? Where? How? Focus on temporal expressions Focus on temporal and spatial expressions Integration of special sublanguage lexica Vocabulary Size: 6000 Vocabulary Size: 10000 Vocabulary Size: 30000 Verbmobil II: Three Domains of Discourse
20
© Wolfgang Wahlster, DFKI GmbH Wann fährt der nächste Zug nach Hamburg ab? When does the next train to Hamburg depart? Wo befindet sich das nächste Hotel? Where is the nearest hotel? Context-Sensitive Speech-to-Speech Translation Verbmobil Server
21
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
22
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
23
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
24
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
25
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
26
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
27
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
28
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
29
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
30
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
31
© Wolfgang Wahlster, DFKI GmbH The Control Panel of Verbmobil
32
© Wolfgang Wahlster, DFKI GmbH Verbmobil‘s Massive Data Collection Effort Transliteration Variant 1 Transliteration Variant 2 Lexical Orthography Canonical Pronounciation Manual Phonological Segmentation Automatic Phonological Segmentation Word Segmentation Prosodic Segmentation Dialog Acts Noises Superimposed Speech Syntactic Category Word Category Syntactic Function Prosodic Boundaries The so-called Partitur (German word for musical score) orchestrates fifteen strata of annotations 3,200 dialogs (182 hours) with 1,658 speakers 79,562 turns distributed on 56 CDs, 21.5 GB
33
© Wolfgang Wahlster, DFKI GmbH Machine Learning for the Integration of Statistical Properties into Symbolic Models for Speech Recognition, Parsing, Dialog Processing, Translation Transcribed Speech Data Segmented Speech with Prosodic Labels Annotated Dialogs with Dialog Acts Treebanks & Predicate- Argument Structures Aligned Bilingual Corpora Hidden Markov Models Neural Nets, Multilayered Perceptrons Probabilistic Automata Probabilistic Grammars Probabilistic Transfer Rules Extracting Statistical Properties from Large Corpora
34
© Wolfgang Wahlster, DFKI GmbH 50 60 70 80 90 100 VM1 '97'98 '99.1'99.2'99.3 2000 Word accuracy [%] Japanese English German Multilinguality
35
© Wolfgang Wahlster, DFKI GmbH Language Identification (LID) German Recognizer English Recognizer Japanese Recognizer Speech Independent LID- Module w 1 … w n Multilinguality
36
© Wolfgang Wahlster, DFKI GmbH M1 M2M3 M5 M6 M4 BB 2BB 1 BB 3 M1 Verbmobil IVerbmobil II Multi-Agent Architecture Multi-Blackboard Architecture Each module must know, which module produces what data Direct communication between modules Heavy data traffic for moving copies around All modules can register for each blackboard dynamically No direct communication between modules No copies of representation structures (word lattice, VIT chart) From a Multi-Agent Architecture to a Multi- Blackboard Architecture Blackboards M2 M3 M6 M4 M5
37
© Wolfgang Wahlster, DFKI GmbH Module 1.1 Module 2.1 Module 3.1 Blackboard 1 Preprocessed Speech Signal Blackboard 2 Word Lattice Blackboard 3 Syntactic Representation: Parsing Results Blackboard 4 Semantic Representation: Lambda DRS Blackboard 5 Dialog Acts Module 4.1 Module 5.1 Module 6.1 Multi-Blackboard/Multi-Engine Architecture 1.2 2.23.2...... 4.2.. 5.2.... 6.2
38
© Wolfgang Wahlster, DFKI GmbH Audio Data Command Recognizer Spontaneous Speech Recognizer Channel/Speaker Adaptation Prosodic Analysis A Multi-Blackboard Architecture for the Combination of Results from Deep and Shallow Processing Modules
39
© Wolfgang Wahlster, DFKI GmbH Audio Data Word Hypotheses Graph with Prosodic Labels Command Recognizer Spontaneous Speech Recognizer Channel/Speaker Adaptation Prosodic Analysis Statistical Parser Dialog Act Recognition Chunk Parser HPSG Parser A Multi-Blackboard Architecture for the Combination of Results from Deep and Shallow Processing Modules
40
© Wolfgang Wahlster, DFKI GmbH Audio Data Word Hypotheses Graph with Prosodic Labels VITs Underspecified Discourse Representations Command Recognizer Spontaneous Speech Recognizer Channel/Speaker Adaptation Prosodic Analysis Statistical Parser Dialog Act Recognition Chunk Parser HPSG Parser Semantic Construction Robust Dialog Semantics Semantic Transfer Generation A Multi-Blackboard Architecture for the Combination of Results from Deep and Shallow Processing Modules
41
© Wolfgang Wahlster, DFKI GmbH VIT (Verbmobil Interface Terms) as a Multi-Stratal Representation Language used as a common representation scheme for information exchange between all components and processing threads design inspired by underspecified discourse representation structures (UDRS, Reyle/Kamp 1993) compact representation of lexical and structured ambiguities and scope underspecifications of quantifiers, negations and adverbs variable-free sets of non-recursive terms: [beginning (35, i37), arg3 (35, i37,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], streams of literals as flat multi-stratal representations that are very efficient for incremental processing
42
© Wolfgang Wahlster, DFKI GmbH Vit (vitID (sid (104,a,en,10,80,1,en,y,semantics), % Segment Identifier [word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]), % WHG String index (38, 25,i35), % Index [beginning (35, i37), arg3 (35, i37,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions [in_g (26, 25), in_g (37, 38), in_g (27, 25), in_g (28, 30), in_g (31, 33), in_g (34, 32), in_g (35, 29), in_g (36, 25), leq (25, h41), leq (25, h43), leq (29, h42), leq (29, h44), leq (30, h43), leq (32, h45), leq (33, h43)], % Scope and Grouping Constraints [s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sortal Specifications for Instance Variables [dialog_act (25, inform), dir (36, no), prontype (i36, third,std)], % Discourse and Pragmatics [cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg), num (i38, sg), pcase (l135, i38, of)], % Syntax [ta_aspect (i35, progr), ta_mood (i35, ind), ta_perf (i35, nonperf), ta_tense (i35, pres)], % Tense and Aspect [pros_accent (35)] % Prosody VIT for ‘He is coming at the beginning of August‘
43
© Wolfgang Wahlster, DFKI GmbH [word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]), % WHG String [beginning (35, i37), arg3 (35, i37,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts [cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax Information between Layers is Linked Together Using Constant Symbols Instances are constants interpreted as skolemized variables
44
© Wolfgang Wahlster, DFKI GmbH [word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]), % WHG String [beginning (35, i37), arg3 (35, i37,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts [cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax Information between Layers Linked Together Using Constant Symbols Instances are constants interpreted as skolemized variables
45
© Wolfgang Wahlster, DFKI GmbH [word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]), % WHG String [beginning (35, i37), arg3 (35, i37,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts [cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax Information between Layers Linked Together Using Constant Symbols Instances are constants interpreted as skolemized variables
46
© Wolfgang Wahlster, DFKI GmbH [word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]), % WHG String [beginning (35, i37), arg3 (35, i37,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts [cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax Information between Layers Linked Together Using Constant Symbols Instances are constants interpreted as skolemized variables
47
© Wolfgang Wahlster, DFKI GmbH [word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]), % WHG String [beginning (35, i37), arg3 (35, i37,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts [cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax Information between Layers Linked Together Using Constant Symbols Instances are constants interpreted as skolemized variables
48
© Wolfgang Wahlster, DFKI GmbH The Use of Underspecified Representations Wir telephoniertenmit Freundenaus Schweden. Two Readings in the Source Language Underspecified Semantic Representation Ambiguity Preserving Translations A compact representation of scope ambiguities in a logical language without using disjunctions Two Readings in the Target Language We calledfriends from Sweden.
49
© Wolfgang Wahlster, DFKI GmbH Verbmobil is the First Dialog Translation System that Uses Prosodic Information Systematically at All Processing Stages Speech SignalWord Hypotheses Graph Multilingual Prosody Module Prosodic features: duration pitch energy pause Search Space Restriction Parsing Dialog Act Segmentation and Recognition Dialog Understanding Constraints for Transfer Translation Lexical Choice Generation Speech Synthesis Speaker Adaptation Boundary Information Boundary Information Boundary Information Boundary Information Sentence Mood Sentence Mood Accented Words Accented Words Prosodic Feature Vector
50
© Wolfgang Wahlster, DFKI GmbH Using Syntactic-Prosodic Boundaries to Speed- Up the Parsing Process yes S1 no problem S4 Mister Mueller S4 when would you like to go to Hannover S4 without boundaries: # chart edges: 1256 runtime: 1.31 secs with boundaries: #chart edges: 632 runtime: 0.62 secs speed-up: 53%
51
© Wolfgang Wahlster, DFKI GmbH Using Syntactic-Prosodic Boundaries to Speed- Up the Parsing Process yes S1 no problem S4 Mister Mueller S4 when would you like to go to Hannover S4 without boundaries: # chart edges: 1256 runtime: 1.31 secs with boundaries: #chart edges: 632 runtime: 0.62 secs speed-up: 53%
52
© Wolfgang Wahlster, DFKI GmbH Chunk Parser Statistical Parser HPSG Parser Integrating Shallow and Deep Analysis Components in a Multi-Engine Approach A* Algorithm guiding through Augmented Word Hypotheses Graph A* Algorithm guiding through Augmented Word Hypotheses Graph
53
© Wolfgang Wahlster, DFKI GmbH Robust Dialog Semantics Combination and knowledge- based reconstruction of complete VITs Robust Dialog Semantics Combination and knowledge- based reconstruction of complete VITs Complete and Spanning VITs Complete and Spanning VITs Integrating Shallow and Deep Analysis Components in a Multi-Engine Approach Chunk Parser Statistical Parser HPSG Parser partial VITs Chart with a combination of partial VITs Chart with a combination of partial VITs A* Algorithm guiding through Augmented Word Hypotheses Graph A* Algorithm guiding through Augmented Word Hypotheses Graph
54
© Wolfgang Wahlster, DFKI GmbH Wir treffen uns in Mannheim, äh, in Saarbrücken. (We are meeting in Mannheim, oops, in Saarbruecken.) We are meeting in Saarbruecken. English German Automatic Understanding and Correction of Speech Repairs in Spontaneous Telephone Dialogs
55
© Wolfgang Wahlster, DFKI GmbH I need a car next Tuesdayoops Monday Original Utterance Editing PhaseRepair Phase Reparandum Hesitation Reparans Recognition of Substitutions Transformation of the Word Hypothesis Graph I need a car next Monday Verbmobil Technology:Understands Speech Repairs and extracts the intended meaning The Understanding of Spontaneous Speech Repairs
56
© Wolfgang Wahlster, DFKI GmbH VHG: A Packed Chart Representation of Partial Semantic Representations Chart Parser using cascaded finite-state transducers Incremental chart construction and anytime processing
57
© Wolfgang Wahlster, DFKI GmbH VHG: A Packed Chart Representation of Partial Semantic Representations Chart Parser using cascaded finite-state transducers Statistical LR parser trained on a treebank Incremental chart construction and anytime processing
58
© Wolfgang Wahlster, DFKI GmbH VHG: A Packed Chart Representation of Partial Semantic Representations Chart Parser using cascaded finite-state transducers Statistical LR parser trained on a treebank Very fast HPSG parser Incremental chart construction and anytime processing
59
© Wolfgang Wahlster, DFKI GmbH VHG: A Packed Chart Representation of Partial Semantic Representations Chart Parser using cascaded finite-state transducers Statistical LR parser trained on a treebank Very fast HPSG parser Incremental chart construction and anytime processing
60
© Wolfgang Wahlster, DFKI GmbH Incremental chart construction and anytime processing Rule-based combination and transformation of partial UDRS coded as VITs VHG: A Packed Chart Representation of Partial Semantic Representations Chart Parser using cascaded finite-state transducers Statistical LR parser trained on a treebank Very fast HPSG parser
61
© Wolfgang Wahlster, DFKI GmbH Incremental chart construction and anytime processing Rule-based combination and transformation of partial UDRS coded as VITs Selection of a spanning analysis using a bigram model for VITs VHG: A Packed Chart Representation of Partial Semantic Representations Chart Parser using cascaded finite-state transducers Statistical LR parser trained on a treebank Very fast HPSG parser
62
© Wolfgang Wahlster, DFKI GmbH We are meeting in Kaiserslautern. Wir treffen uns Kaiserslautern. (We are meeting Kaiserslautern.) English German Semantic Correction of Recognition Errors
63
© Wolfgang Wahlster, DFKI GmbH Goals of robust semantic processing (Pinkal, Worm, Rupp) Combination of unrelated analysis fragments Completion of incomplete analysis results Skipping of irrelevant fragments Method:T ransformation rules on VIT Hypothesis Graph: Conditions on VIT structures Operations on VIT structures The rules are based on various knowledge sources: lattice of semantic types domain ontology sortal restrictions semantic constraints Results: 20% analysis is improved, 0.6% analysis gets worse Robust Dialog Semantics: Deep Processing of Shallow Structures
64
© Wolfgang Wahlster, DFKI GmbH The preposition ‚in‘ is missing in all paths through the word hypothesis graph. A temporal NP is transformed into a temporal modifier using an underspecified temporal relation: [temporal_np(V1)] [typeraise_to_mod (V1, V2)] & V2 The modifier is applied to a proposition: [type (V1, prop), type (V2, mod)] [apply (V2, V1, V3)] & V3 Let us meet the late afternoon to catch the train to Frankfurt Let us meet (in) the late afternoon to catch the train to Frankfurt Robust Dialog Semantics: Combining and Completing Partial Representations
65
© Wolfgang Wahlster, DFKI GmbH Competing Strategies for Robust Speech Translation The concurrent processing modules of Verbmobil combine deep semantic translation with shallow surface-oriented translation methods. time out? time out? Acceptable Translation Rate Expensive, but precise Translation Cheap, but approximate Translation Principled and compositional syntactic and semantic analysis Semantic-based transfer of Verbmobil Interface Terms (VITs) as set of underspecified DRS Case-based Translation Dialog-act based translation Statistical translation Results with Confidence Values Results with Confidence Values Selection of best result
66
© Wolfgang Wahlster, DFKI GmbH Architecture of the Semantic Transfer Module Bilingual Dictionary Refined VIT (L1) Refined VIT (L2) Lexical Transfer Monolingual Refinement Rules Monolingual Refinement Rules Disambiguation Rules Disambiguation Rules Monolingual Refinement Rules Monolingual Refinement Rules Disambiguation Rules Disambiguation Rules VIT (L1) VIT (L2) Phrasal Transfer Underspecified VIT (L1) Underspecified VIT (L2) Phrasal Dictionary Refinement
67
© Wolfgang Wahlster, DFKI GmbH Preserving lexical ambiguities How did you find his office? (get to or like) Wie fanden Sie sein Büro? Disambiguation is not necessary for the translation between German and English. dou kare nojimusho o mitsukeraremashita ka How he POSS office OBJ get to can PAST QUESTION kare nojimusho wadou omoimasu ka He POSS office TOPIC how think QUESTION Lexical Disambiguation On-Demand Disambiguation is necessary for the translation between German and Japanese.
68
© Wolfgang Wahlster, DFKI GmbH Three English Translations of the German Word “Termin” Found in the Verbmobil Corpus 1.Verschieben wir den Termin. Let’s reschedule the appointment 2.Schlagen Sie einen Termin vor. Suggest a date. 3. Da habe ich einen Termin frei. I have got a free slot there. Subsumption Relations in the Domain Model scheduled event default temporal_specification appointment set_start_timetime_interval date slot
69
© Wolfgang Wahlster, DFKI GmbH Entries in the Transfer Lexicon: German English (Simplified) tau_lex (termin, appointment,pred_sort (subsume(scheduled_event))). tau_lex (termin, date,pred_sort (subsume(set_start_time)). tau_lex (termin, slot, pred_sort (subsume (time_interval))). tau_lex(verschieben, reschedule, [tau (#S), tau (#0)], pred_args ([#S, #0 & pred_sort (scheduled_event)])) tau_lex(ausmachen, fix, [tau (#S), tau (#0)], pred_args ([#S, #0 & pred_sort (set_start_time)])) tau_lex(freihaben, have_free, [tau (#S), tau (#0)], pred_args ([#S, #0 & pred_sort (time_interval)]))
70
© Wolfgang Wahlster, DFKI GmbH Using Context and World Knowledge for Semantic Transfer All other dialog translation systems translate word-by-word or sentence-by-sentence. 1 Nehmen wir dieses Hotel, ja. Let us take this hotel. Ich reserviere einen Platz. I will reserve a room. 2 Machen wir das Abendessen dort. Let us have dinner there. Ich reserviere einen Platz. I will reserve a table. 3 Gehen wir ins Theater. Let us go to the theater. Ich möchte Plätze reservieren. I would like to reserve seats. Example: Platz room / table / seat
71
© Wolfgang Wahlster, DFKI GmbH Segment 1 If you prefer another hotel, Segment 1 If you prefer another hotel, Segment 2 please let me know. Segment 2 please let me know. Integrating Deep and Shallow Processing: Combining Results from Concurrent Translation Threads
72
© Wolfgang Wahlster, DFKI GmbH Statistical Translation Statistical Translation Dialog-Act Based Translation Dialog-Act Based Translation Semantic Transfer Semantic Transfer Case-Based Translation Case-Based Translation Integrating Deep and Shallow Processing: Combining Results from Concurrent Translation Threads Segment 1 If you prefer another hotel, Segment 1 If you prefer another hotel, Segment 2 please let me know. Segment 2 please let me know. Alternative Translations with Confidence Values
73
© Wolfgang Wahlster, DFKI GmbH Integrating Deep and Shallow Processing: Combining Results from Concurrent Translation Threads Segment 1 Translated by Semantic Transfer Segment 1 Translated by Semantic Transfer Segment 2 Translated by Case-Based Translation Segment 2 Translated by Case-Based Translation Alternative Translations with Confidence Values Statistical Translation Statistical Translation Dialog-Act Based Translation Dialog-Act Based Translation Semantic Transfer Semantic Transfer Case-Based Translation Case-Based Translation Segment 1 If you prefer another hotel, Segment 1 If you prefer another hotel, Segment 2 please let me know. Segment 2 please let me know. Selection Module
74
© Wolfgang Wahlster, DFKI GmbH SEQ:=Set of all translation sequences for a turn Seq SEQ:=Sequence of translation segments s 1, s 2,...s n Input: A Machine Learning Approach to the Selection of the Best Translation Result Each translation thread provides for every segment an online confidence value confidence (thread.segment)
75
© Wolfgang Wahlster, DFKI GmbH SEQ:=Set of all translation sequences for a turn Seq SEQ:=Sequence of translation segments s 1, s 2,...s n Input: Each translation thread provides for every segment an online confidence value confidence (thread.segment) Task: Compute normalized confidence values for translated Seq CONF (Seq) = Length(segment) * (alpha(thread) + beta(thread) * confidence(thread.segment)) segment Seq A Context-Free Approach to the Selection of the Best Translation Result
76
© Wolfgang Wahlster, DFKI GmbH SEQ:=Set of all translation sequences for a turn Seq SEQ:=Sequence of translation segments s 1, s 2,...s n Input: Task: Compute normalized confidence values for translated Seq CONF (Seq) = Length(segment) * (alpha(thread) + beta(thread) * confidence(thread.segment)) Output: Best (SEQ) = {Seq SEQ | Seq is maximal element in (SEQ CONF ) segment Seq A Context-Free Approach to the Selection of the Best Translation Result Each translation thread provides for every segment an online confidence value confidence (thread.segment)
77
© Wolfgang Wahlster, DFKI GmbH Turn := segment 1, segment 2...segment n For each turn in a training corpus all segments translated by one of the four translation threads are manually annotated with a score for translation quality. Learning the Normalizing Factors Alpha and Beta from an Annotated Corpus
78
© Wolfgang Wahlster, DFKI GmbH Turn := segment 1, segment 2...segment n For each turn in a training corpus all segments translated by one of the four translation threads are manually annotated with a score for translation quality. For the sequence of n segments resulting in the best overall translation score at most 4 n linear inequations are generated, so that the selected sequence is better than all alternative translation sequences. Learning the Normalizing Factors Alpha and Beta from an Annotated Corpus
79
© Wolfgang Wahlster, DFKI GmbH Turn := segment 1, segment 2...segment n For each turn in a training corpus all segments translated by one of the four translation threads are manually annotated with a score for translation quality. For the sequence of n segments resulting in the best overall translation score at most 4 n linear inequations are generated, so that the selected sequence is better than all alternative translation sequences. From the set of inequations for spanning analyses ( 4 n ) the values of alpha and beta can be determined offline by solving the constraint system. Learning the Normalizing Factors Alpha and Beta from an Annotated Corpus
80
© Wolfgang Wahlster, DFKI GmbH Integrating a Deep HPSG-based Analysis with Probabilistic Dialog Act Recognition for Semantic Transfer Probabilistic Analysis of Dialog Acts (HMM) Probabilistic Analysis of Dialog Acts (HMM) Recognition of Dialog Plans (Plan Operators) Recognition of Dialog Plans (Plan Operators) Dialog Act Type HPSG Analysis Robust Dialog Semantics Robust Dialog Semantics VIT Semantic Transfer Semantic Transfer Dialog Act Type
81
© Wolfgang Wahlster, DFKI GmbH Probabilistic Analysis of Dialog Acts (HMM) Probabilistic Analysis of Dialog Acts (HMM) Recognition of Dialog Plans (Plan Operators) Recognition of Dialog Plans (Plan Operators) Dialog Phase Dialog Act Type Integrating a Deep HPSG-based Analysis with Probabilistic Dialog Act Recognition for Semantic Transfer HPSG Analysis Robust Dialog Semantics Robust Dialog Semantics VIT Semantic Transfer Semantic Transfer Dialog Act Type
82
© Wolfgang Wahlster, DFKI GmbH Dialog Act CONTROL_DIALOG MANAGE_TASK PROMOTE_TASK GREETING INTRODUCE POLITENESS_FORMULA THANK DELIBERATE BACKCHANNEL INIT DEFER CLOSE REQUEST SUGGEST INFORM FEEDBACK COMMIT REQUEST_SUGGEST REQUEST_CLARIFY REQUEST_COMMENT REQUEST_COMMIT GREETING_BEGIN GREETING_END DIGRESS EXCLUDE CLARIFY GIVE_REASON DEVIATE_SCENARIO REFER_TO_SETTING CLARIFY_ANSWER FEEDBACK_NEGATIVE REJECT EXPLAINED_REJECT FEEDBACK_POSITIVE ACCEPT CONFIRM The Dialog Act Hierarchy used for Planning, Prediction, Translation and Generation
83
© Wolfgang Wahlster, DFKI GmbH ( OPERATOR-s-10523-6 goal [IN-TURN confirm-s-10523 ?S-3314 ?S-3316] subgoals (sequence [IN-TURN confirm-s-10521?S-3314 ?S-3315] [IN-TURN confirm-s-10522?S-3315 ?S-3316]) PROB 0.72) ( OPERATOR-s-10521-8 goal [IN-TURN confirm-s-10521 ?S-3321 ?S-3322] subgoals (sequence[DOMAIN-DEPENDENT accept ?S-3321 ?S-3322]) PROB 0.95) Learning of Probabilistic Plan Operators from Annotated Corpora
84
© Wolfgang Wahlster, DFKI GmbH Dialog Translation by Verbmobil Multilingual Generation of Summaries HTML- Document in English Transferred by Internet or Fax HTML- Document in German Transferred by Internet or Fax German Dialog Partner American Dialog Partner Automatic Generation of Multilingual Summaries of Telephone Conversations
85
© Wolfgang Wahlster, DFKI GmbH Dialog Summary Participants: Mr. Jones, Mr. Mueller Date: 22.3.2001 Time: 8:57 AM to 10:03 AM Theme: Appointment schedule with trip and accommodation Dialog Summary: Scheduling: Mr. Jones and Mr. Mueller will meet at the train station on the 1 st of March 2001 at 10:00 am. Travelling: The trip from Hamburg to Hanover by train will start on the 1 st of March at 10:15 am. Summary automatically generated at 22.3.2001 12:31:24 h
86
© Wolfgang Wahlster, DFKI GmbH Microplanning: Create Syntactic Building Blocks Method: Mapping of dependency structures Example: Time Expressions DEF (L,I,G,H) DOWF (L1,I,mo) ORD (L2,I,11) MOFY (L3,I,may) MONDAY1 ARG ELEVENTH_DAY SPEC ARG THE MAY ARG OF_P Semantic dependency: VITSyntactical dependency: TAG
87
© Wolfgang Wahlster, DFKI GmbH Speeding Up the Language Generation Process by the Compilation of the HPSG Grammar to an LTAG Generation Grammar Lexicalized Tree Adjoining Grammar 2,350 Trees Compilation - extended domain of locality - no recursive feature structures - fast generation (0.5 secs average runtime) HPSG Analysis Grammar
88
© Wolfgang Wahlster, DFKI GmbH IhavetimeMonday.on Sentence to synthesize ihavetimemonday Ihavetimemonday ihavemonday i on Tokens S E Edge direction SE have time imondayon Corpus-based Speech Synthesis
89
© Wolfgang Wahlster, DFKI GmbH Funding by the German Ministry for Education and Research BMBF (Dr. Reuse) Phase I (1993-1996)$ 33 M Phase II (1997-2000)$ 28 M 60% Industrial funding according to a shared cost model$ 17 M Additional R&D investments of industrial partners$ 11 M Total$ 89 M Verbmobil: Long-Term, Large-Scale Funding and Its Impact
90
© Wolfgang Wahlster, DFKI GmbH > 800 Publications (>600 refereed) >Many Patents > 20 Commercial Spin-off Products >8 Spin-off Companies > 900 trained Researchers for German Language Industry Philips, DaimlerChrysler and Siemens are leaders in Spoken Dialog Applications Verbmobil: Long-Term, Large-Scale Funding and Its Impact
91
© Wolfgang Wahlster, DFKI GmbH 0 50 100 150 200 250 300 350 15101520253035 40 45505560 Distribution of Sentence Length in Large-Scale Evaluation Web-based Evaluation of 25,345 Translations by 65 Evaluators
92
© Wolfgang Wahlster, DFKI GmbH Evaluation Results The translation of a turn is approximately correct if it preserves the intention of the speaker and the main propositional content of her utterance. Translation Thread Case-based Translation Statistical Translation Dialog-Act based Translation Semantic Transfer Substring-based Translation Automatic Selection Manual Selection 44% 79% 45% 47% 75% 66% / 83% * 95% 46% 81% 46% 49% 79% 68% / 85% * 97% Word Accuracy 75% 3267 Turns Word Accuracy 80% 2723 Turns * After Training with Instance-based Learning Algorithm
93
© Wolfgang Wahlster, DFKI GmbH Topic Meeting time Meeting place Means of transportation Departure place Arrival time Who reserves the hotel How to get to departure place Total Number of Tasks Average Percentage of Successful Task Completions Successful Completions/ Attempts 25/28 21/27 30/30 22/25 22/26 28/31 7/9 227/255 Successful Tasks 89,3 77,8 100 88 84,6 90,3 77,8 86,8 Frequency- Based Weighting Factor 0,90 0,87 0,97 0,81 0,84 1 0,29 89,6 Results of End-to-End Evaluation Based on Dialog Task Completion for 31 Trials........
94
© Wolfgang Wahlster, DFKI GmbH Vocabulary Size: 10 000 for German, Equivalent English Lexicon, 2500 for Japanese Operational Success Criteria: Word recognition rate (16 kHz): German: spontaneous: 75% (cooperative: 85%) English: spontaneous: 72% (cooperative: 82%) Japanese: spontaneous: 75% (cooperative: 85%) (8kHz) spontaneous: 70% (cooperative: 80%) 80% of the translations are approximately correct and the dialog task success rate should be around 90%. The average end-to-end processing time should be four times real time (length of the input signal) Checklist for Final Verbmobil System
95
© Wolfgang Wahlster, DFKI GmbH Results of the Verbmobil Project have been used in 20 Spin-Off Products by the Industrial Partners DaimlerChrysler, Philips and Siemens Verbmobil Dictation Systems 3 Spoken Dialog Systems 4 Dialog Engines 2 Command & Control Systems 5 Text Classification Systems 3 Translation Systems 3
96
© Wolfgang Wahlster, DFKI GmbH Speech control of: cell phone, radio, windows / AC, navigation system Option for S-, C-, and E-Class of Mercedes and BMW Speaker-independent, Garbage models for non-speech (blinker, AC, wheels) Linguatronic : Spoken Dialogs with a Mercedes-Benz Mike Please call Doris Wahlster. Open the left window in the back. I want to hear the weather channel. When will I reach the next gas station? Where is the next parking lot?
97
© Wolfgang Wahlster, DFKI GmbH Fielded applications Train schedules (German Railway System, DB) TABA (Philips) +49 241 60 40 20 OSCAR (DaimlerChrysler) +49 1805 99 66 22 Flight Schedules (Lufthansa) ALF (Philips) +49 1803 00 00 74 Technical Challenges: phone-based dialogs, many proper names, clarification subdialogs Spoken Dialogs about Schedules
98
© Wolfgang Wahlster, DFKI GmbH Verbmobil XtraMind Technologies Language Technology for Customer Interaction Services www.xtramind.com Saarbrücken GSDC GmbH Multilingual Documentation www.ic-portal.gsdc.de Nürnberg SCHEMA GmbH Document Engineering www.schema.de Nürnberg SYMPALOG GmbH Spoken Dialog Systems www.sympalog.de Nürnberg RETIVOX GbR Speech Synthesis Systems www.retivox.de Bonn CLT Sprachtechnologie GmbH LT for Text Processing www.clt-st.de Saarbrücken AIXPLAIN AG Human Language Technology www.aixplain.de Aachen SONICSON GmbH Natural Language Access to Online Music www.sonicson.com Kaiserslautern Successful Technology Transfer: 8 High-Tec Spin-Off Companies in the Area of Language Technology have been founded by Verbmobil Researchers
99
© Wolfgang Wahlster, DFKI GmbH Verbmobil Internships 18 Master Students 238 PhD Students 164 Student Research Assistants 483 Habilitations 16 Total 919 Verbmobil was the Key Resource for the Education and Training of Researchers and Engineers Needed to Build Up Language Industry in Germany
100
© Wolfgang Wahlster, DFKI GmbH Verbmobil SmartKom Today‘s Cell Phone Third Generation UMTS Phone Speech onlySpeech, Graphics and Gesture From Spoken Dialog to Multimodal Dialog
101
© Wolfgang Wahlster, DFKI GmbH Natural Language Dialog Graphical User Interfaces Gestural Interaction Multimodal Interaction Merging Various User Interface Paradigms see Phil Cohen‘s invited talk on Friday
102
© Wolfgang Wahlster, DFKI GmbH Main Contractor Project Management Testbed Software Integration DFKI Saarbrücken Main Contractor Project Management Testbed Software Integration DFKI Saarbrücken The SmartKom Consortium: Project Budget: $ 34 M Project Duration: 4 years SmartKom: Intuitive Multimodal Interaction MediaInterface European Media Lab IMS Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart Ludwig-Maximilians- Universität München
103
© Wolfgang Wahlster, DFKI GmbH Camera GPS Microphone Loudspeaker Stylus- Activated Sketch Pad Wearable Compute Server Docking Station for Car PC Biosensor for Authentication & Emotional Feedback GSM for Telephone, Fax, Internet Connectivity SmartKom-Mobile: A Handheld Communication Assistant
104
© Wolfgang Wahlster, DFKI GmbH SmartKom: Multimodal Dialogs with a Life-like Character
105
© Wolfgang Wahlster, DFKI GmbH Verbmobil is a Very Large Dialog System 69 modules communicate via 224 blackboards HPSG for German uses a hierarchy of 2,400 types 15,385 entries in the semantic database 22,783 transfer rules and 13,640 microplanning rules 30,000 templates for case-based translation 691,583 alignment templates 334 finite state-transducers
106
© Wolfgang Wahlster, DFKI GmbH Deep Processing can be used for merging, completing and repairing the results of shallow processing strategies. Shallow methods can be used to guide the search in deep processing. Statistical methods must be augmented by symbolic models to achieve higher accuracy and broader coverage. Statistical methods can be used to learn operators or selection strategies for symbolic processes. Lessons Learned from Verbmobil
107
© Wolfgang Wahlster, DFKI GmbH Real-world problems in language technology like the understanding of spoken dialogs, speech-to-speech translation and multimodal dialog systems can only be cracked by the combined muscle of deep and shallow processing approaches. Conclusions and Take-Home Messages
108
© Wolfgang Wahlster, DFKI GmbH In a multi-blackboard and multi-engine architecture based on packed representations on all processing levels speech recognition parsing semantic processing translation generation using charts with underspecified representations the results of concurrent processing threads can be combined in an incremental fashion. Conclusions and Take-Home Messages
109
© Wolfgang Wahlster, DFKI GmbH All results of concurrent and competing processing modules should come with a confidence value, so that statistically trained selection modules can choose the most promising result at each stage, if demanded by a following processing step. Conclusions and Take-Home Messages
110
© Wolfgang Wahlster, DFKI GmbH Packed representations together with formalisms for underspecification capture the uncertainties in a each processing phase, so that the uncertainties can be reduced by linguistic, discourse and domain constraints as soon as they become applicable. Conclusions and Take-Home Messages
111
© Wolfgang Wahlster, DFKI GmbH Conclusions and Take-Home Messages Underspecification allows disambiguation requirements to be delayed until later processing stages where better-informed decisions can be made. The massive use of underspecification makes the syntax-semantic interface and transfer rules almost deterministic, thereby boosting processing speed.
112
© Wolfgang Wahlster, DFKI GmbH Integrating top-down knowledge into low-level speech recognition processes Exploiting more knowledge about human interpretation strategies More robust translation of turns with very low word accuracy rates Expensive data collection and cognitively unrealistic training data Open Problems:
113
© Wolfgang Wahlster, DFKI GmbH You can find a 10-page paper in the IJCAI-01 Proceedings, Vol. 2 see pages 1484 - 1493 An extended version will appear in the Winter issue of the AI Magazine or check the URL: verbmobil.dfki.de Further Reading
114
© Wolfgang Wahlster, DFKI GmbH Wahlster, W. (2000) (ed.): Verbmobil: Foundations of Speech-to-Speech Translation. Berlin, New York, Tokyo: Springer. 679 pp. 224 figs., 88 tabs. Hardcover ISBN 3-540-67783-6 The Verbmobil Book
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.