Download presentation
Presentation is loading. Please wait.
Published bySara Mitchell Modified over 6 years ago
1
Robust Translation of Spontaneous Speech: A Multi-Engine Approach
Seventeenth International Joint Conference on Artificial Intelligence, IJCAI-01 Seattle Wednesday, 8 August 2001 Robust Translation of Spontaneous Speech: A Multi-Engine Approach Wolfgang Wahlster German Research Center for Artificial Intelligence DFKI GmbH
2
Mobile Speech-to-Speech Translation of Spontaneous Dialogs
As the name Verbmobil suggests, the system supports verbal communication with foreign dialog partners in mobile situations. 1 face-to-face conversations 2 telecommunication
3
Mobile Speech-to-Speech Translation of Spontaneous Dialogs
Verbmobil Speech Translation Server Conference Call: The Verbmobil Speech Translation Server connects GSM cell phone users
4
Robust Realtime Translation with Verbmobil
At a German Airport: An American business man calls the secretary of a German business partner.
5
Outline l Verbmobil‘s Multi-Blackboard and Multi-Engine Architecture
l Exploiting Underspecification in a Multi-Stratal Semantic Representation Language l Combining Deep and Shallow Processing Strategies for Robust Dialog Translation l Evaluation and Technology Transfer l Lessons Learned and Conclusions
6
Telephone-based Dialog Translation
Verbmobil Server Cluster l ISDN Conference Call (3 Participants): -German Speaker -Verbmobil -American Speaker l Speech-based Set-up of the Conference Call German Dialog Partner German German GermanEnglish Bianca/Brick XS BinTec ISDN-LAN Router English German English English American Dialog Partner LINUX Server Sun Server 450 Sun ULTRA 60/80
7
Verbmobil: The First Speech-Only Dialog Translation System
Mobile GSM Phone Mobile DECT Phone American Speaker: “Verbmobil” (Voice Dialing)
8
Verbmobil: The First Speech-Only Dialog Translation System
Mobile GSM Phone Mobile DECT Phone American Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server
9
Verbmobil: The First Speech-Only Dialog Translation System
Mobile GSM Phone Mobile DECT Phone American Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server Verbmobil: “Welcome to the Verbmobil Translation System. Please speak the telephone number of your partner.”
10
Verbmobil: The First Speech-Only Dialog Translation System
Mobile GSM Phone Mobile DECT Phone American Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server Verbmobil: “Welcome to the Verbmobil Translation System. Please speak the telephone number of your partner.” American Speaker: “ ”
11
Verbmobil: The First Speech-Only Dialog Translation System
Mobile GSM Phone Mobile DECT Phone American Speaker: “Verbmobil” (Voice Dialing) Connect to the Verbmobil Speech-to-Speech Translation Server Verbmobil: “Welcome to the Verbmobil Translation System. Please speak the telephone number of your partner.” To German Participant To American Participant American Speaker: “ ” Foreign Participant is placed into the Conference Call Verbmobil: Verbmobil hat eine neue Verbindung aufgebaut. Bitte sprechen Sie jetzt. Verbmobil: Welcome to the Verbmobil server. Please start your input after the beep.
12
Verbmobil is a Multilingual System
It supports bidirectional translation between: German English (American) German Japanese German Chinese (Mandarine)
13
WILHELMS-UNIVERSITÄT
Verbmobil Partner TU-BRAUNSCHWEIG DAIMLERCHRYSLER RHEINISCHE FRIEDRICH WILHELMS-UNIVERSITÄT BONN LUDWIG MAXIMILIANS UNIVERSITÄT MÜNCHEN Phase 2 UNIVERSITÄT BIELEFELD UNIVERSITÄT DES SAARLANDES TECHNISCHE UNIVERSITÄT MÜNCHEN UNIVERSITÄT HAMBURG FRIEDRICH- ALEXANDER- UNIVERSITÄT ERLANGEN-NÜRNBERG EBERHARDT-KARLS UNIVERSITÄT TÜBINGEN RUHR-UNIVERSITÄT BOCHUM UNIVERSITÄT STUTTGART UNIVERSITÄT KARLSRUHE W. Wahlster, DFKI
14
Three Levels of Language Processing
Speech Telephone Input Acoustic Language Models Speech Recognition What has the caller said? 100 Alternatives Word Lists Speech Analysis Sprachanalyse Grammar Reduction of Uncertainty What has the caller meant? 10 Alternatives Lexical Meaning Speech Under- stan- ding Discourse Context Knowledge about Domain of Discourse What does the caller want? Unambiguous Understanding in the Dialog Context
15
Challenges for Language Engineering
Input Conditions Naturalness Adaptability Dialog Capabilities Open Microphone, GSM Quality Spontaneous Speech Speaker adaptive Multiparty Negotiation Verbmobil Increasing Complexity Close-Speaking Microphone/ Headset Push-to-talk Isolated Words Speaker Dependent Monolog Dictation Telephone, Pause-based Segmentation Read Continuous Speech Speaker Independent Information- seeking Dialog
16
Verbmobil II: Three Domains of Discourse
Scenario 1 Appointment Scheduling Scenario 2 Travel Planning & Hotel Reservation Scenario 3 PC-Maintenance Hotline
17
Verbmobil II: Three Domains of Discourse
Scenario 1 Appointment Scheduling Scenario 2 Travel Planning & Hotel Reservation Scenario 3 PC-Maintenance Hotline When? When? Where? How? What? When? Where? How?
18
Verbmobil II: Three Domains of Discourse
Scenario 1 Appointment Scheduling Scenario 2 Travel Planning & Hotel Reservation Scenario 3 PC-Maintenance Hotline When? When? Where? How? What? When? Where? How? Focus on temporal expressions Focus on temporal and spatial expressions Integration of special sublanguage lexica
19
Verbmobil II: Three Domains of Discourse
Scenario 1 Appointment Scheduling Scenario 2 Travel Planning & Hotel Reservation Scenario 3 PC-Maintenance Hotline When? When? Where? How? What? When? Where? How? Focus on temporal expressions Focus on temporal and spatial expressions Integration of special sublanguage lexica Vocabulary Size: 6000 Vocabulary Size: 10000 Vocabulary Size: 30000
20
Context-Sensitive Speech-to-Speech Translation
Wann fährt der nächste Zug nach Hamburg ab? When does the next train to Hamburg depart? Wo befindet sich das nächste Hotel? Where is the nearest hotel? Verbmobil Server
21
The Control Panel of Verbmobil
22
The Control Panel of Verbmobil
23
The Control Panel of Verbmobil
24
The Control Panel of Verbmobil
25
The Control Panel of Verbmobil
26
The Control Panel of Verbmobil
27
The Control Panel of Verbmobil
28
The Control Panel of Verbmobil
29
The Control Panel of Verbmobil
30
The Control Panel of Verbmobil
31
The Control Panel of Verbmobil
32
Verbmobil‘s Massive Data Collection Effort
Transliteration Variant 1 Transliteration Variant 2 Lexical Orthography Canonical Pronounciation Manual Phonological Segmentation 3,200 dialogs (182 hours) with 1,658 speakers 79,562 turns distributed on 56 CDs, 21.5 GB Automatic Phonological Segmentation Word Segmentation Prosodic Segmentation Dialog Acts Noises Superimposed Speech Syntactic Category Word Category Syntactic Function Prosodic Boundaries The so-called Partitur (German word for musical score) orchestrates fifteen strata of annotations
33
Extracting Statistical Properties from Large Corpora
Segmented Speech with Prosodic Labels Treebanks & Predicate- Argument Structures Annotated Dialogs with Dialog Acts Aligned Bilingual Corpora Transcribed Speech Data Machine Learning for the Integration of Statistical Properties into Symbolic Models for Speech Recognition, Parsing, Dialog Processing, Translation Hidden Markov Models Neural Nets, Multilayered Perceptrons Probabilistic Transfer Rules Probabilistic Automata Probabilistic Grammars
34
Multilinguality Japanese English German 100 90 80 Word accuracy [%] 70 60 50 '97 '98 VM1 '99.1 '99.2 '99.3 2000
35
Language Identification (LID)
Multilinguality Language Identification (LID) German Recognizer Independent LID- Module w1 … wn Speech English Recognizer Japanese Recognizer
36
From a Multi-Agent Architecture to a Multi-Blackboard Architecture
Verbmobil I Verbmobil II Multi-Agent Architecture Multi-Blackboard Architecture M3 M1 M2 M3 M1 M2 Blackboards BB 1 BB 2 BB 3 M4 M5 M6 M4 M5 M6 Each module must know, which module produces what data Direct communication between modules Heavy data traffic for moving copies around All modules can register for each blackboard dynamically No direct communication between modules No copies of representation structures (word lattice, VIT chart)
37
Multi-Blackboard/Multi-Engine Architecture
Module 1.1 Module 2.1 Module 3.1 1.2 2.2 3.2 . . . . . . Blackboard 1 Preprocessed Speech Signal Blackboard 3 Syntactic Representation: Parsing Results Blackboard 4 Semantic Representation: Lambda DRS Blackboard 5 Dialog Acts Blackboard 2 Word Lattice Module 4.1 Module 5.1 Module 6.1 4.2 5.2 6.2 . . . . . .
38
A Multi-Blackboard Architecture for the Combination of Results from Deep and Shallow Processing Modules Command Recognizer Channel/Speaker Adaptation Audio Data Spontaneous Speech Recognizer Prosodic Analysis
39
A Multi-Blackboard Architecture for the Combination of Results from Deep and Shallow Processing Modules Command Recognizer Channel/Speaker Adaptation Audio Data Spontaneous Speech Recognizer Prosodic Analysis Statistical Parser Chunk Parser Word Hypotheses Graph with Prosodic Labels Dialog Act Recognition HPSG Parser
40
A Multi-Blackboard Architecture for the Combination of Results from Deep and Shallow Processing Modules Command Recognizer Channel/Speaker Adaptation Audio Data Spontaneous Speech Recognizer Prosodic Analysis Statistical Parser Chunk Parser Word Hypotheses Graph with Prosodic Labels Dialog Act Recognition HPSG Parser Semantic Construction Semantic Transfer VITs Underspecified Discourse Representations Robust Dialog Semantics Generation
41
VIT (Verbmobil Interface Terms) as a Multi-Stratal Representation Language
l used as a common representation scheme for information exchange between all components and processing threads l design inspired by underspecified discourse representation structures (UDRS, Reyle/Kamp 1993) l compact representation of lexical and structured ambiguities and scope underspecifications of quantifiers, negations and adverbs l variable-free sets of non-recursive terms: [beginning (35, i37), arg3 (35, i37 ,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34 ,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], l streams of literals as flat multi-stratal representations that are very efficient for incremental processing
42
VIT for ‘He is coming at the beginning of August‘
Vit (vitID (sid (104,a,en,10,80,1,en,y,semantics), % Segment Identifier [word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the ,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]), % WHG String index (38, 25 ,i35), % Index [beginning (35, i37), arg3 (35, i37 ,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34 ,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions [in_g (26, 25), in_g (37, 38), in_g (27, 25), in_g (28, 30), in_g (31, 33), in_g (34, 32), in_g (35, 29), in_g (36, 25), leq (25, h41), leq (25, h43), leq (29, h42), leq (29, h44), leq (30, h43), leq (32, h45), leq (33, h43)], % Scope and Grouping Constraints [s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sortal Specifications for Instance Variables [dialog_act (25, inform), dir (36, no), prontype (i36, third,std)], % Discourse and Pragmatics [cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg), num (i38, sg), pcase (l135, i38, of)], % Syntax [ta_aspect (i35, progr), ta_mood (i35, ind), ta_perf (i35, nonperf), ta_tense (i35, pres)], % Tense and Aspect [pros_accent (35)] % Prosody
43
Information between Layers is Linked Together Using Constant Symbols
Instances are constants interpreted as skolemized variables [word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the ,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]), % WHG String [beginning (35, i37), arg3 (35, i37 ,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34 ,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts [cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax
44
Information between Layers Linked Together Using Constant Symbols
Instances are constants interpreted as skolemized variables [word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the ,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]), % WHG String [beginning (35, i37), arg3 (35, i37 ,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34 ,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts [cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax
45
Information between Layers Linked Together Using Constant Symbols
Instances are constants interpreted as skolemized variables [word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the ,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]), % WHG String [beginning (35, i37), arg3 (35, i37 ,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34 ,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts [cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax
46
Information between Layers Linked Together Using Constant Symbols
Instances are constants interpreted as skolemized variables [word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the ,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]), % WHG String [beginning (35, i37), arg3 (35, i37 ,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34 ,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts [cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax
47
Information between Layers Linked Together Using Constant Symbols
Instances are constants interpreted as skolemized variables [word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the ,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]), % WHG String [beginning (35, i37), arg3 (35, i37 ,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34 ,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions [s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts [cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax
48
The Use of Underspecified Representations
Two Readings in the Source Language Wir telephonierten mit Freunden aus Schweden. A compact representation of scope ambiguities in a logical language without using disjunctions Underspecified Semantic Representation Ambiguity Preserving Translations Two Readings in the Target Language We called friends from Sweden.
49
Verbmobil is the First Dialog Translation System that Uses Prosodic Information Systematically at All Processing Stages Speech Signal Word Hypotheses Graph Multilingual Prosody Module Prosodic features: l duration l pitch l energy l pause Boundary Information Boundary Information Sentence Mood Accented Words Prosodic Feature Vector Dialog Act Segmentation and Recognition Search Space Restriction Constraints for Transfer Lexical Choice Speaker Adaptation Dialog Understanding Speech Synthesis Parsing Translation Generation
50
Using Syntactic-Prosodic Boundaries to Speed-Up the Parsing Process
yes S1 no problem S4 Mister Mueller S4 when would you like to go to Hannover S4 without boundaries: # chart edges: runtime: 1.31 secs with boundaries: #chart edges: runtime: 0.62 secs speed-up: 53%
51
Using Syntactic-Prosodic Boundaries to Speed-Up the Parsing Process
yes S1 no problem S4 Mister Mueller S4 when would you like to go to Hannover S4 without boundaries: # chart edges: runtime: 1.31 secs with boundaries: #chart edges: runtime: 0.62 secs speed-up: 53%
52
Augmented Word Hypotheses Graph
Integrating Shallow and Deep Analysis Components in a Multi-Engine Approach A* Algorithm guiding through Augmented Word Hypotheses Graph Statistical Parser Chunk Parser HPSG Parser
53
Augmented Word Hypotheses Graph
Integrating Shallow and Deep Analysis Components in a Multi-Engine Approach A* Algorithm guiding through Augmented Word Hypotheses Graph Statistical Parser Chunk Parser HPSG Parser partial VITs Chart with a combination of partial VITs partial VITs partial VITs Robust Dialog Semantics Combination and knowledge- based reconstruction of complete VITs Complete and Spanning VITs
54
Automatic Understanding and Correction of Speech Repairs in Spontaneous Telephone Dialogs
Wir treffen uns in Mannheim, äh, in Saarbrücken. (We are meeting in Mannheim, oops, in Saarbruecken.) German English We are meeting in Saarbruecken.
55
The Understanding of Spontaneous Speech Repairs
I need a car next Tuesday oops Monday Original Utterance Editing Phase Repair Phase Reparandum Hesitation Reparans Recognition of Substitutions Transformation of the Word Hypothesis Graph I need a car next Monday Verbmobil Technology: Understands Speech Repairs and extracts the intended meaning
56
VHG: A Packed Chart Representation of Partial Semantic Representations
l Incremental chart construction and anytime processing l Chart Parser using cascaded finite-state transducers
57
VHG: A Packed Chart Representation of Partial Semantic Representations
l Incremental chart construction and anytime processing l Chart Parser using cascaded finite-state transducers l Statistical LR parser trained on a treebank
58
VHG: A Packed Chart Representation of Partial Semantic Representations
l Incremental chart construction and anytime processing l Chart Parser using cascaded finite-state transducers l Statistical LR parser trained on a treebank l Very fast HPSG parser
59
VHG: A Packed Chart Representation of Partial Semantic Representations
l Incremental chart construction and anytime processing l Chart Parser using cascaded finite-state transducers l Statistical LR parser trained on a treebank l Very fast HPSG parser
60
VHG: A Packed Chart Representation of Partial Semantic Representations
l Incremental chart construction and anytime processing l Rule-based combination and transformation of partial UDRS coded as VITs l Chart Parser using cascaded finite-state transducers l Statistical LR parser trained on a treebank l Very fast HPSG parser
61
VHG: A Packed Chart Representation of Partial Semantic Representations
l Incremental chart construction and anytime processing l Rule-based combination and transformation of partial UDRS coded as VITs l Selection of a spanning analysis using a bigram model for VITs l Chart Parser using cascaded finite-state transducers l Statistical LR parser trained on a treebank l Very fast HPSG parser
62
Semantic Correction of Recognition Errors
Wir treffen uns Kaiserslautern. (We are meeting Kaiserslautern.) German English We are meeting in Kaiserslautern.
63
Robust Dialog Semantics: Deep Processing of Shallow Structures
Goals of robust semantic processing (Pinkal, Worm, Rupp) l Combination of unrelated analysis fragments l Completion of incomplete analysis results l Skipping of irrelevant fragments Method: Transformation rules on VIT Hypothesis Graph: Conditions on VIT structures Operations on VIT structures The rules are based on various knowledge sources: l lattice of semantic types l domain ontology l sortal restrictions l semantic constraints Results: 20% analysis is improved, 0.6% analysis gets worse
64
Robust Dialog Semantics: Combining and Completing Partial Representations
Let us meet (in) the late afternoon to catch the train to Frankfurt Let us meet the late afternoon to catch the train to Frankfurt The preposition ‚in‘ is missing in all paths through the word hypothesis graph. A temporal NP is transformed into a temporal modifier using an underspecified temporal relation: [temporal_np(V1)] [typeraise_to_mod (V1, V2)] & V2 The modifier is applied to a proposition: [type (V1, prop), type (V2, mod)] [apply (V2, V1, V3)] & V3
65
Competing Strategies for Robust Speech Translation
The concurrent processing modules of Verbmobil combine deep semantic translation with shallow surface-oriented translation methods. Expensive, but precise Translation Cheap, but approximate Translation time out? l Case-based Translation l Dialog-act based translation l Statistical translation l Principled and compositional syntactic and semantic analysis l Semantic-based transfer of Verbmobil Interface Terms (VITs) as set of underspecified DRS Selection of best result Results with Confidence Values Results with Confidence Values Acceptable Translation Rate
66
Architecture of the Semantic Transfer Module
Bilingual Dictionary Refined VIT (L1) Refined VIT (L2) Lexical Transfer Monolingual Refinement Rules Monolingual Refinement Rules Refinement Refinement Phrasal Transfer VIT (L1) VIT (L2) Disambiguation Rules Disambiguation Rules Phrasal Dictionary Underspecified VIT (L1) Underspecified VIT (L2)
67
Lexical Disambiguation On-Demand
Preserving lexical ambiguities How did you find his office? (get to or like) Wie fanden Sie sein Büro? Disambiguation is not necessary for the translation between German and English. dou kare no jimusho o mitsukeraremashita ka How he POSS office OBJ get to can PAST QUESTION kare no jimusho wa dou omoimasu ka He POSS office TOPIC how think QUESTION Disambiguation is necessary for the translation between German and Japanese.
68
Subsumption Relations temporal_specification
Three English Translations of the German Word “Termin” Found in the Verbmobil Corpus Subsumption Relations in the Domain Model 1. Verschieben wir den Termin. Let’s reschedule the appointment 2. Schlagen Sie einen Termin vor. Suggest a date. 3. Da habe ich einen Termin frei. I have got a free slot there. scheduled event default temporal_specification appointment set_start_time time_interval date slot
69
Entries in the Transfer Lexicon: German English (Simplified)
tau_lex (termin, appointment, pred_sort (subsume (scheduled_event))). tau_lex (termin, date, pred_sort (subsume (set_start_time)). tau_lex (termin, slot, pred_sort (subsume (time_interval))). tau_lex (verschieben, reschedule, [tau (#S), tau (#0)], pred_args ([#S, #0 & pred_sort (scheduled_event)])) tau_lex (ausmachen, fix, [tau (#S), tau (#0)], pred_args ([#S, #0 & pred_sort (set_start_time)])) tau_lex (freihaben, have_free, [tau (#S), tau (#0)], pred_args ([#S, #0 & pred_sort (time_interval)]))
70
Using Context and World Knowledge for Semantic Transfer
Example: Platz room / table / seat Nehmen wir dieses Hotel, ja Let us take this hotel. Ich reserviere einen Platz I will reserve a room. 1 Machen wir das Abendessen dort. Let us have dinner there. Ich reserviere einen Platz I will reserve a table. 2 Gehen wir ins Theater Let us go to the theater. Ich möchte Plätze reservieren. I would like to reserve seats. 3 All other dialog translation systems translate word-by-word or sentence-by-sentence.
71
If you prefer another hotel,
Integrating Deep and Shallow Processing: Combining Results from Concurrent Translation Threads Segment 1 If you prefer another hotel, Segment 2 please let me know.
72
If you prefer another hotel,
Integrating Deep and Shallow Processing: Combining Results from Concurrent Translation Threads Segment 1 If you prefer another hotel, Segment 2 please let me know. Statistical Translation Case-Based Translation Dialog-Act Based Translation Semantic Transfer Alternative Translations with Confidence Values
73
Integrating Deep and Shallow Processing: Combining Results from Concurrent Translation Threads
Segment 1 If you prefer another hotel, Segment 2 please let me know. Statistical Translation Case-Based Translation Dialog-Act Based Translation Semantic Transfer Alternative Translations with Confidence Values Selection Module Segment 1 Translated by Semantic Transfer Segment 2 Translated by Case-Based Translation
74
A Machine Learning Approach to the Selection of the Best Translation Result
SEQ := Set of all translation sequences for a turn SeqSEQ := Sequence of translation segments s1, s2, ...sn Each translation thread provides for every segment an online confidence value confidence (thread.segment) Input:
75
A Context-Free Approach to the Selection of the Best Translation Result
SEQ := Set of all translation sequences for a turn SeqSEQ := Sequence of translation segments s1, s2, ...sn Each translation thread provides for every segment an online confidence value confidence (thread.segment) Input: Task: Compute normalized confidence values for translated Seq CONF (Seq) = Length(segment) * (alpha(thread) + beta(thread) * confidence(thread.segment)) segment Seq
76
Best (SEQ) = {Seq SEQ | Seq is maximal element in (SEQ CONF)
A Context-Free Approach to the Selection of the Best Translation Result SEQ := Set of all translation sequences for a turn SeqSEQ := Sequence of translation segments s1, s2, ...sn Each translation thread provides for every segment an online confidence value confidence (thread.segment) Input: Task: Compute normalized confidence values for translated Seq CONF (Seq) = Length(segment) * (alpha(thread) + beta(thread) * confidence(thread.segment)) segment Seq Output: Best (SEQ) = {Seq SEQ | Seq is maximal element in (SEQ CONF)
77
Learning the Normalizing Factors Alpha and Beta from an Annotated Corpus
Turn := segment1, segment2...segmentn For each turn in a training corpus all segments translated by one of the four translation threads are manually annotated with a score for translation quality.
78
Learning the Normalizing Factors Alpha and Beta from an Annotated Corpus
Turn := segment1, segment2...segmentn For each turn in a training corpus all segments translated by one of the four translation threads are manually annotated with a score for translation quality. For the sequence of n segments resulting in the best overall translation score at most 4n linear inequations are generated, so that the selected sequence is better than all alternative translation sequences.
79
Learning the Normalizing Factors Alpha and Beta from an Annotated Corpus
Turn := segment1, segment2...segmentn For each turn in a training corpus all segments translated by one of the four translation threads are manually annotated with a score for translation quality. For the sequence of n segments resulting in the best overall translation score at most 4n linear inequations are generated, so that the selected sequence is better than all alternative translation sequences. From the set of inequations for spanning analyses ( 4n) the values of alpha and beta can be determined offline by solving the constraint system.
80
Integrating a Deep HPSG-based Analysis with Probabilistic Dialog Act Recognition for Semantic Transfer HPSG Analysis Probabilistic Analysis of Dialog Acts (HMM) Robust Dialog Semantics Dialog Act Type VIT Dialog Act Type Recognition of Dialog Plans (Plan Operators) Semantic Transfer
81
Integrating a Deep HPSG-based Analysis with Probabilistic Dialog Act Recognition for Semantic Transfer HPSG Analysis Probabilistic Analysis of Dialog Acts (HMM) Robust Dialog Semantics Dialog Act Type VIT Dialog Act Type Recognition of Dialog Plans (Plan Operators) Semantic Transfer Dialog Phase
82
The Dialog Act Hierarchy used for Planning, Prediction, Translation and Generation
GREETING_BEGIN GREETING_END GREETING INTRODUCE POLITENESS_FORMULA THANK DELIBERATE BACKCHANNEL CONTROL_DIALOG INIT DEFER CLOSE MANAGE_TASK Dialog Act REQUEST_SUGGEST REQUEST_CLARIFY REQUEST_COMMENT REQUEST_COMMIT REQUEST SUGGEST INFORM FEEDBACK COMMIT DEVIATE_SCENARIO REFER_TO_SETTING DIGRESS EXCLUDE CLARIFY GIVE_REASON PROMOTE_TASK CLARIFY_ANSWER FEEDBACK_NEGATIVE REJECT EXPLAINED_REJECT ACCEPT CONFIRM FEEDBACK_POSITIVE
83
Learning of Probabilistic Plan Operators from Annotated Corpora
goal [IN-TURN confirm-s ?S-3314 ?S-3316] subgoals (sequence [IN-TURN confirm-s ?S-3314 ?S-3315] [ IN-TURN confirm-s ?S ?S-3316]) PROB 0.72) ( OPERATOR-s goal [IN-TURN confirm-s ?S-3321 ?S-3322] subgoals (sequence [DOMAIN-DEPENDENT accept ?S-3321 ?S-3322]) PROB 0.95)
84
Generation of Summaries
Automatic Generation of Multilingual Summaries of Telephone Conversations Dialog Translation by Verbmobil Multilingual Generation of Summaries HTML- Document in German Transferred by Internet or Fax HTML- Document in English Transferred by Internet or Fax German Dialog Partner American Dialog Partner
85
Dialog Summary Dialog Summary: Scheduling:
Participants: Mr. Jones, Mr. Mueller Date: Time: 8:57 AM to 10:03 AM Theme: Appointment schedule with trip and accommodation Dialog Summary: Scheduling: Mr. Jones and Mr. Mueller will meet at the train station on the 1st of March 2001 at 10:00 am. Travelling: The trip from Hamburg to Hanover by train will start on the 1st of March at 10:15 am. Summary automatically generated at :31:24 h
86
Microplanning: Create Syntactic Building Blocks
Method: Mapping of dependency structures Example: Time Expressions MONDAY1 ARG ELEVENTH_DAY DEF (L,I,G,H) DOWF (L1,I,mo) ORD (L2,I,11) MOFY (L3,I,may) SPEC ARG OF_P THE ARG MAY Norbert Reithinger Semantic dependency: VIT Syntactical dependency: TAG
87
HPSG Analysis Grammar Lexicalized Tree Adjoining Grammar 2,350 Trees
Speeding Up the Language Generation Process by the Compilation of the HPSG Grammar to an LTAG Generation Grammar HPSG Analysis Grammar Compilation extended domain of locality no recursive feature structures fast generation (0.5 secs average runtime) Lexicalized Tree Adjoining Grammar 2,350 Trees
88
Corpus-based Speech Synthesis
Sentence to synthesize I have time Monday. on S E have time i monday on i have time on monday S I have time on monday E Tokens i have on monday i on Edge direction
89
Verbmobil: Long-Term, Large-Scale Funding and Its Impact
l Funding by the German Ministry for Education and Research BMBF (Dr. Reuse) Phase I ( ) $ 33 M Phase II ( ) $ 28 M l 60% Industrial funding according to a shared cost model $ 17 M l Additional R&D investments of industrial partners $ 11 M Total $ 89 M
90
Verbmobil: Long-Term, Large-Scale Funding and Its Impact
l > 800 Publications (>600 refereed) l > Many Patents l > 20 Commercial Spin-off Products l > 8 Spin-off Companies l > 900 trained Researchers for German Language Industry Philips, DaimlerChrysler and Siemens are leaders in Spoken Dialog Applications
91
Distribution of Sentence Length in Large-Scale Evaluation
350 300 Web-based Evaluation of 25,345 Translations by 65 Evaluators 250 200 150 100 50 1 5 10 15 20 25 30 35 40 45 50 55 60
92
Evaluation Results The translation of a turn is approximately correct if it preserves the intention of the speaker and the main propositional content of her utterance. Word Accuracy 75% 3267 Turns Word Accuracy 80% 2723 Turns Translation Thread Case-based Translation Statistical Translation Dialog-Act based Translation Semantic Transfer Substring-based Translation Automatic Selection Manual Selection 44% 79% 45% 47% 75% 66% / 83% * 95% 46% 81% 49% 79% 68% / 85% * 97% * After Training with Instance-based Learning Algorithm
93
Results of End-to-End Evaluation Based on Dialog Task Completion for 31 Trials
Successful Completions/ Attempts 25/28 21/27 30/30 22/25 22/26 28/31 7/9 227/255 Successful Tasks 89,3 77,8 100 88 84,6 90,3 86,8 Frequency- Based Weighting Factor 0,90 0,87 0,97 0,81 0,84 1 0,29 89,6 Topic Meeting time Meeting place Means of transportation Departure place Arrival time Who reserves the hotel How to get to departure place Total Number of Tasks Average Percentage of Successful Task Completions . .
94
Checklist for Final Verbmobil System
Vocabulary Size: for German , Equivalent English Lexicon, 2500 for Japanese Operational Success Criteria: Word recognition rate (16 kHz): German: spontaneous: 75% (cooperative: 85%) English: spontaneous: 72% (cooperative: 82%) Japanese: spontaneous: 75% (cooperative: 85%) (8kHz) spontaneous: 70% (cooperative: 80%) 80% of the translations are approximately correct and the dialog task success rate should be around 90%. The average end-to-end processing time should be four times real time (length of the input signal)
95
Results of the Verbmobil Project have been used in 20 Spin-Off Products by the Industrial Partners DaimlerChrysler, Philips and Siemens Spoken Dialog Systems 4 Translation Systems 3 Verbmobil Command & Control Systems 5 Text Classification Systems 3 Dictation Systems 3 Dialog Engines 2
96
Linguatronic : Spoken Dialogs with a Mercedes-Benz
Please call Doris Wahlster. Open the left window in the back. I want to hear the weather channel. When will I reach the next gas station? Where is the next parking lot? Mike l Speech control of: cell phone, radio, windows / AC, navigation system l Option for S-, C-, and E-Class of Mercedes and BMW l Speaker-independent, Garbage models for non-speech (blinker, AC, wheels)
97
Spoken Dialogs about Schedules
Fielded applications l Train schedules (German Railway System, DB) l TABA (Philips) l OSCAR (DaimlerChrysler) l Flight Schedules (Lufthansa) l ALF (Philips) Technical Challenges: phone-based dialogs, many proper names, clarification subdialogs
98
Successful Technology Transfer: 8 High-Tec Spin-Off Companies in the Area of Language Technology have been founded by Verbmobil Researchers RETIVOX GbR Speech Synthesis Systems Bonn XtraMind Technologies Language Technology for Customer Interaction Services Saarbrücken Verbmobil SONICSON GmbH Natural Language Access to Online Music Kaiserslautern GSDC GmbH Multilingual Documentation Nürnberg CLT Sprachtechnologie GmbH LT for Text Processing Saarbrücken SCHEMA GmbH Document Engineering Nürnberg AIXPLAIN AG Human Language Technology Aachen SYMPALOG GmbH Spoken Dialog Systems Nürnberg
99
Verbmobil was the Key Resource for the Education and Training of Researchers and Engineers Needed to Build Up Language Industry in Germany Master Students 238 Total 919 Verbmobil Student Research Assistants 483 Habilitations 16 Internships 18 PhD Students 164
100
From Spoken Dialog to Multimodal Dialog
Today‘s Cell Phone Third Generation UMTS Phone Verbmobil SmartKom Speech only Speech, Graphics and Gesture
101
Merging Various User Interface Paradigms
Natural Language Dialog Graphical User Interfaces Gestural Interaction Multimodal Interaction see Phil Cohen‘s invited talk on Friday
102
SmartKom: Intuitive Multimodal Interaction
Project Budget: $ 34 M Project Duration: 4 years Main Contractor Project Management Testbed Software Integration DFKI Saarbrücken The SmartKom Consortium: MediaInterface European Media Lab IMS Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart Ludwig-Maximilians- Universität München
103
SmartKom-Mobile: A Handheld Communication Assistant
GSM for Telephone, Fax, Internet Connectivity GPS Camera Wearable Compute Server Stylus-Activated Sketch Pad Microphone Biosensor for Authentication & Emotional Feedback Loudspeaker Docking Station for Car PC
104
SmartKom: Multimodal Dialogs with a Life-like Character
105
Verbmobil is a Very Large Dialog System
l 69 modules communicate via 224 blackboards l HPSG for German uses a hierarchy of 2,400 types l 15,385 entries in the semantic database l 22,783 transfer rules and 13,640 microplanning rules l 30,000 templates for case-based translation l 691,583 alignment templates l 334 finite state-transducers
106
Lessons Learned from Verbmobil
Deep Processing can be used for merging, completing and repairing the results of shallow processing strategies. Shallow methods can be used to guide the search in deep processing. Statistical methods must be augmented by symbolic models to achieve higher accuracy and broader coverage. Statistical methods can be used to learn operators or selection strategies for symbolic processes.
107
Conclusions and Take-Home Messages
Real-world problems in language technology like the l understanding of spoken dialogs, l speech-to-speech translation l and multimodal dialog systems can only be cracked by the combined muscle of deep and shallow processing approaches.
108
Conclusions and Take-Home Messages
In a multi-blackboard and multi-engine architecture based on packed representations on all processing levels l speech recognition l parsing l semantic processing l translation l generation using charts with underspecified representations the results of concurrent processing threads can be combined in an incremental fashion.
109
Conclusions and Take-Home Messages
All results of concurrent and competing processing modules should come with a confidence value, so that statistically trained selection modules can choose the most promising result at each stage, if demanded by a following processing step.
110
Conclusions and Take-Home Messages
Packed representations together with formalisms for underspecification capture the uncertainties in a each processing phase, so that the uncertainties can be reduced by linguistic, discourse and domain constraints as soon as they become applicable.
111
Conclusions and Take-Home Messages
Underspecification allows disambiguation requirements to be delayed until later processing stages where better-informed decisions can be made. The massive use of underspecification makes the syntax-semantic interface and transfer rules almost deterministic, thereby boosting processing speed.
112
Open Problems: l Integrating top-down knowledge into low-level
speech recognition processes l Exploiting more knowledge about human interpretation strategies l More robust translation of turns with very low word accuracy rates l Expensive data collection and cognitively unrealistic training data
113
Further Reading l You can find a 10-page paper in the
IJCAI-01 Proceedings, Vol. 2 see pages l An extended version will appear in the Winter issue of the AI Magazine l or check the URL: verbmobil.dfki.de
114
The Verbmobil Book Wahlster, W. (2000) (ed.):
Verbmobil: Foundations of Speech-to-Speech Translation. Berlin, New York, Tokyo: Springer. 679 pp. 224 figs., 88 tabs. Hardcover ISBN
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.