Download presentation
Presentation is loading. Please wait.
Published byJeffry Harper Modified over 9 years ago
1
Carnegie Mellon IRST-itc Balancing Expressiveness and Simplicity in an Interlingua for Task Based Dialogue Lori Levin, Donna Gates, Dorcas Wallace, Kay Peterson, Alon Lavie, Fabio Pianesi, Emanuele Pianta, Roldano Cattoni, Nadia Mana
2
Carnegie Mellon IRST-itc Outline Overview of the Interchange Format (IF) Proposals for Evaluating Interlinguas –Measuring coverage –Measuring reliability –Measuring scalability
3
Carnegie Mellon IRST-itc Multilingual Translation with an Interlingua Japanese Arabic Chinese (input sentence) San1 tian1 qian2, wo3 kai1 shi3 jue2 de2 tong4 English French German Italian Korean Arabic Chinese (paraphrase) wo3 yi3 jin1 tong4 le4 san1 tian1 English (output sentence) The pain started three days ago. French German Italian Japanese Korean Analyzers Generators Spanish Catalan Interlingua give-information+onset+body-state (body-state-spec=pain, time=(interval=3d, relative=before))
4
Carnegie Mellon IRST-itc Advantages of Interlingua Add a new language easily –get all-ways translation to all previous languages by adding one grammar for analysis and one grammar for generation Mono-lingual development teams. Paraphrase –Generate a new source language sentence from the interlingua so that the user can confirm the meaning
5
Carnegie Mellon IRST-itc Disadvantages of Interlingua “Meaning” is arbitrarily deep. –What level of detail do you stop at? If it is too simple, meaning will be lost in translation. If it is too complex, analysis and generation will be too difficult. Should be applicable to all languages. Human development time.
6
Carnegie Mellon IRST-itc Speech Acts: Speaker intention vs literal meaning Can you pass the salt? Literal meaning: The speaker asks for information about the hearer’s ability. Speaker intention: The speaker requests the hearer to perform an action.
7
Carnegie Mellon IRST-itc Domain Actions: Extended, Domain-Specific Speech Acts give-information+existence+body-state It hurts. give-information+onset+body-object The rash started three days ago. Request-information+personal-data What is your name?
8
Carnegie Mellon IRST-itc Domain Actions: Extended, Domain-Specific Speech Acts In domain. –I sprained my ankle yesterday. –When did the headache start? Out of Domain –Yesterday I slipped in the driveway on my way to the garage. –The headache started after my boss noticed that I deleted the file.
9
Carnegie Mellon IRST-itc Formulaic Utterances Good night. tisbaH cala xEr waking up on good Romanization of Arabic from CallHome Egypt
10
Carnegie Mellon IRST-itc Same intention, different syntax rigly bitiwgacny my leg hurts candy wagac fE rigly I have pain in my leg rigly bitiClimny my leg hurts fE wagac fE rigly there is pain in my leg rigly bitinqaH calya my leg bothers on me Romanization of Arabic from CallHome Egypt.
11
Carnegie Mellon IRST-itc Outline Overview of the Interchange Format (IF) Proposals for Evaluating Interlinguas –Measuring coverage –Measuring reliability –Measuring scalability
12
Carnegie Mellon IRST-itc Comparison of two interlinguas I would like to make a reservation for the fourth through the seventh of July. IF-1 (C-STAR II, 1997-1999) c:request-action+reservation+temporal+hotel (time=(start-time=md4, end-time=(md7,july))) IF-2 (NESPOLE, 2000-2002) c:give-information+disposition+reservation +accommodation (disposition=(who=I, desire), reservation-spec=(reservation, identifiability=no), accommodation-spec=hotel, object-time=(start-time=(md=4), end-time=(md=7, month=7, incl-excl=inclusive)))
13
Carnegie Mellon IRST-itc The Interchange Format Database 61.2.3 olang I lang I Prv IRST “telefono per prenotare delle stanze per quattro colleghi” 61.2.3 olang I lang E Prv IRST “I’m calling to book some rooms for four colleagues” 61.2.3 IF Prv IRST c:request-action+reservation+features+room (for-whom= (associate, quantity=4)) 61.2.3 comments: dial-oo5-spkB-roca0-02-3
14
Carnegie Mellon IRST-itc Comparison of four databases (travel domain, role playing, spontaneous speech) DB-1: C-STAR II English database tagged with IF-1 –2278 sentences DB-2: C-STAR II English database tagged with IF-2 – 2564 sentences DB-3: NESPOLE English database tagged with IF-2 – 1446 sentences –Only about 50% of the vocabulary overlaps with the C-STAR database. DB-4: Combined database tagged with IF-2 –4010 sentences Same data, different interlingua Significantly larger domain
15
Carnegie Mellon IRST-itc Outline Overview of the Interchange Format (IF) Proposals for Evaluating Interlinguas Measuring coverage –Measuring reliability –Measuring scalability
16
Carnegie Mellon IRST-itc Measuring Coverage No-tag rate: –Can a human expert assign an interlingua representation to each sentence? –C-STAR II no-tag rate: 7.3% –NESPOLE no-tag rate: 2.4% 300 more sentences were covered in the C-STAR English database End-to-end translation performance: Measures recognizer, analyzer, and generator performance in combination with interlingua coverage.
17
Carnegie Mellon IRST-itc Outline Overview of the Interchange Format (IF) Proposals for Evaluating Interlinguas Measuring coverage Measuring reliability –Measuring scalability
18
Carnegie Mellon IRST-itc Example of failure of reliability Input: 3:00, right? Interlingua: verify (time=3:00) Poor choice of speech act name: does it mean that the speaker is confirming the time or requesting verification from the user? Output: 3:00 is right.
19
Carnegie Mellon IRST-itc Measuring Reliability: Cross-site evaluations Compare performance of: –Analyzer interlingua generator –Where the analyzer and generator are built at the same site (or by the same person) –Where the analyzer and generator are built at different sites (or by different people who may not know each other) C-STAR II interlingua: comparable end-to-end performance within sites and across sites. –around 60% acceptable translations from speech recognizer output. NESPOLE interlingua: cross-site end-to-end performance is lower.
20
Carnegie Mellon IRST-itc Intercoder agreement: average of percent agreeent pairwise Speech actDomain ActionArguments IF-1: Site 1 and Site2 (exp.) 82%66%86% IF-2: Site 1 and Site 2 (4 experts) 92%75%87 % IF-2: Within Site 1 (3 experts) 94%88%90% IF-2: Site 1 vs Site 2 (3 experts and 1 experts) 89%62%83% IF-2: Site 1 and Site 2 (experts and novices) 88%63%86% IF-2: Within Site 2 (expert and novices) 89 %64 %87% IF-2: Within Site 2 (novices) 91%61%83%
21
Carnegie Mellon IRST-itc Workshop on InterlinguaReliability SIG-IL Association for Machine Translation in the Americas October 8, 2002 Tiburon, California Submissions by July 21: –500-1500 word abstract (email to lsl@cs.cmu.edu) –Intent to participate in coding experiment
22
Carnegie Mellon IRST-itc Outline Overview of the Interchange Format (IF) Proposals for Evaluating Interlinguas Measuring coverage Measuring reliability Measuring scalability
23
Carnegie Mellon IRST-itc Comparison of four databases (travel domain, role playing, spontaneous speech) DB-1: C-STAR II English database tagged with IF-1 –2278 sentences DB-2: C-STAR II English database tagged with IF-2 – 2564 sentences DB-3: NESPOLE English database tagged with IF-2 – 1446 sentences –Only about 50% of the vocabulary overlaps with the C-STAR database. DB-4: Combined database tagged with IF-2 –4010 sentences Same data, different interlingua Significantly larger domain
24
Carnegie Mellon IRST-itc Measuring Scalability: Coverage Rate What percent of the database is covered by the top n most frequent domain actions? Coverage of 50 most frequent domain actions C-STAR client66.7% NESPOLE client66.5% Combined client62.9% C-STAR agent67.3% NESPOLE agent71.4% Combined agent64.0%
25
Carnegie Mellon IRST-itc Measuring Scalability: Number of domain actions as a function of database size Sample size from 100 to 3000 sentences in increments of 25. Average number of unique domain actions over ten random samples for each sample size. Each sample includes a random selection of frequent and infrequent domain actions.
26
Carnegie Mellon IRST-itc
27
Carnegie Mellon IRST-itc Conclusions An interlingua based on domain actions is suitable for task-oriented dialogue: –Reliable –Good coverage –Scalable without explosion of domain actions It is possible to evaluate an interlingua for –Realiability –Expressivity –Scalability
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.