Presentation is loading. Please wait.

Presentation is loading. Please wait.

AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas.

Similar presentations


Presentation on theme: "AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas."— Presentation transcript:

1 AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas Sierra [eboldrini/patricio/borja/marcel/]@dlsi.ua.es chelo.vargas@ua.es CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

2 Outline Introduction Corpus Principles Previous work Problematic cases Evaluation Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

3 Introduction interaction AQA: multilingual annotation scheme for anaphora resolution that can be applied in machine learning for the improvement of QA systems To understand and annotate the way anaphora is used in each language To be able to detect the antecedent of each the anaphora and find the correct answer INTERACTION between the user and the system Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

4 Introduction languages Languages: Italian, Spanish, English Advantages: participate successfully in competitions in which the question is formulated in a language and the system shows the answer in another language Disadvantages: languages with different characteristics Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

5 Introduction languages Languages: Italian, Spanish, English Advantages: can participate successfully in competitions in which the question is formulated in a language and the system shows you the answer in another language Disadvantages: languages with different characteristics Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages ¿Qué medio de transporte se utilizó en la Expedición Kon-tiki? ¿Cuántas personas la tripulaban? Quale mezzo di trasporto venne usato nella spedizione Kon-Tiki? Quanti membri d'equipaggio aveva 0 ? What transport was used in the Kon-Tiki Expedition? How many people crewed it ?

6 Corpus Corpus for CLEF 2008 in English, Italian and Spanish 200 questions per language Topic-related questions Categories of questions: factoid, definition, and list Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

7 Principles annotated elements Each group has a topic CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

8 Principles annotated elements Each group has a topic CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion Between what days was the battle of Brunete ? Where was the article of Gerda Taro about this battle published? Which hospital were she moved to after her accident?

9 Principles annotated elements If there is a subtopic, we mark it CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion Between what days was the battle of Brunete ? Where was the article of Gerda Taro about this battle published? Which hospital were she moved to after her accident?

10 Principles annotated elments Each question (question/answer pair) has a number Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

11 Principles annotated elments Each question (question/answer pair) has a number Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Between what days was the battle of Brunete ? Where was the article of Gerda Taro about this battle published? Which hospital were she moved to after her accident?

12 Principles annotated elments Each anaphora has a number, the same of its antecedent Between what days was the battle of Brunete ? Where was the article of Gerda Taro about this battle published? Which hospital were she moved to after her accident? CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

13 Principles annotated elments We indicate if the antecedent is in the question or in the answer CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

14 Between what days was the battle of Brunete ? Where was the article of Gerda Taro about this battle published? Which hospital were she moved to after her accident? Principles annotated elments We indicate if the antecedent is in the question or in the answer CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

15 Principles annotated elments We indicate if the antecedent is in the question or in the answer CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion Which city is the headquarters of the China's Eastern Fleet? How far from China's capital city is it ? What was its population in 2002?

16 Principles annotated elments We indicate the number of the question or the answer where the antecedent is situated CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

17 Between what days was the battle of Brunete ? Where was the article of Gerda Taro about this battle published? Which hospital were she moved to after her accident? Principles annotated elments We indicate the number of the question or the answer where the antecedent is situated CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

18 Principles annotated elments We select the type of anaphora Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

19 Between what days was the battle of Brunete ? Where was the article of Gerda Taro about this battle published? Which hospital were she moved to after her accident? Principles annotated elments We select the type of anaphora Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

20 Principles annotated elments We select the type of anaphora Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages In which country is the Colditz Castle ? Exactly in which state is it ? Who was the first who escaped from there ?

21 Principles annotated elments We select the type of anaphora Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Who published the Evangelium Vitae encyclical ? How many 0 did he publish?

22 Principles annotated elments We select the type of relation Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

23 Between what days was the battle of Brunete ? Where was the article of Gerda Taro about this battle published? Which hospital were she moved to after her accident? Principles annotated elments We select the type of relation Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

24 Principles annotated elments We select the type of relation Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Which islands are in the Pelagie Islands ? Which is the biggest one ?

25 Principles annotated elments We underline if the annotator has doubts or not Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

26 Between what days was the battle of Brunete ? Where was the article of Gerda Taro about this battle published? Which hospital were she moved to after her accident? Principles annotated elments We underline if the annotator has doubts or not Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

27 Previuos work UCREL (Fligelstone, 1992; Garside et al., 1997): first scheme for anaphora resolution MUC: inclusion of the coreference task in MUC-6 and MUC-7 Last decade of 20th century: anaphora resolution project for French (Popescu, Belis and Robba, 1997). Martínez-Barco and Palomar (2001): An annotation scheme for dialogues applied to anaphora resolution algorithm. MATE/GNOME (Poesio, 2004): meta-model Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

28 Previuos work what we added MATE/GNOME (Poesio, 2004): meta-model Element link in the text with the information about the anaphora Identification of the question/answer pair Topic/subtopic Antecedent in the question or in the answer Status of the annotation Applied to three languages Applied to collections of questions Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

29 Problematic cases World knowledge An antecedent contains another one Collective nouns Two antecedents, but separated Doubtful position of the antecedent An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

30 Problematic cases World knowledge Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion An antecedent contains another one Collective nouns Two antecedents, but separated Doubtful position of the antecedent An anaphora inside a discourse entity An antecedent contains another one Collective nouns Two antecedents, but separated Doubtful position of the antecedent An anaphora inside a discourse entity Which was the "gordo" in the 1995 Christmas ? Which was the prize ? CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

31 Problematic cases World knowledge Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion Collective nouns Two antecedents, but separated Doubtful position of the antecedent An anaphora inside a discourse entity Collective nouns Two antecedents, but separated Doubtful position of the antecedent An anaphora inside a discourse entity An antecedent contains another one Who were the founders of Magnum Photos ? In what year did they found it ? CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

32 Problematic cases World knowledge An antecedent contains another one World knowledge An antecedent contains another one Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion Collective nouns Two antecedents, but separated Doubtful position of the antecedent An anaphora inside a discourse entity Two antecedents, but separated Doubtful position of the antecedent An anaphora inside a discourse entity What is the starring cast of the film Beetlejuice? Who of them is the main character? CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

33 Problematic cases World knowledge An antecedent contains another one Collective nouns World knowledge An antecedent contains another one Collective nouns Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion Two antecedents, but separated Doubtful position of the antecedent An anaphora inside a discourse entity Doubtful position of the antecedent An anaphora inside a discourse entity CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Between what days was the battle of Brunete ? Where was the article of Gerda Taro about this battle published? Which hospital were she moved to after her accident?

34 What transport was used in the Kon-Tiki Expedition? How many people crewed it ? Problematic cases World knowledge An antecedent contains another one Collective nouns Two antecedents, but separated World knowledge An antecedent contains another one Collective nouns Two antecedents, but separated Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion Doubtful position of the antecedent An anaphora inside a discourse entity ? ? ? ? CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

35 Problematic cases World knowledge An antecedent contains another one Collective nouns Two antecedents, but separated Doubtful position of the antecedent World knowledge An antecedent contains another one Collective nouns Two antecedents, but separated Doubtful position of the antecedent Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion An anaphora inside a discourse entity What is a censer ? What name is given to the one of the Cathedral of Santiago de Compostela ? How much does it weight? CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

36 Evaluation Annotation 2 annotators Blind annotation Evaluation Each language independently Global results Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

37 Evaluation subdivision Topic boundary Anaphora detection Anaphora attibutes Antecedent recognition Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

38 Evaluation topic boundary Class N: new topic Class S: same topic Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages SPANISHITALIANENGLISH A1\A2SN SN SN S620S 0S610 N0138N0 N1 Kappa1 1 0,988

39 Evaluation anaphora detection Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages SPANISHITALIANENGLISH Anaphors detected by A1706967 Anaphors detected by A2706968 Anaphors detection agreement706967 Different anaphora boundary110

40 Evaluation anaphora attributes (antecedent) Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages SPANISHITALIANENGLISH A1\A2QA QA QA Q640Q620Q610 A06A07A06 Kappa1 1 1

41 Evaluation anaphora attributes (type) Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages SPANISHITALIANENGLISH A1A2A1A2A1A2 Elips33 32 33 Pron131513 42 Adv112211 Sup100000 DD222122 21 Kappa0,95511

42 Evaluation anaphora attributes (relation) Dir: direct relation Indir: bridging relation Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages SPANISHITALIANENGLISH A1\A2DIRINDIRA1\A2DIRINDIRA1\A2DIRINDIR DIR520Q510Q520 INDIR414INDIR117INDIR213 Kappa0,838Kappa0,961Kappa0,909

43 Evaluation antecedent recognition Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages SPANISHITALIANENGLISH Total antecedents into the answer (agreement)676 Total antecedents into the question (agreement)646261 Anaphors pointing to the same question (refq) (agreement)646261 Antecedents with different boundary (disagreement)231

44 Evaluation global results Total agreement results Spanish: 60/70 = 0,857 Italian: 60/69 = 0,869 English: 59/67 = 0,880 Average: 0,868 Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

45 Conclusion Multilingual annotation scheme for anaphora resoultion For the improvement of QA system: the system can detect the antecedent of each anaphora and extract the correct answer For a true interaction between the system and the user Simple but complete Positive results of the evaluation Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

46 Future work Integration of other languages Application of the annotation scheme to other corpora Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

47

48 Evaluation measure used Kappa Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages


Download ppt "AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas."

Similar presentations


Ads by Google