Download presentation
Presentation is loading. Please wait.
1
3rd Answer Validation Exercise ( AVE 2008) QA subtrack at Cross-Language Evaluation Forum 2008 UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks to… Main task QA organizing committee
2
nlp.uned.es/clef-qa/ave Answer Validation Exercise 2008 Validate the correctness of real systems answers Question Answering Question Candidate answer Supporting Text Answer is not correct or not enough evidence Question Answer is correct Answer Validation
3
nlp.uned.es/clef-qa/ave Collections What is Zanussi? was an Italian producer of home appliances Zanussi For the Polish film director, see Krzysztof Zanussi. For the hot-air balloon, see Zanussi (balloon). Zanussi was an Italian producer of home appliances that in 1984 was bought who had also been in Cassibile since August 31 Only after the signing had taken place was Giuseppe Castellano informed of the additional clauses that had been presented by general Ronald Campbell to another Italian general, Zanussi, who had also been in Cassibile since August 31. 3 (1985) 3 Out of 5 Live (1985) What Is This? Candidate answers grouped by question - Accept or Reject all answers - Select one of the accepted answers
4
nlp.uned.es/clef-qa/ave Collections Remove duplicated answers inside the same question group Discard NIL answers, void answers and answers with too long supporting snippet This processing lead to a reduction in the number of answers to be validated
5
nlp.uned.es/clef-qa/ave AVE Collections 2008 (# answers to validate) Available for CLEF participants at nlp.uned.es/clef-qa/ave/ Testing (% Correct)Development % Correct Spanish152810%55123% English10557.5%19510.8% German102710.8%26425.4% Portuguese101420.5%14842.8% Dutch22819.3%7815.8% French19926.1%17149.7% Romanian11910.5%8243.7% Basque1047.2%-- Bulgarian2744.4%-- Italian--10016%
6
nlp.uned.es/clef-qa/ave Evaluation Not balanced collections (real world) Approach: Detect if there is enough evidence to accept an answer Measures: Precision, recall and F over ACCEPTED answers Baseline system: Accept all answers
7
nlp.uned.es/clef-qa/ave Participants and runs DEENESFRROTot Fernuniversität in Hagen22 LIMSI22 U. Iasi224 DFKI112 INAOE22 U. Alicante123 UNC22 U. Jaén (UJA)2226 LINA11 Total3865224
8
nlp.uned.es/clef-qa/ave Evaluation: P, R, F Group SystemFPrecisionRecall DFKILtqa0.640.540.78 UAOfe0.490.350.86 UNCJota_20.210.130.56 IASIUaic_20.190.110.85 UNCJota_10.170.090.94 IASIUaic_10.170.090.76 100% VALIDATED0.140.081 UJAMagc_2(bbr)0.020.170.01 UJAMagc_1(timbl)000 Precision, Recall and F measure over correct answers for English
9
nlp.uned.es/clef-qa/ave Additional measures Compare AVE systems with QA systems performance Count the answers SELECTED correctly Reward the detection of groups in which all answers are incorrect Allows a new justified attempt to answer the question new
10
nlp.uned.es/clef-qa/ave Additional measures new
11
nlp.uned.es/clef-qa/ave Evaluation: estimated performance System type estimated_ qa_performance qa_accuracy (% best combination) qa_rej_ accuracy qa_ accuracy_max Perfect selection0,560,34 (100%)0,661 ltqaAV0,340,24 (70,37%)0,440,68 ofeAV0,270,19 (57,41%)0,40,59 uaic_2AV0,240,24 (70,37%)0,010,25 wlvs081roenQA0,210,21 (62,96%)00,21 uaic_1AV0,190,19 (57,41%)00,19 jota_2AV0,170,16 (46,30%)0,10,26 dfki081deenQA0,170,17 (50%)00,17 jota_1AV0,160,16 (46,30%)00,16 dcun081deenQA0,100,10 (29,63%)00,10 Random0,090,09 (25,25%)00,09 nlel081enenQA0,060,06 (18,52%)00,06 nlel082enenQA0,050,05 (14,81%)00,05 ilkm081nlenQA0,040,04 (12,96%)00,04 magc_2(bbr)AV0,010,01 (1,85%)0,640,65 dcun082deenQA0,010,01 (1,85%)00,01 magc_1(timbl)AV00 (0%)0,63
12
nlp.uned.es/clef-qa/ave Comparing AV systems performance with QA systems (English)
13
nlp.uned.es/clef-qa/ave Techniques reported at AVE 2007 & 2008 10 reports (2007) 9 reports (2008) Generates hypotheses 62 Wordnet 35 Chunking 34 n-grams, longest common Subsequences 54 Phrase transformations 22 NER 57 Num. expressions 67 Temp. expressions 45 Coreference resolution 20 Dependency analysis 33 Syntactic similarity 40 Functions (sub, obj, etc) 33 Syntactic transformations 12 Word-sense disambiguation 21 Semantic parsing 42 Semantic role labeling 21 First order logic representation 32 Theorem prover 31 Semantic similarity 20
14
nlp.uned.es/clef-qa/ave Conclusion (of AVE) Three years of evaluation in a real environment Real systems outputs -> AVE input Developed methodologies Build collections from QA responses Evaluate in chain with a QA Track Compare results with QA systems Introduction of RTE techniques in QA More NLP More Machine Learning New testing collections for the QA and RTE communities In 8 languages, not only English
15
nlp.uned.es/clef-qa/ave Many Thanks!! CLEF AVE QA Organizing Committee AVE participants UNED team
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.