Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007 UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo Thanks to… Main task organizing committee
nlp.uned.es/QA/ave What? Answer Validation Exercise Validate the correctness of the answers…... given by the participants at CLEF QA 2007
nlp.uned.es/QA/ave AVE 2006: an RTE exercise If the text semantically entails the hypothesis, then the answer is expected to be correct. Question Supporting snippet & doc ID Exact Answer QA system Hypothesis Into affirmative form Text
nlp.uned.es/QA/ave Answer Validation Exercise Question Answering Question Candidate answer Supporting Text Textual Entailment Answer is not correct or not enough evidence Automatic Hypothesis Generation Question Hypothesis Answer is correct AVE 2006 AVE 2007 Answer Validation Black box
nlp.uned.es/QA/ave Answer Validation Exercise AVE 2006 Not possible to quantify the potential gain that AV modules give to QA systems Change in AVE 2007 methodology Group answers by question Systems must validate all But select one
nlp.uned.es/QA/ave AVE 2007 Collections What is Zanussi? was an Italian producer of home appliances Zanussi For the Polish film director, see Krzysztof Zanussi. For the hot-air balloon, see Zanussi (balloon). Zanussi was an Italian producer of home appliances that in 1984 was bought who had also been in Cassibile since August 31 Only after the signing had taken place was Giuseppe Castellano informed of the additional clauses that had been presented by general Ronald Campbell to another Italian general, Zanussi, who had also been in Cassibile since August (1985) 3 Out of 5 Live (1985) What Is This?
nlp.uned.es/QA/ave Collections Remove duplicated answers inside the same question group Discard NIL answers, void answers and answers with too long supporting snippet This processing lead to a reduction in the number of answers to be validated
nlp.uned.es/QA/ave Collections (# answers to validate) Available for CLEF participants atnlp.uned.es/QA/ave/ TestingDevelopment English Spanish German French Italian Dutch Portuguese Bulgarian-70 Romanian127-
nlp.uned.es/QA/ave Evaluation Not balanced collections Approach: Detect if there is enough evidence to accept an answer Measures: Precision, recall and F over ACCEPTED answers Baseline system: Accept all answers
nlp.uned.es/QA/ave Evaluation GroupSystemFPrecisionRecall DFKIltqa_ DFKIltqa_ U. Alicanteofe_ Text-Mess ProjectText-Mess_ Iasiadiftene UNEDrodrigo Text-Mess ProjectText-Mess_ U. Alicanteofe_ % VALIDATED % VALIDATED Precision, Recall and F measure over correct answers for English
nlp.uned.es/QA/ave Comparing AV systems performance with QA systems (German) GroupSystem Type QA accuracy % of perfect selection Perfect selection QA % FUHiglockner_2 AV % FUHiglockner_1 AV % DFKI dfki071dedeQA % FUH fuha071dedeQA % Random AV % DFKI dfki071endeQA % FUH fuha072dedeQA % DFKI dfki071ptdeQA %
nlp.uned.es/QA/ave Techniques reported at AVE 2007 10 reports, all reported a RTE approach Generates hypotheses 6 Wordnet 3 Chunking 3 n-grams, longest common Subsequences 5 Phrase transformations 2 NER 5 Num. expressions 6 Temp. expressions 4 Coreference resolution 2 Dependency analysis 3 Syntactic similarity 4 Functions (sub, obj, etc) 3 Syntactic transformations 1 Word-sense disambiguation 2 Semantic parsing 4 Semantic role labeling 2 First order logic representation 3 Theorem prover 3 Semantic similarity 2
nlp.uned.es/QA/ave Conclusion Evaluation in a real environment Real systems outputs -> AVE input Developed methodologies Build collections from QA responses Evaluate in chain with a QA Track Compare results with QA systems New testing collections for the QA and RTE communities In 7 languages, not only English
nlp.uned.es/QA/ave Conclusion 9 groups, 16 systems, 4 languages All systems based on Textual Entailment 5 out of 9 groups participated in QA Introduction of RTE techniques in QA More NLP More Machine Learning Systems based on syntactic or semantic analysis perform Automatic Hypothesis Generation Combination of the question and the answer Some cases directly in a logic form