RTE Planning Session Luisa Bentivogli, Peter Clark, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo
Discussion items What’s done so far: RTE 1-7 What’s next: what, where, when? open discussion and audience feedback
Where we have got to 7 years of RTE challenges (sponsored by PASCAL – finishing in 2011) -RTE 1-5: balanced data sets based on the output of NLP applications -RTE 6-7: moving toward more realistic scenarios - Main task: TE performed against a real corpus, focused on SUM setting (after experimentation in RTE-5 pilot) -KBP Validation experiment Considerable amount of datasets have been created in 7 RTE campaigns
What next? SUM and IE (KBP) have been already investigated in RTE-6 and RTE-7 Proposal: Investigate the potentialities of RTE systems for another NLP application setting
What next? RTE-8 will not be at TAC 2012 – Co-locate with a major conference to get wider engagement with the NLP community – NIST will continue to support the activities and contribute to the organization of challenges No RTE-8 in to allow the shift to an earlier time in the year -to prepare datasets for a new setting
Future directions for RTE: new NLP application scenarios QA appears to be the most natural direction – open domain, unsupervised setting Possible QA scenarios: Answer Validation 1.QA4MRE scenario 2.QA from Textbooks scenario 3.AVE on traditional QA tracks data
Answer Validation deciding whether an answer is correct or not according to a given text AV as a Textual Entailment problem: – H: question+answer (turned into a declarative sentence) – T is the text supporting the answer – T entails H = the answer is correct according to the supporting text
AV Input: Question: Which is the capital of Croatia? Answer: Zagreb Text: The capital of Croatia, Zagreb, has a population of around 700,000 citizens and it is known for … RTE Input: 1) T: Text (The capital of Croatia, Zagreb, has a population...) H: Q + A (Zagreb is the capital of Croatia) => H created manually or with automatic tools 2) Original AV triplet: => Requires automatic H generation Answer validation – An Example
1. The QA4MRE scenario Focuses on the Validation step of the QA pipeline – Formulated as a multiple choice reading comprehension test Questions about 1 given text Candidate answers provided – + Reference collection of documents to allow systems to acquire the same background knowledge used to assist with answering some questions End of the roadmap: full QA setting
Text Coal seam gas drilling in Australia's Surat Basin has been halted by flooding. Australia's Easternwell, being acquired by Transfield Services, has ceased drilling because of the flooding. The company is drilling coal seam gas wells for Australia's Santos Ltd. Santos said the impact was minimal. Multiple Choice Test According to the text… What company owns wells in Surat Basin? 1.Australia 2.Coal seam gas wells 3.Transfield Services 4.Santos Ltd. 5.Ausam Energy corporation QA4MRE Reading Test
T(ext) Coal seam gas drilling in Australia's Surat Basin has been halted by flooding. Australia's Easternwell, being acquired by Transfield Services, has ceased drilling because of the flooding. The company is drilling coal seam gas wells for Australia's Santos Ltd. Santos said the impact was minimal. Hs (Q + given A) 1.A USTRALIA owns wells in Surat Basin (NO ENTAILMENT) 2.C OAL SEAM GAS wells owns wells in Surat Basin (NO ENTAILMENT) 3.T RANSFIELD S ERVICES owns wells in Surat Basin (NO ENTAILMENT) 4.S ANTOS L TD. owns wells in Surat Basin (ENTAILMENT) 5.A USAM E NERGY C ORPORATION owns wells in Surat Basin (NO ENTAILMENT) QA4MRE-based RTE task
Interesting data: – questions are posed so that various kinds of textual inferences could be requested ( lexical, syntactic, discourse ) Available datasets: – 2011: up to 600 Hs 12 reading tests, 120 questions, 600 options – The task will be proposed 2012 When full QA setting => AV of QA4MRE systems
2. QA from a Textbook (eg., Biology) Textbooks as natural source of Q&A pairs: T = a paragraph / chapter / book Hs = revision/test questions from teachers and/or the end of the chapter: – True/false questions – Turn «find-a-value» questions into declarative sentences A natural and interesting challenge – established task, ready supply of data
T(ext) – from Biology textbook …Normally, the genetic material in the nucleus is in a loosely bundled coil called chromatin. At the onset of prophase, chromatin condenses together into a highly ordered structure called a chromosome. Since the genetic material has already been duplicated earlier in S phase, the replicated chromosomes have two sister chromatids, bound together at the centromere by the cohesin protein complex…. Hs Which of the following statement(s) are true? a.Genetic material is duplicated during prophase (NO ENTAILMENT) b.During prophase, chromosomes form from chromatin. (ENTAILMENT) c.S phase follows prophase. (NO ENTAILMENT) d.Chromatin is a form of genetic material. (ENTAILMENT) e.Cohesin keep the sister chromatid pairs connected with each other (ENTAILMENT) QA from a Textbook (cont.) Example (Biology)
3. AVE on «traditional» QA data Answer Validation Exercise (Peñas et al., 2006) – Validating the correctness of answers given by QA systems, according to the supporting documents returned by the systems. – Like RTE 6-7 KBP Validation Task Data available from past QA campaigns (TREC & CLEF)
Pilot task: RTE on Specialized Datasets Possible pilot task using specialized datasets, where all T-H pairs contain one or more specific phenomena that affect inference: – Temporal expressions – Numerical expressions Focus on temporal and quantitative reasoning
TE-related initiatives for 2012: -Task # 6: Semantic Textual Similarity -Task # 8: Cross-Lingual Textual 2012: -QA4MRE
POSSIBLE VENUES FOR RTE-8 IN 2013 Semantics conferences are trying to join their efforts: *Sem 2012 – The first joint conference on lexical and computational semantics – Co-located with NAACL-HLT 2012 PROPOSAL: co-locate RTE-8 with ?Siglex NAACL-HLT or ACL (summer 2013) ?IWCS (winter or spring 2013)
Thank you See you all in 2013!