AQUAINT Pilot evaluation for knowledge-oriented systems April 20, 2005
AQUAINT Represented Projects Arizona Brandeis Cycorp ICSI Illinois ISI LCC MIT Monmouth PARC Stanford UT Dallas
AQUAINT Tentative Agenda 10: :15 Welcome 10: :15 Survey/prioritization of general issues and controversies 11: :15 Discussion/resolution of most significant disagreements 12:15 - 1:15 Lunch 1:15 - 2:00 Resolution of other general issues (if needed) 2:00 - 3:00 Examination/discussion of particular examples (about 5 examples per team) 3:00 - 3:15 Break 3:15 - 4:15 Continue examination/discussion of team examples 4:15 - 4:45 Technical issues: data formats, answer coding, scoring 4:45 - 5:30 Next steps, goals for June PI meeting 5:30 Depart
AQUAINT Decisions Form of challenge: Inference-based questions (yes-no, choice, wh, …) Development and test sets: –Open domain, but no reliance on specialist knowledge; general common sense –Include long chains of inference, but shorter ones too –Allow constructed and natural examples –Not aiming to produce training data System output: –Mandatory: Response, strict/plausible –Optional: System justifications (e.g. linguistic/world-knowledge), human explanations, system confidence Annotation: –Mandatory: Passage, Question, Response, strict/plausible, linguistic/world- knowledge, True/False/Unknown (to allow for distractors) –Optional: Characterize the particular knowledge (world, linguistic) that an answer depends on, assumptions (including which interpretation if ambiguous), context-type (belief, plan,…), annotator’s confidence, provenance
AQUAINT Next Steps Annotation guidelines: May 20 subcommittee: Crouch (PARC), Sauri (Brandeis), Fowler (LCC) Format in uniform way: XML specified by Kaplan (PARC), Fowler(LCC) May 25 Data validation: Small number of examples passed around by June 1 Evaluation and Scoring: Discussion at June PI meeting –What do we want to learn? –How is it done? –How is it presented?
AQUAINT