Presentation is loading. Please wait.

Presentation is loading. Please wait.

AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.

Similar presentations


Presentation on theme: "AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees."— Presentation transcript:

1 AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees

2 NIST AQUAINT Components of Evaluation Pilot investigations into how to evaluate advanced QA systems TREC QA track

3 NIST AQUAINT Pilot Studies Definitions Dialogue Fixed Domain Justifications Multimedia & multilingual systems No answer Opinions Relationships

4 NIST AQUAINT Pilot Studies Too many pilots diluted efforts –dialog and definition pilots mostly complete –relationship and opinion pilots had start –data obtained for fixed domain pilot Emphasis for 2nd year of AQUAINT –fixed domain –relationships –multimedia/multilingual versions

5 NIST AQUAINT TREC QA Track Community-wide evaluation of open domain QA systems –currently limited to “factoid” questions Who was the first person to run the mile in less than four minutes? When was the Rosenberg trial? How deep is Crater Lake? –use news corpus as source of answers systems required to return document from corpus in support of answer

6 NIST AQUAINT QA Track Participation 34 groups from North America, Europe, & Asia

7 NIST AQUAINT Task Return exactly one response for each of 500 questions –response is either [doc, string] pair or NIL –NIL indicates belief that there is no answer in the corpus –new this year, string must be nothing more or less than an answer Rank questions by confidence in answer –emphasis on getting systems to recognize when they have found a good answer

8 NIST AQUAINT Data New AQUAINT document set articles from NY Times newswire (1998-2000), AP newswire (1998-2000), and Xinhua News Agency (1996-2000) approximately 3 gb of text approximately 1,033,000 articles Questions taken from MSNSearch and AskJeeves logs no definition questions some spelling/grammatical errors remain 46 questions with no known answer in docs

9 NIST AQUAINT Motivation for Exact Answers Text snippets masking important differences among systems Pinpointing precise extent of answer important to driving technology –not a statement that deployed systems should return only exact answers –exact answers may be important as component in larger language systems

10 NIST AQUAINT Exact Answers Human assessors judged responses Wrong: string does not contain a correct answer or answer is unresponsive Not Supported: string contains a correct answer, but doc does not support that answer Not Exact: string contains correct answer and doc supports it, but string contains too much (or too little) info Right: string is exactly a correct answer that is supported by the doc

11 NIST AQUAINT Distribution of Judgments 15,948 judgments across 500 questions In general, systems can find extent of answer if they can find it at all distribution skewed across systems attempt to get exact answer sometimes caused units to be lost (so marked wrong) 12,63979.3%Wrong 5053.2%Unsupported 4422.8%ineXact 2,36214.8%Right

12 NIST AQUAINT Confidence-weighted Scoring Focus on getting systems to know when they have found a good answer –questions ranked by confidence in answer –compute score based on ranking  i=1 number right to rank i / i N N

13 NIST AQUAINT Main Task Results

14 NIST AQUAINT Main Themes Many systems now using specific data sources for expected question types name lists gazetteers Web used by most systems, but in different ways primary source of answer that is then mapped to corpus one of several sources whose results are fused place to validate answer found in corpus

15 NIST AQUAINT Confidence Ranking Different approaches –most groups used the type of question as a factor –some systems that use scoring techniques to rank candidate answers also used score for ranking questions –few groups used training set to learn good feature set and corresponding weights, then applied classifier to test set –many groups ranked NIL questions last


Download ppt "AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees."

Similar presentations


Ads by Google