AQUAINT Testbed John Aberdeen, John Burger, Conrad Chang, John Henderson, Scott Mardis The MITRE Corporation © 2002, The MITRE Corporation.

AQUAINT Activities @ MITRE l Testbed - Provide access to Q&A systems on classified data. - Solicit user feedback (user studies). l Testweb - Provide public access to a broad variety of Q&A capabilities. - Evaluate systems and architectures for inclusion in testbed. l User Studies - Determine tasks for which Q&A is a useful technology. - Determine advantages of Q&A over related information technologies. - Obtain feedback from users on utility and usability.

Non-AQUAINT studies AFIWC (Air Force Information Warfare Center) installed both classified & unclassified systems QANDA configured to answer information security questions on BUGTRAC data How can CDE be exploited? IRS study questions about completing tax forms What is required to claim a child as a dependent?

Lessons learned l Answers are only useful in context (long answers are preferred) l Source text must be available l Chapter and section headings are important context l Issues with a classified system - may not be able to know the top-level objectives of the users - may not be able to be told or record any actual questions - feedback is largely qualitative

l Classified network (ICTESTNET) - access to users, data, scenarios will be restricted l Evaluate systems prior to installation - Testweb becomes more important - MITRE installations are more than rehearsal l To facilitate feedback, initial deployment should use open source data possibly on a different network Testbed

Testbed Activity l MITRE installations (need to assess portability to the IC environment, maintainability, features, resources, etc.) - QUIRK (CYCorp/IBM) - Javelin (CMU) - in progress - who’s next? l Support scenario development on CNS Data with Search and Q/A interface. Centralize collection of user questions. Available for analysts, reservists, AQUAINT executive committee members, etc.

Testweb l Clarity measure l ISI TextMap integrated into web demo Soon: l CNS Data: search + Google API l QANDA on CNS Q/A system CNS TREC 2002 Javelin, LCC, Qanda, TextMap Q/A Portal/ Demo Q/A repository Google API Other collections IR service Clarity service Users Q/A Portal/ Demo

System Interoperability ABC 0 When did Columbus discover America? 1492 the previous year Columbus discovered America in 1492 www.columbus.org

Answer Combination l 67 systems submitted to TREC-11 main QA task - Including some variants l Average raw submission accuracy was 22% - 28% for loosely correct (judgment  {1,2,3}) l How well can we do by combining systems in some way? - Simplest approach: voting - More sophisticated approaches?

Basic Approach l Define distance measures between answer pairs - Generalization of simple voting - Can use partial matches, other evidence sources l Select near-“centroid” of all submissions - Minimize sum of pairwise distances (SOP) - Previously used to select DNA sequences (Gusfield 1993) l Endless possibilities for distance measures - Edit distance, ngrams, geo & time distances … l Also used document source prior - NY Times vs. Associated Press vs. Xinhua vs. NIL

Sample Results l Simple voting more than twice as good as average l More sophisticated measures even better l SOP scores can also be used for confidence ranking

Example: Question 1674 What day did Neil Armstrong land on the moon? l 22 different answer strings submitted - 1969 (plurality of submissions—incorrect) - July 20, 1969; on July 20, 1969 (correct) - July 18, 1969; July 14, 1999 … - 20 - Plus variants differing in punctuation l Best-scoring selector chooses correct answer - Above answers all contribute

Future Work l Did not have access to - System identity (even anonymized) - Confidence rankings l Would like to use both - Simple system-specific priors would be easy - More sophisticated models possible l Better confidence estimation - Should do better than using SOP score directly

Initial User Study: Comparison to traditional IR l Establish a baseline for relative utility via a task- based comparison of Q/A to traditional IR. l Initial task: collect set of geographic, temporal, and monetary facts regarding hurricane Mitch l Data: TREC11 l Measure: task completeness, accuracy, time l Analyze logs for query reformulations, documents usage, etc.

Preliminary Results Initial subjects are MITRE employees We have run N subjects each on Q/A and IR (Lucene)

What’s Next l Testbed system appraisals l Testweb stability & facelift l Studies with other Q/A systems & features l Other tasks (based on CNS data) l Other component integrations: - answer combination & summarization

AQUAINT Testbed John Aberdeen, John Burger, Conrad Chang, John Henderson, Scott Mardis The MITRE Corporation © 2002, The MITRE Corporation.

Similar presentations

Presentation on theme: "AQUAINT Testbed John Aberdeen, John Burger, Conrad Chang, John Henderson, Scott Mardis The MITRE Corporation © 2002, The MITRE Corporation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

AQUAINT Testbed John Aberdeen, John Burger, Conrad Chang, John Henderson, Scott Mardis The MITRE Corporation © 2002, The MITRE Corporation.

Similar presentations

Presentation on theme: "AQUAINT Testbed John Aberdeen, John Burger, Conrad Chang, John Henderson, Scott Mardis The MITRE Corporation © 2002, The MITRE Corporation."— Presentation transcript:

Similar presentations

About project

Feedback