AQUAINT Testbed John Aberdeen, John Burger, Conrad Chang, John Henderson, Scott Mardis The MITRE Corporation © 2002, The MITRE Corporation.

Slides:



Advertisements
Similar presentations
GEOSS Common Infrastructure (GCI)
Advertisements

Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Systems Analysis and Design in a Changing World
Chapter 8: Evaluating Alternatives for Requirements, Environment, and Implementation.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Web Search - Summer Term 2006 III. Web Search - Introduction (Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Search Engines and Information Retrieval
Customer: Rosalva Gallardo Team members: Susan Lin Buda Chiou Jim Milewski Marcos Mercado November 23, 2010.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Testbeds Salim Roukos IBM T. J. Watson Research Center 9/11/02.
Information Retrieval in Practice
Third Recognizing Textual Entailment Challenge Potential SNeRG Submission.
8 Systems Analysis and Design in a Changing World, Fifth Edition.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Federated Search of Text Search Engines in Uncooperative Environments Luo Si Language Technology Institute School of Computer Science Carnegie Mellon University.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Software Project Management Fifth Edition
JAVELIN Project Briefing 1 AQUAINT Year I Mid-Year Review Language Technologies Institute Carnegie Mellon University Status Update for Mid-Year Program.
1 Anonshare 2.0 P2P Anonymous Browsing History Share Frank Chiang Terry Go Rui Ma Anita Mathew.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Search Engines and Information Retrieval Chapter 1.
Where Innovation Is Tradition SYST699 – Spec Innovations Innoslate™ System Engineering Management Software Tool Test & Analysis.
Hands-On Microsoft Windows Server 2003 Administration Chapter 2 Managing Windows Server 2003 Hardware and Software.
THE GITB TESTING FRAMEWORK Jacques Durand, Fujitsu America | December 1, 2011 GITB |
Demystifying the Business Analysis Body of Knowledge Central Iowa IIBA Chapter December 7, 2005.
Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
An Analysis of Assessor Behavior in Crowdsourced Preference Judgments Dongqing Zhu and Ben Carterette University of Delaware.
AQUAINT Testbed Activity John Burger, Marc Light, Scott Mardis, Alex Morgan The MITRE Corporation © 2002, The MITRE Corporation.
Keystroke Biometric System Client: Dr. Mary Villani Instructor: Dr. Charles Tappert Team 4 Members: Michael Wuench ; Mingfei Bi ; Evelin Urbaez ; Shaji.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
Module 10 Administering and Configuring SharePoint Search.
Module 6 Securing Content. Module Overview Administering SharePoint Groups Implementing SharePoint Roles and Role Assignments Securing and Auditing SharePoint.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
AQUAINT Testbed John Aberdeen, John Burger, Conrad Chang, Scott Mardis The MITRE Corporation © 2002, The MITRE Corporation.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Software Development A Proposed Process and Methodology.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
AQUAINT User Testbed and System Integration Activities Marc Light, John Burger, Clarence Huff MITRE.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
ADC Portal & Clearinghouse GEO Architecture and Data Committee 2-3 March 2006 George Percivall OGC Chief Architect
AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.
T Iteration Demo Group 1 Project Planning Iteration
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
1 © 2004 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Technical Support Seminar Using the Cisco Technical Support Website.
CRISP WP 17 1 / 2 Proposed Metadata Catalogue Architecture Document.
Information Retrieval and Extraction 2009 Term Project – Modern Web Search Advisor: 陳信希 TA: 蔡銘峰&許名宏.
Group collaborative Projects: TECHNOLOGICAL TOOLS TO ASSESS INDIVIDUAL CONTRIBUTION.
Introduction to the GEOSS Registries: Components, Services, and Standards Doug Nebert U.S. Federal Geographic Data Committee June 2007.
Architecture Review 10/11/2004
Systems Analysis and Design in a Changing World, Fifth Edition
Chapter 8 Environments, Alternatives, and Decisions.
Systems Analysis – ITEC 3155 Evaluating Alternatives for Requirements, Environment, and Implementation.
9/18/2018 Department of Software Engineering and IT Engineering
IR Theory: Evaluation Methods
PERFORMANCE AND TALENT MANAGEMENT
Introduction to Information Retrieval
DEPLOYING SECURITY CONFIGURATION
Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
Information Retrieval and Web Design
{Project Name} Organizational Chart, Roles and Responsibilities
Presentation transcript:

AQUAINT Testbed John Aberdeen, John Burger, Conrad Chang, John Henderson, Scott Mardis The MITRE Corporation © 2002, The MITRE Corporation

AQUAINT MITRE l Testbed - Provide access to Q&A systems on classified data. - Solicit user feedback (user studies). l Testweb - Provide public access to a broad variety of Q&A capabilities. - Evaluate systems and architectures for inclusion in testbed. l User Studies - Determine tasks for which Q&A is a useful technology. - Determine advantages of Q&A over related information technologies. - Obtain feedback from users on utility and usability.

Non-AQUAINT studies AFIWC (Air Force Information Warfare Center) installed both classified & unclassified systems QANDA configured to answer information security questions on BUGTRAC data How can CDE be exploited? IRS study questions about completing tax forms What is required to claim a child as a dependent?

Lessons learned l Answers are only useful in context (long answers are preferred) l Source text must be available l Chapter and section headings are important context l Issues with a classified system - may not be able to know the top-level objectives of the users - may not be able to be told or record any actual questions - feedback is largely qualitative

l Classified network (ICTESTNET) - access to users, data, scenarios will be restricted l Evaluate systems prior to installation - Testweb becomes more important - MITRE installations are more than rehearsal l To facilitate feedback, initial deployment should use open source data possibly on a different network Testbed

Testbed Activity l MITRE installations (need to assess portability to the IC environment, maintainability, features, resources, etc.) - QUIRK (CYCorp/IBM) - Javelin (CMU) - in progress - who’s next? l Support scenario development on CNS Data with Search and Q/A interface. Centralize collection of user questions. Available for analysts, reservists, AQUAINT executive committee members, etc.

Testweb l Clarity measure l ISI TextMap integrated into web demo Soon: l CNS Data: search + Google API l QANDA on CNS Q/A system CNS TREC 2002 Javelin, LCC, Qanda, TextMap Q/A Portal/ Demo Q/A repository Google API Other collections IR service Clarity service Users Q/A Portal/ Demo

System Interoperability ABC 0 When did Columbus discover America? 1492 the previous year Columbus discovered America in

Answer Combination l 67 systems submitted to TREC-11 main QA task - Including some variants l Average raw submission accuracy was 22% - 28% for loosely correct (judgment  {1,2,3}) l How well can we do by combining systems in some way? - Simplest approach: voting - More sophisticated approaches?

Basic Approach l Define distance measures between answer pairs - Generalization of simple voting - Can use partial matches, other evidence sources l Select near-“centroid” of all submissions - Minimize sum of pairwise distances (SOP) - Previously used to select DNA sequences (Gusfield 1993) l Endless possibilities for distance measures - Edit distance, ngrams, geo & time distances … l Also used document source prior - NY Times vs. Associated Press vs. Xinhua vs. NIL

Sample Results l Simple voting more than twice as good as average l More sophisticated measures even better l SOP scores can also be used for confidence ranking

Example: Question 1674 What day did Neil Armstrong land on the moon? l 22 different answer strings submitted (plurality of submissions—incorrect) - July 20, 1969; on July 20, 1969 (correct) - July 18, 1969; July 14, 1999 … Plus variants differing in punctuation l Best-scoring selector chooses correct answer - Above answers all contribute

Future Work l Did not have access to - System identity (even anonymized) - Confidence rankings l Would like to use both - Simple system-specific priors would be easy - More sophisticated models possible l Better confidence estimation - Should do better than using SOP score directly

Initial User Study: Comparison to traditional IR l Establish a baseline for relative utility via a task- based comparison of Q/A to traditional IR. l Initial task: collect set of geographic, temporal, and monetary facts regarding hurricane Mitch l Data: TREC11 l Measure: task completeness, accuracy, time l Analyze logs for query reformulations, documents usage, etc.

Preliminary Results Initial subjects are MITRE employees We have run N subjects each on Q/A and IR (Lucene)

What’s Next l Testbed system appraisals l Testweb stability & facelift l Studies with other Q/A systems & features l Other tasks (based on CNS data) l Other component integrations: - answer combination & summarization