Download presentation
Presentation is loading. Please wait.
Published byAubrey Park Modified over 9 years ago
1
AQUAINT User Testbed and System Integration Activities Marc Light, John Burger, Clarence Huff MITRE
2
In the beginning was the BAA “As independent activities, but in close collaboration with AQUAINT contractors, the Government intends … to establish a testbed for the demonstration of emerging capabilities.” “ARDA and its AQUAINT Program Committee, with technical advice and support from a separately solicited and procured system integration contractor, will throughout the entire duration of the AQUAINT program, be reviewing and evaluating each of the programs and projects for components, subsystems and full systems that could be successfully integrated... demonstrate application against existing operational problems …”
3
Models of Technology Transfer Research Transfer
4
System Integrator’s Role MITRE will advise and support wrt what and how systems could be part of the user testbed Research Transfer MITRE will advise and support wrt user-centric evaluations and data and performance analysis and report back to researchers
5
Other User Testbed Details User-centric evaluation –E.g., task time studies, quality of solution studies, user satisfaction surveys Intelligence analysts are the user group –Many different kinds of analysts User testbed will be in both unclassified and classified environments
6
More on MITRE’s Role What we will do –Advise on and enable testbed participation –User-centric evaluations –Data and system performance analysis –Provide an open source reference QA system –Provide integration infrastructure (glue) What we won't do –Act as a gatekeeper –Force a one-size-fits-all architecture
7
What is MITRE MITRE is a set of FFRDCs –Federally-Funded Research and Development Centers Does not sell products, must not compete Charged to act in the public interest –Unbiased advisor, system integrator, evaluator, etc.
8
Okay, so what is this user testbed thing? Researchers Integration User Eval & Analysis User Separation between parallel efforts of research systems and user testbed User Testbed
9
User-Centric Evaluation Select users, design tasks Timing measures –How fast are users able to accomplish a task? Task performance measures –Quality of user’s solution to task User satisfaction
10
Example Survey Questions Please take a few moments to provide us with feedback on your experiences with AQUAINT. Your responses to the questions below will help us improve the system. Keep in mind that we are evaluating the system – not you. Gender: ___M ___FAge group: ___20-29 ___30-39 ___40-49 ___50+ How often do you use a computer for your work? ___never ___once a week ___a few times a week ___a few hours a day ___all day Which keyword search engine(s) do your normally use to find information?_____ Have you ever used AQUAINT before or seen a demonstration of AQUAINT? _____ Using AQUAINT to answer my questions was ___very easy ___easy ___neither easy nor hard ___hard ___very hard ___no opinion AQUAINT worked as expected ___all of the time ___most of the time ___neutral ___some of the time ___never ___no expectations I had to rephrase my questions to get the answer I wanted ___all of the time ___most of the time ___some of the time ___never ___no opinion AQUAINT’s response time in returning answers to my questions was ___very fast ___fast ___slightly fast ___neutral ___slightly slow ___slow ___very slow ___no opinion The first answer AQUAINT returned for each question I asked was the one I wanted ___all of the time ___most of the time ___neutral ___some of the time ___never ___not sure At least one of the answers AQUAINT returned for each question I asked was what I wanted ___strongly agree ___agree ___slightly agree ___neutral ___slightly disagree ___disagree ___strongly disagree AQUAINT provided accurate answers to my questions ___all of the time ___most of the time ___neutral ___some of the time ___never ___not sure
11
Analysis Activity Inherent properties of the data –What percentage of questions are definitions? System-level feature analysis –Is there a correlation between system performance and answer redundancy? Bounds –How much ambiguity remains after answer typing … –What is the limit of weighted word overlap approaches? –Related to error analysis
12
Example Analysis: Answer Multiplicity
13
Integration Activities Inserting complete systems into testbed Connecting components Answer collation Reference system
14
Testbed “Glue” Wish List Arbitrary system topologies Distributed systems Scalability Delivery of only relevant inputs to components Caching and archiving Fault tolerance MITRE’s Catalyst has many of these characteristics
15
Catalyst Characteristics Data model –Standoff annotations –Flexible restructuring, renaming, indexing Processing model –Components connected by streams of annotations –Stream operations (merge, split, …) –Flexible system topologies –Synchronization In use for DARPA’s TIDES Program
16
Three ways to integrate a language processor into a Catalyst system Go native: language processor uses Catalyst stream operators directly Wrap: write a separate, native process to mediate with existing non-Catalyst LP Middle ground: if existing LP uses Expat (or equivalent) XML parser, replace with Catalyst’s Expat-like API Existing APIs in C, Lisp, Python. Java and Perl in progress.
17
Answer Collation Problem –Many systems means many answers –Answers will duplicate, overlap, cluster –Combining rankings is problematic As system integrator, MITRE will develop an answer collation module
18
Answer Collation Issues Merging –PLO Palestine Liberation Organization –Last Christmas 2000-12-25 –Issues of approximation, partial orders... Clustering –{Yasser Arafat, PLO, Hammas} Ranking –Answer type, number of occurrences, source quality
19
Reference System Open-source QA system from MITRE Components include a question analyzer, passage filter, tokenizer, entity taggers, answer selector... Uses Catalyst for glue –Some components can communicate using inline XML Intent: –Contractors without end-to-end systems can insert their own components –Possibly useful for baseline evaluation
20
Summary of User Testbed Activities Parallel research and transition efforts MITRE to assist in transition effort User testbed activities include –Selecting users and defining tasks –User-centric evaluation –Analysis –Integration
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.