Natasha Fridman Noy Mark A. Musen Stanford University

Empirical Studies of Knowledge Acquisition - or - Natasha and Mark do time at Leavenworth
Natasha Fridman Noy Mark A. Musen Stanford University Stanford, California USA

Overview Protégé-2000 version 1.0 DARPA’s HPKB program
Empirical evaluation of Protégé-2000 Where to we go from here?

Generations of Protégé systems at SMI
PROTÉGÉ LISP-Machine system for rapid knowledge acquisition for clinical-trial specifications PROTÉGÉ-II NeXTSTEP system that allowed independent ontology editing and selection of alternative problem-solving methods Protégé/Win Finally, a Protégé system for the masses ... Protégé/Java (a.k.a. Protégé-2000) The subject of this talk ...

Protégé/2000 Represents the latest in a series of interactive tools for knowledge-system development Facilitates construction of knowledge bases in a principled fashion from reusable components Allows a variety of “plug ins” to facilitate customization in various dimensions Still needs a better name ...

Knowledge-base development with Protégé/2000
Build a domain ontology (a conceptual model of the application area) Custom-tailor GUI for acquisition of content knowledge Elicit content knowledge from application specialists Map domain ontology to appropriate problem solvers for automation of particular tasks

Building knowledge bases: The Protégé methodology
Domain ontology to provide domain of discourse Knowledge-acquisition tool for entry of detailed content

Protégé/2000 Ontology-editing tab
Add additional constraints on classes and attributes Developer can see knowledge organization clearly Easy to edit attributes and facets Classification problems become viewable

Generation of usable domain-specific KA tools
Protégé/2000 system takes as input a domain ontology generates in real time a graphical KA tool Developers Tweak KA tool appearance by using direct-manipulation layout-editing facilities Add custom user-interface widgets when complexity of domain warrants more specialized visual metaphors

A great case for customized widgets: monitoring nuclear power plants

Some Advances in Protégé/2000
Much improved editing of ontologies creation and customization of knowledge-acquisition tools adaptation of system to new requirements But still no automated support for mapping of knowledge bases to problem-solving methods—yet! No more shuffling among different development tools!

Protégé-2000 adopts the OKBC knowledge model
Protégé-2000 knowledge-bases are OKBC-compliant Protégé-2000 is not OKBC-generic There are some OKBC knowledge bases that Protégé-2000 cannot handle It’s very close, though! Differences are required to ease KA Instances are instances of exactly one class

The race to develop plug-ins
GUI widgets for tables diagrams animation File I/O plug ins for interoperability with databases, other knowledge-based systems Tab plug-ins for embedded applications

Swapping components Each of the Protégé-2000 major components can be
swapped out and replaced with a different one Knowledge model Storage User interface

Protégé-2000 plug-ins Will revolutionize development of KA tools
Allow nearly every aspect of the system to be modifiable in a well-defined manner Allow multiple groups each to develop special-purpose plug ins for their own purposes Will lead to libraries of plug-ins to allow KA systems to be adapted in radical ways Are already being developed by a widely distributed user community!

But how do we know we’re making progress?
Most KA systems are never evaluated There are no well-established evaluation approaches There are no benchmarks for comparison Most KA-tool users do not want to participate in evaluation experiments They have their own work to do Evaluation is time-consuming

Sisyphus experiments Have been organized by KA community
Have involved shared tasks Office assignment Elevator configuration Rock and mineral classification Have done a better job of allowing comparison of knowledge-system architectures than of KA techniques

What is needed Empirical studies of subject-matter experts entering “real” knowledge Metrics for assessing Quality of entered knowledge Quantity of entered knowledge Usability of KA tools Environments where subject-matter experts can allocate the necessary time for these kinds of studies

We found a captive audience in Kansas ...

What the rest of the talk is about
High-Performance Knowledge Bases Program Empirical evaluation of knowledge-based systems Why and How? How we designed, conducted, evaluated a usability experiment Extensions to Protégé Experiment results

High-performance knowledge bases (HPKB) program
Enable developers to construct large knowledge bases Reuse the knowledge in multiple applications with diverse problem-solving methods in rapidly changing environments Foster collaboration among multiple teams of technology developers and integrators

Two challenge problems
Crisis management challenge problem Managing and understanding information before confrontation Building systems to help warning analysts and policy makers Battlespace challenge problem Analyzing courses of actions for conformance with principles of warfare, resource allocation, feasibility and so on

Why does SMI care about HPKB
Research challenges common to both: collaboration and knowledge sharing management of large knowledge bases knowledge-base development by subject-matter experts (SMEs) who are not experts in knowledge engineering empirical evaluation of the tools and knowledge bases Tools developed for HPKB are also applied in medical domains

Evaluating artificial-intelligence systems
“Studying AI systems is not very different from studying moderately intelligent animals such as rats” — Paul R. Cohen, “Empirical Methods for Artificial Intelligence”

Designing an experiment
Formulate a hypothesis What are we testing? Determine what exactly affects performance Remove various factors from the system and compare results Create conditions for controlled experiment Script sessions Design tasks carefully

Knowledge-acquisition experiment
Evaluate how subject-matter experts (in this case, military experts) can use Protégé to develop and maintain knowledge bases

The problem Knowledge is not static The world changes
What we know about the world changes

Large-scale changes in military doctrine
From presentation by COL Mike Smith (

Domain experts need to interact with knowledge bases
Understand the knowledge base Know what it contains (and what it doesn’t) Perform quality control Remove or change outdated knowledge Acquire new knowledge Extend the knowledge base to cover new areas of expertise

Specific goals for the experiment
Hypothesis 1 Subject-matter experts can use Protégé-2000 effectively for knowledge acquisition Hypothesis 2 Highly custom-tailored tools for the specific domain improve knowledge-acquisition rate and quality

Domain: Opposing-force unit organization
Source: Opposing Force (OPFOR) Battle book: force structure for opposing force Why this domain? The OPFOR information is used by intelligence analysts in planning battles The OPFOR information is changing and needs to be verified and updated by intelligence analysts

Information represented in the knowledge base

Protégé-2000

HPKB tab

Purpose of the experiment
Compare Protégé-2000 and HPKB Tab Protégé-2000 general-purpose tool for knowledge-base design and maintenance allows automatic generation of forms for browsing and entering knowledge HPKB Tab Battlespace-analysis-specific addition to Protégé to collect unit-related information

Experiment methodology
Ablation experiment

Experiment time line Group 1 Group 2 Day 1 (use Protégé-2000)
Day 1 (use HPKB tab) Morning: training session Afternoon: experiment 1 Morning: training session Afternoon: experiment 1 Compare Protégé-2000 to HPKB tab Day 2 (use HPKB tab) Day 2 (use Protégé-2000) Morning: training session Afternoon: experiment 2 Morning: training session Afternoon: experiment 2 Day 3 (use HPKB tab) Day 3 (use Protégé-2000) Test retention of skills Afternoon: experiment 3 Afternoon: experiment 3

Tasks Task design Tasks included
Seven tasks each day – from easy to more difficult Each task starts on a new version of ontology Sets of tasks for all three days are similar Tasks included Verifying what is in the knowledge base Correcting the wrong information Making information more specific Creating new classes of units

Example of a task (task 4)
Verify that all Artillery subunits of Mechanized Infantry Brigade (IFV)(DIV) have their organization chart specified. You need to verify that each artillery unit mentioned in the chart for Mechanized Infantry Brigade (IFV)(DIV) has its own chart defined. All subunits of other types are now fully specified and you do not need to verify this fact. Only study the artillery subunits. For each artillery unit that does not have the chart defined, or does not have it checked (that is, it may be not fully specified), create or complete the chart.

Preparing for evaluation
For each task, define a set of evaluation criteria in advance What constitutes a correct answer? What to do if there is more than one answer ? What do we measure? Logging capability Keep logs of all steps for each user Still hard to measure quality – some of the analysis had to be done manually Usability questionnaires

Evaluation criteria Knowledge-acquisition rate Ability to find errors
Quality of knowledge entry Subjective opinion

Evaluating quality of knowledge entry
How many errors SMEs found in the knowledge base How many wrong steps SMEs took (vs. correct steps) How many terms SMEs correctly added to the knowledge base Have the SMEs noticed their errors themselves and where they able to recover How long did it take for a user to recover from an error

Knowledge-acquisition rate (Days 1-3)
2 4 6 HPKB Tab outperforms Protégé-2000 by 43%

KA rate improves substantially with learning

Knowledge base verification: finding errors
93% of errors found Knowledge base contained a small number of errors for each task. The subjects had to find all the errors. On average, the subjects using HPKB tab performed 26% better than the subjects using Protégé-2000

Quality of knowledge entry: wrong steps versus correct steps

Removing the “hangover effect”
Wrong steps: 1%

Task 6: enter a large amount of data

Error recovery rate Average number of steps to recover
from an error: 3.5

Creating new classes 14 new classes to create Observations
All the classes were placed in correct places On the first two days subjects created additional categories to hold groups of similar classes Explored (and changed) the hierarchy on their own

Retention of skills experiment: knowledge-acquisition rate

Retention of skills experiment
Results Number of errors found increased to 81% with Protégé was 72% with HPKB Tab Correctness 93% of the steps were correct

User satisfaction

Testing the hypothesis: Protégé-2000 versus HPKB tab
KA rate is 43% higher with HPKB tab On the first day, the quality of knowledge entry is significantly better with HPKB tab

Summary of results Very small amount of training
No help at all on day 3 Knowledge-acquisition rate improves substantially with learning Subjects found up to 93% of errors Very low error rate: 6 % (almost 1% with HPKB Tab if you discount hangover effect) One week later: still works….

Lessons learned Preparation, preparation, preparation
Do not expect anything What you think is going to be hard is actually easy What you think is easy, turns out to be hard Dry-run is very important Test the tasks Test the software Test the metrics collection mechanism

Lessons learned (2) Do not under-estimate the human factor
You need to break the ice Design a valid experiment “Our system does 5 apples per millennium” Carefully designed tasks Scripts for training sessions

Lessons learned (3) Leavenworth is not as bad as you would expect
Or is it?

Natasha Fridman Noy Mark A. Musen Stanford University

Similar presentations

Presentation on theme: "Natasha Fridman Noy Mark A. Musen Stanford University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Natasha Fridman Noy Mark A. Musen Stanford University

Similar presentations

Presentation on theme: "Natasha Fridman Noy Mark A. Musen Stanford University"— Presentation transcript:

Similar presentations

About project

Feedback