Natasha Fridman Noy Mark A. Musen Stanford University

Slides:

Advertisements

Similar presentations

System Development Life Cycle (SDLC)

Advertisements

Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.

Software Quality Assurance Inspection by Ross Simmerman Software developers follow a method of software quality assurance and try to eliminate bugs prior.

R R R CSE870: Advanced Software Engineering (Cheng): Intro to Software Engineering1 Advanced Software Engineering Dr. Cheng Overview of Software Engineering.

Protégé An Environment for Knowledge- Based Systems Development Haishan Liu.

1 CMPT 275 Software Engineering Requirements Analysis Process Janice Regan,

Introduction to Software Testing

What is Business Analysis Planning & Monitoring?

Chapter : Software Process

Overview of the Database Development Process

Managing Software Quality

CSI315 Web Applications and Technology Overview of Systems Development (342)

Chapter 2 The process Process, Methods, and Tools

Managing the development and purchase of information systems (Part 1)

SCSC 311 Information Systems: hardware and software.

Introduction Complex and large SW. SW crises Expensive HW. Custom SW. Batch execution Structured programming Product SW.

1 Software Requirements l Specifying system functionality and constraints l Chapters 5 and 6 ++

Introduction Complex and large SW. SW crises Expensive HW. Custom SW. Batch execution Structured programming Product SW.

Henrik Eriksson Department of Computer and Information Science Linkoping University SE Linkoping, Sweden Raymond W. Fergerson Yuval Shahar Stanford.

Protégé/2000 Advanced Tools for Building Intelligent Systems Mark A. Musen Stanford University Stanford, California USA.

MANAGEMENT INFORMATION SYSTEM

1 CASE Computer Aided Software Engineering. 2 What is CASE ? A good workshop for any craftsperson has three primary characteristics 1.A collection of.

4 Chapter 4: Beginning the Analysis: Investigating System Requirements Systems Analysis and Design in a Changing World, 3 rd Edition.

Advanced Software Engineering Dr. Cheng

Chapter 1 Computer Technology: Your Need to Know

Chapter 1 The Systems Development Environment

Software Configuration Management

Software Project Configuration Management

Chapter 1 The Systems Development Environment

Information Systems Development

Introduction Edited by Enas Naffar using the following textbooks: - A concise introduction to Software Engineering - Software Engineering for students-

Modern Systems Analysis and Design Third Edition

The Development Process of Web Applications

Recent trends in estimation methodologies

Protégé/2000 Advanced Tools for Building Intelligent Systems

Computer Aided Software Engineering (CASE)

Modern Systems Analysis and Design Third Edition

Joseph JaJa, Mike Smorul, and Sangchul Song

Chapter 1 The Systems Development Environment

DSS: Decision Support Systems and AI: Artificial Intelligence

Systems Analysis and Design

Understanding and Utilizing the ISP Analysis Process

Chapter 6: Design of Expert Systems

Chapter 6 Database Design

Chapter 1 The Systems Development Environment

Software Project Planning &

Chapter 1 The Systems Development Environment

Software Prototyping Animating and demonstrating system requirements.

Constructive Cost Model

Introduction Edited by Enas Naffar using the following textbooks: - A concise introduction to Software Engineering - Software Engineering for students-

Components of Annual Plan & Designing a Learning Unit Plan

Chapter 1 Database Systems

Introduction to Software Testing

File Systems and Databases

Engineering Overview Introduction to Engineering Design

Modern Systems Analysis and Design Third Edition

Thursday’s Lecture Chemistry Building Musspratt Lecture Theatre,

Engineering Overview Introduction to Engineering Design

Modern Systems Analysis and Design Third Edition

Course: Module: Lesson # & Name Instructional Material 1 of 32 Lesson Delivery Mode: Lesson Duration: Document Name: 1. Professional Diploma in ERP Systems.

CHAPTER 10 METHODOLOGIES FOR CUSTOM SOFTWARE DEVELOPMENT

Engineering Overview.

Engineering Overview.

Applying Use Cases (Chapters 25,26)

Chapter 22 Object-Oriented Systems Analysis and Design and UML

Engineering Overview.

Map of Human Computer Interaction

Chapter 1 The Systems Development Environment

Building Ontologies with Protégé-2000

Modern Systems Analysis and Design Third Edition

Presentation transcript:

Empirical Studies of Knowledge Acquisition - or - Natasha and Mark do time at Leavenworth Natasha Fridman Noy Mark A. Musen Stanford University Stanford, California USA

Overview Protégé-2000 version 1.0 DARPA’s HPKB program Empirical evaluation of Protégé-2000 Where to we go from here?

Generations of Protégé systems at SMI PROTÉGÉ LISP-Machine system for rapid knowledge acquisition for clinical-trial specifications PROTÉGÉ-II NeXTSTEP system that allowed independent ontology editing and selection of alternative problem-solving methods Protégé/Win Finally, a Protégé system for the masses ... Protégé/Java (a.k.a. Protégé-2000) The subject of this talk ...

Protégé/2000 Represents the latest in a series of interactive tools for knowledge-system development Facilitates construction of knowledge bases in a principled fashion from reusable components Allows a variety of “plug ins” to facilitate customization in various dimensions Still needs a better name ...

Knowledge-base development with Protégé/2000 Build a domain ontology (a conceptual model of the application area) Custom-tailor GUI for acquisition of content knowledge Elicit content knowledge from application specialists Map domain ontology to appropriate problem solvers for automation of particular tasks

Building knowledge bases: The Protégé methodology Domain ontology to provide domain of discourse Knowledge-acquisition tool for entry of detailed content

Protégé/2000 Ontology-editing tab Add additional constraints on classes and attributes Developer can see knowledge organization clearly Easy to edit attributes and facets Classification problems become viewable

Generation of usable domain-specific KA tools Protégé/2000 system takes as input a domain ontology generates in real time a graphical KA tool Developers Tweak KA tool appearance by using direct-manipulation layout-editing facilities Add custom user-interface widgets when complexity of domain warrants more specialized visual metaphors

A great case for customized widgets: monitoring nuclear power plants

Some Advances in Protégé/2000 Much improved editing of ontologies creation and customization of knowledge-acquisition tools adaptation of system to new requirements But still no automated support for mapping of knowledge bases to problem-solving methods—yet! No more shuffling among different development tools!

Protégé-2000 adopts the OKBC knowledge model Protégé-2000 knowledge-bases are OKBC-compliant Protégé-2000 is not OKBC-generic There are some OKBC knowledge bases that Protégé-2000 cannot handle It’s very close, though! Differences are required to ease KA Instances are instances of exactly one class

The race to develop plug-ins GUI widgets for tables diagrams animation File I/O plug ins for interoperability with databases, other knowledge-based systems Tab plug-ins for embedded applications

Swapping components Each of the Protégé-2000 major components can be swapped out and replaced with a different one Knowledge model Storage User interface

Protégé-2000 plug-ins Will revolutionize development of KA tools Allow nearly every aspect of the system to be modifiable in a well-defined manner Allow multiple groups each to develop special-purpose plug ins for their own purposes Will lead to libraries of plug-ins to allow KA systems to be adapted in radical ways Are already being developed by a widely distributed user community!

But how do we know we’re making progress? Most KA systems are never evaluated There are no well-established evaluation approaches There are no benchmarks for comparison Most KA-tool users do not want to participate in evaluation experiments They have their own work to do Evaluation is time-consuming

Sisyphus experiments Have been organized by KA community Have involved shared tasks Office assignment Elevator configuration Rock and mineral classification Have done a better job of allowing comparison of knowledge-system architectures than of KA techniques

What is needed Empirical studies of subject-matter experts entering “real” knowledge Metrics for assessing Quality of entered knowledge Quantity of entered knowledge Usability of KA tools Environments where subject-matter experts can allocate the necessary time for these kinds of studies

We found a captive audience in Kansas ...

What the rest of the talk is about High-Performance Knowledge Bases Program Empirical evaluation of knowledge-based systems Why and How? How we designed, conducted, evaluated a usability experiment Extensions to Protégé Experiment results

High-performance knowledge bases (HPKB) program Enable developers to construct large knowledge bases Reuse the knowledge in multiple applications with diverse problem-solving methods in rapidly changing environments Foster collaboration among multiple teams of technology developers and integrators

Two challenge problems Crisis management challenge problem Managing and understanding information before confrontation Building systems to help warning analysts and policy makers Battlespace challenge problem Analyzing courses of actions for conformance with principles of warfare, resource allocation, feasibility and so on

Why does SMI care about HPKB Research challenges common to both: collaboration and knowledge sharing management of large knowledge bases knowledge-base development by subject-matter experts (SMEs) who are not experts in knowledge engineering empirical evaluation of the tools and knowledge bases Tools developed for HPKB are also applied in medical domains

Evaluating artificial-intelligence systems “Studying AI systems is not very different from studying moderately intelligent animals such as rats” — Paul R. Cohen, “Empirical Methods for Artificial Intelligence”

Designing an experiment Formulate a hypothesis What are we testing? Determine what exactly affects performance Remove various factors from the system and compare results Create conditions for controlled experiment Script sessions Design tasks carefully

Knowledge-acquisition experiment Evaluate how subject-matter experts (in this case, military experts) can use Protégé to develop and maintain knowledge bases

The problem Knowledge is not static The world changes What we know about the world changes

Large-scale changes in military doctrine From presentation by COL Mike Smith (http://192.111.52.19/jadd/fm1005/)

Domain experts need to interact with knowledge bases Understand the knowledge base Know what it contains (and what it doesn’t) Perform quality control Remove or change outdated knowledge Acquire new knowledge Extend the knowledge base to cover new areas of expertise

Specific goals for the experiment Hypothesis 1 Subject-matter experts can use Protégé-2000 effectively for knowledge acquisition Hypothesis 2 Highly custom-tailored tools for the specific domain improve knowledge-acquisition rate and quality

Domain: Opposing-force unit organization Source: Opposing Force (OPFOR) Battle book: force structure for opposing force Why this domain? The OPFOR information is used by intelligence analysts in planning battles The OPFOR information is changing and needs to be verified and updated by intelligence analysts

Information represented in the knowledge base

Protégé-2000

HPKB tab

Purpose of the experiment Compare Protégé-2000 and HPKB Tab Protégé-2000 general-purpose tool for knowledge-base design and maintenance allows automatic generation of forms for browsing and entering knowledge HPKB Tab Battlespace-analysis-specific addition to Protégé to collect unit-related information

Experiment methodology Ablation experiment

Experiment time line Group 1 Group 2 Day 1 (use Protégé-2000) Day 1 (use HPKB tab) Morning: training session Afternoon: experiment 1 Morning: training session Afternoon: experiment 1 Compare Protégé-2000 to HPKB tab Day 2 (use HPKB tab) Day 2 (use Protégé-2000) Morning: training session Afternoon: experiment 2 Morning: training session Afternoon: experiment 2 Day 3 (use HPKB tab) Day 3 (use Protégé-2000) Test retention of skills Afternoon: experiment 3 Afternoon: experiment 3

Tasks Task design Tasks included Seven tasks each day – from easy to more difficult Each task starts on a new version of ontology Sets of tasks for all three days are similar Tasks included Verifying what is in the knowledge base Correcting the wrong information Making information more specific Creating new classes of units

Example of a task (task 4) Verify that all Artillery subunits of Mechanized Infantry Brigade (IFV)(DIV) have their organization chart specified. You need to verify that each artillery unit mentioned in the chart for Mechanized Infantry Brigade (IFV)(DIV) has its own chart defined. All subunits of other types are now fully specified and you do not need to verify this fact. Only study the artillery subunits. For each artillery unit that does not have the chart defined, or does not have it checked (that is, it may be not fully specified), create or complete the chart.

Preparing for evaluation For each task, define a set of evaluation criteria in advance What constitutes a correct answer? What to do if there is more than one answer ? What do we measure? Logging capability Keep logs of all steps for each user Still hard to measure quality – some of the analysis had to be done manually Usability questionnaires

Evaluation criteria Knowledge-acquisition rate Ability to find errors Quality of knowledge entry Subjective opinion

Evaluating quality of knowledge entry How many errors SMEs found in the knowledge base How many wrong steps SMEs took (vs. correct steps) How many terms SMEs correctly added to the knowledge base Have the SMEs noticed their errors themselves and where they able to recover How long did it take for a user to recover from an error

Knowledge-acquisition rate (Days 1-3) 2 4 6 HPKB Tab outperforms Protégé-2000 by 43%

KA rate improves substantially with learning

Knowledge base verification: finding errors 93% of errors found Knowledge base contained a small number of errors for each task. The subjects had to find all the errors. On average, the subjects using HPKB tab performed 26% better than the subjects using Protégé-2000

Quality of knowledge entry: wrong steps versus correct steps

Removing the “hangover effect” Wrong steps: 1%

Task 6: enter a large amount of data

Error recovery rate Average number of steps to recover from an error: 3.5

Creating new classes 14 new classes to create Observations All the classes were placed in correct places On the first two days subjects created additional categories to hold groups of similar classes Explored (and changed) the hierarchy on their own

Retention of skills experiment: knowledge-acquisition rate

Retention of skills experiment Results Number of errors found increased to 81% with Protégé was 72% with HPKB Tab Correctness 93% of the steps were correct

User satisfaction

Testing the hypothesis: Protégé-2000 versus HPKB tab KA rate is 43% higher with HPKB tab On the first day, the quality of knowledge entry is significantly better with HPKB tab

Summary of results Very small amount of training No help at all on day 3 Knowledge-acquisition rate improves substantially with learning Subjects found up to 93% of errors Very low error rate: 6 % (almost 1% with HPKB Tab if you discount hangover effect) One week later: still works….

Lessons learned Preparation, preparation, preparation Do not expect anything What you think is going to be hard is actually easy What you think is easy, turns out to be hard Dry-run is very important Test the tasks Test the software Test the metrics collection mechanism

Lessons learned (2) Do not under-estimate the human factor You need to break the ice Design a valid experiment “Our system does 5 apples per millennium” Carefully designed tasks Scripts for training sessions

Lessons learned (3) Leavenworth is not as bad as you would expect Or is it?