Technologies to Enable Biologists to Build Large Knowledge Bases on Human Anatomy and Physiology Bruce Porter Ken Barker Art Souther Department of Computer.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Modelling with expert systems. Expert systems Modelling with expert systems Coaching modelling with expert systems Advantages and limitations of modelling.
RNA and Protein Synthesis
Knowledge-based Information Retrieval: A Work in Progress Knowledge-based Systems Research Group, University of Texas at Austin.
DARPA SHAKEN Virus-Invades-Cell Invade VirusCell Attach Penetrate Release Move invader thing invaded barrier Cell-membrane has-part subevent penetrator.
Database Systems: Design, Implementation, and Management Tenth Edition
Ch 3 System Development Environment
Building Knowledge Bases Compositionally Bruce Porter, Peter Clark Ken Barker, Art Souther, John Thompson James Fan, Dan Tecuci, Peter Yeh Marwan Elrakabawy,
Semiotics and Ontologies. Ontologies contain categories, lexicons contain word senses, terminologies contain terms, directories contain addresses, catalogs.
Knowledge Systems Bruce Porter Department of Computer Sciences The University of Texas at Austin.
A Library of Generic Concepts for Composing Knowledge Bases Ken Barker, Bruce UTAustin Peter
How an SME Might Assemble a KB from Components Bruce Porter (University of Texas) Peter Clark (Boeing) and Colleagues.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Bruce Porter (University of Texas) Peter Clark (Boeing) and Colleagues Building KB’s by Assembling Components: An early evaluation of the approach.
Introduction to Knowledge Engineering
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Knowledge Representation Reading: Chapter
Protein Synthesis Ordinary Level. Lesson Objectives At the end of this lesson you should be able to 1.Outline the steps in protein synthesis 2.Understand.
Lecture 1 Introduction to Biology
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Biology 10.1 How Proteins are Made:
Machine Reading as a Process of Partial Question-Answering Peter Clark and Phil Harrison Boeing Research & Technology June 2010.
QUALITATIVE MODELING IN EDUCATION Bert Bredweg and Ken Forbus Yeşim İmamoğlu.
1 Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications Stuart Aitken Artificial Intelligence Applications.
Part I Overview and Introduction to SHAKEN. Simplified Version of how a Virus Invades a Cell “A virus invades a cell in the following way. First, the.
Of 39 lecture 2: ontology - basics. of 39 ontology a branch of metaphysics relating to the nature and relations of being a particular theory about the.
Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.
Knowledge Base Content Bruce Porter, Peter Clark Ken Barker, Art Souther, John Thompson James Fan, Dan Tecuci, Peter Yeh Marwan Elrakabawy, Sarah Tierney.
Knowledge Entry as the Graphical Assembly of Components Peter Clark, John Thompson (Boeing) Ken Barker, Bruce Porter (Univ Texas at Austin) Vinay Chaudhri,
SOFTWARE DESIGN.
Information Systems Engineering. Lecture Outline Information Systems Architecture Information System Architecture components Information Engineering Phases.
1 What is an Ontology? n No exact definition n A tool to help organize knowledge n Or a way to convey a theory on how to represent a class of things n.
Domain-Independent Concepts Domain-Specific Concepts SlotEntityEventValueCliché ActionStateIntangible-EntityTangible-Entity ObjectSubstance…Place…Transfer…CreateMove.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 14 Slide 1 Object-oriented Design.
Proposed NWI KIF/CG --> Common Logic Standard A working group was recently formed from the KIF working group. John Sowa is the only CG representative so.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Transcription and mRNA Modification
Artificial Intelligence 2004 Ontology
1 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT IKRAFT: Interactive Knowledge Representation and Acquisition from Text Yolanda Gil Varun Ratnakar.
Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach Peter Clark (Boeing Phantom Works) Ken Barker, Bruce Porter.
Knowledge Entry as the Graphical Assembly of Components Peter Clark, John Thompson (Boeing) Ken Barker, Bruce Porter (Univ Texas at Austin) Vinay Chaudhri,
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Controlled Vocabulary & Thesaurus Design Course Introduction and Background.
International Workshop Jan 21– 24, 2012 Jacksonville, Fl USA Model-based Systems Engineering (MBSE) Initiative Slides by Henson Graves Presented by Matthew.
Faculty Faculty Richard Fikes Edward Feigenbaum (Director) (Emeritus) (Director) (Emeritus) Knowledge Systems Laboratory Stanford University “In the knowledge.
Building KB’s by Assembling Components Bruce Porter (UT Austin) Peter Clark (Boeing)
Knowledge Systems and Project Halo In collaboration with SRI (Vinay Chaudhri) and Boeing (Peter Clark)
Some Thoughts to Consider 8 How difficult is it to get a group of people, or a group of companies, or a group of nations to agree on a particular ontology?
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
Knowledge-Based Question-Answering Bruce Porter, Peter Clark, and John Thompson.
Transcription Objectives: Trace the path of protein synthesis.
Teaching Bioinformatics Nevena Ackovska Ana Madevska - Bogdanova.
Lecture 8-2CS250: Intro to AI/Lisp What do you mean, “What do I mean?” Lecture 8-2 November 18 th, 1999 CS250.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
Gene Expression DNA, RNA, and Protein Synthesis. Gene Expression Genes contain messages that determine traits. The process of expressing those genes includes.
Data Models. 2 The Importance of Data Models Data models –Relatively simple representations, usually graphical, of complex real-world data structures.
KANAL (Knowledge ANALysis) Status Jihie Kim Yolanda Gil Jim Blythe Varun Ratnakar
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
CCNT Lab of Zhejiang University
ece 627 intelligent web: ontology and beyond
Data Dictionaries ER Diagram.
RNA (Ch 13.1).
Bruce Porter Ken Barker Art Souther Department of Computer Science
Bruce Porter Art Souther Department of Computer Science
Unit 2.1: BASIC PRINCIPLES OF HUMAN GENETICS
Ontology-Based Approaches to Data Integration
Presentation transcript:

Technologies to Enable Biologists to Build Large Knowledge Bases on Human Anatomy and Physiology Bruce Porter Ken Barker Art Souther Department of Computer Science University of Texas at Austin Vinay Chaudhri AI Center, Stanford Research Institute Peter Clark Math and Computing Research Center, Boeing

What’s in an Ontology? lexicon to aid communication –both for people and computers cataloging system to organize a library –a library of multi-media objects meta-level schema for integrating databases –so queries can be answered across databases hierarchy of classes and instances –supporting inheritance of general information knowledge base for autonomous reasoning

What’s in an Ontology? lexicon to aid communication –both for people and computers catalog system to organize a library –library contains multi-media objects meta-level schema for integrating databases –so queries can be answered across databases hierarchy of classes and instances –supporting inheritance of general information knowledge base for autonomous reasoning “strong AI”

What is Autonomous Reasoning Good for? answering questions that are unanticipated when the knowledge base is built –why and what-if questions –answers tailored to user’s interest and background –superhuman performance finding gaps and inconsistencies in the knowledge base raising good questions

Knowledge Base Evolution from expert systems to multifunctional knowledge bases: – Mycin and Guidon – broadening both the task and the domain

Large Multi-functional KB's can be Built e.g. the Botany Knowledge Base 10 year construction effort by full-time domain expert and tools developers contains 40,000 concepts and 160,000 facts much more information available via inheritance and rules performance goal: robust, expert-level ability to answer questions with good explanations

… and they can work well: e.g., for the task of generating descriptions Q: What happens during embryo sac formation? A: Embryo sac formation is a kind of female gametophyte formation. During embryo sac formation, the embryo sac is formed from the megaspore mother cell. Embryo sac formation occurs in the ovule. Embryo sac formation is a step of angiosperm sexual reproduction. It consists of megasporogenesis and embryo sac generation. During megasporogenesis, the megaspore mother cell divides in the nucellus to form 4 megaspores. During embryo sac generation, the embryo sac is generated from the megaspore.

… and they can work well: e.g., for the task of generating descriptions Q: What happens during embryo sac formation? A: Embryo sac formation is a kind of female gametophyte formation. During embryo sac formation, the embryo sac is formed from the megaspore mother cell. Embryo sac formation occurs in the ovule. Embryo sac formation is a step of angiosperm sexual reproduction. It consists of megasporogenesis and embryo sac generation. During megasporogenesis, the megaspore mother cell divides in the nucellus to form 4 megaspores. During embryo sac generation, the embryo sac is generated from the megaspore. … but we need a better process

Enabling Domain Experts to Build Knowledge Bases Why not use knowledge engineers instead? –they are less concerned with the fidelity of the representations –they lack the knowledge to simplify and abstract the knowledge thoughtfully –they operate with sentence-level facts rather than domain-level theories We envision extensive knowledge bases built by the distributed community of active scientists, and maintained by organizations like NSF, NIH, NLM.

Enabling Domain Experts to Build Knowledge Bases Why not use knowledge engineers instead? –they are less concerned with the fidelity of the representations –they lack the knowledge to simplify and abstract the knowledge thoughtfully –they operate with sentence-level facts rather than domain-level theories We envision extensive knowledge bases built by the distributed community of active scientists, and maintained by organizations like NSF, NIH, NLM.  This will only work if domain experts can work with familiar concepts and without writing axioms!

Our Approach Building knowledge bases is a joint effort: knowledge engineers build a library consisting of –a small hierarchy of reusable, composable, domain- independent knowledge units (“components”) –a small vocabulary of relations to connect them knowledge engineers develop generic question answering methods, such as simulation domain specialists build representations of fundamental concepts (“pump priming”) domain experts build a KB through the instantiation and composition of components supported by DARPA’s Rapid Knowledge Formation project

A Library of Components easy to learn and use broad semantic distinctions (easy to choose) allows detailed pre-engineering of declarative executable models (Paul Cohen, Umass) drawn from related work –ontology design/knowledge engineering –linguistics semantic primitives case theory, discourse analysis, semantics –English lexical resources dictionaries, thesauri, word lists WordNet, Roget, LDOCE, corpora, etc. small

Library Contents actions — things that happen, change states –Breach,Enter, Copy, Replace, Transfer, etc.Breach states — relatively temporally stable events –Be-Closed, Be-Attached-To, Be-Confined, etc. entities — things that are –Substance, Place, Object, etc. roles — things that are, but only in the context of things that happen –Catalyst,Container, Template, Vehicle, etc.

Library Contents relations between events, entities, roles –agent, object, recipient, result, etc.agent –content, part, material, possession, etc. –causes, defeats, enables, prevents, etc. –purpose, plays, etc. properties between events/entities and values –rate, frequency, intensity, direction, etc. –size, color, integrity, shape, etc.

Access browsing the hierarchy top-down semantic search –all components have hooks to WordNet –climb the WordNet hypernym tree with search terms –assemble: Attach, Come-Together mend: Repair infiltrate: Enter, Traverse, Penetrate, Move-Into gum-up: Block, Obstruct busted: Be-Broken, Be-Ruined

A Small Example The software system is called SHAKEN mRNA-Transport: –“mRNA is transported out of the cell nucleus into the cytoplasm”

unify

location

“Real KBs” are Significantly Larger Here’s part of the representation of mRNA- Processing built by a biologist (Art)

Knowledge Types Taxonomic: –RNA Capping is-a-kind-of Attach Partonomic: – Eucaryotic Cell has-parts Nucleus, Mitochondrion Causal: –RNA Capping enables mRNA Export Subevents: –mRNA processing has-subevents RNA Capping, Polyadenylation, mRNA Splicing... Temporal: –RNA Capping occurs-before mRNA Export

Knowledge Types Qualitative Influences: –RNA Capping inhibits mRNA Degradation Spatial Information: –Eucaryotic Primary RNA Transcript has-region 5-prime UTR Structural: –Nuclear Envelope encloses mRNA Telic: –RNA polymerase has-purpose to be a Catalyst in Polyadenylation Imagery: –graphics and animation

Evaluation Can Domain Experts learn to use the library to encode domain knowledge? Can sophisticated knowledge be captured through composition of components?

Methodology train biologists (4 graduate students) for six days have them encode knowledge from a college textbook, Essential Cell Biology by Bruce Alberts supply end-of-the-chapter-style Biology questions have the biologists pose the questions to their knowledge bases and record the answers have another biologist evaluate the answers on a scale of 0-3 qualitatively evaluate their KBs

Some Example Questions What nucleotide base pairs with adenine in RNA? How is uracil in RNA like thymine in DNA? What is the relationship between thymine and uracil? For a given bacterial gene, how are bacterial RNA and DNA molecules different? Describe RNA as a kind of polymer. What are the four bases/nucleotides of RNA? What is the relationship between a DNA gene and its RNA transcription product?

Evaluation — Question Answering

Evaluation — Productivity

Summary Multi-functional knowledge bases can be built … by domain experts, almost … and they will be, with or without sound principles of ontological engineering … and ontologists can significantly improve the results

Summary Multi-functional knowledge bases can be built … by domain experts, almost … and they will be, with or without sound principles of ontological engineering … and ontologists can significantly improve the results Art and I would love to give you a demo! Ask us how you can get a PC version of SHAKEN for research use