Bruce Porter Art Souther Department of Computer Science

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Knowledge Representation using First-Order Logic
Knowledge-based Information Retrieval: A Work in Progress Knowledge-based Systems Research Group, University of Texas at Austin.
Chapter 11 user support. Issues –different types of support at different times –implementation and presentation both important –all need careful design.
Building Knowledge Bases Compositionally Bruce Porter, Peter Clark Ken Barker, Art Souther, John Thompson James Fan, Dan Tecuci, Peter Yeh Marwan Elrakabawy,
A Library of Generic Concepts for Composing Knowledge Bases Ken Barker, Bruce UTAustin Peter
How an SME Might Assemble a KB from Components Bruce Porter (University of Texas) Peter Clark (Boeing) and Colleagues.
Bruce Porter (University of Texas) Peter Clark (Boeing) and Colleagues Building KB’s by Assembling Components: An early evaluation of the approach.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Protein Synthesis Ordinary Level. Lesson Objectives At the end of this lesson you should be able to 1.Outline the steps in protein synthesis 2.Understand.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
QUALITATIVE MODELING IN EDUCATION Bert Bredweg and Ken Forbus Yeşim İmamoğlu.
1 Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications Stuart Aitken Artificial Intelligence Applications.
Part I Overview and Introduction to SHAKEN. Simplified Version of how a Virus Invades a Cell “A virus invades a cell in the following way. First, the.
DAY 2 Part I: Review Part II: Task Part III: Feedback and Suggestions.
Technologies to Enable Biologists to Build Large Knowledge Bases on Human Anatomy and Physiology Bruce Porter Ken Barker Art Souther Department of Computer.
Knowledge Base Content Bruce Porter, Peter Clark Ken Barker, Art Souther, John Thompson James Fan, Dan Tecuci, Peter Yeh Marwan Elrakabawy, Sarah Tierney.
Knowledge Entry as the Graphical Assembly of Components Peter Clark, John Thompson (Boeing) Ken Barker, Bruce Porter (Univ Texas at Austin) Vinay Chaudhri,
School of Computing FACULTY OF ENGINEERING Developing a methodology for building small scale domain ontologies: HISO case study Ilaria Corda PhD student.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach Peter Clark (Boeing Phantom Works) Ken Barker, Bruce Porter.
Knowledge Entry as the Graphical Assembly of Components Peter Clark, John Thompson (Boeing) Ken Barker, Bruce Porter (Univ Texas at Austin) Vinay Chaudhri,
Building KB’s by Assembling Components Bruce Porter (UT Austin) Peter Clark (Boeing)
Knowledge Systems and Project Halo In collaboration with SRI (Vinay Chaudhri) and Boeing (Peter Clark)
Some Thoughts to Consider 8 How difficult is it to get a group of people, or a group of companies, or a group of nations to agree on a particular ontology?
KANAL (Knowledge ANALysis) Status Jihie Kim Yolanda Gil Jim Blythe Varun Ratnakar
Managing Data Resources File Organization and databases for business information systems.
DNA AND GENETICS Chapter 12 Lesson 3. Essential Questions What is DNA? What is the role of RNA in protein production? How do changes in the sequence of.
Albia Dugger Miami Dade College Cecie Starr Christine Evers Lisa Starr Chapter 9 From DNA to Protein (Sections )
Engineering, 7th edition. Chapter 8 Slide 1 System models.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Knowledge Representation Techniques
The Semantic Web By: Maulik Parikh.
Control of gene expression in eukaryotic cells
(3) Gene Expression Gene Expression (A) What is Gene Expression?
Action Editor Storyboard
ece 627 intelligent web: ontology and beyond
Semantic Visualization
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
From Genes to Protein Chapter 17.
Data Dictionaries ER Diagram.
Abstract descriptions of systems whose requirements are being analysed
It is the branch of biology that deals with the molecular basis of biological activity. Molecular biology chiefly concerns itself with understanding the.
Unit 8 – DNA Structure and Replication
Chapter 5 RNA and Transcription
Survey of Knowledge Base Content
Chapter 2 Database Environment Pearson Education © 2009.
RNA (Ch 13.1).
Transcription Ms. Day AP Biology.
Bruce Porter Ken Barker Art Souther Department of Computer Science
Chapter 17 From Gene to Protein.
Chapter 10 How Proteins are Made
Chapter 8.4 How Proteins are Made
Year 12 Biology Macromolecules Unit
Unit 2.1: BASIC PRINCIPLES OF HUMAN GENETICS
Metadata Framework as the basis for Metadata-driven Architecture
RNA Chapter 13.1.
Ontology-Based Approaches to Data Integration
Introduction to Systems Analysis and Design Stefano Moshi Memorial University College System Analysis & Design BIT
General Animal Biology
Causal Models Lecture 12.
Chapter 11 user support.
RNA Read the lesson title aloud to students..
Semantic Nets and Frames
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 4 System Modeling.
Chapter 2 Database Environment Pearson Education © 2009.
Building Ontologies with Protégé-2000
General Animal Biology
Presentation transcript:

Technologies to Enable Biologists to Build Large Knowledge Bases on Human Anatomy and Physiology Bruce Porter Art Souther Department of Computer Science University of Texas at Austin Vinay Chaudhri AI Center, Stanford Research Institute Peter Clark Math and Computing Research Center, Boeing

What’s in an Ontology? lexicon to aid communication both for people and computers cataloging system to organize a library a library of multi-media objects meta-level schema for integrating databases so queries can be answered across databases hierarchy of classes and instances supporting inheritance of general information knowledge base for autonomous reasoning

What’s in an Ontology? lexicon to aid communication both for people and computers catalog system to organize a library library contains multi-media objects meta-level schema for integrating databases so queries can be answered across databases hierarchy of classes and instances supporting inheritance of general information knowledge base for autonomous reasoning “strong AI”

What is Autonomous Reasoning Good for? answering questions that are unanticipated when the knowledge base is built why and what-if questions answers tailored to user’s interest and background superhuman performance finding gaps and inconsistencies in the knowledge base raising good questions

Knowledge Base Evolution from expert systems to multifunctional knowledge bases: Mycin and Guidon broadening both the task and the domain

Large Multi-functional KB's can be Built e.g. the Botany Knowledge Base 10 year construction effort by full-time domain expert and tools developers contains 40,000 concepts and 160,000 facts much more information available via inheritance and rules performance goal: robust, expert-level ability to answer questions with good explanations

… and they can work well: e. g … and they can work well: e.g., for the task of generating descriptions Q: What happens during embryo sac formation? A: Embryo sac formation is a kind of female gametophyte formation. During embryo sac formation, the embryo sac is formed from the megaspore mother cell. Embryo sac formation occurs in the ovule. Embryo sac formation is a step of angiosperm sexual reproduction. It consists of megasporogenesis and embryo sac generation. During megasporogenesis, the megaspore mother cell divides in the nucellus to form 4 megaspores. During embryo sac generation, the embryo sac is generated from the megaspore.

… but we need a better process … and they can work well: e.g., for the task of generating descriptions Q: What happens during embryo sac formation? A: Embryo sac formation is a kind of female gametophyte formation. During embryo sac formation, the embryo sac is formed from the megaspore mother cell. Embryo sac formation occurs in the ovule. Embryo sac formation is a step of angiosperm sexual reproduction. It consists of megasporogenesis and embryo sac generation. During megasporogenesis, the megaspore mother cell divides in the nucellus to form 4 megaspores. During embryo sac generation, the embryo sac is generated from the megaspore. … but we need a better process

Enabling Domain Experts to Build Knowledge Bases Why not use knowledge engineers instead? they are less concerned with the fidelity of the representations they lack the knowledge to simplify and abstract the knowledge thoughtfully they operate with sentence-level facts rather than domain-level theories We envision extensive knowledge bases built by the distributed community of active scientists, and maintained by organizations like NSF, NIH, NLM.

Enabling Domain Experts to Build Knowledge Bases Why not use knowledge engineers instead? they are less concerned with the fidelity of the representations they lack the knowledge to simplify and abstract the knowledge thoughtfully they operate with sentence-level facts rather than domain-level theories We envision extensive knowledge bases built by the distributed community of active scientists, and maintained by organizations like NSF, NIH, NLM. This will only work if domain experts can work with familiar concepts and without writing axioms!

Our Approach Building knowledge bases is a joint effort: knowledge engineers build a library consisting of a small hierarchy of reusable, composable, domain-independent knowledge units (“components”) a small vocabulary of relations to connect them knowledge engineers develop generic question answering methods, such as simulation domain specialists build representations of fundamental concepts (“pump priming”) domain experts build a KB through the instantiation and composition of components supported by DARPA’s Rapid Knowledge Formation project

A Library of Components small A Library of Components easy to learn and use broad semantic distinctions (easy to choose) allows detailed pre-engineering of declarative executable models (Paul Cohen, Umass) drawn from related work ontology design/knowledge engineering linguistics semantic primitives case theory, discourse analysis, semantics English lexical resources dictionaries, thesauri, word lists WordNet, Roget, LDOCE, corpora, etc.

Library Contents actions — things that happen, change states Breach,Enter, Copy, Replace, Transfer, etc. states — relatively temporally stable events Be-Closed, Be-Attached-To, Be-Confined, etc. entities — things that are Substance, Place, Object, etc. roles — things that are, but only in the context of things that happen Catalyst,Container, Template, Vehicle, etc.

Library Contents relations between events, entities, roles agent, object, recipient, result, etc. content, part, material, possession, etc. causes, defeats, enables, prevents, etc. purpose, plays, etc. properties between events/entities and values rate, frequency, intensity, direction, etc. size, color, integrity, shape, etc.

Access browsing the hierarchy top-down semantic search all components have hooks to WordNet climb the WordNet hypernym tree with search terms assemble: Attach, Come-Together mend: Repair infiltrate: Enter, Traverse, Penetrate, Move-Into gum-up: Block, Obstruct busted: Be-Broken, Be-Ruined

A Small Example The software system is called SHAKEN mRNA-Transport: “mRNA is transported out of the cell nucleus into the cytoplasm”

This sequence of screen shots shows some of the kind of knowledge stored in components. It also shows how connecting components can result in extra knowledge being asserted through inferencing. Note that the SME has defined MRNA-Transport as a kind of Move-Out-Of. The example is MRNA-Transport, which we’ve defined as a kind of Move-Out-Of. Nodes with red outline and red text show knowledge that the user supplied. On this slide, the user has defined MRNA-Transport and supplied the knowledge that MRNA is the object of the MRNA-Transport. The definition that shows in the popup is not from the documentation, it’s the text-gen slot on MRNA-Transport. The definition suggests that there’s more to this concept than what appears on the screen. We can get at the rest by showing the expanded version of MRNA-Transport…

Here is the other information that was hidden Here is the other information that was hidden. Note that none of this was added by the user, it all comes from the Move-Out-Of component. In fact, there’s more than what’s shown here. Move-Out-Of also specifies relationships between the container that the object moves out of and the portal it moves through…

By expanding the Container node we see that the Portal the MRNA moves through is a region of the Container. We also see that the MRNA is currently contained in the Container. This slide also shows a new component that we’ve added: Eucaryotic Cell (only Eucaryotic Cells have nuclei). All of the knowledge related to the Eucaryotic Cell is in the Biology pump-priming we put in for the summer trials. Here the user has to specify that the Container the MRNA is moving out of is the Nuclear Envelope. The user does this by dragging the Nuclear-Envelope node on top of the Container node…

Having dragged the Nuclear-Envelope node on top of the Container node, SHAKEN asks if the intention is to make these two things one and the same…

The thing to notice is that SHAKEN inferred that the Nuclear-Pore region of the Nuclear-Envelope is the Portal region of the Container. It also inferred that the origin of the Transport is the Nucleoplasm (but only after you ask it to show the abridged descripion of Nucleoplasm). Finally, the existing knowledge (from pump-priming) that the Nucleoplasm is inside the Nuclear-Envelope is consistent with the requirement that the origin of a Move-Out-Of is inside the container. The only thing left to do is specify that the MRNA is moving into the cytoplasm by unifying Eucaryotic-Cytoplasm with the destination of the Transport…

The thing to notice is that SHAKEN inferred that the Nuclear-Pore region of the Nuclear-Envelope is the Portal region of the Container. It also inferred that the origin of the Transport is the Nucleoplasm (but only after you ask it to show the abridged descripion of Nucleoplasm). Finally, the existing knowledge (from pump-priming) that the Nucleoplasm is inside the Nuclear-Envelope is consistent with the requirement that the origin of a Move-Out-Of is inside the container. The only thing left to do is specify that the MRNA is moving into the cytoplasm by unifying Eucaryotic-Cytoplasm with the destination of the Transport… unify

location Having specified the Eucaryotic-Cytoplasm as the destination of the Transport, SHAKEN’s simulator will be able to infer that after the MRNA is transported out of the nucleus, its new location will be the Eucaryotic-Cytoplasm. This will not be shown as an arc on this static representation on this screen. (This example was particularly effective because we chose Nuclear-Envelope as the Container and not Nucleus. It’s not quite reasonable to expect our users to do the same. One of our Y2 projects is to allow the user to specify the Nucleus as Container, but figure out that the Nuclear-Envelope part of the Nucleus is more appropriate in this case. We’ve been referring to this as one of many kinds of “loose speak” that we’re investigating).

“Real KBs” are Significantly Larger Here’s part of the representation of mRNA-Processing built by a biologist (Art)

Knowledge Types Taxonomic: Partonomic: Causal: Subevents: Temporal: RNA Capping is-a-kind-of Attach Partonomic: Eucaryotic Cell has-parts Nucleus, Mitochondrion Causal: RNA Capping enables mRNA Export  Subevents: mRNA processing has-subevents RNA Capping, Polyadenylation, mRNA Splicing . . . Temporal: RNA Capping occurs-before mRNA Export 

Knowledge Types Qualitative Influences: Spatial Information: RNA Capping inhibits mRNA Degradation  Spatial Information: Eucaryotic Primary RNA Transcript has-region 5-prime UTR Structural: Nuclear Envelope encloses mRNA  Telic: RNA polymerase has-purpose to be a Catalyst in Polyadenylation Imagery: graphics and animation

Evaluation Can Domain Experts learn to use the library to encode domain knowledge? Can sophisticated knowledge be captured through composition of components?

Methodology train biologists (4 graduate students) for six days have them encode knowledge from a college textbook, Essential Cell Biology by Bruce Alberts supply end-of-the-chapter-style Biology questions have the biologists pose the questions to their knowledge bases and record the answers have another biologist evaluate the answers on a scale of 0-3 qualitatively evaluate their KBs

Some Example Questions What nucleotide base pairs with adenine in RNA? How is uracil in RNA like thymine in DNA? What is the relationship between thymine and uracil? For a given bacterial gene, how are bacterial RNA and DNA molecules different? Describe RNA as a kind of polymer. What are the four bases/nucleotides of RNA? What is the relationship between a DNA gene and its RNA transcription product?

Evaluation — Question Answering

Evaluation — Productivity

Summary Multi-functional knowledge bases can be built … by domain experts, almost … and they will be, with or without sound principles of ontological engineering … and ontologists can significantly improve the results

Summary Multi-functional knowledge bases can be built … by domain experts, almost … and they will be, with or without sound principles of ontological engineering … and ontologists can significantly improve the results Art and I would love to give you a demo! Ask us how you can get a PC version of SHAKEN for research use