Formal Structuring of Genomic Knowledge Nigam Shah Postdoctoral Fellow, SMI

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

PENN S TATE Compatible text, visual and mathematical representations for biological process ontologies Nigam Shah Penn State University.
Animal, Plant & Soil Science
Verification and Validation
Developing Science Skills. Preparing for Tasks Level DLevel ELevel F individually or in small groups will identify two or three questions to investigate.
Asking translational research questions using ontology enrichment analysis Nigam Shah
Background information Formal verification methods based on theorem proving techniques and model­checking –to prove the absence of errors (in the formal.
Requirements Engineering n Elicit requirements from customer  Information and control needs, product function and behavior, overall product performance,
OASIS Reference Model for Service Oriented Architecture 1.0
Planning Value of Planning What to consider when planning a lesson Learning Performance Structure of a Lesson Plan.
Introduction to Research
Overview of Software Requirements
Internet tools for genomic analysis: part 2
Chapter 9 Using Data Flow Diagrams
Scientific method - 1 Scientific method is a body of techniques for investigating phenomena and acquiring new knowledge, as well as for correcting and.
DATA FLOW DIAGRAMS IT 155.
Chapter One: The Science of Psychology
Process Modeling SYSTEMS ANALYSIS AND DESIGN, 6 TH EDITION DENNIS, WIXOM, AND ROTH © 2015 JOHN WILEY & SONS. ALL RIGHTS RESERVED. 1 Roberta M. Roth.
6 Systems Analysis and Design in a Changing World, Fourth Edition.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Chapter 6: The Traditional Approach to Requirements
System Analysis Overview Document functional requirements by creating models Two concepts help identify functional requirements in the traditional approach.
Section 2: Science as a Process
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design Copyright 2000 © John Wiley & Sons, Inc. All rights reserved. Slide 1 Process.
Steven A. Jones Associate Professor Biomedical Engineering Louisiana Tech University Created for our NSF-funded Research Experiences for Teachers Program.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Verification and Validation.
Requirements Analysis
Business Process Management. Key Definitions Process model A formal way of representing how a business operates Illustrates the activities that are performed.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Chapter One: The Science of Psychology. Ways to Acquire Knowledge Tenacity Tenacity Refers to the continued presentation of a particular bit of information.
TEA Science Workshop #3 October 1, 2012 Kim Lott Utah State University.
1.3: Scientific Thinking & Processes Key concept: Science is a way of thinking, questioning, and gathering evidence.
1 Brief Review of Research Model / Hypothesis. 2 Research is Argument.
Big Idea 1: The Practice of Science Description A: Scientific inquiry is a multifaceted activity; the processes of science include the formulation of scientifically.
The student will demonstrate an understanding of how scientific inquiry and technological design, including mathematical analysis, can be used appropriately.
What is Science? Science is a system of knowledge based on facts and principles.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
LOGIC AND ONTOLOGY Both logic and ontology are important areas of philosophy covering large, diverse, and active research projects. These two areas overlap.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Ontology based analyses methods ++ develop a grammar for making productions using mf, bp, cl: –derive a higher level grammar for next level of productions.
Unpacking the Elements of Scientific Reasoning Keisha Varma, Patricia Ross, Frances Lawrenz, Gill Roehrig, Douglas Huffman, Leah McGuire, Ying-Chih Chen,
1 The Theoretical Framework. A theoretical framework is similar to the frame of the house. Just as the foundation supports a house, a theoretical framework.
Section 2 Scientific Methods Chapter 1 Bellringer Complete these two tasks: 1. Describe an advertisement that cites research results. 2. Answer this question:
Introduction to Earth Science Section 2 Section 2: Science as a Process Preview Key Ideas Behavior of Natural Systems Scientific Methods Scientific Measurements.
Experimental Psychology PSY 433 Chapter 1 – Explanation in Scientific Psychology.
Computations using pathways and networks Nigam Shah
Statistical Testing with Genes Saurabh Sinha CS 466.
The Scientific Method. Objectives Explain how science is different from other forms of human endeavor. Identify the steps that make up scientific methods.
Chapter 1: The Science of Biology. Science What is science? –An organized way of using evidence to learn about the natural world What is the goal of science?
Introduction to Research. Purpose of Research Evidence-based practice Validate clinical practice through scientific inquiry Scientific rational must exist.
Copyright © by Holt, Rinehart and Winston. All rights reserved. Resources Chapter menu Section 2 Scientific Methods Chapter 1 Bellringer Complete these.
Requirements Analysis
Formal Verification. Background Information Formal verification methods based on theorem proving techniques and model­checking –To prove the absence of.
1.3: Scientific Thinking & Processes Key concept: Science is a way of thinking, questioning, and gathering evidence.
6 Systems Analysis and Design in a Changing World, Fourth Edition.
The Psychologist as Detective, 4e by Smith/Davis © 2007 Pearson Education Chapter One: The Science of Psychology.
Scientific Methodology Vodcast 1.1 Unit 1: Introduction to Biology.
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 1 Research: An Overview.
High throughput biology data management and data intensive computing drivers George Michaels.
Chapter 2 Section 1 Conducting Research Obj: List and explain the steps scientists follow in conducting scientific research.
Writing a sound proposal
Section 2: Science as a Process
Experimental Psychology PSY 433
Biological Science Applications in Agriculture
Subject Name: SOFTWARE ENGINEERING Subject Code:10IS51
Presentation transcript:

Formal Structuring of Genomic Knowledge Nigam Shah Postdoctoral Fellow, SMI

The ‘Understanding’ cycle Formulate hypothesis Store validated hypotheses Design experiment to test hypothesis Get best possible match with data Evaluate for consistency with known information Identify conflicts and suggest ‘corrections’ HyBrow assists in the tasks bound by the red outline

Walking along this cycle is “hard” *The way much of biology works is by applying prior knowledge (‘what is known’) for interpreting datasets rather than the application of a set of axioms that will elicit knowledge. (Stevens et al, 2000) *We need to explicitly articulate ‘what is known’… that’s a problem with the current information overload. *If we explicitly articulate ‘what is known’, in an organizing framework, it serves as a reference for integrating new data with prior knowledge. *And increases our ability to fit the results into the “big picture”.

How can we make it easier? If we design a framework for making statements or sets of statements, comprising a hypothesis, about biological processes and systematically examine a wide variety of datasets for evaluating them. We can speed up the ‘understanding cycle’.

Events and Implicit claims An hypothesis is a statement about relationships (among objects) within a biological system. Protein P induces transcription of gene X An ‘event’ is a relationship between two biological entities, which we call ‘agents’. Implicit claims that can be tested: 1.P is a transcription factor. 2.P is a transcriptional activator. 3.P is localized to the nucleus. 4.P can bind to the promoter of gene X promoter | gene X P P

Components of a formal representation Formal representation Domain knowledge model (Ontology) Conceptual framework Establish a correspondence between the conceptual framework and the ontology Domain information and knowledge structured into the knowledge model Knowledgebase Curated data = information. Large amount of information is created & stored by model organism databases Data = generated by researchers. Not always accessible or available in a Model Organism Database (except sequence and microarray data) Database

The conceptual framework  The terminal symbols – which cannot be further decomposed in a grammar – are supplied by the hypothesis ontology.  This grammar together with the hypothesis ontology, allows us to represent hypotheses in a formal language Event → Subject.Verb.Object Event → Subject.Verb.Object.Context Event → Subject.Verb.Object.Context.AssocCond Subject → (Actor | Context | Event) Verb → (Physical | Biochemical | Logical) Object → (Actor | Context | Event) Actor → (Gene | Protein | Complex …) Context → (Physical | Genetic | Temporal) AssocCond → (Presence of | absence of).Agent We have specified methods to evaluate formal language hypotheses for internal consistency & agreement with existing knowledge.

The conceptual framework  Consistency of an hypothesis with prior knowledge is evaluated by applying constraints and rules.  A constraint is a statement specifying the evidence that contradicts or supports an event.  A protein must be in the nucleus to bind to a promoter.  A rule comprises the ‘steps’ for deciding whether a constraint is satisfied or violated. Binds_to_promoter [P, g] : Annotation constraints if cellular location of P is not nucleus, give a penalty. if biological process is not transcription, give a penalty.

Components of a formal representation Formal representation Domain knowledge model (Ontology) Conceptual framework Establish a correspondence between the conceptual framework and the ontology Domain information and knowledge structured into the knowledge model Knowledgebase Curated data = information. Large amount of information is created & stored by model organism databases Data = generated by researchers. Not always accessible or available in a Model Organism Database (except sequence and microarray data) Database

Hypothesis Ontology  Expressive enough to describe the galactose system at a coarse level of detail.  It is compatible with other ontology efforts.  E.g. GO so that GO annotations can be used directly in HyBrow.  We have also developed a grammar to write hypotheses using events from this ontology.

Grammar for a hypothesis A hypothesis consists of at least one event stream An event stream is a sequence of one or more events or event streams with logical joints (or operators) between them. An event has exactly one agent_a, exactly one agent_b and exactly one operator (i.e. a relationship between the two agents). It also has a physical location that denotes ‘where’ the event happened, the genetic context of the organism and associated experimental perturbations when the event happened. A logical joint is the conjunction between two event streams.

Components of a formal representation Formal representation Domain knowledge model (Ontology) Conceptual framework Establish a correspondence between the conceptual framework and the ontology Domain information and knowledge structured into the knowledge model Knowledgebase Curated data = information. Large amount of information is created & stored by model organism databases Data = generated by researchers. Not always accessible or available in a Model Organism Database (except sequence and microarray data) Database

Constraints A constraint is a statement specifying the evidence that supports or contradicts an event. Types of constraints:  Ontology  Data  Existence  Temporal X binds to promoter of Y  Ontology  X must be a protein, complex; Y must be a gene  Data  X must be annotated to be localized to the nucleus.  The promoter of Y must have a binding site for X;  Existence  The gene for X must be present

Rules A rule decides whether a constraint is satisfied or violated. A second layer of rules check the logical structure of the hypothesis The first layer of rules enforce the constraints to decide support or conflict based on the data we have.

Components of a formal representation Formal representation Domain knowledge model (Ontology) Conceptual framework Establish a correspondence between the conceptual framework and the ontology Domain information and knowledge structured into the knowledge model Knowledgebase Curated data = information. Large amount of information is created & stored by model organism databases Data = generated by researchers. Not always accessible or available in a Model Organism Database (except sequence and microarray data) Database

Proteomics SequenceLiterature Microarray HyBrow KB protein_nameratiomethod gal1p1.143ICAT gal10p1.067ICAT gal2p0.858ICAT gal7p1.122ICAT gal5p0.269ICAT gcy1p0.144ICAT acc1p-0.035ICAT tup1p0.173ICAT MS The knowledgebase

User interfaces Hypothesis described in Natural Language Biological process described in a formal language

Evaluating an hypothesis

Screen shot of the output A list of events in the submitted hypothesis A plot of the counts of support and conflicts An explanation for each support / conflict with a link to the data source

HyBrow: take home  The minimum requirement for a formal representation:  Ability to represent data  information  Knowledge  A language to express your “thought experiment” (your model, hypothesis, theory, theorem etc)  A reasoning framework to evaluate the outcome/ validity/accuracy of your thought experiment  We should not aim to use all the data and come up with ONE model that explains everything.  It is much better to propose a model and examine if your data supports/contradicts it

A clinical example  Autism is a developmental disability characterized by “severe and pervasive impairment in several areas of development.”  Nutrigenomics is gathering a lot of attention in Autism treatment  DAN! (defeat autism now!) researchers sometimes refer to this as “biomedical treatment”  Tests for deciding the optimal nutrigenomics therapy are costly and hard to interpret

Excerpt from a parent’s  …right now, that is a manual process to relate the genetic (mutation info...) and any microbial inputs to a biochemical pathway diagram and relate the mutations to specific supplement or enzyme therapies. It costs > $1000 and 6-8 months for someone to manually interpret the results.  I was wondering if it would be helpful to develop a model to contain the static/known information and some dynamic models to help answer some interesting questions relevant to the person's data.  This might make it possible to develop tools for a physician or motivated individual to use nutrigenomic information.

Credits and acknowledgements  Stephen Racunas  Co-developer of HyBrow  Funding  NIH

Orgnanon  an Organon, an instrument for the proper conduct and representation of scientific research.  The first Organon was written by the Ancient Greek philosopher Aristotle in the 4th Century B.C., and included his works on logic and the theory of science.[1]  The second great Organon, the Novum Organum (1620) of Francis Bacon was written as an update, extension and correction of the Aristotelian Organon in light of the success and experimental methods of post-Galilean modern natural science almost 2000 years latter.[2]  [1] The works known as Aristotle’s Organon can be found in The Complete Works of Aristotle, Two Volumes (Jonathan Barnes ed.). Princeton: Princeton University Press,  [2] Bacon, F. Novum Organum (Urback, P. and Gibson, J. transl. and eds.). Chicago: Open Court, 1994.