Presentation is loading. Please wait.

Presentation is loading. Please wait.

Formal Structuring of Genomic Knowledge Nigam Shah Postdoctoral Fellow, SMI

Similar presentations


Presentation on theme: "Formal Structuring of Genomic Knowledge Nigam Shah Postdoctoral Fellow, SMI"— Presentation transcript:

1 Formal Structuring of Genomic Knowledge Nigam Shah Postdoctoral Fellow, SMI nigam@stanford.edu

2 The ‘Understanding’ cycle Formulate hypothesis Store validated hypotheses Design experiment to test hypothesis Get best possible match with data Evaluate for consistency with known information Identify conflicts and suggest ‘corrections’ HyBrow assists in the tasks bound by the red outline

3 Walking along this cycle is “hard” *The way much of biology works is by applying prior knowledge (‘what is known’) for interpreting datasets rather than the application of a set of axioms that will elicit knowledge. (Stevens et al, 2000) *We need to explicitly articulate ‘what is known’… that’s a problem with the current information overload. *If we explicitly articulate ‘what is known’, in an organizing framework, it serves as a reference for integrating new data with prior knowledge. *And increases our ability to fit the results into the “big picture”.

4 How can we make it easier? If we design a framework for making statements or sets of statements, comprising a hypothesis, about biological processes and systematically examine a wide variety of datasets for evaluating them. We can speed up the ‘understanding cycle’.

5 Events and Implicit claims An hypothesis is a statement about relationships (among objects) within a biological system. Protein P induces transcription of gene X An ‘event’ is a relationship between two biological entities, which we call ‘agents’. Implicit claims that can be tested: 1.P is a transcription factor. 2.P is a transcriptional activator. 3.P is localized to the nucleus. 4.P can bind to the promoter of gene X promoter | gene X P P

6 Components of a formal representation Formal representation Domain knowledge model (Ontology) Conceptual framework Establish a correspondence between the conceptual framework and the ontology Domain information and knowledge structured into the knowledge model Knowledgebase Curated data = information. Large amount of information is created & stored by model organism databases Data = generated by researchers. Not always accessible or available in a Model Organism Database (except sequence and microarray data) Database

7 The conceptual framework  The terminal symbols – which cannot be further decomposed in a grammar – are supplied by the hypothesis ontology.  This grammar together with the hypothesis ontology, allows us to represent hypotheses in a formal language Event → Subject.Verb.Object Event → Subject.Verb.Object.Context Event → Subject.Verb.Object.Context.AssocCond Subject → (Actor | Context | Event) Verb → (Physical | Biochemical | Logical) Object → (Actor | Context | Event) Actor → (Gene | Protein | Complex …) Context → (Physical | Genetic | Temporal) AssocCond → (Presence of | absence of).Agent We have specified methods to evaluate formal language hypotheses for internal consistency & agreement with existing knowledge.

8 The conceptual framework  Consistency of an hypothesis with prior knowledge is evaluated by applying constraints and rules.  A constraint is a statement specifying the evidence that contradicts or supports an event.  A protein must be in the nucleus to bind to a promoter.  A rule comprises the ‘steps’ for deciding whether a constraint is satisfied or violated. Binds_to_promoter [P, g] : Annotation constraints if cellular location of P is not nucleus, give a penalty. if biological process is not transcription, give a penalty.

9 Components of a formal representation Formal representation Domain knowledge model (Ontology) Conceptual framework Establish a correspondence between the conceptual framework and the ontology Domain information and knowledge structured into the knowledge model Knowledgebase Curated data = information. Large amount of information is created & stored by model organism databases Data = generated by researchers. Not always accessible or available in a Model Organism Database (except sequence and microarray data) Database

10 Hypothesis Ontology  Expressive enough to describe the galactose system at a coarse level of detail.  It is compatible with other ontology efforts.  E.g. GO so that GO annotations can be used directly in HyBrow.  We have also developed a grammar to write hypotheses using events from this ontology.

11 Grammar for a hypothesis A hypothesis consists of at least one event stream An event stream is a sequence of one or more events or event streams with logical joints (or operators) between them. An event has exactly one agent_a, exactly one agent_b and exactly one operator (i.e. a relationship between the two agents). It also has a physical location that denotes ‘where’ the event happened, the genetic context of the organism and associated experimental perturbations when the event happened. A logical joint is the conjunction between two event streams.

12 Components of a formal representation Formal representation Domain knowledge model (Ontology) Conceptual framework Establish a correspondence between the conceptual framework and the ontology Domain information and knowledge structured into the knowledge model Knowledgebase Curated data = information. Large amount of information is created & stored by model organism databases Data = generated by researchers. Not always accessible or available in a Model Organism Database (except sequence and microarray data) Database

13 Constraints A constraint is a statement specifying the evidence that supports or contradicts an event. Types of constraints:  Ontology  Data  Existence  Temporal X binds to promoter of Y  Ontology  X must be a protein, complex; Y must be a gene  Data  X must be annotated to be localized to the nucleus.  The promoter of Y must have a binding site for X;  Existence  The gene for X must be present

14 Rules A rule decides whether a constraint is satisfied or violated. A second layer of rules check the logical structure of the hypothesis The first layer of rules enforce the constraints to decide support or conflict based on the data we have.

15 Components of a formal representation Formal representation Domain knowledge model (Ontology) Conceptual framework Establish a correspondence between the conceptual framework and the ontology Domain information and knowledge structured into the knowledge model Knowledgebase Curated data = information. Large amount of information is created & stored by model organism databases Data = generated by researchers. Not always accessible or available in a Model Organism Database (except sequence and microarray data) Database

16 Proteomics SequenceLiterature Microarray HyBrow KB protein_nameratiomethod gal1p1.143ICAT gal10p1.067ICAT gal2p0.858ICAT gal7p1.122ICAT gal5p0.269ICAT gcy1p0.144ICAT acc1p-0.035ICAT tup1p0.173ICAT MS The knowledgebase

17 User interfaces Hypothesis described in Natural Language Biological process described in a formal language

18 Evaluating an hypothesis

19

20 Screen shot of the output A list of events in the submitted hypothesis A plot of the counts of support and conflicts An explanation for each support / conflict with a link to the data source

21 HyBrow: take home  The minimum requirement for a formal representation:  Ability to represent data  information  Knowledge  A language to express your “thought experiment” (your model, hypothesis, theory, theorem etc)  A reasoning framework to evaluate the outcome/ validity/accuracy of your thought experiment  We should not aim to use all the data and come up with ONE model that explains everything.  It is much better to propose a model and examine if your data supports/contradicts it

22 A clinical example  Autism is a developmental disability characterized by “severe and pervasive impairment in several areas of development.”  Nutrigenomics is gathering a lot of attention in Autism treatment  DAN! (defeat autism now!) researchers sometimes refer to this as “biomedical treatment”  Tests for deciding the optimal nutrigenomics therapy are costly and hard to interpret

23 Excerpt from a parent’s email  …right now, that is a manual process to relate the genetic (mutation info...) and any microbial inputs to a biochemical pathway diagram and relate the mutations to specific supplement or enzyme therapies. It costs > $1000 and 6-8 months for someone to manually interpret the results.  I was wondering if it would be helpful to develop a model to contain the static/known information and some dynamic models to help answer some interesting questions relevant to the person's data.  This might make it possible to develop tools for a physician or motivated individual to use nutrigenomic information.

24 Credits and acknowledgements  Stephen Racunas  Co-developer of HyBrow  Funding  NIH

25 Orgnanon  an Organon, an instrument for the proper conduct and representation of scientific research.  The first Organon was written by the Ancient Greek philosopher Aristotle in the 4th Century B.C., and included his works on logic and the theory of science.[1]  The second great Organon, the Novum Organum (1620) of Francis Bacon was written as an update, extension and correction of the Aristotelian Organon in light of the success and experimental methods of post-Galilean modern natural science almost 2000 years latter.[2]  [1] The works known as Aristotle’s Organon can be found in The Complete Works of Aristotle, Two Volumes (Jonathan Barnes ed.). Princeton: Princeton University Press, 1984.  [2] Bacon, F. Novum Organum (Urback, P. and Gibson, J. transl. and eds.). Chicago: Open Court, 1994.


Download ppt "Formal Structuring of Genomic Knowledge Nigam Shah Postdoctoral Fellow, SMI"

Similar presentations


Ads by Google