Computations using pathways and networks Nigam Shah

Slides:



Advertisements
Similar presentations
PENN S TATE Compatible text, visual and mathematical representations for biological process ontologies Nigam Shah Penn State University.
Advertisements

BioSigNet: Reasoning and Hypothesizing about Signaling Networks Nam Tran.
April 15, 2004SPIE1 Association in Level 2 Fusion Mieczyslaw M. Kokar Christopher J. Matheus Jerzy A. Letkowski Kenneth Baclawski Paul Kogut.
1 A Description Logic with Concrete Domains CS848 presentation Presenter: Yongjuan Zou.
Asking translational research questions using ontology enrichment analysis Nigam Shah
Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Background information Formal verification methods based on theorem proving techniques and model­checking –to prove the absence of errors (in the formal.
Gene Ontology John Pinney
A Framework for Ontology-Based Knowledge Management System
Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis Jonsson.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Storing and Retrieving Biological Instances with the Instance Store Daniele Turi, Phillip Lord, Michael Bada, Robert Stevens.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
Knowledge Acquisitioning. Definition The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Triangulation of network metaphors The Royal Netherlands Academy of Arts and Sciences Iina Hellsten & Andrea Scharnhorst Networked Research and Digital.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
Using DNA Subway in the Classroom Red Line Lesson Sketch.
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
Using DNA Subway in the Classroom Red Line Lesson Sketch.
Automated Explanation of Gene-Gene Relationships Wacek Kuśnierczyk.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
What Is a Gene Network?. Gene Regulatory Systems “Programs built into the DNA of every animal.” Eric H. Davidson.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
1 Brief Review of Research Model / Hypothesis. 2 Research is Argument.
Networks and Interactions Boo Virk v1.0.
Reconstruction of Transcriptional Regulatory Networks
Network & Systems Modeling 29 June 2009 NCSU GO Workshop.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
LOGIC AND ONTOLOGY Both logic and ontology are important areas of philosophy covering large, diverse, and active research projects. These two areas overlap.
Formal Structuring of Genomic Knowledge Nigam Shah Postdoctoral Fellow, SMI
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Ontology based analyses methods ++ develop a grammar for making productions using mf, bp, cl: –derive a higher level grammar for next level of productions.
Statistical Testing with Genes Saurabh Sinha CS 466.
BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Mining the Biomedical Research Literature Ken Baclawski.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Introduction to biological molecular networks
GO based data analysis Iowa State Workshop 11 June 2009.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Approach to building ontologies A high-level view Chris Wroe.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
Formal Verification. Background Information Formal verification methods based on theorem proving techniques and model­checking –To prove the absence of.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
High throughput biology data management and data intensive computing drivers George Michaels.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Artificial Intelligence Knowledge Representation.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Syntax and semantics >AMYLASEE1 TGCATNGY A very simple FASTA file.
The Semantic Web By: Maulik Parikh.
What is cognitive psychology?
Statistical Testing with Genes
Knowledge Representation
Ontology.
Logical architecture refinement
Causal Models Lecture 12.
Statistical Testing with Genes
CH 4 - Language semantics
Presentation transcript:

Computations using pathways and networks Nigam Shah

THE GOAL = MAKING SENSE OF HIGH THROUGHPUT DATA

High throughput data “high throughput” is one of those fuzzy terms that is never really defined anywhere Genomics data is considered high throughput if: You can not “look” at your data to interpret it Generally speaking it means ~ 1000 or more genes and 20 or more samples. There are about 40 different high throughput genomics data generation technologies. DNA, mRNA, proteins, metabolites … all can be measured

How does ontology help? An ontology provides a organizing framework for creating “abstractions” of the high throughput data The simplest ontologies (i.e. terminologies, controlled vocabularies) provide the most bang- for-the-buck Gene Ontology (GO) is the prime example More structured ontologies – such as those that represent pathways and more higher order biological concepts – still have to demonstrate real utility.

Gene Ontology to analyze microarray data

Using GO annotations

Descriptions built by connecting/linking ontology terms Biologists interpret a list of genes and form a result statement such as: The photosynthesis genes located in the chloroplast are repressed in response to ozone stress and have the ABRE binding site enriched in their promoters.

…more structure ? in OBOL Relations Ontology OBOL Relations Ontology

Between-ontology structure

… more structure [beyond GO]: PATO The building blocks of phenotype descriptions: EQ Entity (bearer) such as spermatocyte, wing Quality (property, attribute) - a kind of dependent continuant Formally, an EQ description defines: - a Quality which inheres_in a bearer entity The building blocks are combined according to the Pheno- syntax

Semantically structured annotations WHY HOW

Open Questions/Challenges Creation/acceptance of a systematic formalism for creating expressive annotations. (e.g. associated_with, involves) A generic tool that uses ontologies and allow the user to compose terms and cross ontology annotations Easy term/annotation composition Control the amount of alternative [compositional] statements allowed

Pathways to analyze array data

“Pathways” to analyze array data The notion of a cancer signaling pathway can serve as an organizing framework for interpreting microarray expression data. On examining a relatively small set of genes based on prior biological knowledge about a given pathway, the analysis becomes more specific.

Reactome’s sky painter

Operations on pathway resources Custom codeRDF + SPARQLOWL + SWRL Verify a pathway resourceProofreading Reactome [1] In progress Perform integrated querying of multiple pathway resources Hard (“wrapper” approaches) PKB [2] Verify multiple pathway resources Too hard (there are ~200) Merge and compare multiple pathway resources “Reason” over pathway resources [1] A case study in pathway knowledgebase verification, BMC Bioinformatics 2006, 7:196 [2] Pathway Knowledge Base: An Integrated pathway resource using BioPAX, Submitted to Applied Ontology

Merge and compare pathway resources Given a set of ‘nodes’ and some ‘links’ among them, query multiple pathway sources and fill in the most plausible interactions between the nodes. Plausible = not contradicted by existing data and knowledge Current pathway resources [in biopax] can not support this because, the manner in which ‘nodes’ are identified, the manner in which ‘links’ are identified is arbitrary. Reactome has started to connect the pathway steps will GO biological processes. BioPAX lets pathway sources “export” their nodes and links. …but p53 in resource A is still different from P53 in resource B … and Activate in resource A is still different from activates in resource B

Problem I have no clue what a pathway is! A set or series of interactions, often forming a network, which biologists have found useful to group together for organizational, historic, biophysical or other reasons. The complexity and abstraction represented in a pathway is decided by its author attempting to represent the interactions between a set of genes, proteins, and small molecules.

“Networks” to analyze high throughput genomic data

Building networks Take a high throughput dataset Define a notion of ‘relatedness’ depending on the dataset Co-expression for microarray data Co-occurance for literature networks … Enlist [node]-- --[node] pairs Find a good graph drawing program!

Nice hairball but … From Long et al, in Trends in Biochemical Sciences, vol 32, no 7. Srinivasan B, Snow R, Shah N and Batzoglou S in Interactome Networks CSHL From Srinivasan et al, in Briefings in Bioinformatics August 2007.

Hypotheses/Models to analyze high throughput genomic data

Events and Implicit claims An hypothesis is a statement about relationships (among objects) within a biological system. Protein P induces transcription of gene X An ‘event’ is a relationship between two biological entities. Implicit claims that can be tested: 1.P is a transcription factor. 2.P is a transcriptional activator. 3.P is localized to the nucleus. 4.P can bind to the promoter of gene X promoter | gene X P P

Representing Events Explicitly A hypothesis consists of at least one event stream An event stream is a sequence of one or more events or event streams with logical joints (or operators) between them. An event has exactly one agent_a, exactly one agent_b and exactly one operator (i.e. a relationship between the two agents). It also has a physical location that denotes ‘where’ the event happened, the genetic context of the organism and associated experimental perturbations when the event happened. A logical joint is the conjunction between two event streams. A hypothesis consists of at least one event stream An event stream is a sequence of one or more events or event streams with logical joints (or operators) between them. An event has exactly one agent_a, exactly one agent_b and exactly one operator (i.e. a relationship between the two agents). It also has a physical location that denotes ‘where’ the event happened, the genetic context of the organism and associated experimental perturbations when the event happened. A logical joint is the conjunction between two event streams.

User interfaces Hypothesis described in Natural Language Biological process described in a formal language

Evaluating an hypothesis

A. Representation of an hypothesis in terms of events (ev = event) B. Holding the mouse on a neighboring hypothesis (b1) shows what event was replaced to create it C. Plot of the support versus conflicts for submitted and neighboring hypotheses (n1, b1). Clicking on the n1 submits that hypothesis as ‘seed’

HyBrow: lessons learnt The minimum requirement for a formal representation: Ability to represent data  information  Knowledge A language to unambiguously express your “thought experiment” (your model, hypothesis, theory, theorem etc) A reasoning framework to evaluate the outcome/ validity/accuracy of your thought experiment Project Home page:

Pathways as “models”? Pathways are assumed to be models representing biological processes, without actually knowing the modeling formalism in which the model is valid. The ‘language’ of writing out a pathway doesn’t really have a grammar and/or a logic Most pathways end up being lists of heterogeneous sets of “steps” (in terms of the time of execution, the place of execution, the abstraction level, the kind of ‘thing’ passed along etc…) Lots of discussion on requirements of data providers, where are the users/consumers and their use cases?

Claims Pathways are useful only if they can serve as “models” [accurate representations] of a process Hence whatever needs to be done to ensure that a pathway is a valid model of at least one formalism should be required of the pathway author. A pathway representation that doesn’t solve the problem of uniquely identifying entities doesn’t solve the problem of integrating pathways. We just end up with marked up, structured information from multiple providers, without actually integrating anything.

Success of projects in the Biomedical domain High KR complexity Minimal KR complexity Minimal computational complexity High computational complexity

Success of projects in the Biomedical domain High KR complexity Minimal KR complexity Minimal computational complexity High computational complexity