Download presentation
Presentation is loading. Please wait.
1
FUNCTIONAL ANNOTATION OF REGULATORY PATHWAYS
Jayesh Pandey, Mehmet Koyuturk, Wojciech Szpankowski, and Ananth Grama. PURDUE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE Supported by the National Institutes of Health Hello and welcome. I will talk about my thesis work today, which is about comparative analysis of molecular interaction networks.
2
GENE REGULATION Gene expression is the process of synthesizing a functional protein coded by the corresponding gene Genes (and their products) regulate the extent of each other’s expression Any step of gene expression can be modulated Transcription, translation, post-transcriptional modification, RNA transport, mRNA degradation… I will start with a brief overview of molecular interaction networks. I will talk about where the data comes from, how it is modeled, and present some biological observations that motivate comparative network analysis. Then, I will talk about how we address various algorithmic and analytical problems on interaction networks. Finally, I will briefly discuss problems I am currently working on and what I am planning explore in the near future. Ligand independent transcriptional regulation at chromatin level
3
GENE REGULATORY NETWORKS
Model the organization of regulatory interactions in the cell Genes are nodes, regulatory interactions are directed edges Boolean network model: Edges are signed, indicating up- (promotion) and down-regulation (supression) Gene Up-regulation Down-regulation Flowering time in Arabidopsis
4
MOLECULAR ANNOTATION Similar systems involving different molecules (genes, proteins) in different species Functional annotation of genes provides an unified understanding of the underlying principles Molecular function: What is the role of a gene? Biological process: In which processes is a gene involved? Cellular component: Where is a gene’s product localized? Gene Ontology provides a library of molecular annotation We refer to each annotation class as a functional attribute
5
FROM MOLECULES TO SYSTEMS
Networks are species-specific Annotation is at the molecular level Map networks from gene space to function space Can generate a library of annotated “modular (sub-) networks” Network of Gene Ontology terms based on significance of pairwise interactions in yeast synthetic gene array (SGA) network (Tong et al., Science, 2004)
6
INDIRECT REGULATION Assessment of pairwise interactions is simple, but not adequate g1 g3 g5 g1 g3 g5 g2 g4 g4 g2 g4 g4
7
FUNCTIONAL ATTRIBUTE NETWORKS
Multigraph model A gene is associated with multiple functional attributes A functional attribute is associated with multiple genes Functional attributes are represented by nodes Genes are represented by ports, reflecting context g1 g2 g3 g4 g5 g6 Gene network Functional attribute network
8
FREQUENCY OF A MULTIPATH
A pathway of functional attributes occurs in various contexts in the gene network Multipath in the functional attribute network Frequency of multipath is 4 on the left, it is 0 on the right
9
SIGNIFICANCE OF A PATHWAY
We want to identify multipaths with unusual frequency These might correspond to modular pathways Frequency alone is not a good measure of statistical significance The distribution of functional attributes among genes is highly skewed The degree distribution in the gene network is highly skewed Pathways that contain common functional attributes have high frequency, but they are not necessarily interesting
10
STATISTICAL INTERPRETABILITY
Additional positive observation => increased significance Additional negative observation => decreased significance B B’ A A P(B) < P(A) P(B’) > P(A) Frequency is not statistically interpretable!
11
MONOTONICITY Frequency is a monotonic measure
If a pathway is frequent, then all of its sub-paths are frequent Algorithmic advantage: enumerate all frequent patterns in a bottom-up fashion Commonly exploited in traditional data mining applications Statistically interpretable measures are not monotonic! Statistical significance fluctuates in the search space Existing data mining algorithms do not apply Significance of pathways are non-monotonic in two dimensions
12
GO HIERARCHY P( ) < P( ) < P( )
Functional attributes are organized in a hierarchical manner “regulation of steroid biosynthetic process” is a “regulation of steroid metabolic process” and is part of “steroid biosynthetic process” Interpretable statistical measures are not monotonic with respect to GO hierarchy P( ) < g1 g5 g3 P( ) < g2 g4 P( )
13
PATHWAY LENGTH P( ) > P( ) P( ) < P( ) Open problems
How can we effectively search in the pathway space, where significance fluctuates? How can we find optimal resolution in functional attribute space?
14
STATISTICAL MODEL π123: Emphasize modularity of pathways
Condition on frequency of building blocks! We denote each frequency random variable by N, their realization by n Significance of pathway π123: p123 = P (N123 ≥ n123|N12=n12, N23=n23, N1=n1, N2=n2, N3=n3) π123: N1 N2 N3 N12 N23 N123
15
SIGNIFICANCE OF A PATHWAY
Assume that regulatory interactions are independent There are n12 n23 occurrences of π 12 and π 23 The probability that these go through the same gene is 1/n2 The probability that at least n123 of the n12n23 pairs of edges go through the same gene can be bounded by p123≤ exp(n12n23Hq(t)) where q = 1/n2 and t = n123 / n12n23 Hq(t) = t log(q/t) +(1-t) log((1-q)/(1-t)) is the weighted entropy of t with respect to q Can be generalized to pathways of arbitrary length
16
SIGNIFICANCE OF AN EDGE
A single regulatory interaction is the shortest pathway Statistical significance is evaluated with respect to baseline model The number of edges leaving and entering each functional attribute is specified Edges are assumed to be independent The frequency of a regulatory interaction is a hypergeometric random variable Can derive a similar bound for the p-value of a single regulatory interaction
17
ALGORITHMIC ISSUES Significance is not monotonic
Need to enumerate all pathways? Strongly significant pathways A pathway is strongly significant if all its building blocks are significant (defined recursively) Allows pruning out the search space effectively Shortcutting common functional attributes Transcription factors, DNA binding genes, etc. are responsible for mediating regulation Shortcut these terms, consider regulatory effect of different processes on each other directly
18
NARADA http://www.cs.purdue.edu/homes/jpandey/narada/
A software for identification of significant pathways Queries Given functional attribute T, find all significant pathways that originate at T Given functional attribute T, find all significant pathways that terminate at T Given a sequence of functional attributes T1, T2, …, Tk, find all occurrences of the corresponding pathway Identified pathways are displayed as a tree User can explore back and forth between the gene network and the functional attribute network
19
RESULTS E. coli transcription network obtained from RegulonDB
3159 regulatory interactions between 1364 genes Using Gene Ontology, 881 of these genes are mapped to 318 processes Pathway length 2 3 4 5 All 427 580 1401 942 Strongly significant 208 183 142 Common terms shortcut 184 119 1
20
MOLYBDATE ION TRANSPORT
Significant regulatory pathways that originate at molybdate ion transport Their occurrences in the gene network
21
WHAT IS SIGNIFICANT? Molybdate ion transport regulates various processes directly Mo-molybdopterin cofactor biosynthesis, oligopeptide transport, cytochrome complex assembly It regulates various other processes indirectly Through DNA-dependent regulation of transcription, two-component signal transduction system, nitrate assimilation Direct regulation of these mediator processes is not significant NARADA captures modularity of indirect regulation!
22
CONCLUSION Mapping gene regulatory networks to functional attribute space demonstrates great potential Abstract, unified understanding of regulatory systems Algorithmically, a wide range of new challenges How can we bound interpretable statistical measures? How can we handle hierarchy in functional attribute space? Discovering new information How can we project identified “canonical” patterns on other species to discover new regulatory relationships?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.