Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University

Acknowledgements Colin Cooper, KCL Joe Nadeau, CWRU Petra Berenbrink, SFU Gurkan Bebek, CWRU NSF, NSERC, CRC Programme, Charles Wang Foundation

Networks are found in biological systems of varying scales: time units: millions of years time units: millions of years 1. the evolutionary tree of life 2. ecological networks 3. the genetic control networks of organisms 4. the protein interaction network in cells 5. the metabolic network in cells time units: millionth of a second

Proteins in a cell There are thousands of different active proteins in a cell acting as: enzymes, catalysors to chemical reactions of the metabolism enzymes, catalysors to chemical reactions of the metabolism components of cellular machinery (e.g. ribosomes) components of cellular machinery (e.g. ribosomes) regulators of gene expression regulators of gene expression Certain proteins play specific roles in special cellular compartments Others move from one compartment to another as “signals”.

Protein Interactions Proteins are produced and degraded all of the time. The rates at which these processes occur depend on what proteins are already present, how they interact with one another directly and how they interact with genes (at DNA or mRNA level). The rates at which these processes occur depend on what proteins are already present, how they interact with one another directly and how they interact with genes (at DNA or mRNA level). Proteins that bind to DNA or RNA have direct effect on production or degradation of other proteins. One protein can speed up or slow down the rate of production of another by binding to the corresponding DNA or mRNA, One protein can speed up or slow down the rate of production of another by binding to the corresponding DNA or mRNA,

What is a proteome network? Represents interactions between pairs of proteins as a binary relationship. Forms a network in which: Vertex = protein Vertex = protein Link = interaction Link = interaction Establishes an ordinary graph of all proteins in an organism and all possible interactions between them.

MIPS Proteome Network Topology

PPI Database Sources DIP (Database of Interacting Proteins - UCLA) BIND (also include other molecule interactions) MIPS (Munich information center for proteins) others including: PROTEOMEPRONETCURAGENPIM see http://www.hgmp.mrc.ac.uk/GenomeWeb/prot-interaction.html

Complete Yeast Proteome Network

The yeast proteome network seems to reveal two basic graph theoretic properties: The frequency of proteins having interactions with exactly k other proteins follows a power law: f(d) ~ C.d  The network exhibits the small world phenomena: small degree of separation between individuals

Degree Distribution of PPI Network of the yeast [Wagner], [Jeong et al.]

Small world phenomena & power-law degree distribution also observed in: communication networks communication networks web graphs web graphs research citation networks research citation networks social networks social networks [Albert, Barabasi & Jeong], [Broder et al.], [Faloutsos 3 ] Classical -Erdos-Renyi type random graphs do not exhibit these properties: Links between pairs of fixed set of nodes picked uniformly: Maximum degree logarithmic with network size No hubs to make short connections between nodes

Preferential Attachment Model [Yule], [Simon] G New node Power-law graphs can be generated by an iterative process: Add one new node at a time Add one new node at a time Connect new node to existing ones independently: Connect new node to existing ones independently: Probability that a node is connected to the new node is proportional to degree [Bollabas et al] Such graphs also exhibit small world phenomena [Barabasi & Albert], [Barabasi & Albert], [Bollabas & Riordan] [Bollabas & Riordan]

Proteome network modeling The model should capture the underlying mechanisms that generate the network while satisfying known mathematical properties: Ohno’s model of genome growth by duplication Duplication based graphs [Kleinberg et al.], [Kumar et al] [Pastor-Satorras et al], [Chung et al.]: each iteration duplicates a randomly chosen vertex with all its links. each iteration duplicates a randomly chosen vertex with all its links. it then independently deletes existing edges and inserts new ones. it then independently deletes existing edges and inserts new ones. Analysis of incoming degree distribution in directed graphs reveal a power law. Simulations on undirected networks exhibit power law like degree distributions.

Duplication Model G At each iteration t (= total number of nodes) 1.Existing vertex is chosen uniformly at random and is ``duplicated'' with all its links. 2.Emulate mutations by a.each link of the new vertex is deleted with probability q = 1-p b.inserting edges between the new node and every other node with probability r/t

Degree distribution of the best fitting duplication model

Expected degree distribution Iterative process give difference equations based on both degree and time; for r = 0: F t+1 (d) = F t (d) (1- pd/t) + F t (d-1) p(d-1)/t + 1/t  j>d-1 F t (j) p k q j-k [j! / k! (j-k)!] [Pastor-Satorras et al.]: approximate difference equations by differential equations to come up with a power law with exponential cutoff: F(d)/t = f(d) ~ C d    d Underlying assumption: Pr[ t+1 generates a degree d node] depends on f t (d+1) and f t (d) only depends on f t (d+1) and f t (d) only [Chung et al.]: verify whether power law degree distribution is satisfied by the difference equations: f t (d) ~ C d  for sufficiently large t,d Underlying assumption: f t (d) is independent of t for all d

Counter evidence f t (d) with exponential cutoff will result in a maximum degree of O(log t) as per Erdos-Renyi graphs. Power law degree distribution implies a maximum degree of  (t p ) f t (d) can not be independent of t: for r=0 and p=.5, the fraction of singletons approach 1 with growing t

What to do with singletons Allow them to exist and duplicate (and let them dominate) Allow them to exist but not duplicate (will have fixed fraction – do not agree with the yeast network) Do not allow them to exist: Either delete as soon as one is created Either delete as soon as one is created Or, always have one default connection to one of the existing nodes Or, always have one default connection to one of the existing nodes adding a fourth term to the difference equation F t+1 (k) = F t (k) (1- pk/t) + F t (k-1) p(k-1)/t + 1/t  j>k-1 F t (j) p k q j-k [j! / k! (j-k)!] + 1/t  j>k-1 F t (j) p k q j-k [j! / k! (j-k)!] + (F t (k-1) - F t (k))  j>0 F t (j) q j / t 2 + (F t (k-1) - F t (k))  j>0 F t (j) q j / t 2 which gives a power law if 1 = p  – p + p - 

Other properties: k-reachability r k (n) : number of nodes that are at most k hops away from n. r 1 (n 1 ) = 5 r 2 (n 1 ) = 9 r 3 (n 1 ) = 10

k-reachability of individual nodes (nodes sorted by degree)

Average k-reachability of nodes with fixed initial degree A verage degree distribution of the neighbors is independent of the degree of the original node However the variance is high

Sequence Homology Two proteins are sequencewise homologous if their pairwise cDNA alignment results with 50% similarity and above: Dual phase distribution of the total number of protein pairs as a function of percentage similarity: cDNA sequence source: ftp://genome-ftp.stanford.edu/pub/yeast

Sequence homology vs interactions If p 1 - p 2 are similar and p 2 - p 3 interact, then with.03 chance p 1 - p 3 interact. if p 1 - p 2 are similar, p 2 - p 3 are similar, then p 1 - p 3 are similar with.64 chance If two genes physically interact with each other, it is very likely that they are not similar (excluding self interactions).

Sequence Homology Enhanced Duplication Model Given duplicate node i: Each interaction edge (i,j) is deleted with probability q.Each interaction edge (i,j) is deleted with probability q. For each similarity edge (j,k), with.03 probabilty, the interaction edge (i,k) is deleted. Each similarity edge (i,j) is deleted with probability q’.Each similarity edge (i,j) is deleted with probability q’. For each similarity edge (j,k), with.64 probabilty the similarity edge (i,k) is deleted. For each interaction edge (j,k), with.03 probabilty the interaction edge (i,k) is deleted. For each j, a new interaction edge (i,j) is added with probability r/t.For each j, a new interaction edge (i,j) is added with probability r/t. For each similarity edge (j,k), with.03 probabilty the interaction edge (i,k) is added. A new similarity edge (i,j) is added with probability r’/t.A new similarity edge (i,j) is added with probability r’/t. For each similarity edge (j,k), with.64 probabilty, the similarity edge (i,k) is added. For each interaction edge (j,k), with.03 probabilty, the interaction edge (i,k) is added.

Degree Distribution of the Enhanced Model

k-reachability of individual nodes (nodes sorted by degree)

Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Similar presentations

Presentation on theme: "Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.

Similar presentations

Presentation on theme: "Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University."— Presentation transcript:

Similar presentations

About project

Feedback