Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.

Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University

2 Content How to characterize a biology network ? – Graph theory, topological parameters (node degrees, average path length, clustering coefficient, and node degree correlation.) – Random graph, Scale-free network, Hierarchical network Search algorithm – Breadth-first Search, Depth-first Search

3 Biological Networks - metabolic networks Metabolism is the most basic network of biochemical reactions, which generate energy for driving various cell processes, and degrade and synthesize many different bio-molecules.

4 Biological Networks - Protein-protein interaction network (PIN) Proteins perform distinct and well-defined functions, but little is known about how interactions among them are structured at the cellular level. Protein-protein interaction account for binding interactions and formation of protein complex. - Experiment – Yeast two-hybrid method, or co-immunoprecipitationYeast two-hybrid methodco-immunoprecipitation www.utoronto.ca/boonelab/proteomics.htm Limitation: No subcellular location, and temporal information. Cliques – protein complexes ?

5 Biological Networks - PIN Yeast Protein-protein interaction network - protein-protein interactions are not random - highly connected proteins are unlikely to interact with each other. Not a random network - Data from the high- throughput two-hybrid experiment (T. Ito, et al. PNAS (2001) ) - The full set containing 4549 interactions among 3278 yeast proteins 87% nodes in the largest component - k max ~ 285 ! - Figure shows nuclear proteins only

6 Biological Networks – Gene regulation networks Example of a genetic regulatory network of two genes (a and b), each coding for a regulatory protein (A and B). In a gene regulatory network, the protein encoded by a gene can regulate the expression of other genes, for instance, by activating or inhibiting DNA transcription. These genes in turn produce new regulatory proteins that control other genes.

7 Biological Networks – Gene regulation networks Transcription regulatory network in Yeast - From the YPD database: 1276 regulations among 682 proteins by 125 transcription factors (~10 regulated genes per TF) - Part of a bigger genetic regulatory network of 1772 regulations among 908 proteins Transcription regulatory network in H. sapiens Data courtesy of Ariadne Genomics obtained from the literature search: 1449 regulations among 689 proteins Transcription regulatory network in E. coli Data (courtesy of Uri Alon) was curated from the Regulon database: 606 interactions between 424 operons (by 116 TFs)

8 Graph Theory – Basic concepts Graphs G=(N,E) N={n 1 n 2,... n N } E={e 1 e 2,... e M } e k ={n i n j } Nodes: proteins Edges: protein interactions Mutligraph e k ={n i n j }+ duplicate edges i.e. e m ={n i n j } Nodes: proteins Edges: interactions of different sort:  binding and similarity Hypergraphs Hyperedge: e x ={n i, n j, n k...} Nodes: proteins Edges: protein complexes Directed hypergraph Hyperedge: e x ={n i, n j.. | n k n l...} Nodes: substances Edges: chemical reactions A + B  C +D e X ={A, B.. | C, D...} Directed graph e k ={n i n j } Nodes: genes and their products Edges from A to B: gene regulation  gene A regulates expression of gene B Different systems  Different graphs

9 Graph Theory – Basic concepts Node degree Components Complete graph (Clique) Shortest path length Clustering coefficient C i if A-B, B-C, then it is highly probable that A-C Two ways to compute C i -E i actual connections out of C k 2 possible connections -number of triangles that included i/k i (k i -1) Average clustering coefficient

10 Graph Theory – Vertex adjacency matrix 1 2 3 4 - ∞ means not directly connected - node i connectivity, k i = count j (m ij = 1) ki1311ki1311 Undirected graph Bipartite graph symmetric

11 Graph Theory – Edge adjacency matrix c a b c d abcdabcd symmetric 1 2 3 4 a b d G The edge adjacency matrix (E) of a graph G is identical to vertex adjacency matrix (A) of the line graph of G, L(G). That is the edge in G are replaced by vertices in L(G). Two vertices in L(G) are connected whenever the corresponding edges in G are adjacent. a b c d A(L(G)) = E(G) L(G) The labeling of the same graph G are related by a similarity transformation, P -1 A(G 1 )P=A(G 2 ).

12 Graph Theory – average network distance Interaction path length or average network distance, d - the average of the distances between all pairs of nodes - frequency of the shortest interaction path length, f(L) - determined by using the Floyd’s algorithm The average network diameter d is given by where L is the shortest path length between two nodes. Network diameter (global)  Average network distance (local)

13 Graph Theory – the shortest path The shortest path - Floyd algorithm, an O(N 3 ) algorithm. For iteration n, - given three nodes i, j and k, it is shorter to reach j from i by passing through k M n ij =min{M n-1 ij, M n-1 ik +M n-1 kj } - search for all possible paths, e.g. 1-2, 1-2-3, 1-2-4, 2-3, 2-4 1 2 3 4 i k j

14 Random Graph Theory = Graph Theory +Probability

15 Random Graph Theory = Graph Theory +Probability

16 Random Graph Theory = Graph Theory + Probability Random graph (Erdos and Renyi, 1960) N nodes labeled and connected by n edges  C N 2 = N(N-1)/2 possible edges  possible graphs with N nodes and n edges nNumber of possible graphs, C 6 n 16 215 320 415 56 61 N = 4  C 6 n n 3 3 4 4 5 6 N = 4

Search Algorithms  Find the shortest route, in terms of distance between nodes S and G.  A matrix representation of the graph in Figure 3.1 17

Search Algorithms – Breadth-first search (BFS)  Nodes are expanded in the order in which they are generated. S is expanded into A, B, and C, which are generated in the order 1,2,and 3.  A is expanded first to B, C and D, which has generation order 4, 5 and 6  BFS goes back to node B and expands that next to A, C and E (generation order 7, 8 and 9) and then goes back to node 3 (C) and expands that to A, B, D, E and F (generation order 10, 11, 12, 13 and 14). 18

Search Algorithms – Depth-first search (DFS)  Begin from the root node of the tree  Visited the first unvisit node, then marked this node  Then find the next unvisit node, then marked this node  When proceed, all the nodes are already visited, go back to the parent node 19

Search Algorithms – Depth-first search (DFS) 20 A B C D E

Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.

Similar presentations

Presentation on theme: "Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.

Similar presentations

Presentation on theme: "Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University."— Presentation transcript:

Similar presentations

About project

Feedback