Download presentation
Presentation is loading. Please wait.
1
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University
2
2 Content How to characterize a biology network ? – Graph theory, topological parameters (node degrees, average path length, clustering coefficient, and node degree correlation.) – Random graph, Scale-free network, Hierarchical network Search algorithm – Breadth-first Search, Depth-first Search
3
3 Biological Networks - metabolic networks Metabolism is the most basic network of biochemical reactions, which generate energy for driving various cell processes, and degrade and synthesize many different bio-molecules.
4
4 Biological Networks - Protein-protein interaction network (PIN) Proteins perform distinct and well-defined functions, but little is known about how interactions among them are structured at the cellular level. Protein-protein interaction account for binding interactions and formation of protein complex. - Experiment – Yeast two-hybrid method, or co-immunoprecipitationYeast two-hybrid methodco-immunoprecipitation www.utoronto.ca/boonelab/proteomics.htm Limitation: No subcellular location, and temporal information. Cliques – protein complexes ?
5
5 Biological Networks - PIN Yeast Protein-protein interaction network - protein-protein interactions are not random - highly connected proteins are unlikely to interact with each other. Not a random network - Data from the high- throughput two-hybrid experiment (T. Ito, et al. PNAS (2001) ) - The full set containing 4549 interactions among 3278 yeast proteins 87% nodes in the largest component - k max ~ 285 ! - Figure shows nuclear proteins only
6
6 Biological Networks – Gene regulation networks Example of a genetic regulatory network of two genes (a and b), each coding for a regulatory protein (A and B). In a gene regulatory network, the protein encoded by a gene can regulate the expression of other genes, for instance, by activating or inhibiting DNA transcription. These genes in turn produce new regulatory proteins that control other genes.
7
7 Biological Networks – Gene regulation networks Transcription regulatory network in Yeast - From the YPD database: 1276 regulations among 682 proteins by 125 transcription factors (~10 regulated genes per TF) - Part of a bigger genetic regulatory network of 1772 regulations among 908 proteins Transcription regulatory network in H. sapiens Data courtesy of Ariadne Genomics obtained from the literature search: 1449 regulations among 689 proteins Transcription regulatory network in E. coli Data (courtesy of Uri Alon) was curated from the Regulon database: 606 interactions between 424 operons (by 116 TFs)
8
8 Graph Theory – Basic concepts Graphs G=(N,E) N={n 1 n 2,... n N } E={e 1 e 2,... e M } e k ={n i n j } Nodes: proteins Edges: protein interactions Mutligraph e k ={n i n j }+ duplicate edges i.e. e m ={n i n j } Nodes: proteins Edges: interactions of different sort: binding and similarity Hypergraphs Hyperedge: e x ={n i, n j, n k...} Nodes: proteins Edges: protein complexes Directed hypergraph Hyperedge: e x ={n i, n j.. | n k n l...} Nodes: substances Edges: chemical reactions A + B C +D e X ={A, B.. | C, D...} Directed graph e k ={n i n j } Nodes: genes and their products Edges from A to B: gene regulation gene A regulates expression of gene B Different systems Different graphs
9
9 Graph Theory – Basic concepts Node degree Components Complete graph (Clique) Shortest path length Clustering coefficient C i if A-B, B-C, then it is highly probable that A-C Two ways to compute C i -E i actual connections out of C k 2 possible connections -number of triangles that included i/k i (k i -1) Average clustering coefficient
10
10 Graph Theory – Vertex adjacency matrix 1 2 3 4 - ∞ means not directly connected - node i connectivity, k i = count j (m ij = 1) ki1311ki1311 Undirected graph Bipartite graph symmetric
11
11 Graph Theory – Edge adjacency matrix c a b c d abcdabcd symmetric 1 2 3 4 a b d G The edge adjacency matrix (E) of a graph G is identical to vertex adjacency matrix (A) of the line graph of G, L(G). That is the edge in G are replaced by vertices in L(G). Two vertices in L(G) are connected whenever the corresponding edges in G are adjacent. a b c d A(L(G)) = E(G) L(G) The labeling of the same graph G are related by a similarity transformation, P -1 A(G 1 )P=A(G 2 ).
12
12 Graph Theory – average network distance Interaction path length or average network distance, d - the average of the distances between all pairs of nodes - frequency of the shortest interaction path length, f(L) - determined by using the Floyd’s algorithm The average network diameter d is given by where L is the shortest path length between two nodes. Network diameter (global) Average network distance (local)
13
13 Graph Theory – the shortest path The shortest path - Floyd algorithm, an O(N 3 ) algorithm. For iteration n, - given three nodes i, j and k, it is shorter to reach j from i by passing through k M n ij =min{M n-1 ij, M n-1 ik +M n-1 kj } - search for all possible paths, e.g. 1-2, 1-2-3, 1-2-4, 2-3, 2-4 1 2 3 4 i k j
14
14 Random Graph Theory = Graph Theory +Probability
15
15 Random Graph Theory = Graph Theory +Probability
16
16 Random Graph Theory = Graph Theory + Probability Random graph (Erdos and Renyi, 1960) N nodes labeled and connected by n edges C N 2 = N(N-1)/2 possible edges possible graphs with N nodes and n edges nNumber of possible graphs, C 6 n 16 215 320 415 56 61 N = 4 C 6 n n 3 3 4 4 5 6 N = 4
17
Search Algorithms Find the shortest route, in terms of distance between nodes S and G. A matrix representation of the graph in Figure 3.1 17
18
Search Algorithms – Breadth-first search (BFS) Nodes are expanded in the order in which they are generated. S is expanded into A, B, and C, which are generated in the order 1,2,and 3. A is expanded first to B, C and D, which has generation order 4, 5 and 6 BFS goes back to node B and expands that next to A, C and E (generation order 7, 8 and 9) and then goes back to node 3 (C) and expands that to A, B, D, E and F (generation order 10, 11, 12, 13 and 14). 18
19
Search Algorithms – Depth-first search (DFS) Begin from the root node of the tree Visited the first unvisit node, then marked this node Then find the next unvisit node, then marked this node When proceed, all the nodes are already visited, go back to the parent node 19
20
Search Algorithms – Depth-first search (DFS) 20 A B C D E
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.