Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.

Similar presentations


Presentation on theme: "Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are."— Presentation transcript:

1 Graph mining in bioinformatics Laur Tooming

2 Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are genes or proteins The meaning of an edge depends on the type of the graph –Protein-protein interaction –Gene regulation

3 What we’re looking for We want to find sets of genes that have a biological meaning. Idea: find graph-theoretically relevant sets of vertices and find out if they are also biologically meaningful. Simple example: connected components A more advanced idea: graph clustering. Find subgraphs that have a high edge density.

4

5

6 Markov Cluster Algorithm (MCL) If there is cluster structure in a graph, random walks tend to remain in a cluster for a long time Graph modelled as a stochastic matrix: sum of entries in a column is 1 a ij - probability that randomly walking out of j will go to i on the next step Bigger edge weight means greater probability of choosing that edge Stijn van Dongen, Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, May 2000. http://micans.org/mcl/

7 Markov Cluster Algorithm (MCL) Two procedures, inflation and expansion, are applied alternatively Expansion: matrix squaring –c onsiders longer random walks Inflation: raising entries to some power, rescaling to remain stochastic –Weakens weak edges and strengthens strong ones Converges to a steady state

8 Markov Cluster Algorithm (MCL) Images from http://micans.org/mcl/ani/mcl-animation.htmlhttp://micans.org/mcl/ani/mcl-animation.html

9 Betweenness centrality clustering An edge between different clusters is on many shortest paths from one cluster to another. An edge inside a cluster is on less shortest paths, because there are more alternative paths inside a cluster. Betweenness centrality of an edge - the number of shortest paths in the graph containing that edge. Remove edges with the highest centrality from the graph to obtain clustering. Optimisations: –instead of all shortest paths, pick a sample of vertices and calculate shortest paths from them –remove several edges at once

10 GraphWeb Web interface for analysing biological graphs Simple syntax for entering graphs –multiple datasets –directed edges –edge weights Visualising graphs with GraphViz Finding biological meaning with g:Profiler ds1: A > B 10 ds2: A > B 4 ds1: B C 5 ds2: C > D 12

11 Combining several datasets Whether or not there is an edge between two vertices is determined in biological experiments, which may sometimes give false results. For a given graph different sources may give different information. Some sources may be more trustworthy than others. We would like to combine different sources and assess the trustworthyness of each edge in the resulting graph. Edge weight in summary graph: sum over datasets –w(e,G) = Σ w(e,G i )*w(G i )

12 Combining several datasets

13

14

15 The end


Download ppt "Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are."

Similar presentations


Ads by Google