Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.

Slides:



Advertisements
Similar presentations
Graph Theory Arnold Mesa. Basic Concepts n A graph G = (V,E) is defined by a set of vertices and edges v3 v1 v2Vertex (v1) Edge (e1) A Graph with 3 vertices.
Advertisements

CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
KDD 2009 Scalable Graph Clustering using Stochastic Flows Applications to Community Discovery Venu Satuluri and Srinivasan Parthasarathy Data Mining Research.
Exact Inference in Bayes Nets
 Graph Graph  Types of Graphs Types of Graphs  Data Structures to Store Graphs Data Structures to Store Graphs  Graph Definitions Graph Definitions.
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Applied Discrete Mathematics Week 12: Trees
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
Protein Domain Finding Problem Olga Russakovsky, Eugene Fratkin, Phuong Minh Tu, Serafim Batzoglou Algorithm Step 1: Creating a graph of k-mers First,
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Markov Cluster Algorithm
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Backtracking.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Minimum Spanning Trees. Subgraph A graph G is a subgraph of graph H if –The vertices of G are a subset of the vertices of H, and –The edges of G are a.
CS6800 Advanced Theory of Computation Fall 2012 Vinay B Gavirangaswamy
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE
Graph clustering Jin Chen CSE Fall 2012 MSU 1.
Programming for Geographical Information Analysis: Advanced Skills Online mini-lecture: Introduction to Networks Dr Andy Evans.
1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.
Soft Computing Lecture 18 Foundations of genetic algorithms (GA). Using of GA.
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
Liang Ge.  Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.
Module 5 – Networks and Decision Mathematics Chapter 23 – Undirected Graphs.
Researchers: Preet Bola Mike Earnest Kevin Varela-O’Hara Han Zou Advisor: Walter Rusin Data Storage Networks.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Markov Cluster (MCL) algorithm Stijn van Dongen.
ITEC 2620A Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: 2620a.htm Office: TEL 3049.
Microarrays.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Matrices Section 2.6. Section Summary Definition of a Matrix Matrix Arithmetic Transposes and Powers of Arithmetic Zero-One matrices.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Graphs. Introduction Graphs are a collection of vertices and edges Graphs are a collection of vertices and edges The solid circles are the vertices A,
Lecture 14, CS5671 Clustering Algorithms Density based clustering Self organizing feature maps Grid based clustering Markov clustering.
Class 2: Graph Theory IST402. Can one walk across the seven bridges and never cross the same bridge twice? Network Science: Graph Theory THE BRIDGES OF.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
NN k Networks for browsing and clustering image collections Daniel Heesch Communications and Signal Processing Group Electrical and Electronic Engineering.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Week 11 - Wednesday.  What did we talk about last time?  Graphs  Paths and circuits.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
 Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Graph clustering to detect network modules
Biological networks CS 5263 Bioinformatics.
DTMC Applications Ranking Web Pages & Slotted ALOHA
Graph Theory and Algorithm 02
Network analysis.
Community detection in graphs
Graph Clustering based on Random Walk
Connectivity Section 10.4.
Regulation Analysis using Restricted Boltzmann Machines
ITEC 2620M Introduction to Data Structures
SEG5010 Presentation Zhou Lanjun.
Discrete Mathematics Lecture 13_14: Graph Theory and Tree
Detecting and analysing motion
Presentation transcript:

Graph mining in bioinformatics Laur Tooming

Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are genes or proteins The meaning of an edge depends on the type of the graph –Protein-protein interaction –Gene regulation

What we’re looking for We want to find sets of genes that have a biological meaning. Idea: find graph-theoretically relevant sets of vertices and find out if they are also biologically meaningful. Simple example: connected components A more advanced idea: graph clustering. Find subgraphs that have a high edge density.

Markov Cluster Algorithm (MCL) If there is cluster structure in a graph, random walks tend to remain in a cluster for a long time Graph modelled as a stochastic matrix: sum of entries in a column is 1 a ij - probability that randomly walking out of j will go to i on the next step Bigger edge weight means greater probability of choosing that edge Stijn van Dongen, Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, May

Markov Cluster Algorithm (MCL) Two procedures, inflation and expansion, are applied alternatively Expansion: matrix squaring –c onsiders longer random walks Inflation: raising entries to some power, rescaling to remain stochastic –Weakens weak edges and strengthens strong ones Converges to a steady state

Markov Cluster Algorithm (MCL) Images from

Betweenness centrality clustering An edge between different clusters is on many shortest paths from one cluster to another. An edge inside a cluster is on less shortest paths, because there are more alternative paths inside a cluster. Betweenness centrality of an edge - the number of shortest paths in the graph containing that edge. Remove edges with the highest centrality from the graph to obtain clustering. Optimisations: –instead of all shortest paths, pick a sample of vertices and calculate shortest paths from them –remove several edges at once

GraphWeb Web interface for analysing biological graphs Simple syntax for entering graphs –multiple datasets –directed edges –edge weights Visualising graphs with GraphViz Finding biological meaning with g:Profiler ds1: A > B 10 ds2: A > B 4 ds1: B C 5 ds2: C > D 12

Combining several datasets Whether or not there is an edge between two vertices is determined in biological experiments, which may sometimes give false results. For a given graph different sources may give different information. Some sources may be more trustworthy than others. We would like to combine different sources and assess the trustworthyness of each edge in the resulting graph. Edge weight in summary graph: sum over datasets –w(e,G) = Σ w(e,G i )*w(G i )

Combining several datasets

The end