SEG5010 Presentation Zhou Lanjun.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Sequence Similarity Searching Class 4 March 2010.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Similar Sequence Similar Function Charles Yan Spring 2006.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles Jin Chen Sep 2012.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Identification of Regulatory Binding Sites Using Minimum Spanning Trees Pacific Symposium on Biocomputing, pp , 2003 Reporter: Chu-Ting Tseng Advisor:
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
Construction of Substitution Matrices
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.
Introduction to biological molecular networks
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Shortest Path Analysis and 2nd-Order Analysis Ming-Chih Kao U of M Medical School
Construction of Substitution matrices
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Cohesive Subgraph Computation over Large Graphs
Finding Dense and Connected Subgraphs in Dual Networks
CSCI2950-C Lecture 12 Networks
Semi-Supervised Clustering
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
Groups of vertices and Core-periphery structure
Joshua M. Stuart, Eran Segal, Daphne Koller, Stuart K. Kim
Biological networks CS 5263 Bioinformatics.
Research in Computational Molecular Biology , Vol (2008)
William Norris Professor and Head, Department of Computer Science
1 Department of Engineering, 2 Department of Mathematics,
William Norris Professor and Head, Department of Computer Science
1 Department of Engineering, 2 Department of Mathematics,
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
1 Department of Engineering, 2 Department of Mathematics,
Clustering.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
FUNCTIONAL ANNOTATION OF REGULATORY PATHWAYS
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Approximate Graph Mining with Label Costs
Clustering.
Distance-Constraint Reachability Computation in Uncertain Graphs
Presentation transcript:

SEG5010 Presentation Zhou Lanjun

A graph-based approach to systematically reconstruct human transcriptional regulatory modules Xifeng Yan et al. ISMB 2007 SEG5010 Presentation 2019/2/24

Problem Gene regulation includes the processes that cells and viruses use to turn the information in genes into gene products. (Wikipedia) Commonly approach Derive coexpression clusters from a microarray dataset (http://en.wikipedia.org/wiki/DNA_microarray ) Mining coexpression clusters from multiple microarray datasets across diverse conditions are more likely to form a transcription module ? SEG5010 Presentation 2019/2/24

Problem Mining frequent dense vertexset (FDVS) The vertex set {d, e, f, g} is a frequent dense vertexset because >80% of the vertex pairs are connected in at least 2 out of the 4 graphs (thick lines) SEG5010 Presentation 2019/2/24

Problem Why not directly use the summary graph? One of the two dense subgraphs in the summary graph, {a, b, c, d}, is not dense in any original graph. Noise may become indistinguishable SEG5010 Presentation 2019/2/24

Problem Formulation SEG5010 Presentation 2019/2/24

Problem Formulation SEG5010 Presentation 2019/2/24

Mining frequent dense vertexsets Important Observation: Given m graphs, a frequent dense vertexset with density δ and frequency θ must form a subgraph with density >= δθm in the summary graph. We can start from the summary graph and mine its dense subgraphs first SEG5010 Presentation 2019/2/24

Mining frequent dense vertexsets cont’d SEG5010 Presentation 2019/2/24

Mining frequent dense vertexsets cont’d Benefits Significantly shrink the search space Provide a good starting point for the refinement process Defects False patterns Fail in splitting large infrequent dense vertexsets Might break a true dense vertexset in half SEG5010 Presentation 2019/2/24

Mining frequent dense vertexsets cont’d Discussion of noise tolerant G' : noise graph G* : real graph G : observed graph The chance for a noise edge to have weight >=θm in a summary graph is: SEG5010 Presentation 2019/2/24

Mining frequent dense vertexsets cont’d The expected number of k-vertex dense subgraphs that could be formed by noise edges: p=b(m,θ,q) s= k(k−1)/2 P(k,l,d) : the probability that a k-vertex l-edge graph has minimum degree d (derived through simulation) Very sensitive to p SEG5010 Presentation 2019/2/24

Mining frequent dense vertexsets cont’d Solutions: Divide the coexpression graphs into small groups (reduce m) Re-weight summary graph to reduce the weights of noise edges SEG5010 Presentation 2019/2/24

Pipeline of NeMo SEG5010 Presentation 2019/2/24

Partitioning Group together graphs that likely contain at least one FDVS. SEG5010 Presentation 2019/2/24

Re-weight edges in summary graph Traditional summary graph w : the number of edges shared by the two vertices across m graphs Proposed method : ‘neighbor association’ summary graph Intuition: if two vertices share many small frequent dense subgraphs, likely these two vertices come from the same dense vertexset SEG5010 Presentation 2019/2/24

Re-weight edges in summary graph (cont’d) Graphlets There is a connection between the density of a graph and its k-graphlets SEG5010 Presentation 2019/2/24

Re-weight edges in summary graph (cont’d) Let score(u,v) be the weight of edge(u,v) in a neighbor association graph. If u and v are in the same dense subgraph, score(u,v) should be close to 1 If u and v are not in the same dense subgraph, score(u,v) should be smaller If u and v do not share any dense k-graphlet, score(u,v) should be set to 0 SEG5010 Presentation 2019/2/24

Re-weight edges in summary graph (cont’d) Given two vertices, u and v in a large clique with n vertices, the maximum number of k-graphlets they share is , after normalization this value is: When n >> k, (4) is close to 1 SEG5010 Presentation 2019/2/24

Re-weight edges in summary graph (cont’d) Let πu be the set of frequent dense (k −1)-vertexlets that contain vertex u and πu,v be the set of frequent dense k-vertexlets that contain u and v. Define: Note : score(u,v) is not equal to score(v,u) SEG5010 Presentation 2019/2/24

Re-weight edges in summary graph (cont’d) SEG5010 Presentation 2019/2/24

Experiment Setup 105 human microarray datasets An edge exists between two genes if their expression correlation with a p-value less than 0.01 significant (Zhou et al., 2002) Top 2% (justified by Equation 2) most significant correlations with a p-value less than 0.01 are included in each graph in this study SEG5010 Presentation 2019/2/24

Experimental results SEG5010 Presentation 2019/2/24

Comparison with other approaches SEG5010 Presentation 2019/2/24

Comparison with other approaches SEG5010 Presentation 2019/2/24

Conclusions A novel graph-based algorithm, NeMo, to efficiently mine the frequent dense vertexsets in a set of coexpression graphs. Demonstration of NeMo’s application in identifying frequent coexpression clusters across many microarray datasets NeMo can also be applied to other biological relational graphs for finding approximate network modules. SEG5010 Presentation 2019/2/24

Conserved pathways within bacteria and yeast as revealed by global protein network alignment Brian P. Kelley et al. PNAS2003, Vol. 100, no. 20 SEG5010 Presentation 2019/2/24

Concepts Pathway Refers to a sequence of protein–protein interactions forming a connected path in the network SEG5010 Presentation 2019/2/24

Problem Given that protein sequences and structures are conserved in and among species, are networks of protein interactions conserved as well? Is there some minimal set of interaction pathways required for all species? Can we measure evolutionary distance at the level of network connectivity rather than at the level of DNA or protein sequence? Mounting evidence suggests that conserved protein interaction pathways indeed exist and may be ubiquitous SEG5010 Presentation 2019/2/24

Method An efficient computational procedure for aligning two protein interaction networks to identify their conserved interaction pathways.(PATHBLAST) SEG5010 Presentation 2019/2/24

Overview of The PATHBLAST Algorithm Two networks are combined into a global alignment graph Vertex: a pair of proteins (one from each) having at least weak sequence similarity (BLAST E<=10-2) Edge: Conserved interaction SEG5010 Presentation 2019/2/24

BLAST BLAST is one of the most widely used bioinformatics programs, because it addresses a fundamental problem and the algorithm emphasizes speed over sensitivity. Which bacterial species have a protein that is related in lineage to a certain protein with known amino-acid sequence? Where does a certain sequence of DNA originate? What other genes encode proteins that exhibit structures or motifs such as ones that have just been determined? SEG5010 Presentation 2019/2/24

The PATHBLAST Algorithm cont’d Scoring function of a path P: p(v) : probability of true homology with in the protein pair represented by v q(e) : probability that the protein-protein interactions represented by e are real prandom and qrandom : expected values of p(v) and q(e) overall vertices and edges in G SEG5010 Presentation 2019/2/24

The PATHBLAST Algorithm cont’d p(v) is computed using Bayes’ rule H : the event of true homology between the proteins represented by v p(Ev) : the frequency of each E value over all v in G p(Ev|H) : based on E values within the subset of vertices for which both proteins are in the same cluster of orthologous groups (COG) p(H) : overall frequency of vertices with proteins that are in the same COG SEG5010 Presentation 2019/2/24

The PATHBLAST Algorithm cont’d q(e) of each edge is computed from the underlying probabilities of protein-protein interactions it represents. This paper estimate q(e) using the number of independent experimental studies reporting it and then compute as the product of these probabilities SEG5010 Presentation 2019/2/24

The PATHBLAST Algorithm cont’d Alignment Procedure Identify the highest-scoring pathway alignment P* of fixed length L (L vertices and L−1 edges) If G is directed and acyclic, can be accomplished in linear time by using DP Base case is: SEG5010 Presentation 2019/2/24

The PATHBLAST Algorithm cont’d Unfortunately, G is not generally acyclic Construct a sufficient number of directed acyclic subgraphs (5L!) then compute highest-scoring paths for each SEG5010 Presentation 2019/2/24

Thanks! Q&A SEG5010 Presentation 2019/2/24