A phylogenetic application of the combinatorial graph Laplacian Eric A. Stone Department of Statistics Bioinformatics Research Center North Carolina State.

Slides:

Advertisements

Similar presentations

05/11/2005 Carnegie Mellon School of Computer Science Aladdin Lamps 05 Combinatorial and algebraic tools for multigrid Yiannis Koutis Computer Science.

Advertisements

5.4 Basis And Dimension.

Chapter 4 Euclidean Vector Spaces

Chapter 8 Topics in Graph Theory

An introduction to maximum parsimony and compatibility

Covariance Matrix Applications

13 May 2009Instructor: Tasneem Darwish1 University of Palestine Faculty of Applied Engineering and Urban Planning Software Engineering Department Introduction.

Surface normals and principal component analysis (PCA)

CompSci 102 Discrete Math for Computer Science April 19, 2012 Prof. Rodger Lecture adapted from Bruce Maggs/Lecture developed at Carnegie Mellon, primarily.

Information Networks Graph Clustering Lecture 14.

Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.

Clustering II CMPUT 466/551 Nilanjan Ray. Mean-shift Clustering Will show slides from:

10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.

Symmetric Groups and Ramanujan Graphs Mike Krebs, Cal State LA (joint work with A. Shaheen)

Lecture 21: Spectral Clustering

Spectral Clustering Scatter plot of a 2D data set K-means ClusteringSpectral Clustering U. von Luxburg. A tutorial on spectral clustering. Technical report,

CS 584. Review n Systems of equations and finite element methods are related.

Totally Unimodular Matrices Lecture 11: Feb 23 Simplex Algorithm Elliposid Algorithm.

On Balanced Signed Graphs and Consistent Marked Graphs Fred S. Roberts DIMACS, Rutgers University Piscataway, NJ, USA.

Segmentation Graph-Theoretic Clustering.

EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.

1 Separator Theorems for Planar Graphs Presented by Shira Zucker.

A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.

MOHAMMAD IMRAN DEPARTMENT OF APPLIED SCIENCES JAHANGIRABAD EDUCATIONAL GROUP OF INSTITUTES.

Application of Graph Theory to OO Software Engineering Alexander Chatzigeorgiou, Nikolaos Tsantalis, George Stephanides Department of Applied Informatics.

Discrete Mathematics Lecture 9 Alexander Bukharovich New York University.

Graphs, relations and matrices

Applied Discrete Mathematics Week 10: Equivalence Relations

Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.

Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University

C&O 355 Mathematical Programming Fall 2010 Lecture 17 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.

1 Preliminaries Precalculus Review I Precalculus Review II

Subdivision of Edge In a graph G, subdivision of an edge uv is the operation of replacing uv with a path u,w,v through a new vertex w.

Gaussian Elimination, Rank and Cramer

Structure Preserving Embedding Blake Shaw, Tony Jebara ICML 2009 (Best Student Paper nominee) Presented by Feng Chen.

Krakow, Summer 2011 Comparability Graphs William T. Trotter

Matrix Completion Problems for Various Classes of P-Matrices Leslie Hogben Department of Mathematics, Iowa State University, Ames, IA 50011

Lecture7 Topic1: Graph spectral analysis/Graph spectral clustering and its application to metabolic networks Topic 2: Different centrality measures of.

O PTIMALITY OF THE N EIGHBOR J OINING A LGORITHM AND F ACES OF THE B ALANCED M INIMUM E VOLUTION P OLYTOPE David Haws Joint work with Ruriko Yoshida and.

Lecture 5: Mathematics of Networks (Cont) CS 790g: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.

Three different ways There are three different ways to show that ρ(A) is a simple eigenvalue of an irreducible nonnegative matrix A:

Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009.

The countable character of uncountable graphs François Laviolette Barbados 2003.

Matrix Completion Problems for Various Classes of P-Matrices Leslie Hogben Department of Mathematics, Iowa State University, Ames, IA 50011

Spectral Sequencing Based on Graph Distance Rong Liu, Hao Zhang, Oliver van Kaick {lrong, haoz, cs.sfu.ca {lrong, haoz, cs.sfu.ca.

Reflexivity in some classes of multicyclic treelike graphs Bojana Mihailović, Zoran Radosavljević, Marija Rašajski Faculty of Electrical Engineering, University.

Graph Theory and Applications

Domain decomposition in parallel computing Ashok Srinivasan Florida State University.

Graphs Lecture 2. Graphs (1) An undirected graph is a triple (V, E, Y), where V and E are finite sets and Y:E g{X V :| X |=2}. A directed graph or digraph.

Spectral Graph Theory and the Inverse Eigenvalue Problem of a Graph Leslie Hogben Department of Mathematics, Iowa State University, Ames, IA 50011

 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.

D EPARTMENT /S EMESTER (ECE – III SEM) NETWORK THEORY SECTION-D Manav Rachna University 1.

Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”

1 Chapter 8 – Symmetric Matrices and Quadratic Forms Outline 8.1 Symmetric Matrices 8.2Quardratic Forms 8.3Singular ValuesSymmetric MatricesQuardratic.

C&O 355 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.

COMPSCI 102 Introduction to Discrete Mathematics.

Mesh Segmentation via Spectral Embedding and Contour Analysis Speaker: Min Meng

A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.

Tutorial 6. Eigenvalues & Eigenvectors Reminder: Eigenvectors A vector x invariant up to a scaling by λ to a multiplication by matrix A is called.

Spectral partitioning works: Planar graphs and finite element meshes

Eigenvalues and Eigenvectors

The countable character of uncountable graphs François Laviolette Barbados 2003.

Degree and Eigenvector Centrality

Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE

Segmentation Graph-Theoretic Clustering.

Spectral Clustering Eric Xing Lecture 8, August 13, 2010

Great Theoretical Ideas In Computer Science

Discrete Mathematics for Computer Science

Matrices and Determinants

Presentation transcript:

A phylogenetic application of the combinatorial graph Laplacian Eric A. Stone Department of Statistics Bioinformatics Research Center North Carolina State University

My motivation for this project Trees in statistics or biology –Often a latent branching structure relating some observed data Trees in mathematics –Always a connected graph with no cycles

My motivation for this project Trees in statistics or biology –PROBLEM: Recover properties of latent branching structure Trees in mathematics –Always a connected graph with no cycles

My motivation for this project Trees in statistics or biology –PROBLEM: Recover properties of latent branching structure Trees in mathematics –Characterization of observed structure by spectral graph theory

My motivation for this project Trees in statistics or biology –PROBLEM: Recover properties of latent branching structure Trees in mathematics –Characterization of observed structure by spectral graph theory

Bridging the gap Rectifying trees and trees Can we use some powerful tools of spectral graph theory to recover latent structure? –Natural relationship between trees and complete graphs?!?

Tree and distance matrices The tree with vertex set {1,…,8} has distance matrix D The phylogenetic tree can only be observed at {1,…,5} –We can only observe (estimate) the phylogenetic portion D* The phylogenetic portion D*

More motivation for this project Trees in statistics or biology –PROBLEM: Recover properties of latent branching structure Given D* only, recover latent branching structure –This is the problem of phylogenetic reconstruction (w/o error!) The phylogenetic portion D*

NJ finds (2,n-2) splits from D* A split is a bipartition of the leaf set (e.g. {1,2,3,4,5}) that can be induced by cutting a branch on the tree –e.g. {{1,2},{3,4,5}} or {{1,2,5},{3,4}} Neighbor-joining criterion identifies (2,n-2) splits through {{1,2},{3,4,5}}{{1,2,5},{3,4}}

A recipe for tree reconstruction from D* 1.Find a split –NJ relies on theorem that guarantees (2,n-2) split from Q matrix 2.Use knowledge of split to reduce dimension –NJ prunes the cherry (neighboring taxa) to reduce leaves by one 3.Iterate until tree has been fully reconstructed –Tree topology specified by its split set

Our narrow goal 1.Find a split –NJ relies on theorem that guarantees (2,n-2) split from Q matrix –Hypothesize criterion that identifies deeper splits … and prove that it actually works

Our solution The phylogenetic portion D*

Our solution Let H be the centering matrix: Find eigenvector Y of HD*H with the smallest eigenvalue –The signs of the entries of Y identify a split of the tree The phylogenetic portion D*

About the matrix HD*H Entries of HD*H are D ij – D i. – D.j + D.. HD*H is negative semidefinite –Zero is a simple eigenvalue with unit eigenvector –Entries of remaining eigenvalues have both + and - entries HD*H appears prominently in: –Multidimensional scaling –Principal coordinate analysis

Example of our solution Find eigenvector Y of HD*H with the smallest eigenvalue: Signs of Y identify the split {{1,2},{3,4,5}}

A real example (data from ToL) Two iterations

Our solution 1.Find a split –NJ relies on theorem that guarantees (2,n-2) split from Q matrix –Hypothesize criterion that identifies deep splits … and prove that it actually works

Affinity and distance In phylogenetics, common to consider pairwise distances –In graph theory, common to consider pairwise affinities Distance-based Affinity-based

Distance matrix Laplacian matrix

The genius of Miroslav Fiedler G connected smallest eigenvalue of L, zero, is simple –Smallest positive eigenvalue,, called algebraic connectivity of G Fiedler vectors Y satisfy LY= Y –Fiedler cut is the sign-induced bipartition

The genius of Miroslav Fiedler G connected smallest eigenvalue of L, zero, is simple –Smallest positive eigenvalue,, called algebraic connectivity of G Fiedler vectors Y satisfy LY= Y –Fiedler cut is the sign-induced bipartition Fiedler cut here is –{{1,2,6},{3,4,5,7,8}} Note that the cut implies a leaf split: –{{1,2},{3,4,5}}

Is this relevant here? We do not observe an 8x8 Laplacian matrix L –All we get is a 5x5 matrix of between-leaf pairwise distances D* Where is the connection to graph theory? The phylogenetic portion D*

Recall: Our solution Let H be the centering matrix: Find eigenvector Y of HD*H with the smallest eigenvalue –The signs of the entries of Y identify a split of the tree The phylogenetic portion D*

An extremely useful relationship Recall the centering matrix H –The (Moore-Penrose) pseudoinverse of HDH is in fact -2L We have shown in the context of this formula –Principal submatrices of D relate to Schur complements of L In particular, (HD*H) + = -2L* = -2(L/Z) = -2(W – XZ T Y), where WX Z Y

Recall: Our solution Find eigenvector Y of HD*H with the smallest eigenvalue –The signs of the entries of Y identify a split of the tree The smallest eigenvalue of HD*H (negative semidefinite) is the smallest positive eigenvalue of L* In fact, L* can be seen as a graph Laplacian –And our solution, Y, is the Fiedler vector of that graph! But what does this graph look like?

Schur complementation of a vertex The vertices adjacent to 8 become adjacent to each other

Schur complementation of the interior The graph described by L* is fully connected –All cuts yield connected subgraphs No help from Fiedler

Recap thus far Given matrix D* of pairwise distances between leaves Find eigenvector Y of HD*H with the smallest eigenvalue –Claim: The signs of the entries of Y identify a split of the tree Y shown to be a Fiedler vector of the Laplacian L* –But graph of L* is fully connected, has no apparent structure Thus Fiedler says nothing about signs of entries of Y –But claim requires signs to be consistent with structure of the tree

Recap thus far Thus Fiedler says nothing about signs of entries of Y –But claim requires signs to be consistent with structure of the tree How does L* inherit the structure of the tree? NO YES

The quotient rule inspires a Schur tower

How does this help?

Cutpoints and connected components A point of articulation (or cutpoint) is a point r G whose deletion yields a subgraph with 2 connected components –Cutpoints: 6,7,8 –Shown: {1}, {2}, {3,4,5,7,8} are connected components at 6 The cutpoints of a tree are its internal nodes

The key observation (i.e. theorem) Let L be the Laplacian of a graph G with some cutpoint v –Let L {v} be the Laplacian of G {v} obtained by Schur complement at v Then the Fiedler cut G {v} identifies a split of G –Here the Fiedler cut of G {6} is {{1,2,5,8},{3,4,7}} –Including 6 in {1,2,5,8} defines two connected components in G + G G {6} ?

The quotient rule inspires a Schur tower How does this help? Look at Schur paths to graph with Laplacian L* L L*

The punch line The graph with Laplacian L* can be obtained in three ways The Fiedler cut of G {6,7,8} must split G {6,7} and G {6,8} and G {7,8}

The punch line The graph with Laplacian L* can be obtained in three ways The Fiedler cut of G {6,7,8} must split G {6,7} and G {6,8} and G {7,8}

Recall: Example Find eigenvector Y of HD*H with the smallest eigenvalue: Signs of Y identify the split {{1,2},{3,4,5}}

The punch line The graph with Laplacian L* can be obtained in three ways The Fiedler cut of G {6,7,8} must split G {6,7} and G {6,8} and G {7,8} This implies that the cut splits the progenitor graph G! {{1,2,6},{3,4,5,7,8}}

Our solution actually works Let H be the centering matrix: Find eigenvector Y of HD*H with the smallest eigenvalue –The signs of the entries of Y identify a split of the tree The phylogenetic portion D*

A recipe for tree reconstruction 1.Find a split –NJ relies on theorem that guarantees (2,n-2) split from Q matrix –We have a theorem that guarantees splits from HD*H matrix 2.Use knowledge of split to reduce dimension –NJ prunes the cherry (neighboring taxa) to reduce leaves by one –We use a divisive method that reduces to pairs of subtrees 3.Iterate until tree has been fully reconstructed –Tree topology specified by its split set

Reconstruction from the inside out

Connections with Classical MDS and PCoA Classical solution to multidimensional scaling –a.k.a. Principal coordinate analysis Recipe for dimension reduction given distance matrix D: 1.Construct matrix A from D entrywise: x -x 2 /2 2.Double centering: B = HAH 3.Find k largest eigenvalues i of B with corresponding eigenvectors X i 4.Coordinates of point P r given by row r of eigenvector entries k = 1 with sqrt of tree distance equivalent to our approach

Phylogenetic ordination PCoA on sequence data with k = 3: –For appropriate distance, C1 (x-axis) guaranteed to split taxa at 0 Our results support popular use of PCoA –Provided that the right distance is considered…

Conclusion I Natural connection between matrix of pairwise distances and the Laplacian of a complete graph

Conclusion II Structure of tree embedded in complete graph and recoverable via spectral theory Notion of Fiedler cut extends concept to Fiedler split –Inheritance propagated through Schur tower NO YES

Conclusion III Results inspire fast divisive tree reconstruction method

Conclusion IV Provides guidance and justification for ordination approach

Acknowledgements Alex Griffing (NCSU Bioinformatics) Carl Meyer (NCSU Math) Amy Langville (CoC Math)

Cutpoints and Perron components Each connected component identifies a principal submatrix Each such principal submatrix is inverse positive –Implies that the inverse has a Perron value that is simple –The Perron component is that with the largest Perron value

Cutpoints and Perron components INVERSE PRINCIPAL SUBMATRICES = 1 =.5 = 7.49 PERRON COMPONENT

The key observation Take Schur complement of L at cutpoint, e.g. 6 Consider Fiedler vector of derived Laplacian –Signs of entries outside Perron component are positive (+) –Signs of entries inside Perron component indeterminate (+/-) INVERSE PRINCIPAL SUBMATRICES = 1 =.5 = 7.49 PERRON COMPONENT SCHUR GRAPH AT /-