Fast Jensen-Shannon Graph Kernel Bai Lu and Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research.

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

Partitional Algorithms to Detect Complex Clusters
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Lecture 21: Spectral Clustering
On Computing Compression Trees for Data Collection in Wireless Sensor Networks Jian Li, Amol Deshpande and Samir Khuller Department of Computer Science,
Graph Based Semi- Supervised Learning Fei Wang Department of Statistical Science Cornell University.
Applied Discrete Mathematics Week 12: Trees
Efficient Generation of Minimal Graphs Using Independent Path Analysis Linda S. Humphrey 20 November 2006 Department of Computer Science and Engineering.
International Workshop on Computer Vision - Institute for Studies in Theoretical Physics and Mathematics, April , Tehran 1 IV COMPUTING SIZE.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Lists A list is a finite, ordered sequence of data items. Two Implementations –Arrays –Linked Lists.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.
Soon-Hyung Yook, Sungmin Lee, Yup Kim Kyung Hee University NSPCS 08 Unified centrality measure of complex networks.
Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.
Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Soon-Hyung Yook, Sungmin Lee, Yup Kim Kyung Hee University NSPCS 08 Unified centrality measure of complex networks: a dynamical approach to a topological.
Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame Fall 2010.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Data Structures & Algorithms Graphs
1 Heat Diffusion Classifier on a Graph Haixuan Yang, Irwin King, Michael R. Lyu The Chinese University of Hong Kong Group Meeting 2006.
Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Lecture 10: Graph-Path-Circuit
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Discrete Structures CISC 2315 FALL 2010 Graphs & Trees.
Mix networks with restricted routes PET 2003 Mix Networks with Restricted Routes George Danezis University of Cambridge Computer Laboratory Privacy Enhancing.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Presented by Alon Levin
Graph Indexing From managing and mining graph data.
NN k Networks for browsing and clustering image collections Daniel Heesch Communications and Signal Processing Group Electrical and Electronic Engineering.
Geometric diffusions as a tool for harmonic analysis and structure definition of data By R. R. Coifman et al. The second-round discussion* on * The first-round.
Ganesh J, Soumyajit Ganguly, Manish Gupta, Vasudeva Varma, Vikram Pudi
A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:
A Fast Kernel for Attributed Graphs Yu Su University of California at Santa Barbara with Fangqiu Han, Richard E. Harang, and Xifeng Yan.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Modular organization.
Random Walk for Similarity Testing in Complex Networks
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Hiroki Sayama NECSI Summer School 2008 Week 2: Complex Systems Modeling and Networks Network Models Hiroki Sayama
Applied Discrete Mathematics Week 14: Trees
Special Graphs By: Sandeep Tuli Astt. Prof. CSE.
Hans Bodlaender, Marek Cygan and Stefan Kratsch
Hongfang Wang and Edwin R. Hancock Department of Computer Science
The countable character of uncountable graphs François Laviolette Barbados 2003.
Probabilistic Data Management
Network analysis.
Object Recognition in the Dynamic Link Architecture
Graph Algorithms Using Depth First Search
Discrete Kernels.
Clustered representations: Clusters, covers, and partitions
Scale-Space Representation of 3D Models and Topological Matching
Department of Computer Science University of York
Scale-Space Representation for Matching of 3D Models
Representing Graphs Wade Trappe.
Automatic Segmentation of Data Sequences
Lecture 10: Graphs Graph Terminology Special Types of Graphs
Applied Discrete Mathematics Week 13: Graphs
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Based on slides by Y. Peng University of Maryland
Approximate Graph Mining with Label Costs
Presentation transcript:

Fast Jensen-Shannon Graph Kernel Bai Lu and Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award

Structural Variations

Protein-Protein Interaction Networks

Manipulating graphs Is structure similar (graph isomorphism, inexact match)? Is complexity similar (are graphs from same class but different in detail)? Is complexity (type of structure) uniform?

Goals Can we capture determine the similarity of structure using measures that capture their intrinsic complexity. Can graph entropies be used for this purpose. If they can then they lead naturally to information theoretic kernels and description length for learning over graph data.

Outline Literature Review: State of the Art Graph Kernels Existing graph kernel methods : Graph kernels based on a) walks, b) paths or c) subgraph or subtree structures. Prior Work: Recently we have developed on information theoretic graph kernel based on Jensen-Shannon divergence probability distributions on graphs. Fast Jensen-Shannon Graph Kernel: Based on depth depth-based subgraph representation of a graph Based around graph centroid Experiments Conclusion

Literature Review: Graph Kernels Existing Graph Kernels (i.e Graph Kernels from the R- convolution [Haussler, 1999]) fall into three classes: Restricted subgraph or subtree kernels Weisfeiler-Lehman subtree kernel [Shevashidze et al., 2009, NIPS] Random walk kernels Product graph kernels [Gartner et al., 2003, ICML] Marginalized kernels on graphs [Kashima et al., 2003, ICML] Path based kernels Shortest path kernel [Borgwardt, 2005, ICDM]

Motivation Limitations of existing graph kernel Can not scale up to substructures of large size (e.g. (sub)graphs with hundreds or even thousands vertices). Compromised to substructures of limited size and only roughly capture topological arrangement within a graph. Even for relatively small subgraphs, most graph kernels still require significant computational overheads. Aim: develop a novel subgraph kernel for efficient computation, even when a pair of fully sized subgraphs are compared.

Approach Investigate how to kernelize depth-based graph representations by similarity for K-layer subgraphs using the Jensen-Shannon divergence. Commence by showing how to compute a fast Jensen- Shannon diffusion kernel for a pair of (sub)graphs. Describe how to compute a fast depth-based graph representation., based on complexity of structure. Combine ideas to compute fast Jensen-Shannon subgraph kernel.

Notation Consider a graph, adjacency matrix has elements The vertex degree matrix of is given by Normalaised Laplacian and its spectrum

The Jensen-Shannon Diffusion Kernel Jensen-Shannon diffusion kernel for graphs: For graphs Gp and Gq, the Jensen-Shannon divergence is where is entropy of composite structure formed from two (sub)graphs being compared (here we use the disjoint union). The Jensen-Shannon diffusion kernel for Gp and Gq is where entropy H(·) is either Shannon or the von Neumann.

Composite Structure Composite entropy of disjoint union A disjoint union of a pair of graph of graphs G p and G q is Graphs G p and G q are the connected components of the disjoint union graph G DU. Let p = |V p |/|V DU | and q = |V q |/|V DU |. Entropy (i.e. the composite entropy) of G DU is

Graph Entropy: Measures of complexity Shannon entropy of random walk : The probability of a steady state random walk on visiting vertex v i is. Shannon entropy of steady state random walk is von Neumann entropy: entropy associated with normalised Laplacian eigenvalues. Approximated by (Han PRL12)

Properties The Jensen-Shannon diffusion kernel for graphs : The Jensen-Shannon diffusion kernel is positive definite (pd). This follows the definitions in [Kondor and Lafferty, 2002, ICML], if a dissimilarity measure between a pair of graphs Gp and Gq satisfies symmetry, then a diffusion kernel associated with the similarity measure is pd. Time Complexity: For a pair of graphs Gp and Gq both having n vertices, computing the Jensen-Shannon diffusion kernel requires time complexity O(n^2).

Idea Decompose graph into layered subgraphs from centroid. Use JSD to compare subgraphs. Construct kernel over subgraphs.

The Depth-Based Representation of A Graph Subgraphs from the Centroid Vertex For graph G(V,E), construct shortest path matrix matrix SG whose element SG(i, j) are the shortest path lengths between vertices vi and vj. Average-shortest-path vector SV for G(V,E) is a vector with element from vertex vi to the remaining vertices. Centroid vertex for G(V,E) as The K-layer centroid expansion subgraph where

Depth-Based Representation For a graph G, we obtain a family of centroid expansion subgraphs, the depth- based representation of G is defined as where H(·) is either the Shannon entropy or the von Neumann entropy. Measures complexity via variation of entropy with depth

The Depth-Based Representation An example of the depth-based representation for a graph from the centroid vertex

Fast Jensen-Shannon Subgraph Kernel For a pair of graphs Gp(Vp, Ep) and Gq(Vq, Eq), similarity measure is is summed over an entropy-based similarity measure for the K-layer subgraphs. Jensen-Shannon diffusion kernel is the sum of the diffusion kernel measures for all the pairs of K-layer subgraphs Jensen-Shannon subgraph kernel is pd. Because, the proposed subgraph kernel is the sum of the positive Jensen-Shannon diffusion kernel.

Times Complexity Subgraph kernel graphs for graphs with n vertices and m edges, has time complexity O(n^2L + mn), where L is the size of the largest layer of the expansion subgraph. Depth–based representation is O(n^2L+mn). Jensen-Shannon diffusion kernel is O(n^2).

Observations Advantages a) von Neumann entropy is associated with the degree variance of connected vertices. Subgraph kernel is sensitive to interconnections between vertex clusters. b) For Shannon entropy vertices with large degrees dominate the entropy. Subgraph kernel is suited to characterizing a group of highly interconnected vertices, i.e. a dominant cluster. c) The depth-based representation captures inhomogeneities of complexity with depth. Enables it go gauge structure more finely than straightforwardly applying Jensen-Shannon diffusion kernel to original graphs. d) The proposed subgraph kernel only compares the pairs of subgraphs with the same layer size K. Avoids enumerating all the pairs of subgraphs and renders an efficient computation. e) Overcomes the subgraph size restriction which arises in existing graph kernels.

Experiments ( New, not in the paper ) We evaluate the classification performance of our kernel using 10-fold cross validation associated with C-Support Vector Machine. (Intel i5 3210M 2.5GHz) Classification of graphs abstracted from bioinformatics and computer vision databases. This datasets include: GatorBait (3D shapes), DD, COIL5 (images), CATH1, CATH2. Graph kernels for comparisons include: a) our kernel: 1) using the Shannon entropy (JSSS) 2) using the von Neumann entropy (JSSV) b) Weisfeiler-Lehman subtree kernel (WL), c) the shortest path graph kernel (SPGK), d) the graphlet count kernel (GCGK)

Experiments Details of the datasets

Experiments Classification Timing

Conclusion and Further Work Conclusion Presented a fast version of our Jensen-Shannon kernel. Compares well to alternatives on standard ML datasets. Further Work Hypergraphs, alternative entropies and divergences.