Presenter: Saurabh Verma, PhD Candidate (2015-Present)

Slides:

Advertisements

Similar presentations

Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.

Advertisements

Triangle partition problem Jian Li Sep,2005.  Proposed by Redstar in Algorithm board in Fudan BBS.  Motivated by some network design strategy.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Fast Jensen-Shannon Graph Kernel Bai Lu and Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research.

Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona.

Maximizing the Spread of Influence through a Social Network

Two Technique Papers on High Dimensionality Allan Rempel December 5, 2005.

One-Shot Multi-Set Non-rigid Feature-Spatial Matching

Graph Based Semi- Supervised Learning Fei Wang Department of Statistical Science Cornell University.

1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)

1 Numerical geometry of non-rigid shapes Spectral Methods Tutorial. Spectral Methods Tutorial 6 © Maks Ovsjanikov tosca.cs.technion.ac.il/book Numerical.

Diffusion Maps and Spectral Clustering

Lecture7 Topic1: Graph spectral analysis/Graph spectral clustering and its application to metabolic networks Topic 2: Concept of Line Graphs Topic 3: Introduction.

Wei Wang Xi’an Jiaotong University Generalized Spectral Characterization of Graphs: Revisited Shanghai Conference on Algebraic Combinatorics (SCAC), Shanghai,

On comparison of different approaches to the stability radius calculation Olga Karelkina Department of Mathematics University of Turku MCDM 2011.

IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

1 Panther: Fast Top-K Similarity Search on Large Networks Jing Zhang 1, Jie Tang 1, Cong Ma 1, Hanghang Tong 2, Yu Jing 1, and Juanzi Li 1 1 Department.

Spectral Sequencing Based on Graph Distance Rong Liu, Hao Zhang, Oliver van Kaick {lrong, haoz, cs.sfu.ca {lrong, haoz, cs.sfu.ca.

Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.

Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.

Secret Sharing Non-Shannon Information Inequalities Presented in: Theory of Cryptography Conference (TCC) 2009 Published in: IEEE Transactions on Information.

Presented by Alon Levin

Geometric diffusions as a tool for harmonic analysis and structure definition of data By R. R. Coifman et al. The second-round discussion* on * The first-round.

A Fast Kernel for Attributed Graphs Yu Su University of California at Santa Barbara with Fangqiu Han, Richard E. Harang, and Xifeng Yan.

Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.

Fundamental Graph Theory (Lecture 1) Lectured by Hung-Lin Fu 傅恆霖 Department of Applied Mathematics National Chiao Tung University.

DeepWalk: Online Learning of Social Representations

Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.

Spectral Methods for Dimensionality

Nonlinear Dimensionality Reduction

CS 9633 Machine Learning Support Vector Machines

Queensland University of Technology

Intrinsic Data Geometry from a Training Set

We propose a method which can be used to reduce high dimensional data sets into simplicial complexes with far fewer points which can capture topological.

Gyan Ranjan University of Minnesota, MN

Markov Chains Mixing Times Lecture 5

Solving Linear Systems Ax=b

Supervised Time Series Pattern Discovery through Local Importance

Analysis of bio-molecular networks through RANKS (RAnking of Nodes

کاربرد نگاشت با حفظ تنکی در شناسایی چهره

Maximal Independent Set

Spectral Methods Tutorial 6 1 © Maks Ovsjanikov

Graph Analysis by Persistent Homology

Generative models preserving community structure

Hyper-parameter tuning for graph kernels via Multiple Kernel Learning

Discrete Kernels.

Structural Properties of Low Threshold Rank Graphs

Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE

Section 7.12: Similarity By: Ralucca Gera, NPS.

Learning with information of features

Chapter 6. Large Scale Optimization

Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.

Discovering Functional Communities in Social Media

Generative models preserving community structure

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.

On the effect of randomness on planted 3-coloring models

Spectral Clustering Eric Xing Lecture 8, August 13, 2010

Classical Algorithms from Quantum and Arthur-Merlin Communication Protocols Lijie Chen MIT Ruosong Wang CMU.

Binghui Wang, Le Zhang, Neil Zhenqiang Gong

Asymmetric Transitivity Preserving Graph Embedding

Jongik Kim1, Dong-Hoon Choi2, and Chen Li3

Graph Neural Networks Amog Kamsetty January 30, 2019.

Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.

Keshav Balasubramanian

Approximate Graph Mining with Label Costs

Learning and Memorization

Chapter 6. Large Scale Optimization

Complexity Theory: Foundations

Graph Convolutional Neural Networks

Presentation transcript:

Hunt For The Unique, Stable, Sparse and Fast Feature Learning on Graphs* Presenter: Saurabh Verma, PhD Candidate (2015-Present) University of Minnesota Twin Cities Advisor: Zhi-Li Zhang *To Appear at 31st Conference on Neural Information Processing Systems (NIPS 2017). Authors: Saurabh Verma, Zhi-Li Zhang

Graphs are Everywhere… Social Network Biological Network Chemical Network Web Network Ecological Network

Node Label Classification Learning Problems on Graph(s) Node Label Classification Graph Classification

Taxonomy of Graph Learning Graph(s) Node Classification Graph Classification Embedding Technique Graph Kernels Convolution Networks Graph Spectrum Deep Walk1 LINE2 Node2Vector3 RandW1 MLG2 WL3 PATCHY1 MCNNs2 DCNNs3 Skew1 Graphlet2 FGSD Spectrum

Taxonomy of Graph Learning Graph(s) Node Classification Graph Classification Embedding Technique Graph Kernels Convolution Networks Graph Spectrum Deep Walk1 LINE2 Node2Vector3 RandW1 MLG2 WL3 PATCHY1 MCNNs2 DCNNs3 Skew1 Graphlet2 FGSD Spectrum

Taxonomy of Graph Learning Graph(s) Node Classification Graph Classification Embedding Technique Graph Kernels Convolution Networks Graph Spectrum Deep Walk1 LINE2 Node2Vector3 RandW1 MLG2 WL3 PATCHY1 MCNNs2 DCNNs3 Skew1 Graphlet2 FGSD Spectrum

Taxonomy of Graph Learning Graph(s) Node Classification Graph Classification Embedding Technique Graph Kernels Convolution Networks Graph Spectrum Deep Walk1 LINE2 Node2Vector3 RandW1 MLG2 WL3 PATCHY1 MCNNs2 DCNNs3 Skew1 Graphlet2 FGSD Spectrum [1] Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. "Deepwalk: Online learning of social representations." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014. [2] Tang, Jian, et al. "Line: Large-scale information network embedding." Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2015. [3] Grover, Aditya, and Jure Leskovec. "node2vec: Scalable feature learning for networks." Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2016.

Taxonomy of Graph Learning Graph(s) Node Classification Graph Classification Embedding Technique Graph Kernels Convolution Networks Graph Spectrum Deep Walk1 LINE2 Node2Vector3 RandW1 MLG2 WL3 PATCHY1 MCNNs2 DCNNs3 Skew1 Graphlet2 FGSD Spectrum [1] Gärtner, Thomas, Peter Flach, and Stefan Wrobel. "On graph kernels: Hardness results and efficient alternatives." Learning Theory and Kernel Machines (2003): 129-143. [2] Kondor, Risi, and Horace Pan. "The multiscale Laplacian graph kernel." Advances in Neural Information Processing Systems. 2016. [3] Shervashidze, Nino, et al. "Weisfeiler-lehman graph kernels." Journal of Machine Learning Research 12.Sep (2011): 2539-2561.

Taxonomy of Graph Learning Graph(s) Node Classification Graph Classification Embedding Technique Graph Kernels Convolution Networks Graph Spectrum Deep Walk1 LINE2 Node2Vector3 RandW1 MLG2 WL3 PATCHY1 MCNNs2 DCNNs3 Skew1 Graphlet2 FGSD Spectrum [1] Niepert, Mathias, Mohamed Ahmed, and Konstantin Kutzkov. "Learning convolutional neural networks for graphs." International Conference on Machine Learning. 2016. [2] Duvenaud, David K., et al. "Convolutional networks on graphs for learning molecular fingerprints." Advances in neural information processing systems. 2015. [3] Atwood, James, and Don Towsley. "Diffusion-convolutional neural networks." Advances in Neural Information Processing Systems. 2016.

Taxonomy of Graph Learning Graph(s) Node Classification Graph Classification Embedding Technique Graph Kernels Convolution Networks Graph Spectrum Deep Walk1 LINE2 Node2Vector3 RandW1 MLG2 WL3 PATCHY1 MCNNs2 DCNNs3 Skew1 Graphlet2 FGSD Spectrum [1] Kondor, Risi, and Karsten M. Borgwardt. "The skew spectrum of graphs." Proceedings of the 25th international conference on Machine learning. ACM, 2008. [2] Kondor, Risi, Nino Shervashidze, and Karsten M. Borgwardt. "The graphlet spectrum." Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 2009.

Taxonomy of Graph Learning Graph(s) Node Classification Graph Classification Embedding Technique Graph Kernels Convolution Networks Graph Spectrum Deep Walk1 LINE2 Node2Vector3 RandW1 MLG2 WL3 PATCHY1 MCNNs2 DCNNs3 Skew1 Graphlet2 FGSD Spectrum (To Appear) Saurabh Verma and Zhi-li Zhang. ``Hunting For a Unique, Stable, Sparse and Fast Feature Algorithm on Graphs”. In 31st Conference on Neural Information Processing Systems (NIPS 2017).

We focus on Graph Classification Problem… ≡ ? Fundamental Problem: How to measure Similarity between two Graphs? Dates back to solving Graph Isomorphism Problem. Proved by Babai (2017) that Graph isomorphism is quasi-polynomial.

We focus on Graph Classification Problem… ? ≡ Fundamental Problem: How to measure Similarity between two Graphs? Dates back to solving Graph Isomorphism Problem. Proved by Babai (2017) that Graph isomorphism is quasi-polynomial, so far...

We focus on Graph Classification Problem… ? ≡ Fundamental Problem: How to measure Similarity between two Graphs? Dates back to solving Graph Isomorphism Problem. Proved by Babai (2017) that Graph isomorphism is quasi-polynomial, so far...

Current Polynomial Alternatives… Graph Kernels & Convolutional Networks Cons: Bypasses computing any explicit graph representation. No theoretical justification on which sub-structure(s) needs to exploited or compared. Computationally expensive in many cases.

Popular Graph Spectrums Polynomial Alternatives…Our Approach…Graph Spectrum Popular Graph Spectrums (only few are there!) Eigenvalue Spectrum of Graph Laplacian. Lose way too much information about graph structure. Skew and Graphlet Spectrum: Based on group theory. Computationally expensive ranges from 𝒪( 𝑛 3 ) 𝑡𝑜 𝒪( 𝑛 6 ).

Popular Graph Spectrums Polynomial Alternatives…Our Approach…Graph Spectrum Popular Graph Spectrums (only few are there!) Eigenvalue Spectrum of Graph Laplacian. Lose way too much information about graph structure. Skew and Graphlet Spectrum: Based on group theory. Computationally expensive ranges from 𝒪( 𝑛 3 ) 𝑡𝑜 𝒪( 𝑛 6 ).

≡ Our Spectrum Idea: Simple yet Powerful! Graph atomic structure (or spectrum) is encoded in the multiset of all nodes pairwise distances. Example: Spectrum 𝓡 based on shortest path distance ≡ 𝓡={0,0,0,0,1,1,2,2,3,3,3,3,…}

But what distance one should consider on a Graph?

Welcome To The Family of Graph Spectral Distances Based on Graph Spectral Properties (Graph Laplacian 𝐿) and denoted as FGSD. 𝑆𝑓(𝑥,𝑦)= 𝑘=0 𝑁−1 𝑓(λ𝑘)(Φ𝑘 𝑥 −Φ𝑘 𝑦 )2 𝑓-spectral distance 𝑓 λ𝑘 is a bijective function of eigenvalues Φ𝑘 𝑦 is 𝑘 𝑡ℎ eigenvector value of 𝑦−node. 𝑥 𝑦 𝛷 0 (𝑥) 𝛷 1 (𝑥) ... 𝛷 𝑁−1 (𝑥) 𝛷 0 (𝑦) 𝛷 1 (𝑦) ... 𝛷 𝑁−1 (𝑦) 𝑆𝑓 𝑥,𝑦 = 𝑓(λ0) 𝛷 0 (𝑥) 𝑓 𝜆1 𝛷 1 (𝑥) ... 𝑓 λ 𝑁−1 𝛷 𝑁−1 (𝑥) − 𝑓 𝜆0 𝛷 0 (𝑦) 𝑓 𝜆1 𝛷 1 (𝑦) ... 𝑓 λ 𝑁−1 𝛷 𝑁−1 (𝑦) 2

What’s so special about this family of distance? Depending upon 𝑓 λ , FGSD can capture different type of information about graph sub-structures.

What’s so special about this family of distance? Depending upon 𝑓 λ , FGSD can capture different type of information about graph sub-structures. Captures Local Sub-Structure Information: For 𝑓 λ = λp p≥0 , 𝑆𝑓(𝑥,𝑦) takes only p-hop local neighborhood information.

What’s so special about this family of distance? Depending upon 𝑓 λ , FGSD can capture different type of information about graph sub-structures. Captures Local Sub-Structure Information: For 𝑓 λ = λp p≥0 , 𝑆𝑓(𝑥,𝑦) takes only p-hop local neighborhood information. 𝑥 𝑦 Example: if 𝑓 λ = λ, then 𝑆𝑓 𝑥,𝑦 = 𝑑 𝑥 + 𝑑 𝑦 −2 I 𝑒𝑑𝑔𝑒 𝑥,𝑦

What’s so special about this family of distance? Depending upon 𝑓 λ , FGSD can capture different type of information about graph sub-structures. Captures Local Sub-Structure Information: For 𝑓 λ = λp p≥0 , 𝑆𝑓(𝑥,𝑦) takes only p-hop local neighborhood information. 𝑥 In general for 𝑓 λ = λp, we have 𝑆𝑓 𝑥,𝑦 = 𝐿 𝑥𝑥 𝑃 + 𝐿 𝑦𝑦 𝑃 −2 𝐿 𝑥𝑦 𝑃 𝑦

What’s so special about this family of distance? Depending upon 𝑓 λ , FGSD can capture different type of information about graph sub-structures. Captures Global Structure Information: For 𝑓 λ = 1/λp p≥0 , 𝑆𝑓(𝑥,𝑦) accounts for all possible paths from x to y on a graph.

What’s so special about this family of distance? Depending upon 𝑓 λ , FGSD can capture different type of information about graph sub-structures. Captures Global Structure Information: For 𝑓 λ = 1/λp p≥0 , 𝑆𝑓(𝑥,𝑦) accounts for all possible paths from x to y on a graph. 1,3,5 Path 2,4,6 3,4 Path 5,6 In general for 𝑓 λ = 1/λp, we have 𝑆𝑓 𝑥,𝑦 =(𝐿+)𝑝𝑥𝑥+(𝐿+)𝑝𝑦𝑦−2(𝐿+)𝑝xy 𝑦 𝑥

Some Known Graph Distances…Derived From FGSD 𝑆𝑓(𝑥,𝑦) is effective resistance or harmonic distance. 𝑓 λ = 1 λ 𝑆𝑓(𝑥,𝑦) is biharmonic distance. 𝑓 λ = 1 λ2 𝑆𝑓(𝑥,𝑦) is heat diffusion distance. 𝑓 λ = 𝑒 −2λ

Some Known Graph Distances…Derived From FGSD 𝑆𝑓(𝑥,𝑦) is effective resistance or harmonic distance. 𝑓 λ = 1 λ 𝑆𝑓(𝑥,𝑦) is biharmonic distance. 𝑓 λ = 1 λ2 𝑆𝑓(𝑥,𝑦) is heat diffusion distance. 𝑓 λ = 𝑒 −2λ

Some Known Graph Distances…Derived From FGSD 𝑆𝑓(𝑥,𝑦) is effective resistance or harmonic distance. 𝑓 λ = 1 λ 𝑆𝑓(𝑥,𝑦) is biharmonic distance. 𝑓 λ = 1 λ2 𝑆𝑓(𝑥,𝑦) is heat diffusion distance. 𝑓 λ = 𝑒 −2λ

Properties of FGSD Graph Spectrum Uniqueness Graph Invariance Stability Sparsity Fast Computation

And… Hunt for the best 𝑓 λ function that can exhibit these desired properties!

Uniqueness of FGSD Graph Spectrum First, why we seek uniqueness property? “Uniqueness” give us the confidence about how “good the FGSD elements are” in determining the graph structure. Theorem (Uniqueness of FGSD): The 𝑓-spectral distance matrix uniquely determines the underlying graph and each graph has unique 𝑆𝑓 upto permutation i.e. 𝑆 𝐺 1 =𝑃 𝑆 𝐺 2 𝑃 𝑇 for some permutation matrix P, if G1 and G2 are isomorphic graphs. Proof Idea: Derive 𝑆𝑓(𝑥,𝑦) explicitly in terms of graph Laplacian L and requires 𝑓 λ to be a bijective function.

Uniqueness of FGSD Graph Spectrum First, why we seek uniqueness property? “Uniqueness” give us the confidence about how “good the FGSD elements are” in determining the graph structure. Theorem (Uniqueness of FGSD): The 𝑓-spectral distance matrix uniquely determines the underlying graph and each graph has unique 𝑆𝑓 upto permutation i.e. 𝑆 𝐺 1 =𝑃 𝑆 𝐺 2 𝑃 𝑇 for some permutation matrix P, if G1 and G2 are isomorphic graphs. Proof Idea: Derive 𝑆𝑓(𝑥,𝑦) explicitly in terms of graph Laplacian L and requires 𝑓 λ to be a bijective function.

Uniqueness of FGSD Graph Spectrum First, why we seek uniqueness property? “Uniqueness” give us the confidence about how “good the FGSD elements are” in determining the graph structure. Theorem (Uniqueness of FGSD): The 𝑓-spectral distance matrix uniquely determines the underlying graph and each graph has unique 𝑆𝑓 upto permutation i.e. 𝑆 𝐺 1 =𝑃 𝑆 𝐺 2 𝑃 𝑇 for some permutation matrix P, if G1 and G2 are isomorphic graphs. Proof Idea: Derive 𝑆𝑓(𝑥,𝑦) explicitly in terms of graph Laplacian L and requires 𝑓 λ to be a bijective function.

Uniqueness of FGSD Graph Spectrum First, why we seek uniqueness property? “Uniqueness” give us the confidence about how “good the FGSD elements are” in determining the graph structure. Theorem (Uniqueness of FGSD): The 𝑓-spectral distance matrix uniquely determines the underlying graph and each graph has unique 𝑆𝑓 upto permutation i.e. 𝑆 𝐺 1 =𝑃 𝑆 𝐺 2 𝑃 𝑇 for some permutation matrix P, if G1 and G2 are isomorphic graphs. Proof Idea: Derive 𝑆𝑓(𝑥,𝑦) explicitly in terms of graph Laplacian L and requires 𝑓 λ to be a bijective function.

Uniqueness of FGSD Graph Spectrum First, why we seek uniqueness property? “Uniqueness” give us the confidence about how “good the FGSD elements are” in determining the graph structure. Theorem (Uniqueness of FGSD): The 𝑓-spectral distance matrix uniquely determines the underlying graph and each graph has unique 𝑆𝑓 upto permutation i.e. 𝑆 𝐺 1 =𝑃 𝑆 𝐺 2 𝑃 𝑇 for some permutation matrix P, if G1 and G2 are isomorphic graphs. Implications: FGSD Graph Spectrum is invariant to permutation of graph vertex labels.

Unfortunately… Converting 𝑓-spectral distance matrix into a multiset breakdowns the uniqueness property to a certain extent. Otherwise, we would have 𝒪( 𝑛 2 ) polynomial algorithm to solve graph isomorphism problem! 2 1 3 4 5 ? 𝓡={0,0,0,0,1,1, …} FGSD Graph Spectrum FGSD Matrix

Unfortunately… Converting 𝑓-spectral distance matrix into a multiset breakdowns the uniqueness property to certain extent… Otherwise, we would have 𝒪( 𝑛 2 ) polynomial algorithm to solve graph isomorphism problem! 2 1 3 4 5 ? 𝓡={0,0,0,0,1,1, …} FGSD Graph Spectrum FGSD Matrix

But…There is lot of hope! Spectral Hope Each FGSD element is rich in itself (as each element encodes certain graph sub-structure). Empirical Hope Biharmonic Spectrum is unique upto atleast 10-nodes graphs (∼11 million graphs) and may be beyond... Harmonic Spectrum is unique upto atleast 8-nodes graphs and … Theoretical Hope (Uniqueness of Graph Harmonic Spectrum): If two graphs G1 and G2 have the same number of nodes but different number of edges, i.e. |V1|=|V2| but |E1| ≠ |E2| then with respect to harmonic spectrum 𝓡(G1) ≠ 𝓡(G2). Proof Idea: Based on the fact that harmonic distance is a monotonic function with respect to adding (or removing) edges.

But…There is lot of hope! Spectral Hope Each FGSD element is rich in itself (as each element encodes certain graph sub-structure). Empirical Hope Biharmonic Spectrum is unique upto atleast 10-nodes graphs (∼11 million graphs) and may be beyond... Harmonic Spectrum is unique upto atleast 8-nodes graphs and … Theoretical Hope (Uniqueness of Graph Harmonic Spectrum): If two graphs G1 and G2 have the same number of nodes but different number of edges, i.e. |V1|=|V2| but |E1| ≠ |E2| then with respect to harmonic spectrum 𝓡(G1) ≠ 𝓡(G2). Proof Idea: Based on the fact that harmonic distance is a monotonic function with respect to adding (or removing) edges.

But…There is lot of hope! Spectral Hope Each FGSD element is rich in itself (as each element encodes certain graph sub-structure). Empirical Hope Biharmonic Spectrum is unique upto atleast 10-nodes graphs (∼11 million graphs) and may be beyond... Harmonic Spectrum is unique upto atleast 8-nodes graphs and … Theoretical Hope (Uniqueness of Graph Harmonic Spectrum): If two graphs G1 and G2 have the same number of nodes but different number of edges, i.e. |V1|=|V2| but |E1| ≠ |E2| then with respect to harmonic spectrum 𝓡(G1) ≠ 𝓡(G2). Proof Idea: Based on the fact that harmonic distance is a monotonic function with respect to adding (or removing) edges.

But…There is lot of hope! Spectral Hope Each FGSD element is rich in itself (as each element encodes certain graph sub-structure). Empirical Hope Biharmonic Spectrum is unique upto atleast 10-nodes graphs (∼11 million graphs) and may be beyond... Harmonic Spectrum is unique upto atleast 8-nodes graphs and … Theoretical Hope (Uniqueness of Graph Harmonic Spectrum): If two graphs G1 and G2 have the same number of nodes but different number of edges, i.e. |V1|=|V2| but |E1| ≠ |E2| then with respect to harmonic spectrum 𝓡(G1) ≠ 𝓡(G2). Proof Idea: Based on the fact that harmonic distance is a monotonic function with respect to adding (or removing) edges.

But…There is lot of hope! Spectral Hope Each FGSD element is rich in itself (as each element encodes certain graph sub-structure). Empirical Hope Biharmonic Spectrum is unique upto atleast 10-nodes graphs (∼11 million graphs) and may be beyond... Harmonic Spectrum is unique upto atleast 8-nodes graphs and … Theoretical Hope (Uniqueness of Graph Harmonic Spectrum): If two graphs G1 and G2 have the same number of nodes but different number of edges, i.e. |V1|=|V2| but |E1| ≠ |E2| then with respect to harmonic spectrum 𝓡(G1) ≠ 𝓡(G2). Proof Idea: Based on the fact that harmonic distance is a monotonic function with respect to adding (or removing) edges.

Relationship Between FGSD and Graph Embedding and Dimension Reduction A Side Adventure… Relationship Between FGSD and Graph Embedding and Dimension Reduction Graph Embedding Let Ѱ= Φ𝑓 Λ 𝟏/𝟐 , where 𝑓( Λ 𝟏/𝟐 )= [diag 𝑓 λ ]. Then 𝑓-spectral distance is 𝑆𝑓(𝑥,𝑦) = Ѱ 𝑥 −Ѱ 𝑦 2 2 . Ѱ can be seen as a Graph Embedding. Dimension Reduction Taking the first p−columns of Ѱ gives Family of Spectral Embedding. Get Laplacian Eigenmaps by setting 𝑓 λ =1 and 𝐿 𝑟𝑤 = 𝐷 −1 𝐿. Get Diffusion Maps by setting 𝑓 λ = λ 2𝑡 and 𝐿 𝑟𝑤 = 𝐷 −1 𝐿.

Relationship Between FGSD and Graph Embedding and Dimension Reduction A Side Adventure… Relationship Between FGSD and Graph Embedding and Dimension Reduction Graph Embedding Let Ѱ= Φ𝑓 Λ 𝟏/𝟐 , where 𝑓( Λ 𝟏/𝟐 )= [diag 𝑓 λ ]. Then 𝑓-spectral distance is 𝑆𝑓(𝑥,𝑦) = Ѱ 𝑥 −Ѱ 𝑦 2 2 . Ѱ can be seen as a Graph Embedding. Dimension Reduction Taking the first p−columns of Ѱ gives Family of Spectral Embedding. Get Laplacian Eigenmaps by setting 𝑓 λ =1 and 𝐿 𝑟𝑤 = 𝐷 −1 𝐿. Get Diffusion Maps by setting 𝑓 λ = λ 2𝑡 and 𝐿 𝑟𝑤 = 𝐷 −1 𝐿.

Relationship Between FGSD and Graph Embedding and Dimension Reduction A Side Adventure… Relationship Between FGSD and Graph Embedding and Dimension Reduction Theorem (Uniqueness of FGSD Graph Embedding): Each graph can be isometrically embedded into a Euclidean space using FGSD as an isometric measure. On the top, this embedding is unique under certain conditions. Graph Euclidean Embedding Recover Graph Potential Benefits Embedding can be used for Node Label Classification or can be served as Node2Vector tool.

Relationship Between FGSD and Graph Embedding and Dimension Reduction A Side Adventure… Relationship Between FGSD and Graph Embedding and Dimension Reduction Theorem (Uniqueness of FGSD Graph Embedding): Each graph can be isometrically embedded into a Euclidean space using FGSD as an isometric measure. On the top, this embedding is unique under certain conditions. Graph Euclidean Embedding Recover Graph Potential Benefits Embedding can be used for Node Label Classification or can be served as Node2Vector tool.

Stability of FGSD Graph Spectrum Why stability is important? We want FGSD Graph Spectrum to be stable under small perturbation (or noise) in graph structure. Theorem (Eigenfunction Stability of FGSD): Let Δ𝑆𝑥𝑦 be the change in 𝑓-spectral distance with respect to change Δω on the weight of any single edge on the graph. Then, Δ𝑆𝑥𝑦≤2(|𝑓( 𝜆 𝑛−1 +2Δω)−𝑓(𝜆1)| Proof Idea: Based on rank-one perturbation analysis of Graph Laplacian.

Stability of FGSD Graph Spectrum Why stability is important? We want FGSD Graph Spectrum to be stable under small perturbation (or noise) in graph structure. Theorem (Eigenfunction Stability of FGSD): Let Δ𝑆𝑥𝑦 be the change in 𝑓-spectral distance with respect to change Δω on the weight of any single edge on the graph. Then, Δ𝑆𝑥𝑦≤2(|𝑓( 𝜆 𝑛−1 +2Δω)−𝑓(𝜆1)| Proof Idea: Based on rank-one perturbation analysis of Graph Laplacian.

Why stability is important? Eigenfunction Stability of FGSD Graph Spectrum Why stability is important? We want FGSD Graph Spectrum to be stable under small perturbation (or noise) in graph structure. Theorem (Eigenfunction Stability of FGSD): Let Δ𝑆𝑥𝑦 be the change in 𝑓-spectral distance with respect to change Δω on the weight of any single edge on the graph. Then, Δ𝑆𝑥𝑦≤2(|𝑓( 𝜆 𝑛−1 +2Δω)−𝑓(𝜆1)| Proof Idea: Based on rank-one perturbation analysis of Graph Laplacian.

Why stability is important? Eigenfunction Stability of FGSD Graph Spectrum Why stability is important? We want FGSD Graph Spectrum to be stable under small perturbation (or noise) in graph structure. Theorem (Eigenfunction Stability of FGSD): Let Δ𝑆𝑥𝑦 be the change in 𝑓-spectral distance with respect to change Δω on the weight of any single edge on the graph. Then, Δ𝑆𝑥𝑦≤2(|𝑓( 𝜆 𝑛−1 +2Δω)−𝑓(𝜆1)| Proof Idea: Based on rank-one perturbation analysis of Graph Laplacian.

Why stability is important? Eigenfunction Stability of FGSD Graph Spectrum Why stability is important? We want FGSD Graph Spectrum to be stable under small perturbation (or noise) in graph structure. Theorem (Eigenfunction Stability of FGSD): Let Δ𝑆𝑥𝑦 be the change in 𝑓-spectral distance with respect to change Δω on the weight of any single edge on the graph. Then, Δ𝑆𝑥𝑦≤2(|𝑓( 𝜆 𝑛−1 +2Δω)−𝑓(𝜆1)| Implications: Decreasing function of 𝑓(𝜆) (such as 1 𝜆𝑝 ) are more stable.

Uniform Stability of FGSD Graph Spectrum Theorem (Uniform Stability of FGSD): Let 𝐸[𝑆(𝑥,𝑦)] be the expected of 𝑓-spectral distance over all possible graphs with fixed ordering of 𝑛 vertices and bounded weights. Then, with probability 1-δ, where δ Є(0,1) we have, |𝑆 𝑥,𝑦 −𝐸 𝑆 𝑥,𝑦 |≤2𝑓(θ) 𝑛(𝑛−1) ) log 1 δ Proof Idea: Based on concentration inequalities (e.g. McDiarmid). x y x y x y 𝐸[𝑆(𝑥,𝑦)]

Uniform Stability of FGSD Graph Spectrum Theorem (Uniform Stability of FGSD): Let 𝐸[𝑆(𝑥,𝑦)] be the expected of 𝑓-spectral distance over all possible graphs with fixed ordering of 𝑛 vertices and bounded weights. Then, with probability 1-δ, where δ Є(0,1) we have, |𝑆 𝑥,𝑦 −𝐸 𝑆 𝑥,𝑦 |≤2𝑓(θ) 𝑛(𝑛−1) ) log 1 δ Proof Idea: Based on concentration inequalities (e.g. McDiarmid). x y x y x y 𝐸[𝑆(𝑥,𝑦)]

Uniform Stability of FGSD Graph Spectrum Theorem (Uniform Stability of FGSD): Let 𝐸[𝑆(𝑥,𝑦)] be the expected of 𝑓-spectral distance over all possible graphs with fixed ordering of 𝑛 vertices and bounded weights. Then, with probability 1-δ, where δ Є(0,1) we have, |𝑆 𝑥,𝑦 −𝐸 𝑆 𝑥,𝑦 |≤2𝑓(θ) 𝑛(𝑛−1) ) log 1 δ Implications: Suggests strong stability of 𝑓-spectral distance over all possible graphs. x y x y x y 𝐸[𝑆(𝑥,𝑦)]

Sparsity of FGSD Graph Spectrum Conjecture (Sparsity of FGSD Spectrum): Let |𝓡(𝑓(𝜆) )|G represents the number of unique or distinct elements present in the multiset of 𝓡. Then, following holds |𝓡(𝑓(𝜆) )|G≥|𝓡( 1 𝜆 )|G+2 Conclusion: Harmonic Spectrum (𝑓(𝜆) = 1 𝜆 ) produce most the sparse features.

Sparsity of FGSD Graph Spectrum Conjecture (Sparsity of FGSD Spectrum): Let |𝓡(𝑓(𝜆) )|G represents the number of unique or distinct elements present in the multiset of 𝓡. Then, following holds |𝓡(𝑓(𝜆) )|G≥|𝓡( 1 𝜆 )|G+2 Conclusion: Harmonic Spectrum (𝑓(𝜆) = 1 𝜆 ) produce the most sparse features.

Further Empirical Evidence: Harmonic vs. Biharmonic Feature Space Label 1 Label 2 (Matrix sparsity= 97.12%) Label 1 Label 2 (Matrix sparsity= 94.28%)

Fast Computation of FGSD Graph Spectrum Recall 𝑆𝑓 𝑥,𝑦 =𝑓(𝐿)𝑥𝑥+𝑓(𝐿)𝑦𝑦−2𝑓(𝐿)xy Or 𝑆𝑓 𝑥,𝑦 =𝑓(𝐿+)𝑥𝑥+𝑓(𝐿+)𝑦𝑦−2𝑓(𝐿+)xy How to compute 𝑓(𝐿) or 𝑓(𝐿+) without performing the eigenvalue decomposition which cost 𝒪(𝑛3)?

Fast Computation of FGSD Graph Spectrum Recall 𝑆𝑓 𝑥,𝑦 =𝑓(𝐿)𝑥𝑥+𝑓(𝐿)𝑦𝑦−2𝑓(𝐿)xy Or 𝑆𝑓 𝑥,𝑦 =𝑓(𝐿+)𝑥𝑥+𝑓(𝐿+)𝑦𝑦−2𝑓(𝐿+)xy How to compute 𝑓(𝐿) or 𝑓(𝐿+) without performing the eigenvalue decomposition which cost 𝒪(𝑛3)?

Fast Computation of FGSD Graph Spectrum Approximation Recipe (for computing 𝑓(𝐿)) Decompose 𝑓(𝜆)= 𝑖=0 𝑟 𝑎𝑖𝑇𝑖(𝜆) into an approximate polynomial series (e.g. Chebyshev polynomials). This results in 𝑓(𝐿)= 𝑖=0 𝑟 𝑎𝑖𝑇𝑖(𝐿) . For sparse 𝐿, computation cost is 𝒪(𝑟|𝐸|) ≪ 𝒪(𝑛2). Proposed Efficient Exact Computation (for 𝑓(𝐿+)) 𝑓(𝐿) matrix has a structure property, 𝑓(𝐿)𝑓(𝐿+)−1=𝐼− 𝐽 𝑛 . Problem turns into solving a sparse linear system. Results in 𝒪(𝑛2) complexity.

Fast Computation of FGSD Graph Spectrum Approximation Recipe (for computing 𝑓(𝐿)) Decompose 𝑓(𝜆)= 𝑖=0 𝑟 𝑎𝑖𝑇𝑖(𝜆) into an approximate polynomial series (e.g. Chebyshev polynomials). This results in 𝑓(𝐿)= 𝑖=0 𝑟 𝑎𝑖𝑇𝑖(𝐿) . For sparse 𝐿, computation cost is 𝒪(𝑟|𝐸|) ≪ 𝒪(𝑛2). Proposed Efficient Exact Computation (for 𝑓(𝐿+)) 𝑓(𝐿) matrix has a structure property, 𝑓(𝐿)𝑓(𝐿+)−1=𝐼− 𝐽 𝑛 . Problem turns into solving a sparse linear system. Results in 𝒪(𝑛2) complexity.

FGSD Graph Spectrum to Graph Feature Vector Finally, we take histogram of FGSD Graph Spectrum to construct graph feature vector since histogram can inherent all these properties.

How good is our complexity in Graph Classification World? SP GK (𝑘>3) MLG (ñ<n) SGS DCNN GS (𝑘>1) FGSD Approximate — 𝒪(𝑛 𝑑 𝑘−1 ) 𝒪( ñ 3 ) 𝒪(|𝐸|) Worst-Case 𝒪(𝑛3) 𝒪( 𝑛 𝑘 ) 𝒪( 𝑛 3 ) 𝒪( 𝑛 2 ) 𝒪( 𝑛 2+𝑘 )

Dominance of FGSD Graph Features (Datasets) Type No. of Graphs Avg. of Nodes Max. of Nodes Class Labels MUTAG Bioinformatic 188 17.9 28 2 PTC 344 25.5 109 PROTEINS 1113 39.1 620 NCI1 4110 29.8 111 NCI109 4127 29.6 D&D 1178 284 5784 MOA 68 18 27 COLLAB Social Network 5000 74.49 492 3 IMDB-BINARY 1000 19.77 136 IMDB-MULTI 1500 13 89 REDDIT-BINARY 2000 429.61 4117 REDDIT-Multi 508.5 3690 5

Sate-Of-The-Art Comparison State-of Art Algorithm Type Published@ Random Walk (RW) Graph Kernel COLT (2003) Shortest Path (SP) ICDM (2005) Skew Spectrum (SGS) Graph Spectrum ICML (2008) Graphlet Spectrum (GS) ICML (2009) Graphlet Kernel (GK) AISTATS (2009) Weisfeiler-Lehman (WL) JMLR (2011) Deep Graph Kernels (DGK) SIGKDD (2015) Multi-Laplacian Kernel (MLG) NIPS (2016) Diffusion Convolutional Network (DCNNs) Convolutional Neural Networks PATCHY Convolutional Network (PSCN) ICML (2016)

Graph Classification Results

Graph Classification Results

Conclusion and Future Work We proposed a conceptually simple yet powerful and theoretically motivated graph representation. Our graph representation based on the discovery of family of graph spectral distances can exhibit certain uniqueness, stability, sparsity and are computationally fast. Our hunt specifically leads to the harmonic distance as an ideal member of this family for extracting graph features. In our future work, we plan to generalize the FGSD for labeled dataset in order to utilize the useful node and edge label information in the graph representation.

Conclusion and Future Work We proposed a conceptually simple yet powerful and theoretically motivated graph representation. Our graph representation based on the discovery of family of graph spectral distances can exhibit certain uniqueness, stability, sparsity and are computationally fast. Our hunt specifically leads to the harmonic distance as an ideal member of this family for extracting graph features. In our future work, we plan to generalize the FGSD for labeled dataset in order to utilize the useful node and edge label information in the graph representation.

Conclusion and Future Work We proposed a conceptually simple yet powerful and theoretically motivated graph representation. Our graph representation based on the discovery of family of graph spectral distances can exhibit certain uniqueness, stability, sparsity and are computationally fast. Our hunt specifically leads to the harmonic distance as an ideal member of this family for extracting graph features. In our future work, we plan to generalize the FGSD for labeled dataset in order to utilize the useful node and edge label information in the graph representation.

Conclusion and Future Work We proposed a conceptually simple yet powerful and theoretically motivated graph representation. Our graph representation based on the discovery of family of graph spectral distances can exhibit certain uniqueness, stability, sparsity and are computationally fast. Our hunt specifically leads to the harmonic distance as an ideal member of this family for extracting graph features. In our future work, we plan to generalize the FGSD for labeled dataset in order to utilize the useful node and edge label information in the graph representation.

Code Available: https://github.com/vermaMachineLearning/FGSD Supplementary Material: http://www-users.cs.umn.edu/~verma/publications.html

dwedwdew Thank You! Questions?