Presenter: Saurabh Verma, PhD Candidate (2015-Present)

Hunt For The Unique, Stable, Sparse and Fast Feature Learning on Graphs*
Presenter: Saurabh Verma, PhD Candidate (2015-Present) University of Minnesota Twin Cities Advisor: Zhi-Li Zhang *To Appear at 31st Conference on Neural Information Processing Systems (NIPS 2017). Authors: Saurabh Verma, Zhi-Li Zhang

Graphs are Everywhere…
Social Network Biological Network Chemical Network Web Network Ecological Network

Node Label Classification
Learning Problems on Graph(s) Node Label Classification Graph Classification

Taxonomy of Graph Learning
Graph(s) Node Classification Graph Classification Embedding Technique Graph Kernels Convolution Networks Graph Spectrum Deep Walk1 LINE2 Node2Vector3 RandW1 MLG2 WL3 PATCHY1 MCNNs2 DCNNs3 Skew1 Graphlet2 FGSD Spectrum

Graph(s) Node Classification Graph Classification Embedding Technique Graph Kernels Convolution Networks Graph Spectrum Deep Walk1 LINE2 Node2Vector3 RandW1 MLG2 WL3 PATCHY1 MCNNs2 DCNNs3 Skew1 Graphlet2 FGSD Spectrum [1] Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. "Deepwalk: Online learning of social representations." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014. [2] Tang, Jian, et al. "Line: Large-scale information network embedding." Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2015. [3] Grover, Aditya, and Jure Leskovec. "node2vec: Scalable feature learning for networks." Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2016.

Graph(s) Node Classification Graph Classification Embedding Technique Graph Kernels Convolution Networks Graph Spectrum Deep Walk1 LINE2 Node2Vector3 RandW1 MLG2 WL3 PATCHY1 MCNNs2 DCNNs3 Skew1 Graphlet2 FGSD Spectrum [1] Gärtner, Thomas, Peter Flach, and Stefan Wrobel. "On graph kernels: Hardness results and efficient alternatives." Learning Theory and Kernel Machines (2003): [2] Kondor, Risi, and Horace Pan. "The multiscale Laplacian graph kernel." Advances in Neural Information Processing Systems [3] Shervashidze, Nino, et al. "Weisfeiler-lehman graph kernels." Journal of Machine Learning Research 12.Sep (2011):

Graph(s) Node Classification Graph Classification Embedding Technique Graph Kernels Convolution Networks Graph Spectrum Deep Walk1 LINE2 Node2Vector3 RandW1 MLG2 WL3 PATCHY1 MCNNs2 DCNNs3 Skew1 Graphlet2 FGSD Spectrum [1] Niepert, Mathias, Mohamed Ahmed, and Konstantin Kutzkov. "Learning convolutional neural networks for graphs." International Conference on Machine Learning [2] Duvenaud, David K., et al. "Convolutional networks on graphs for learning molecular fingerprints." Advances in neural information processing systems [3] Atwood, James, and Don Towsley. "Diffusion-convolutional neural networks." Advances in Neural Information Processing Systems

Graph(s) Node Classification Graph Classification Embedding Technique Graph Kernels Convolution Networks Graph Spectrum Deep Walk1 LINE2 Node2Vector3 RandW1 MLG2 WL3 PATCHY1 MCNNs2 DCNNs3 Skew1 Graphlet2 FGSD Spectrum [1] Kondor, Risi, and Karsten M. Borgwardt. "The skew spectrum of graphs." Proceedings of the 25th international conference on Machine learning. ACM, 2008. [2] Kondor, Risi, Nino Shervashidze, and Karsten M. Borgwardt. "The graphlet spectrum." Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 2009.

Graph(s) Node Classification Graph Classification Embedding Technique Graph Kernels Convolution Networks Graph Spectrum Deep Walk1 LINE2 Node2Vector3 RandW1 MLG2 WL3 PATCHY1 MCNNs2 DCNNs3 Skew1 Graphlet2 FGSD Spectrum (To Appear) Saurabh Verma and Zhi-li Zhang. ``Hunting For a Unique, Stable, Sparse and Fast Feature Algorithm on Graphs”. In 31st Conference on Neural Information Processing Systems (NIPS 2017).

We focus on Graph Classification Problem…
≡ ? Fundamental Problem: How to measure Similarity between two Graphs? Dates back to solving Graph Isomorphism Problem. Proved by Babai (2017) that Graph isomorphism is quasi-polynomial.

We focus on Graph Classification Problem…
? ≡ Fundamental Problem: How to measure Similarity between two Graphs? Dates back to solving Graph Isomorphism Problem. Proved by Babai (2017) that Graph isomorphism is quasi-polynomial, so far...

Current Polynomial Alternatives…
Graph Kernels & Convolutional Networks Cons: Bypasses computing any explicit graph representation. No theoretical justification on which sub-structure(s) needs to exploited or compared. Computationally expensive in many cases.

Popular Graph Spectrums
Polynomial Alternatives…Our Approach…Graph Spectrum Popular Graph Spectrums (only few are there!) Eigenvalue Spectrum of Graph Laplacian. Lose way too much information about graph structure. Skew and Graphlet Spectrum: Based on group theory. Computationally expensive ranges from 𝒪( 𝑛 3 ) 𝑡𝑜 𝒪( 𝑛 6 ).

≡ Our Spectrum Idea: Simple yet Powerful!
Graph atomic structure (or spectrum) is encoded in the multiset of all nodes pairwise distances. Example: Spectrum 𝓡 based on shortest path distance ≡ 𝓡={0,0,0,0,1,1,2,2,3,3,3,3,…}

But what distance one should consider on a Graph?

Welcome To The Family of Graph Spectral Distances
Based on Graph Spectral Properties (Graph Laplacian 𝐿) and denoted as FGSD. 𝑆𝑓(𝑥,𝑦)= 𝑘=0 𝑁−1 𝑓(λ𝑘)(Φ𝑘 𝑥 −Φ𝑘 𝑦 )2 𝑓-spectral distance 𝑓 λ𝑘 is a bijective function of eigenvalues Φ𝑘 𝑦 is 𝑘 𝑡ℎ eigenvector value of 𝑦−node. 𝑥 𝑦 𝛷 0 (𝑥) 𝛷 1 (𝑥) ... 𝛷 𝑁−1 (𝑥) 𝛷 0 (𝑦) 𝛷 1 (𝑦) ... 𝛷 𝑁−1 (𝑦) 𝑆𝑓 𝑥,𝑦 = 𝑓(λ0) 𝛷 0 (𝑥) 𝑓 𝜆1 𝛷 1 (𝑥) ... 𝑓 λ 𝑁−1 𝛷 𝑁−1 (𝑥) − 𝑓 𝜆0 𝛷 0 (𝑦) 𝑓 𝜆1 𝛷 1 (𝑦) ... 𝑓 λ 𝑁−1 𝛷 𝑁−1 (𝑦) 2

What’s so special about this family of distance?
Depending upon 𝑓 λ , FGSD can capture different type of information about graph sub-structures.

Depending upon 𝑓 λ , FGSD can capture different type of information about graph sub-structures. Captures Local Sub-Structure Information: For 𝑓 λ = λp p≥0 , 𝑆𝑓(𝑥,𝑦) takes only p-hop local neighborhood information.

Depending upon 𝑓 λ , FGSD can capture different type of information about graph sub-structures. Captures Local Sub-Structure Information: For 𝑓 λ = λp p≥0 , 𝑆𝑓(𝑥,𝑦) takes only p-hop local neighborhood information. 𝑥 𝑦 Example: if 𝑓 λ = λ, then 𝑆𝑓 𝑥,𝑦 = 𝑑 𝑥 + 𝑑 𝑦 −2 I 𝑒𝑑𝑔𝑒 𝑥,𝑦

Depending upon 𝑓 λ , FGSD can capture different type of information about graph sub-structures. Captures Local Sub-Structure Information: For 𝑓 λ = λp p≥0 , 𝑆𝑓(𝑥,𝑦) takes only p-hop local neighborhood information. 𝑥 In general for 𝑓 λ = λp, we have 𝑆𝑓 𝑥,𝑦 = 𝐿 𝑥𝑥 𝑃 + 𝐿 𝑦𝑦 𝑃 −2 𝐿 𝑥𝑦 𝑃 𝑦

Depending upon 𝑓 λ , FGSD can capture different type of information about graph sub-structures. Captures Global Structure Information: For 𝑓 λ = 1/λp p≥0 , 𝑆𝑓(𝑥,𝑦) accounts for all possible paths from x to y on a graph.

Depending upon 𝑓 λ , FGSD can capture different type of information about graph sub-structures. Captures Global Structure Information: For 𝑓 λ = 1/λp p≥0 , 𝑆𝑓(𝑥,𝑦) accounts for all possible paths from x to y on a graph. 1,3,5 Path 2,4,6 3,4 Path 5,6 In general for 𝑓 λ = 1/λp, we have 𝑆𝑓 𝑥,𝑦 =(𝐿+)𝑝𝑥𝑥+(𝐿+)𝑝𝑦𝑦−2(𝐿+)𝑝xy 𝑦 𝑥

Some Known Graph Distances…Derived From FGSD
𝑆𝑓(𝑥,𝑦) is effective resistance or harmonic distance. 𝑓 λ = 1 λ 𝑆𝑓(𝑥,𝑦) is biharmonic distance. 𝑓 λ = 1 λ2 𝑆𝑓(𝑥,𝑦) is heat diffusion distance. 𝑓 λ = 𝑒 −2λ

Properties of FGSD Graph Spectrum
Uniqueness Graph Invariance Stability Sparsity Fast Computation

And… Hunt for the best 𝑓 λ function that can exhibit these desired properties!

Uniqueness of FGSD Graph Spectrum
First, why we seek uniqueness property? “Uniqueness” give us the confidence about how “good the FGSD elements are” in determining the graph structure. Theorem (Uniqueness of FGSD): The 𝑓-spectral distance matrix uniquely determines the underlying graph and each graph has unique 𝑆𝑓 upto permutation i.e. 𝑆 𝐺 1 =𝑃 𝑆 𝐺 2 𝑃 𝑇 for some permutation matrix P, if G1 and G2 are isomorphic graphs. Proof Idea: Derive 𝑆𝑓(𝑥,𝑦) explicitly in terms of graph Laplacian L and requires 𝑓 λ to be a bijective function.

Uniqueness of FGSD Graph Spectrum
First, why we seek uniqueness property? “Uniqueness” give us the confidence about how “good the FGSD elements are” in determining the graph structure. Theorem (Uniqueness of FGSD): The 𝑓-spectral distance matrix uniquely determines the underlying graph and each graph has unique 𝑆𝑓 upto permutation i.e. 𝑆 𝐺 1 =𝑃 𝑆 𝐺 2 𝑃 𝑇 for some permutation matrix P, if G1 and G2 are isomorphic graphs. Implications: FGSD Graph Spectrum is invariant to permutation of graph vertex labels.

Unfortunately… Converting 𝑓-spectral distance matrix into a multiset breakdowns the uniqueness property to a certain extent. Otherwise, we would have 𝒪( 𝑛 2 ) polynomial algorithm to solve graph isomorphism problem! 2 1 3 4 5 ? 𝓡={0,0,0,0,1,1, …} FGSD Graph Spectrum FGSD Matrix

Unfortunately… Converting 𝑓-spectral distance matrix into a multiset breakdowns the uniqueness property to certain extent… Otherwise, we would have 𝒪( 𝑛 2 ) polynomial algorithm to solve graph isomorphism problem! 2 1 3 4 5 ? 𝓡={0,0,0,0,1,1, …} FGSD Graph Spectrum FGSD Matrix

But…There is lot of hope!
Spectral Hope Each FGSD element is rich in itself (as each element encodes certain graph sub-structure). Empirical Hope Biharmonic Spectrum is unique upto atleast 10-nodes graphs (∼11 million graphs) and may be beyond... Harmonic Spectrum is unique upto atleast 8-nodes graphs and … Theoretical Hope (Uniqueness of Graph Harmonic Spectrum): If two graphs G1 and G2 have the same number of nodes but different number of edges, i.e. |V1|=|V2| but |E1| ≠ |E2| then with respect to harmonic spectrum 𝓡(G1) ≠ 𝓡(G2). Proof Idea: Based on the fact that harmonic distance is a monotonic function with respect to adding (or removing) edges.

Relationship Between FGSD and Graph Embedding and Dimension Reduction
A Side Adventure… Relationship Between FGSD and Graph Embedding and Dimension Reduction Graph Embedding Let Ѱ= Φ𝑓 Λ 𝟏/𝟐 , where 𝑓( Λ 𝟏/𝟐 )= [diag 𝑓 λ ]. Then 𝑓-spectral distance is 𝑆𝑓(𝑥,𝑦) = Ѱ 𝑥 −Ѱ 𝑦 Ѱ can be seen as a Graph Embedding. Dimension Reduction Taking the first p−columns of Ѱ gives Family of Spectral Embedding. Get Laplacian Eigenmaps by setting 𝑓 λ =1 and 𝐿 𝑟𝑤 = 𝐷 −1 𝐿. Get Diffusion Maps by setting 𝑓 λ = λ 2𝑡 and 𝐿 𝑟𝑤 = 𝐷 −1 𝐿.

Relationship Between FGSD and Graph Embedding and Dimension Reduction
A Side Adventure… Relationship Between FGSD and Graph Embedding and Dimension Reduction Theorem (Uniqueness of FGSD Graph Embedding): Each graph can be isometrically embedded into a Euclidean space using FGSD as an isometric measure. On the top, this embedding is unique under certain conditions. Graph Euclidean Embedding Recover Graph Potential Benefits Embedding can be used for Node Label Classification or can be served as Node2Vector tool.

Stability of FGSD Graph Spectrum
Why stability is important? We want FGSD Graph Spectrum to be stable under small perturbation (or noise) in graph structure. Theorem (Eigenfunction Stability of FGSD): Let Δ𝑆𝑥𝑦 be the change in 𝑓-spectral distance with respect to change Δω on the weight of any single edge on the graph. Then, Δ𝑆𝑥𝑦≤2(|𝑓( 𝜆 𝑛−1 +2Δω)−𝑓(𝜆1)| Proof Idea: Based on rank-one perturbation analysis of Graph Laplacian.

Why stability is important?
Eigenfunction Stability of FGSD Graph Spectrum Why stability is important? We want FGSD Graph Spectrum to be stable under small perturbation (or noise) in graph structure. Theorem (Eigenfunction Stability of FGSD): Let Δ𝑆𝑥𝑦 be the change in 𝑓-spectral distance with respect to change Δω on the weight of any single edge on the graph. Then, Δ𝑆𝑥𝑦≤2(|𝑓( 𝜆 𝑛−1 +2Δω)−𝑓(𝜆1)| Proof Idea: Based on rank-one perturbation analysis of Graph Laplacian.

Why stability is important?
Eigenfunction Stability of FGSD Graph Spectrum Why stability is important? We want FGSD Graph Spectrum to be stable under small perturbation (or noise) in graph structure. Theorem (Eigenfunction Stability of FGSD): Let Δ𝑆𝑥𝑦 be the change in 𝑓-spectral distance with respect to change Δω on the weight of any single edge on the graph. Then, Δ𝑆𝑥𝑦≤2(|𝑓( 𝜆 𝑛−1 +2Δω)−𝑓(𝜆1)| Implications: Decreasing function of 𝑓(𝜆) (such as 1 𝜆𝑝 ) are more stable.

Uniform Stability of FGSD Graph Spectrum
Theorem (Uniform Stability of FGSD): Let 𝐸[𝑆(𝑥,𝑦)] be the expected of 𝑓-spectral distance over all possible graphs with fixed ordering of 𝑛 vertices and bounded weights. Then, with probability 1-δ, where δ Є(0,1) we have, |𝑆 𝑥,𝑦 −𝐸 𝑆 𝑥,𝑦 |≤2𝑓(θ) 𝑛(𝑛−1) ) log 1 δ Proof Idea: Based on concentration inequalities (e.g. McDiarmid). x y x y x y 𝐸[𝑆(𝑥,𝑦)]

Uniform Stability of FGSD Graph Spectrum
Theorem (Uniform Stability of FGSD): Let 𝐸[𝑆(𝑥,𝑦)] be the expected of 𝑓-spectral distance over all possible graphs with fixed ordering of 𝑛 vertices and bounded weights. Then, with probability 1-δ, where δ Є(0,1) we have, |𝑆 𝑥,𝑦 −𝐸 𝑆 𝑥,𝑦 |≤2𝑓(θ) 𝑛(𝑛−1) ) log 1 δ Implications: Suggests strong stability of 𝑓-spectral distance over all possible graphs. x y x y x y 𝐸[𝑆(𝑥,𝑦)]

Sparsity of FGSD Graph Spectrum
Conjecture (Sparsity of FGSD Spectrum): Let |𝓡(𝑓(𝜆) )|G represents the number of unique or distinct elements present in the multiset of 𝓡. Then, following holds |𝓡(𝑓(𝜆) )|G≥|𝓡( 1 𝜆 )|G+2 Conclusion: Harmonic Spectrum (𝑓(𝜆) = 1 𝜆 ) produce most the sparse features.

Sparsity of FGSD Graph Spectrum
Conjecture (Sparsity of FGSD Spectrum): Let |𝓡(𝑓(𝜆) )|G represents the number of unique or distinct elements present in the multiset of 𝓡. Then, following holds |𝓡(𝑓(𝜆) )|G≥|𝓡( 1 𝜆 )|G+2 Conclusion: Harmonic Spectrum (𝑓(𝜆) = 1 𝜆 ) produce the most sparse features.

Further Empirical Evidence: Harmonic vs. Biharmonic Feature Space
Label 1 Label 2 (Matrix sparsity= 97.12%) Label 1 Label 2 (Matrix sparsity= 94.28%)

Fast Computation of FGSD Graph Spectrum
Recall 𝑆𝑓 𝑥,𝑦 =𝑓(𝐿)𝑥𝑥+𝑓(𝐿)𝑦𝑦−2𝑓(𝐿)xy Or 𝑆𝑓 𝑥,𝑦 =𝑓(𝐿+)𝑥𝑥+𝑓(𝐿+)𝑦𝑦−2𝑓(𝐿+)xy How to compute 𝑓(𝐿) or 𝑓(𝐿+) without performing the eigenvalue decomposition which cost 𝒪(𝑛3)?

Fast Computation of FGSD Graph Spectrum
Approximation Recipe (for computing 𝑓(𝐿)) Decompose 𝑓(𝜆)= 𝑖=0 𝑟 𝑎𝑖𝑇𝑖(𝜆) into an approximate polynomial series (e.g. Chebyshev polynomials). This results in 𝑓(𝐿)= 𝑖=0 𝑟 𝑎𝑖𝑇𝑖(𝐿) . For sparse 𝐿, computation cost is 𝒪(𝑟|𝐸|) ≪ 𝒪(𝑛2). Proposed Efficient Exact Computation (for 𝑓(𝐿+)) 𝑓(𝐿) matrix has a structure property, 𝑓(𝐿)𝑓(𝐿+)−1=𝐼− 𝐽 𝑛 . Problem turns into solving a sparse linear system. Results in 𝒪(𝑛2) complexity.

FGSD Graph Spectrum to Graph Feature Vector
Finally, we take histogram of FGSD Graph Spectrum to construct graph feature vector since histogram can inherent all these properties.

How good is our complexity in Graph Classification World?
SP GK (𝑘>3) MLG (ñ<n) SGS DCNN GS (𝑘>1) FGSD Approximate — 𝒪(𝑛 𝑑 𝑘−1 ) 𝒪( ñ 3 ) 𝒪(|𝐸|) Worst-Case 𝒪(𝑛3) 𝒪( 𝑛 𝑘 ) 𝒪( 𝑛 3 ) 𝒪( 𝑛 2 ) 𝒪( 𝑛 2+𝑘 )

Dominance of FGSD Graph Features (Datasets)
Type No. of Graphs Avg. of Nodes Max. of Nodes Class Labels MUTAG Bioinformatic 188 17.9 28 2 PTC 344 25.5 109 PROTEINS 1113 39.1 620 NCI1 4110 29.8 111 NCI109 4127 29.6 D&D 1178 284 5784 MOA 68 18 27 COLLAB Social Network 5000 74.49 492 3 IMDB-BINARY 1000 19.77 136 IMDB-MULTI 1500 13 89 REDDIT-BINARY 2000 429.61 4117 REDDIT-Multi 508.5 3690 5

Sate-Of-The-Art Comparison
State-of Art Algorithm Type Random Walk (RW) Graph Kernel COLT (2003) Shortest Path (SP) ICDM (2005) Skew Spectrum (SGS) Graph Spectrum ICML (2008) Graphlet Spectrum (GS) ICML (2009) Graphlet Kernel (GK) AISTATS (2009) Weisfeiler-Lehman (WL) JMLR (2011) Deep Graph Kernels (DGK) SIGKDD (2015) Multi-Laplacian Kernel (MLG) NIPS (2016) Diffusion Convolutional Network (DCNNs) Convolutional Neural Networks PATCHY Convolutional Network (PSCN) ICML (2016)

Graph Classification Results

Conclusion and Future Work
We proposed a conceptually simple yet powerful and theoretically motivated graph representation. Our graph representation based on the discovery of family of graph spectral distances can exhibit certain uniqueness, stability, sparsity and are computationally fast. Our hunt specifically leads to the harmonic distance as an ideal member of this family for extracting graph features. In our future work, we plan to generalize the FGSD for labeled dataset in order to utilize the useful node and edge label information in the graph representation.

Code Available: https://github.com/vermaMachineLearning/FGSD
Supplementary Material:

dwedwdew Thank You! Questions?

Presenter: Saurabh Verma, PhD Candidate (2015-Present)

Similar presentations

Presentation on theme: "Presenter: Saurabh Verma, PhD Candidate (2015-Present)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presenter: Saurabh Verma, PhD Candidate (2015-Present)

Similar presentations

Presentation on theme: "Presenter: Saurabh Verma, PhD Candidate (2015-Present)"— Presentation transcript:

Similar presentations

About project

Feedback