Download presentation
Presentation is loading. Please wait.
Published byEdward Garard Modified over 10 years ago
1
Dept. of Computer Science Rutgers Node and Graph Similarity: Theory and Applications Danai Koutra (CMU) Tina Eliassi-Rad (Rutgers) Christos Faloutsos (CMU) ICDM 2014, Monday December 15 th 2014, Shenzhen, China Copyright for the tutorial materials is held by the authors. The authors grant IEEE ICDM permission to distribute the materials through its website.
2
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Part 2a Graph Similarity: known node correspondence 2
3
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Summary Two main approaches: –Avoiding the correspondence problem Faster Might give good estimates –By finding the node correspondence Potentially more accurate Various methods for unipartite graphs One approach explicitly for bipartite graphs [Koutra+, ICDM’13] + extension to unipartite graphs 3
4
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Roadmap Known node correspondence Unknown node correspondence –Unipartite graphs –Bipartite graphs –Summary 4
5
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Graph Similarity with unknown node correspondence INPUT: 2 anonymized networks –GIVEN : node IDs –NOT GIVEN: side-info class labels OUTPUT: structural similarity score or node mapping 5 1 1 2 2
6
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Can we identify users across social networks? 6 8/29/12 Same or “similar” users?
7
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos More Data Mining Applications 7 Smarter Commerce information network social network x-fer learning internal external networks better anomaly detection better INPUT for graph similarity + graph kernel healthcare team formation
8
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos More Applications? link prediction & viral marketing 8 p rotein-protein alignment c hemical compound comparison IR: synonym extraction Optical character recognition Structure matching in DB wiki translation
9
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Roadmap Known node correspondence Unknown node correspondence –Unipartite graphs Avoiding the correspondence problem Solving the correspondence problem –Bipartite graphs –Summary 9
10
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Main Idea 1.Extract graph features 2.sim(G A, G B ) = “similarity” between the features or apply some classifier 10 deg, cc e-vals e-vectors eccentricity
11
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Spectral methods: λ-distance d(G A, G B ) = Σ(λ Ai - λ Bi ) 2 λ i = eigenvalues of: Adjacency A Laplacian L = D – A Normalized Laplacian L norm = D -1/2 A D -1/2 11 [Bunke ’06, Wilson ’08, ElGhawalby ’08, Haemers+ ’04, Brouwer ’09 ] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 d 1 -.5 d 2 -.5 d 3 -.5 d 1 -.5 d 2 -.5 d 3 -.5 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 d 1 -.5 d 2 -.5 d 3 -.5 d 1 -.5 d 2 -.5 d 3 -.5 d1 d2 d3 d1 d2 d3 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 - O(n 3 ), SVD
12
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos λ-distance: Disadvantages 1)co-spectral graphs with different structure 2)subtle changes in the graphs => big differences in spectra 3)O(n 3 ) runtime SVD 12 [Haemers+ ’04, Brouwer ’09]
13
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Comparison of methods revisited MetricP1P2P3P4 Vertex/Edge Overlap ✗✗✗ ? Graph Edit Distance (XOR) ✗✗✗ ? Signature Similarity ✗✔✗ ? λ-distance (adjacency matrix) ✗✔✗ ? λ-distance (graph laplacian) ✗✔✗ ? λ-distance (normalized lapl.) ✗✔✗ ? D ELTA C ON 0 ✔✔✔✔ D ELTA C ON ✔✔✔✔ 13 edge weight returns focus [Koutra, Faloutsos, Vogelstein. SDM’12]
14
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos SN: Socially Relevant Features 14 [Macindoe ’10] GAGA GBGB Leadership Bonding Diversity sim(G A, G B ) = 1-d(G A, G B ) d: earth mover’s distance between distributions # disjoint communities clust. coefficient dominated by one node?
15
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos NetSimile 15 FG =FG =... 1 2 7... 3 12...n12...n nodesnodes feature s ① # of neighbors ② clustering coefficient ③ edges in egonet ④ # of neighbors of egonet … egonet GAGA median kurtosis s.d. skewness mean ‘signature’ vector P =P = P P P GAGA Canberra distance [Berlingerio, Koutra, Eliassi-Rad, Faloutsos ‘13]
16
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Application: Discontinuity Detection in Yahoo! IM 16 nodes: IM users edges: communication events nodes: IM users edges: communication events 1.Microsoft offers to buy Yahoo!. 2. New features for flickr were announced. [Berlingerio, Koutra, Eliassi-Rad, Faloutsos ‘13]
17
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Roadmap Known node correspondence Unknown node correspondence –Unipartite graphs Avoiding the correspondence problem Solving the correspondence problem –Bipartite graphs –Summary 17
18
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Eigen-Decomposition Approach A B 18 [Umeyama ‘88] G A G B Goal: min || P A P T - B || F 2 P permutation matrix
19
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Eigen-Decomposition Approach A B A = U A Λ A U A T B = U B Λ B U B T 19 G A G B eigendecomp. |UA| |UBT||UA| |UBT| P = Hungarian method on Near-optimal for noiseless graphs O(n 3 ) runtime Only for graphs of same size [Umeyama ‘88]
20
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos NMF-based Approach 20 G A G B Goal: min || P A P T - B || F 2 PP T = P T P = I P>=0 non-0 elements per row/column A B [Ding+ ‘08]
21
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos NMF-based Algorithm Step 1: P 0 = |U A | |U B T | [Umeyama] 21 [Ding+ ‘08]
22
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos NMF-based Algorithm Step 1: P 0 = |U A | |U B T | [Umeyama] Step 2: Non-Negative Matrix Factorization to find P ∞ 22 O(n 3 ) runtime + Guaranteed convergence [Ding+ ‘08]
23
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos NetAlign 23 Possible matchings! [Bayati+ ’11] A L B W kk’
24
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos NetAlign 24 Goal: find matching M C L s.t. it maximizes a linear combination of the weights w & the number of overlapped edges. A L B W kk’ [Bayati+ ’11] NP-hard
25
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Two Solutions 25 A L B W kk’ ApproachSparse LDense LSpeed Belief Propagation [Bayati+’11] ✔ 30% faster Lagrangian [Klau+’09] ✔
26
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos More Approaches Cavity approach [Bradde+ ‘10] Graduated Assignment [Gold ’96] Pattern Recognition [Conte+ ‘04] Biological Networks [Berg, Lassig ‘04, IsoRank [Singh ’07] Linear Programming [Almohamad & Duffuaa ’93] EM [Luo ‘02] Similarity Flooding [Melnik+ ’02] Path-following [Zaslavskiy ’09] Graph Kernels [Smalter+ ‘08] Spectral methods [Qiu, Hancock ‘06] …. 26
27
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Roadmap Known node correspondence Unknown node correspondence –Unipartite graphs Avoiding the correspondence problem Solving the correspondence problem –Bipartite graphs –Summary 27
28
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Bipartite Graph Alignment INPUT: A, B 28 usersusers groups 1 1 0 0 0 0 1 0 1 0 1 1 0 1 usersusers group s 1 1 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 A B [Koutra, Tong, Lubensky ‘13]
29
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Bipartite Graph Alignment INPUT: A, B OUTPUT: P and … (permutation matrices) 29 P (users) A B usersusers groups 1 1 0 0 0 0 1 0 1 0 1 1 0 1 usersusers groups 1 1 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 A B 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 [Koutra, Tong, Lubensky ‘13]
30
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos P (users) usersusers groups 1 1 0 0 0 0 1 0 1 0 1 1 0 1 usersusers groups 1 1 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 A B Bipartite Graph Alignment INPUT: A, B OUTPUT: P and Q (permutation matrices) s.t. min || PAQ - B|| F 2 30 A B Q (groups) A B 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 permutation of users/groups in A Graph isomorphism Hard (P or NP-complete?) Subgraph isomorphism NP-complete Q: Now what? A: Constraints/Relaxations
31
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Main Idea INPUT: A, B OUTPUT: P, Q correspondence matrices s.t. min || PAQ - B|| F 2 31 u g 1 1 0 0 … … 1 1 0 1 g 1 1 0 0 0 … … … 0 1 0 1 0 u A B P (users) A B Q (groups) A B 0.9.2 0.7.3 0.1 0.3.6.4.2 0.5.8 0.9.2 0.7.3 0.1 0.3.6.4.2 0.5.8 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 [Koutra, Tong, Lubensky ‘13]
32
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Main Idea INPUT: A, B OUTPUT: P, Q correspondence matrices s.t. min || PAQ - B|| F 2 CONSTRAINTS: (a)P ij, Q ij = probabilities (not 1-1 mapping) (b)sparse matrices P and Q (more efficient for large scale graphs) 32 u g 1 1 0 0 … … 1 1 0 1 g 1 1 0 0 0 … … … 0 1 0 1 0 u A B P (users) A B Q (groups) A B [Koutra, Tong, Lubensky ‘13]
33
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Main Idea INPUT: A, B OUTPUT: P, Q correspondence matrices. 33 u g 1 1 0 0 … … 1 1 0 1 g 1 1 0 0 0 … … … 0 1 0 1 0 u A B P (users) A B Q (groups) A B min || PAQ - B|| F 2 +λ|| P|| 1 + μ|| Q|| 1 P,Q min || PAQ - B|| F 2 +λ|| P|| 1 + μ|| Q|| 1 P,Q P P A A Q Q B B sparsity [Koutra, Tong, Lubensky ‘13]
34
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Big-Align: Accuracy vs. Runtime 34 Umeyama NetAlign NMF-based BiG-Align skip BiG-Align exact Big-Align improves both speed and accuracy. [Koutra, Tong, Lubensky ‘13]
35
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Extension: Uni-Align 35 d features node degree clustering coeff … … min || PAQ - B|| F 2 fixed P D ETAILS n nodes [Koutra, Tong, Lubensky ‘13]
36
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Extension: Uni-Align 36 n nodes d features min || PAQ - B|| F 2 P P = g*(A,B,S,U)= = closed-form solution SVD A = USV T O(n. d 2 ) D ETAILS [Koutra, Tong, Lubensky ‘13]
37
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Uni-Align: Accuracy vs. Runtime 37 NMF-based NetAlign Umeyama Uni-Align Uni-Align is faster and more accurate than other approaches. subgraphs (64K x 64K users) [Koutra, Tong, Lubensky ‘13]
38
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Beyond BiG-Align: Multi-way Linkage ~ 38 All build upon BiG-Align –~–~ S1: Dynamic Graph Linkage –~–~ S2: Community-level Linkage S3: Hetero. Graph Linkage S4: Multi-relational DB Linkage
39
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Roadmap Known node correspondence Unknown node correspondence –Unipartite graphs –Bipartite graphs –Summary 39
40
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Summary Two main approaches: –Avoiding the correspondence problem Faster Might give good estimates –By finding the node correspondence Potentially more accurate Various methods for unipartite graphs One approach explicitly for bipartite graphs [Koutra+, ICDM’13] + extension to unipartite graphs 40
41
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos References M. Berlingerio, D. Koutra, T. Eliassi-Rad, C. Faloutsos. Network similarity via multiple social theories. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2013.Network similarity via multiple social theories Geng Li, Murat Semerci, Bülent Yener, and Mohammed J. Zaki. 2012. Effective graph classification based on topological and label attributes. Stat. Anal. Data Min. 5, 4 (August 2012), 265-283. J. T. Vogelstein and C. E. Priebe. Shuffled graph classification: Theory and connectome applications. 2011 Macindoe, O. and Richards, W. 2010. Graph comparison using fine structure analysis. InSocialCom/PASSAT. 193–200. A. E. Brouwer and E Spence, Cospectral graphs on 12 vertices. Elec. Journ. Combin., Vol. 16 (20) (2009). R. C. Wilson and P. Zhu, A study of graph spectra for comparing graphs and trees. Journal of Pattern Recognition, vol. 41, no. 9, pp. 2833–2841, 2008. 41
42
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos References Elghawalby, H., & Hancock, E. R. (2008). Measuring Graph Similarity Using Spectral Geometry, 517-526. H. Bunke, P. J. Dickinson, M. Kraetzl, and W. D. Wallis, A Graph-Theoretic Approach to Enterprise Network Dynamics (PCS). Birkhauser, 2006. W. H. Haemers, E. Spence, Enumeration of cospectral graphs. European J. Combin, 25 (2004), 199-211. A. Kelmans. Comparisons of graphs by their number of spanning trees. Discrete Math., 16 (1976), pp. 241–261 M. Fiedler, Algebraic connectivity of graphs. Czechoslovak. Mathematical Journal, vol. 23, no. 98, pp. 298–305, 1973. 42
43
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos References Graph Alignment D. Koutra, H. Tong, D. Lubensky. BIG-ALIGN: Fast Bipartite GraphBIG-ALIGN: Fast Bipartite Graph AlignmentAlignment. In IEEE 13th International Conference on Data Mining (ICDM), pp. 389-398, 2013. J. T. Vogelstein, J. M. Conroy, L. J. Podrazik, S. G. Kratzer, D. E. Fishkind, R. J. Vogelstein, and C. E. Priebe. Fast inexact graph matching with applications in statistical connectomics. CoRR, abs/1112.5507, 2011. S. Bradde, A. Braunstein, H. Mahmoudi, F. Tria, M. Weigt, and R. Zecchina. Aligning graphs and finding substructures by a cavity approach. Europhysics Letters, 89, 2010. M. Zaslavskiy, F. Bach, and J.-P. Vert. A path following algorithm for the graph matching problem. IEEE TPAMI, 31(12):2227–2242, Dec. 2009. G. W. Klau. A new graph-based method for pairwise global network alignment. BMC, 10(S-1), 2009. A. Narayanan and V. Shmatikov. De-anonymizing social networks. In SSP, pages 173 –187, may 2009. 43
44
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos References M. Bayati, M. Gerritsen, D. Gleich, A. Saberi, and Y. Wang. Algorithms for large, sparse network alignment problems. In ICDM, pages 705– 710, 2009. L. Zager and G. Verghese. Graph similarity scoring and matching. Applied Mathematics Letters, 21(1):86–94, 2008. C. H. Q. Ding, T. Li, and M. I. Jordan. Nonnegative matrix factorization for combinatorial optimization: Spectral clustering, graph matching, and clique finding. In ICDM, pages 183–192, 2008. A. Smalter, J. Huan, and G. Lushington. GPM: A Graph Pattern Matching Kernel with Diffusion for Chemical Compound Classification. In ICBBE, 2008. Rohit Singh, Jinbo Xu, and Bonnie Berger. 2007. Pairwise global alignment of protein interaction networks by matching neighborhood topology. In RECOMB ’07. H. Qiu and E. R. Hancock. Graph matching and clustering using spectral partitions. IEEE TPAMI, 39(1):22–34, 2006. D. Conte, P. Foggia, C. Sansone, and M. Vento. Thirty years of graph matching in pattern recognition. IJPRAI, 18(3):265–298, 2004. 44
45
ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos References J. Berg and M. Lassig. Local graph alignment and motif search in biological networks. PNAS, 101(41):14689–14694, Oct. 2004. S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In ICDE, 2002. B. Luo and E. R. Hancock. Iterative procrustes alignment with the EM algorithm. Image Vision Comput., 20(5-6):377–396, 2002. S. Gold and A. Rangarajan. A graduated assignment algorithm for graph matching. IEEE TPAMI, 18(4):377–388, 1996. S. Umeyama. An eigendecomposition approach to weighted graph matching problems. IEEE TPAMI, 10(5):695–703, 1988. 45
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.