Download presentation
Presentation is loading. Please wait.
Published byDaniela Freeman Modified over 9 years ago
1
P-Rank: A Comprehensive Structural Similarity Measure over Information Networks CIKM’ 09 November 3 rd, 2009, Hong Kong Peixiang Zhao, Jiawei Han, Yizhou Sun University of Illinois at Urbana-Champaign Presented by Prof. Hong Cheng, CUHK
2
Outline Introduction & Motivation P-Rank – Formula – Derivatives – Computation Experimental Studies Future direction & Conclusion CIKM’09 Hong KongNov. 3 rd 20091 of 15
3
Introduction Information Networks (INs) – Physical, conceptual, and human/societal entities – Interconnected relationships among different entities INs are ubiquitous and form a critical component of modern information infrastructure – The Web – highway or urban transportation networks – research collaboration and publication networks – Biological networks – social networks CIKM’09 Hong KongNov. 3 rd 20092 of 15
4
Problem Similarity computation on entities of INs – How similar is webpage A with webpage B in the Web ? – How similar is researcher A with researcher B in DBLP co- authorship network ? First of all, how to define “similarity” within a massive IN? – Textual proximity of entity labels/contents – Structural proximity conveyed through links! A good structural similarity measure in INs: SimRank (KDD’02) CIKM’09 Hong KongNov. 3 rd 20093 of 15
5
Why SimRank is not Enough? Philosophy – two entities are similar if they are referenced by similar entities Potential problems – Semantic incomplete Only partial structural information from in-link direction is considered during similarity computation Biased similarity results May fail in different IN settings ! – Inefficient in computation Worst-case O(n 4 ), can be improved to O(n 3 ), where n is the number of vertices in the information network CIKM’09 Hong KongNov. 3 rd 20094 of 15
6
Why SimRank is not Enough? (a) A Heterogeneous IN and Structural Similarity Scores (b) A Homogeneous IN and Structural Similarity Scores CIKM’09 Hong KongNov. 3 rd 20095 of 15
7
P(enetrating)-Rank Philosophy: Two entities are similar, if 1.they are referenced by similar entities 2.they reference similar entities Advantages – Semantic complete Structural information from both in-link and out-link directions are considered during similarity computation Robust in different IN settings – A unified structural similarity framework SimRank is just a special case CIKM’09 Hong KongNov. 3 rd 20096 of 15
8
P-Rank Formula The structural similarity between vertex a and vertex b (a ≠ b), s(a, b): – Recursive form – Approximate iterative form In-link similarity Out-link similarity CIKM’09 Hong KongNov. 3 rd 20097 of 15
9
P-Rank Property The iterative P-Rank has the following properties: – Symmetry: s k (a, b) = s k (b, a) – Monotonicity: 0 ≤ s k (a, b) ≤ s k+1 (a, b) ≤ 1 – Existence: The solution to the iterative P-Rank formula always exists and converges to a fixed point, s( ∗, ∗ ), which is the theoretical solution to the recursive P-Rank formula – Uniqueness: the solution to the iterative P-Rank formula is unique when C ≠ 1 The theoretical solution to P-Rank can be reached by a repetitive computation via the iterative form CIKM’09 Hong KongNov. 3 rd 20098 of 15
10
P-Rank Derivatives P-Rank proposes a unified structural similarity framework, upon which many structural similarity measures are just its special cases CIKM’09 Hong KongNov. 3 rd 20099 of 15
11
P-Rank Computation An iterative algorithm is executed until it reaches the fixed point – Space complexity: O(n 2 ) – Time complexity: O(n 4 ), can be improved to O(n 3 ) by amortization Approximation algorithms on different IN scenarios – Homogeneous IN Radius based pruning: vertex-pairs beyond a radius of r are no longer considered in similarity computation – Heterogeneous IN Category based pruning: vertex-pairs in different categories are no longer considered in similarity computation CIKM’09 Hong KongNov. 3 rd 200910 of 15
12
Experimental Studies Data sets: – Heterogeneous IN: DBLP (paper, author, conference, year) – Homogeneous IN: DBLP (paper with citation), Synthetic data R-MAT Methods – P-Rank – SimRank Metrics – Compactness of clusters – Algorithmic nature – Ground truth CIKM’09 Hong KongNov. 3 rd 200911 of 15
13
Compactness of Clusters P-Rank and SimRank are used as underlying similarity measures, respectively, and K-Medoids are used to cluster different vertices – Compactness: intra-cluster distance/inter-cluster distance Heterogeneous IN Homogeneous IN CIKM’09 Hong KongNov. 3 rd 200912 of 15
14
Algorithmic Nature Iterative P-Rank converges fast to the fixed point P-Rank v.s. the damping factor CP-Rank v.s. lambda CIKM’09 Hong KongNov. 3 rd 200913 of 15
15
Ground Truth Ranking Result Top-10 ranking results for author vertices in DBLP by P-Rank CIKM’09 Hong KongNov. 3 rd 200914 of 15
16
Conclusion The proliferation of information networks calls for effective structural similarity measures in – Ranking – Clustering – Top-k Query Processing – …… Compared with SimRank, P-Rank is witnessed to be a more effective structural similarity measure in large information networks – Semantic complete, general, robust, and flexible enough to be employed in different IN settings CIKM’09 Hong KongNov. 3 rd 200915 of 15
17
Thank you CIKM’ 09 November 3 rd, 2009, Hong Kong
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.