Presentation is loading. Please wait.

Presentation is loading. Please wait.

P-Rank: A Comprehensive Structural Similarity Measure over Information Networks CIKM’ 09 November 3 rd, 2009, Hong Kong Peixiang Zhao, Jiawei Han, Yizhou.

Similar presentations


Presentation on theme: "P-Rank: A Comprehensive Structural Similarity Measure over Information Networks CIKM’ 09 November 3 rd, 2009, Hong Kong Peixiang Zhao, Jiawei Han, Yizhou."— Presentation transcript:

1 P-Rank: A Comprehensive Structural Similarity Measure over Information Networks CIKM’ 09 November 3 rd, 2009, Hong Kong Peixiang Zhao, Jiawei Han, Yizhou Sun University of Illinois at Urbana-Champaign Presented by Prof. Hong Cheng, CUHK

2 Outline Introduction & Motivation P-Rank – Formula – Derivatives – Computation Experimental Studies Future direction & Conclusion CIKM’09 Hong KongNov. 3 rd 20091 of 15

3 Introduction Information Networks (INs) – Physical, conceptual, and human/societal entities – Interconnected relationships among different entities INs are ubiquitous and form a critical component of modern information infrastructure – The Web – highway or urban transportation networks – research collaboration and publication networks – Biological networks – social networks CIKM’09 Hong KongNov. 3 rd 20092 of 15

4 Problem Similarity computation on entities of INs – How similar is webpage A with webpage B in the Web ? – How similar is researcher A with researcher B in DBLP co- authorship network ? First of all, how to define “similarity” within a massive IN? – Textual proximity of entity labels/contents – Structural proximity conveyed through links! A good structural similarity measure in INs: SimRank (KDD’02) CIKM’09 Hong KongNov. 3 rd 20093 of 15

5 Why SimRank is not Enough? Philosophy – two entities are similar if they are referenced by similar entities Potential problems – Semantic incomplete Only partial structural information from in-link direction is considered during similarity computation Biased similarity results May fail in different IN settings ! – Inefficient in computation Worst-case O(n 4 ), can be improved to O(n 3 ), where n is the number of vertices in the information network CIKM’09 Hong KongNov. 3 rd 20094 of 15

6 Why SimRank is not Enough? (a) A Heterogeneous IN and Structural Similarity Scores (b) A Homogeneous IN and Structural Similarity Scores CIKM’09 Hong KongNov. 3 rd 20095 of 15

7 P(enetrating)-Rank Philosophy: Two entities are similar, if 1.they are referenced by similar entities 2.they reference similar entities Advantages – Semantic complete Structural information from both in-link and out-link directions are considered during similarity computation Robust in different IN settings – A unified structural similarity framework SimRank is just a special case CIKM’09 Hong KongNov. 3 rd 20096 of 15

8 P-Rank Formula The structural similarity between vertex a and vertex b (a ≠ b), s(a, b): – Recursive form – Approximate iterative form In-link similarity Out-link similarity CIKM’09 Hong KongNov. 3 rd 20097 of 15

9 P-Rank Property The iterative P-Rank has the following properties: – Symmetry: s k (a, b) = s k (b, a) – Monotonicity: 0 ≤ s k (a, b) ≤ s k+1 (a, b) ≤ 1 – Existence: The solution to the iterative P-Rank formula always exists and converges to a fixed point, s( ∗, ∗ ), which is the theoretical solution to the recursive P-Rank formula – Uniqueness: the solution to the iterative P-Rank formula is unique when C ≠ 1 The theoretical solution to P-Rank can be reached by a repetitive computation via the iterative form CIKM’09 Hong KongNov. 3 rd 20098 of 15

10 P-Rank Derivatives P-Rank proposes a unified structural similarity framework, upon which many structural similarity measures are just its special cases CIKM’09 Hong KongNov. 3 rd 20099 of 15

11 P-Rank Computation An iterative algorithm is executed until it reaches the fixed point – Space complexity: O(n 2 ) – Time complexity: O(n 4 ), can be improved to O(n 3 ) by amortization Approximation algorithms on different IN scenarios – Homogeneous IN Radius based pruning: vertex-pairs beyond a radius of r are no longer considered in similarity computation – Heterogeneous IN Category based pruning: vertex-pairs in different categories are no longer considered in similarity computation CIKM’09 Hong KongNov. 3 rd 200910 of 15

12 Experimental Studies Data sets: – Heterogeneous IN: DBLP (paper, author, conference, year) – Homogeneous IN: DBLP (paper with citation), Synthetic data R-MAT Methods – P-Rank – SimRank Metrics – Compactness of clusters – Algorithmic nature – Ground truth CIKM’09 Hong KongNov. 3 rd 200911 of 15

13 Compactness of Clusters P-Rank and SimRank are used as underlying similarity measures, respectively, and K-Medoids are used to cluster different vertices – Compactness: intra-cluster distance/inter-cluster distance Heterogeneous IN Homogeneous IN CIKM’09 Hong KongNov. 3 rd 200912 of 15

14 Algorithmic Nature Iterative P-Rank converges fast to the fixed point P-Rank v.s. the damping factor CP-Rank v.s. lambda CIKM’09 Hong KongNov. 3 rd 200913 of 15

15 Ground Truth Ranking Result Top-10 ranking results for author vertices in DBLP by P-Rank CIKM’09 Hong KongNov. 3 rd 200914 of 15

16 Conclusion The proliferation of information networks calls for effective structural similarity measures in – Ranking – Clustering – Top-k Query Processing – …… Compared with SimRank, P-Rank is witnessed to be a more effective structural similarity measure in large information networks – Semantic complete, general, robust, and flexible enough to be employed in different IN settings CIKM’09 Hong KongNov. 3 rd 200915 of 15

17 Thank you CIKM’ 09 November 3 rd, 2009, Hong Kong


Download ppt "P-Rank: A Comprehensive Structural Similarity Measure over Information Networks CIKM’ 09 November 3 rd, 2009, Hong Kong Peixiang Zhao, Jiawei Han, Yizhou."

Similar presentations


Ads by Google