Information Network Analysis and Discovery Cuiping Li Guoming He Information School, Renmin University of China.

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

Weiren Yu 1, Jiajin Le 2, Xuemin Lin 1, Wenjie Zhang 1 On the Efficiency of Estimating Penetrating Rank on Large Graphs 1 University of New South Wales.
Weiren Yu 1, Xuemin Lin 1, Wenjie Zhang 1, Ying Zhang 1 Jiajin Le 2, SimFusion+: Extending SimFusion Towards Efficient Estimation on Large and Dynamic.
Beyond Streams and Graphs: Dynamic Tensor Analysis
New Models for Graph Pattern Matching Shuai Ma ( 马 帅 )
Dept. of Computer Science Rutgers Node and Graph Similarity : Theory and Applications Danai Koutra (CMU) Tina Eliassi-Rad (Rutgers) Christos Faloutsos.
VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.
Item Selection By “Hub-Authority” Profit Ranking Ke Wang Ming-Yen Thomas Su Simon Fraser University.
ON LINK-BASED SIMILARITY JOIN A joint work with: Liwen Sun, Xiang Li, David Cheung (University of Hong Kong) Jiawei Han (University of Illinois Urbana.
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
Link Structure and Web Mining Shuying Wang
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Fast Random Walk with Restart and Its Applications
33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date:
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
P-Rank: A Comprehensive Structural Similarity Measure over Information Networks CIKM’ 09 November 3 rd, 2009, Hong Kong Peixiang Zhao, Jiawei Han, Yizhou.
Adaptive On-Line Page Importance Computation Serge, Mihai, Gregory Presented By Liang Tian 7/13/2010 1Adaptive On-Line Page Importance Computation.
On Node Classification in Dynamic Content-based Networks.
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.
Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
1 LinkClus: Efficient Clustering via Heterogeneous Semantic Links Xiaoxin Yin, Jiawei Han Univ. of Illinois at Urbana-Champaign Philip S. Yu IBM T.J. Watson.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June School of Computing National.
Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.
1 Authors: Glen Jeh, Jennifer Widom (Stanford University) KDD, 2002 Presented by: Yuchen Bian SimRank: a measure of structural-context similarity.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Kijung Shin Jinhong Jung Lee Sael U Kang
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
Glen Jeh & Jennifer Widom KDD  Many applications require a measure of “similarity” between objects.  Web search  Shopping Recommendations  Search.
Web Page Clustering using Heuristic Search in the Web Graph IJCAI 07.
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
Presented by: Siddhant Kulkarni Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
Arizona State University Fast Eigen-Functions Tracking on Dynamic Graphs Chen Chen and Hanghang Tong - 1 -
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.
1 Substructure Similarity Search in Graph Databases R 陳芃安.
Task assignment of interactive Entity resolution 龚赛赛
Cohesive Subgraph Computation over Large Graphs
Large Graph Mining: Power Tools and a Practitioner’s guide
CIKM’ 09 November 3rd, 2009, Hong Kong
Jiawei Han Department of Computer Science
CS7280: Special Topics in Data Mining Information/Social Networks
RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng,
Lecture 22 SVD, Eigenvector, and Web Search
Probably Approximately
Presented by: Yang Yu Spatiotemporal GMM for Background Subtraction with Superpixel Hierarchy Mingliang Chen, Xing Wei, Qingxiong.
Jiawei Han Department of Computer Science
Graph-Based Anomaly Detection
Asymmetric Transitivity Preserving Graph Embedding
Lecture 22 SVD, Eigenvector, and Web Search
Lecture 22 SVD, Eigenvector, and Web Search
PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs.
Presentation transcript:

Information Network Analysis and Discovery Cuiping Li Guoming He Information School, Renmin University of China

Related Work 1.Whole graph Level - Macro properties (Laws, generators) -Summary/Visualization -Index 2. Sub-graph Level -Frequent Pattern Mining -Clustering (Community/group detection) -Connected Sub-graph, Central Piece -Pattern Match 3.Node or Link Level -Ranking -Proximity/Similarity -Node Classification -Outlier Detection (Abnormal nodes/links)

Node Proximity/Similarity: Why? Link prediction [Liben-Nowell+], [Tong+] Ranking [Haveliwala], [Chakrabarti+] Management [Minkov+] Image caption [Pan+] Neighborhooh Formulation [Sun+] Conn. subgraph [Faloutsos+], [Tong+], [Koren+] Pattern match [Tong+] Collaborative Filtering [Fouss+] Many more…

Node Similarity: Related Work(1) Computer Network’99: Finding related pages in the World Wide Web, Jeffrey Dean, Monika R. Henzinger (adapting from HITS) KDD’02: SimRank: A Measure of Structural-Context Similarity, Glen Jeh, Jennifer Widom (Adapting from PageRank) –Exploiting Hierarchical Domain Structure to Compute Similarity. P. Ganesan, H. Garcia-Molina, and J. Widom. Transactions on Information Systems, 21(1): 64-93, January 2003.Exploiting Hierarchical Domain Structure to Compute Similarity Vertex similarity in networks: Phys. Rev. E 73, (2006) Optimization on simrank –WWW’05: Scaling link-base similarity search, D.Fogaras, B. Racz (Approximate) –VLDB’08: Accuracy Estimate and Optimization Techniques for SimRank Computation Dmitry Lizorkin, Pavel Velikhov, Maxim Grinev, Denis Turdakov.

Node Similarity: Related Work(2) Domain-Integrated of simrank: –VLDB’08: Simrank++: Query Rewriting through Link Analysis of the Click Graph, Loannis Antonellis (Stanford University), Hector Garcia-Molina (Stanford University), Chi-Chao Chang (Yahoo!). (keywords, ads) Clustering using simrank: –SIGIR’03: ReCom: Reinforcement Clustering of multi- type interrelated data objects, J. Wang, H.J. Zeng, Z. Chen, H.J. LU,L. Tao –VLDB’06: LinkCLus:Efficient Clustering via Heterogeneous Semantic Links, Xiaoxin Yin, Jiawei Han, Philip Yu

Existing Research: Limitation 1 Not Dynamic –Static Algorithm Iterative –Challenges of Dynamic Network Re-computation even one node or edge changes –Our Solution Non-iterative Incremental Computation Cuiping Li, Jiawei Han, Guoming He, Xin Jin, Yizhou Sun, Yintao Yu, Tianyi Wu, "Fast Computation of SimRank for Static and Dynamic Information Networks", Int. Conf. on Extending Data Base Technology (EDBT'10), Lausanne, Switzerland, March 2010

Existing Research: Limitation 2 Not Efficient –Our Solution: employ the modern hardware resource GPU (Graphic Process Unit) Multi-Processor

Compute Node Similarity for Dynamic Network SimRank formula Or Intuition –Two objects are similar if they are referenced by similar objects.

How to Compute SimRank Incremetally Fist glance at SimRank formula –It is Iterative. Has no chance to be computed incrementally Key Observation –SimRank iteration formula has the same form as the well-known Sylvester Equations, based on this, we can compute SimRank without iteration.

Vec-Operator and Kronecker Products Vec-Operator –Vec flattens an n x n matrix A into an n 2 x 1 vector –It stacks the columns of the matrix on top of each other, from left to right Kronecker Product –Product of two matrices A and B –Each element of A is multiplied with the full matrix B:

Sylvester Equations Sylvester Equations: X=SXT + X 0 –Given three n x n matrixes S, T, and X 0 –We want to determine X –Solvable in O(n 3 )

Sylvester Equations Rewrite the Sylvester Equations as vec(X)=vec(SXT) + vec(X 0 ) Exploit the well-known fact vec(SXT) = (T T  S)vec(X) We can get vec(X)= (T T  S)vec(X) + vec(X 0 ) We can get (I - T T  S)vec(X) = vex(X 0 ) Now we have to solve vec(X)=(I - T T  S) -1 vec(X 0 )

SimRank SimRank has the same form as the Sylvester equations X=cA T XA +(1-c)e, (A is the normalized adjacent matrix, e is an identity matirx) Similarly, for SimRank, we have to solve vec(X)=(I -cA T  A T ) -1 vec((1-c)e) vec(X)= (1-c) (I -cA T  A T ) -1 vec(e) –A T  A T can be solved in O(n 3 ) –More importantly, when A is sparse/skew, we can improve the efficiently further.

15 Advantages of non-iterative method vec(X)= (1-c) (I -cA T  A T ) -1 vec(e) It can be solved approximately It can be computed incrementally It can be computed pair-wisely

vec(X)= (1-c) (I -cW  W) -1 vec(e) 利用奇异值分解 SVD 和 Sherman-Morrison 方程求 L 的逆 Approximation W =

W 的 low rank SVD 分解 k 的大小 –k 越大,计算时间越长,精确度越高 Error Bound Approximation

预计算 计算某对结点 (i,j) 的 SimRank Approximation

Incremental Computation 只需要对 U, ,V 进行维护即可

Applications Similarity Tracking: return the N most similar nodes of i at each time step t. Centrality Tracking: return the N most central nodes at each time step t.

Experimental Result on DBLP Top-10 Most Similar Terms for ‘Prof. Jennifer Widom’ up to Each Time Step

Experimental Result Top-10 Most Similar Authors for ‘Prof. Jennifer Widom’ up to Each Time Step

预计算时间

计算不同个数结点对的时间

Wikepedia Data We set the threshold T to be 1.0e-6. For k=15 –the pre-compute time of the Wikipedia dataset is approx hours –the query time for every 1000 node pairs is seconds