Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

Slides:



Advertisements
Similar presentations
Fuzzy Angle Fuzzy Distance + Angle AG = 90 DG = 1 Annual Conference of ITA ACITA 2009 Exact and Fuzzy Sensor Assignment Hosam Rowaih 1 Matthew P. Johnson.
Advertisements

Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Weiren Yu 1, Jiajin Le 2, Xuemin Lin 1, Wenjie Zhang 1 On the Efficiency of Estimating Penetrating Rank on Large Graphs 1 University of New South Wales.
CO-AUTHOR RELATIONSHIP PREDICTION IN HETEROGENEOUS BIBLIOGRAPHIC NETWORKS Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, Jiawei Han 1.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date : 2014/04/15 Source : KDD’13 Authors : Chi Wang, Marina Danilevsky, Nihit.
Andreas Papadopoulos - [WI 2013] IEEE/WIC/ACM International Conference on Web Intelligence Nov , 2013 Atlanta, GA USA A. Papadopoulos,
Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
Patch to the Future: Unsupervised Visual Prediction
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Copyright 2006, Data Mining Research Laboratory An Event-based Framework for Characterizing the Evolutionary Behavior of Interaction Graphs Sitaram Asur,
Clustering II.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1 University of Illinois, IBM TJ Watson Debapriya Basu.
A General Model for Relational Clustering Bo Long and Zhongfei (Mark) Zhang Computer Science Dept./Watson School SUNY Binghamton Xiaoyun Wu Yahoo! Inc.
On Community Outliers and their Efficient Detection in Information Networks Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Query-Based Outlier Detection in Heterogeneous Information Networks Jonathan Kuck 1, Honglei Zhuang 1, Xifeng Yan 2, Hasan Cam 3, Jiawei Han 1 1 University.
PCFG Based Synthetic Mobility Trace Generation S. C. Geyik, E. Bulut, and B. K. Szymanski Department of Computer Science, Rensselaer Polytechnic Institute.
COMMUNITIES IN MULTI-MODE NETWORKS 1. Heterogeneous Network Heterogeneous kinds of objects in social media – YouTube Users, tags, videos, ads – Del.icio.us.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Cluster based fact finders Manish Gupta, Yizhou Sun, Jiawei Han Feb 10, 2011.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Semantic Information Fusion Shashi Phoha, PI Head, Information Science and Technology Division Applied Research Laboratory The Pennsylvania State.
Community Evolution in Dynamic Multi-Mode Networks Lei Tang, Huan Liu Jianping Zhang Zohreh Nazeri Danesh Zandi & Afshin Rahmany Spring 12SRBIAU, Kurdistan.
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Discovering Meta-Paths in Large Heterogeneous Information Network
P-Rank: A Comprehensive Structural Similarity Measure over Information Networks CIKM’ 09 November 3 rd, 2009, Hong Kong Peixiang Zhao, Jiawei Han, Yizhou.
A Cross-Sensor Evaluation of Three Commercial Iris Cameras for Iris Biometrics Ryan Connaughton and Amanda Sgroi June 20, 2011 CVPR Biometrics Workshop.
INARC Charu C. Aggarwal (I2 Contributions) Scalable Graph Querying and Indexing Task I2.2 Charu C. Aggarwal IBM Collaborators (across all tasks): Jiawei.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Algorithmic Detection of Semantic Similarity WWW 2005.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for.
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes ∗ Source: VLDB.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining KDD’05, August 21–24, 2005, Chicago, Illinois, USA. Qiaozhu Mei.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
Ganesh J, Soumyajit Ganguly, Manish Gupta, Vasudeva Varma, Vikram Pudi
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
ClusCite:Effective Citation Recommendation by Information Network-Based Clustering Date: 2014/10/16 Author: Xiang Ren, Jialu Liu,Xiao Yu, Urvashi Khandelwal,
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Hierarchical Agglomerative Clustering on graphs
A Viewpoint-based Approach for Interaction Graph Analysis
Jiawei Han Computer Science University of Illinois at Urbana-Champaign
Integrating Meta-Path Selection With User-Guided Object Clustering in Heterogeneous Information Networks Yizhou Sun†, Brandon Norick†, Jiawei Han†, Xifeng.
CS7280: Special Topics in Data Mining Information/Social Networks
RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng,
Community Distribution Outliers in Heterogeneous Information Networks
Jiawei Han Department of Computer Science
Example: Academic Search
GANG: Detecting Fraudulent Users in OSNs
Heterogeneous Graph Attention Network
Presentation transcript:

Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011

Introduction Information networks are everywhere: social networks, web, academic networks, biological networks. Heterogeneous information networks – Contain multi-typed nodes. – Richer representation compared to homogeneous networks. We study clustering and evolution diagnosis in massive heterogeneous information networks.

Contributions We present an evolutionary clustering algorithm for heterogeneous information networks (ENetClus) We define metrics to characterize clustering behavior We perform study of evolution in a bibliographic heterogeneous network: DBLP

ENetClus features Multi-typed Evolutionary Temporal smoothness Agglomerative Multiple granularities Based on NetClus Consistency Quality Cluster Sizes Evolution rate Cluster appearance/ disappearance Stability of objects Sociability of objects Social influence Evolution metrics Study over DBLP

Problem Formulation Net-Cluster Net-Cluster tree Net-Cluster tree sequence Problem: Given a graph sequence GS, generate a net-cluster tree sequence CTS such that the trees are consistent and represent high-quality clusters. Level 1 Level 2 Level 3 K=3... CT 1 CT 2 CT N CTS

Level 1 Level 2 Level 3 K=3 nc

... CT 1 CT 2 CT N

Approaches Problem: Perform evolutionary clustering over a sequence of heterogeneous network snapshots Approaches – Use homogeneous clustering techniques Does not exploit rich typed information in network Objects related to same entity may get clustered into different clusters. – Use some heterogeneous network clustering algorithm May provide high snapshot clustering quality But may not provide good consistency between clusterings across snapshots

NetClus NetClus is an algorithm to perform clustering over heterogeneous network. It performs iterative ranking of clustering of objects. A probabilistic generative model is used to model the probability of generation of different objects from each cluster. A maximum likelihood technique is used to evaluate the posterior probability of presence of an object in a cluster.

NetClus

ENetClus For the first time instant, initialization of priors and net clusters is similar to NetClus For other time instants – The prior probability of an object o belonging to cluster c k is defined as its representativeness in the corresponding cluster within the net-cluster tree for the previous time instant. – A target object o is assigned to cluster c k with probability p k where p k is the normalized sum of the prior probabilities of neighboring attribute type objects. Ranking is similar to NetClus except that prior probabilities are also used along with the authority based ranking. Prior weight controls the effect of priors and hence the temporal smoothness.

How is ENetClus better than NetClus? NetClus: Inconsistent clusters ENetClus: Consistent clusters Snapshot1 Snapshot2Snapshot3 Snapshot1 Snapshot2Snapshot3

Metrics Membership probability of object o to cluster c i is denoted by Consistency: Chained path consistency: product of consistency over each interval in the sequence

Metrics Snapshot Quality – Compactness – Entropy

Metrics O’: Objects at time y but not at y-1 O’’: Objects at time y O’’’: Objects at time y but not at y+1

Metrics Stability of objects – Degree to which an object is stable with respect to its cluster or network Sociability of objects – Degree to which an object interacts with different clusters Effect of social influence: normality – Normality is the degree to which an object follows the cluster trend

Experiments Dataset – DBLP 1993 to 2008, 654K papers, 484K authors, 107K title terms and 3900 conferences Number of clusters = 4 Levels of net Cluster tree = 4 Prior weight varied from 0 to 1 – Four_area DM, DB, IR, ML papers 1993 to 2008, 29K papers, 28K authors, 20 conferences

Related work Clustering graphs: Mincut, Min-max cut, Spectral, density-based, RankClus [Sun EDBT 09], NetClus [Sun KDD 09] Evolutionary clustering: k-means [Chak KDD06], spectral [Chi KDD07], text streams [Mei KDD05], social network structure [Kuma KDD06] Evolutionary graph studies: GraphScope [Sun KDD07], density-based [Kim VLDB09], analysis [Back KDD06, Lesk KDD05, Lesk KDD08], communities using FacetNet [Lin WWW08], individual objects [Asur KDD07]

Conclusion A clustering algorithm for evolution diagnosis of heterogeneous information networks. Metrics for novel insights into the evolution both at the object level and the clustering level Analysis and evolutionary study of DBLP

Acknowledgements Research was sponsored in part by the U.S. National Science Foundation under grant IIS , and by the Army Research Laboratory under Cooperative Agreement Number W911NF (NS-CTA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

References (1)

References (2)

References (3)