Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes ∗ Source: VLDB.

Slides:



Advertisements
Similar presentations
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
Advertisements

Linear Regression.
K Means Clustering , Nearest Cluster and Gaussian Mixture
Supervised Learning Recap
Segmentation and Fitting Using Probabilistic Methods
A Probabilistic Framework for Semi-Supervised Clustering
Visual Recognition Tutorial
Overview Full Bayesian Learning MAP learning
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1 University of Illinois, IBM TJ Watson Debapriya Basu.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Clustering.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
Unsupervised Learning
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
What is Cluster Analysis?
Visual Recognition Tutorial
EM Algorithm Likelihood, Mixture Models and Clustering.
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Maximum likelihood (ML)
Radial Basis Function Networks
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
EM and expected complete log-likelihood Mixture of Experts
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
Chapter 3 Functions Functions provide a means of expressing relationships between variables, which can be numbers or non-numerical objects.
Combining Statistical Language Models via the Latent Maximum Entropy Principle Shaojum Wang, Dale Schuurmans, Fuchum Peng, Yunxin Zhao.
Text Clustering.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
First topic: clustering and pattern recognition Marc Sobel.
OLAP Recap 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema : Hierarchical Dimensions.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Gaussian Mixture Models and Expectation-Maximization Algorithm.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Flat clustering approaches
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Semi-Supervised Clustering
Data Mining K-means Algorithm
Clustering (3) Center-based algorithms Fuzzy k-means
Data Mining Lecture 11.
Clustering Evaluation The EM Algorithm
Latent Variables, Mixture Models and EM
Integrating Meta-Path Selection With User-Guided Object Clustering in Heterogeneous Information Networks Yizhou Sun†, Brandon Norick†, Jiawei Han†, Xifeng.
RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng,
Hidden Markov Models Part 2: Algorithms
Bayesian Models in Machine Learning
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
3.3 Network-Centric Community Detection
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Multivariate Methods Berlin Chen, 2005 References:
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes ∗ Source: VLDB

1.Background heterogeneous information networks multiple types of objects and links different kinds of attributes Clustering, so that is easy to retrieve Challenge: attribute values of objects are often incomplete(an object may contain only partial or even no observations for a given attribute set) the links of different types may carry different kinds of semantic meanings(each may have its own level of semantic importance in the clustering process) Classic clustering algorithm CANNOT work(k-means etc.)

2.1 The Data Structure heterogeneous information network G = (V, E,W), directed graph each node v ∈ V in the network corresponds to an object each link e ∈ E corresponds to a relationship between the linked objects weight denoted by w(e). Different from the traditional network: τ : V → A, a mapping function from object to object type φ : E → R, a mapping function from link to link type A is the object type set, and R is the link type set A R B <> B R^-1 A (directed)

2.1 The Data Structure Attributes: associated with objects Set X = {X1,..., XT }, attributes across all different types of objects Each object v ∈ V contains a subset of the attributes Observations set v[X] = {xv,1, xv,2,..., xv,N(x,v) }, N(x,v) is the total number of observations of attribute X attached with object v. attributes can be shared by different types of objects or unique. Set V x : object set that contains attribute X

2.2 The Clustering Problem maps every object in the network into a unified hidden space(a soft clustering) Two Challenges: attribute values of objects are often incomplete(an object may contain only partial or even no observations for a given attribute set) the links of different types may carry different kinds of semantic meanings(each may have its own level of semantic importance in the clustering process)

2.2 The Clustering Problem Example1:Bibliographic information network Object types : paper, author, venue Links types: paper->author,author->paper,paper->venue,venus->paper,author- >venue,venus->author,p->p,a->a,v->v; Attribute(only one): paper: text attribute(a set of words) Author and venus:none

2.2 The Clustering Problem For authors and venues, the only available information is from the papers linked to them; For papers, both text attributes and links of different types are provided. The author type is more important than the venue type for paper’cluster. The link type strength should be learned

2.2 The Clustering Problem EXAMPLE 2. Weather sensor network. Object type: precipitation sensor, temperature sensor Link type:p->p, t->t, p->t, t->p Attributes(multiple): precipitation, temperature A sensor may sometimes register none or multiple observations a sensor object contain only partial attribute

2.2 The Clustering Problem a network G = (V, E,W) , K clusters Subset of G, attributes x ∈ X, attribute observations {v[X]} for all objects goal: 1. to learn a soft clustering : creat a membership probability matrix, Θ |V |×K = ( θ v ) v ∈ V, Θ(v, k) denotes the probability of object v in cluster k , 0 ≤ Θ(v, k) ≤ 1 , ∑(k=1 to K) Θ(v, k) =1, θv is the K dimensional cluster membership vector for object v 2. to learn the strengths (importance weights) of different link types in determining the cluster memberships of the objects,γ|R|×1, where γ(r) is a real number and stands for the importance weight for the link type r ∈ R. Set that cluster number K is the best.

3.1 Model Overview a good clustering configuration Θ, should satisfy two properties: Given the clustering configuration, the observed attributes should be generated with a high probability. The clustering configuration should be highly consistent with the network structure. Given network G, relation strength vector γ , cluster component parameter β , the likelihood of the observations of all the attributes X ∈ X :

3.1 Model Overview Given G , γ , p(θ|G , γ ) can get Given β , θ , π x ∈ X p({v[X]}v ∈ VX |Θ,β) can get Goal : find the best parameters γ , β , and the best clustering configuration Θ that maximize the likelihood.

3.2.1 Single Attribute Let X be the only attribute we are interested in the network consider the attribute observation v[X] for each object v is generated from a mixture model each component is a probabilistic model that stands for a cluster parameters β to be learned, component weights denoted by θv given the network configuration Θ , the probability of all the observations {v[X]}v ∈ VX :

3.2.1 Single Attribute Text attribute with categorical distribution: Objects in the network contain text attributes in the form of a term list, from the vocabulary l = 1 to m. Each cluster k has a different term distribution, with the parameter k = (βk,1,...,βk,m), βk,l is the probability of term l appearing in cluster k the probability of observing all the current attribute values: C v,l denotes the count of term l that object v contains

Numerical attribute with Gaussian distribution objects in the network contain numerical observations in the form of a value list, from the domain R The kth cluster is a Gaussian distribution with parameters k = (μk, σ k 2 ) The probability density for all the observations for all objects:

3.2.2 Multiple Attributes multiple attributes in the network are specified by users, say X1,..., XT assuming the independence among these attributes the probability density of observed attribute values {v[X1]},..., {v[XT ]} for a given clustering configuration Θ:

3.3 Modeling Structural Consistency From the view of links, the more similar the two objects are in terms of cluster membership, the more likely they are connected by a link. For a link e = ∈ E, with type r = φ(e) ∈ R, denote the importance of the link type to the clustering process by a real number γ(r).(w(e) is specified in the network as input, γ(r) is defined on link types and needs to be learned) denote the consistency function of 2 cluster membership vectors i and j , link e under strength weights for each link type γ by a feature function f(i, j, e, γ).

3.3 Modeling Structural Consistency several desiderata for a good feature function : The value of the feature function f should increase with greater similarity of θi and θj. The value of the feature function f should decrease with greater importance of the link e, either in terms of its specified weight w(e), or learned importance γ(r ) The feature function should not be symmetric between its first two arguments i and j ( directed graph)

3.3 Modeling Structural Consistency is the cross entropy from θj to θi, which evaluates the deviation of vj from vi For a fixed value of γ(r), the value of H(θj, θi) is minimal and f is maximal the value of f decreases with increasing learned link type strength γ(r) or input link weight w(e) Γ >= 0, f <= 0

3.3 Modeling Structural Consistency Example: The weights of all links w(e) are 1 Cluster number : 3 Θ:given in the figure. Link type: write(author, paper) γ1 published by(paper, venue) γ2 written by(paper, author) γ3

3.3 Modeling Structural Consistency Objects 1 and 3 are more likely to belong to the first cluster, Object 4 is a neutral object, and Object 5 is more likely to belong to the third cluster. f(1, 3) = −0.4701γ3; f(1, 4) = −1.7174γ3; f(1, 5) = −2.3410γ3. f(1, 3) ≥ f(1, 4) ≥ f(1, 5). f(1, 2) = −0.4701γ2 ; f(1, 3) = −0.4701γ3. If γ2 f(1, 3). ( stronger link types are likely to exist only between objects that are very similar to each other ) f(1, 4) = −1.7174γ3, f(4, 1) = −1.0986γ1, f(1, 4) <> f(4, 1).

3.3 Modeling Structural Consistency a log-linear model: Z ( γ) is the partition function , makes the distribution function sum up to 1 : Z ( γ) =

3.4 The Unified Model Goal : determine the best clustering results Θ, the link type strengths γ and the cluster component parameters β , maximize the generative probability of attribute observations and the consistency with the network structure add a Gaussian prior to as a regularization to avoid overfitting new objective function :

4. THE CLUSTERING ALGORITHM assumption: all the types of links play an equally important role in the clustering process(γ = 1) update γ according to the average consistency of links of that type with the current clustering results achieve a good clustering and a reasonable strength vector for link type This iterative algorithm containing two steps: 1. fix the link type weights γ to the best value γ* ( determined in the last iteration ) ; determine the best clustering results Θ and the attribute parameters β for each cluster component , called it cluster optimization step: 2. fix the clustering configuration parameters Θ = Θ ∗ and β=β ∗ (determined in the last step); determine the best value of γ,call it link type strength learning step:

4.1 Cluster Optimization an EM-based algorithm to solve the formula: In the E-step, the probability of each observation x for each object v and each attribute X belonging to each cluster, usually called the hidden cluster label of the observation, Zv,x, is derived according to the current parameters Θ and β. In the M-step, the parameters Θ and β are updated according to the new membership for all the observations in the E-step. ( skip the details of mathematical formula and derivation process)

4.2 Link Type Strength Learning ( skip the details of mathematical formula and derivation process,since it’s too Complex)

4.3 Putting together: The GenClus Algorithm ——General Heterogeneous Network Clustering algorithm

5.Effectiveness Study use Normalized Mutual Information (NMI) to compare(evaluates the similarity between two partitions of the objects) use NetPLSA and iTopicModel as baselines network contains two types of objects, authors (A) and conferences (C); network contains objects corresponding to authors (A), conferences (C) and papers (P);

5.Effectiveness Study compare the clustering results of Gen-Clus with two baselines, by comparing the cluster labels with maximum probabilities with the ground truth 2 baselines: k-means algorithm, spectral clustering method Weather Sensor Network:network is synthetically generated, containing two types of objects: temperature (T) and precipitation (P) sensors

END THX