A Three-way Model for Collective Learning on Multi-Relational Data

A Three-way Model for Collective Learning on Multi-Relational Data
Today my topic is “A three-way model for collective learning on multi-relational data” Author: Maximilian Nickel Speaker: Xinge Wen

INTRODUCTION – Multi relational Data
Relational data is everywhere in our life: WEB Social networks Bioinformatics Relational data is everywhere in our life, like WEB. Remember we have hyperlinks in web, hyperlinks denote the relationship between two web pages. Look at this picture. This picture mimics social networks, (explain the graph) each link denote the relationship of its adjacent nodes.

INTRODUCTION – Why Tensor ？
Object Subject Modelling simplicity: RDF triple Multiple relations can be straightforwardly expressed as a three-way tensor No Structure learning: No priori domain knowledge needed No need to infer from data Expected performance: Factorization methods are good at dealing with high-dimensional data and sparse data Predicate We have heard about scalar, vector, matrix, most of us haven’t heard about tensor. In fact, scalar is tensor with 0 dimension, vector is tensor with 1 dimension, matrix is tensor with 2 dimension, things with 3 or more dimensions are called high order tensor. But normally we use tensor to refer 3 dimension or higher. There are many thing that are inherently tensor. Like RGB pictures, like what have done in reinforcement learning homework. Do you remember the question about black jack game. Here I want to talk about RDF. RDF is a way to model relational data. It uses object-predicate-subject expression, like Trump is president of US. Trump is subject, US is object, president of is predicate which indicate the relationship. Because of the structure, we can use 3 way tensor to represent RDF. With tensor representation, we don’t need to have information about independent variable, no need to have domain knowledge, or infer from the data. In addition, factorization methods are known to be good at high-dimensional dataset, sparse dataset. Because they can do dimension reduction, filling empties and digging latent component.

INTRODUCTION – How we use tensor here
RFD and triple form (Subject, Predicate, Object) Subject and object are two entities Predicate models the relationship like attribute, relation or category A 3-way tensor Two nodes ’E’ stands for entity One node ’R’ stands for relationship A tensor entry 1 denote there exist a relation Here is how we use tensor. Subject and object are two entities. We have entities in this direction Predicate models the relationships like attribute, relation or category We have relationships in this direction This picture tells us we can slice the tensor on the dimension of relation. Also remember matrix factorization is good at dealing sparse matrix, tensor factorization is good at dealing sparse tensor. Why? The reason is that indexing saves space Here A tensor entry 1 denote there has a relation. n * n * m tensor

RECAP – SVD Factorization
Remember that we have SVD factorization: Can deal with m * n matrix (on the contrary eigenvalue decomposition can only deal with n * n matrix) Σ as an diagonal matrix, Σi is the eigenvalue of A which normally ordered from largest to smallest Now lets have a recap on singular value decomposition SVD can deal with m*n matrix, while eigenvalue decomposition can only deal with square matrix Lets Look at the representation of SVD. A matrix A can be decomposed to a matrix U multiply a diagonal matrix Sigma, then multiply the transpose of matrix V. U is a m*m matrix, sigma is m*n diagonal matrix and V is a n*n matrix. The important things is that each element on the diagonal of sigma is an eigenvalue of A. And they are ordered from largest to smallest. But why we want to do this?

RECAP – SVD Factorization
What’s it good for？ Remember why SVD is useful? Eigenvalue Σi decrease very quickly, how fast? Top 1% eigenvalues comprise 99% of all the eigenvalues So SVD could also be written like this: Dimension reduction Compression where r <<m, r<<n Eigenvalues in sigma decrease very quickly. Top 1 percent comprise almost 99% of all the eigenvalues, which means, we don't need to keep all these three big matrixes. We only need three much smaller matrixes. So we can reduce this expression to this one. Dimension reduction and compression

RESCAL ALGORITHM Looks familiar？
Rescal takes the inherent structure of dynamic relational data into account, by employing the tensor rank-r factorization: A is a n × r matrix, contains the latent-component representation of the entities in the domain Rk is an asymmetric r × r matrix that specifies the interaction of the latent components per predicate Looks familiar？ tensor has 3 dimension or higher, but we can unfold it to matrixes. Here if we do rank-k factorization, we got an expression like this. Looks familiar, right? Tensor factorization is very similar to SVD. Here Xk stands for unfolding(matricizing) of the tensor X in rank-k Mode is the dimension of the matrix Rank is #uncorrelated vector in a matrix

RESCAL – How to solve canonical Relational Learning task
Link Prediction: To predict the existence of a relation between two entities, it is sufficient to look at the rank reduced reconstruction of the appropriate slice Xk Collective Classification: Can be cast as link prediction problem by including the classes as entities adding a “class of” relation Standard classification algorithms could be applied to the entities’ latent- component representation A Link-based Clustering Since the entities latent-component representation is computed considering all relations, link-based clustering can be done by clustering the entities in the latent-component space A This expression is a magic. It could be used to solve many relational learning tasks. Link prediction, collective classification and link-based clustering. Link prediction: .....the comparing can be done by comparing Xijk to a given threshold theta, or just by ranking the entries according to their likelihood Collective classification: .... because classification is just an "class of" relation Like: a cat belong to the “mammal”, a whale belong to “mammal” Classification algorithm like : SVM link-based clustering is similar

RESCAL ALGORITHM Now each slice Xk could be further reduced to a “optimization problem” Where loss is the loss function f(A, Rk) And regularization term reg(A,Rk) Remember what we have learned in machine learning class. We can reduce previous expression to an optimization problem. Then we got loss function like this. and regularization term like this. I believe everyone is familiar with terms like this. Then we know what will happen next.

RASCAL – Prediction Example
Predict party membership of US vice presidents Trump Bush Predict Party X Let me give an example of RASCAL algorithm. See, we want to predict party membership of US vice president. The latent-component representation of Pence and Quayle will be similar to each other in this example as both representation reflect that their entities are related to party X. Plus, they are all republican. Because of this, Trump and Bush will also have similar latent-component representations. Consequently, the product of ... and ... will yield similar value. So the missing relation can be predicted correctly. Vice president of Vice president of Pence Quayle

EXPERIMENTS – Predict the party membership of US (vice) president
Visualization of the “presidents of” relation on the US President Dataset. The size of the ring segments indicates how many persons in the dataset are members of the respective party. The size of an arc indicates how often this relation occurs between the connected segments. No other information included in the data other than the party membership and who is (vice) president of whom In this paper, the author did several simple and interesting experiments with RASCAL. Like this one - predict the party membership of US president. This is the visualization of "president of" relation on US president dataset.

EXPERIMENTS – Predict the party membership of US (vice) president
Results of 10-fold cross-validation on this data set against standard tensor factorizations and relational learning algorithms Accuracy measurement AUC: Area under the ROC curve Then the author done 10-fold..... ROC True positive rate against false positive rate. can illustrate the performance of a binary classifier. receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings

EXPERIMENTS – Comparison to state of the art approaches
Task: Perform link prediction on the IRM datasets Kinships, UMLS and nations Comparison to MRC, IRM, and BCTF as well as CP and DEDICOM The author didn't stop here. Here the author did further experiments with much larger datasets, Kinships, UMLS, and nations. This time he compared with other state-of-art tensor decomposition approaches.

Conclusion RESCAL is a tensor factorization approach to relational learning. RESCAL works through information propagation via the entities’ latent- component representation. Fast training time and simple implementation RESCAL included in Scikit-Tensor library

THANK YOU

A Three-way Model for Collective Learning on Multi-Relational Data

Similar presentations

Presentation on theme: "A Three-way Model for Collective Learning on Multi-Relational Data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Three-way Model for Collective Learning on Multi-Relational Data

Similar presentations

Presentation on theme: "A Three-way Model for Collective Learning on Multi-Relational Data"— Presentation transcript:

Similar presentations

About project

Feedback