Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Cross-lingual Entity Alignment via Optimal Transport

Similar presentations


Presentation on theme: "Improving Cross-lingual Entity Alignment via Optimal Transport"— Presentation transcript:

1 Improving Cross-lingual Entity Alignment via Optimal Transport
Shichao Pei, Lu Yu, Xiangliang Zhang CEMSE King Abdullah University of Science and Technology (KAUST), SA 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

2 Outline The Task Background and Related Work Proposed Model Experiment
Conclusion 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

3 The Task Problem KGs differ in language and content – Similar domain, e.g., geography. In each KG which includes a set of triples, each of which includes a head entity (e.g., Mexico), a relation (e.g., neighbor of) and a tail entity (e.g., USA). Entity alignment is to find pairs of entities with the same meaning (one in English KG and the other in French KG), so called aligned entities, e.g., Mexico with Mexique, USA with Etats-Unis. 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

4 The Task Motivation Various methods, sources, and languages have been explored to construct KGs, and most existing KGs are developed separately. These KGs are inevitably heterogeneous in surface forms and typically supplementary in contents. It is thus essential to align entities in multiple KGs and join them into a unified KG for knowledge-driven applications. 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

5 Background Related Work Feature Engineering Methods
The semantics of OWL properties [Hu et al., 2011] Compatible neighbors and attribute values of entities [Suchanek et al., 2012] Structural information of relations [Lacoste-Julien et al., 2013] Well-designed hand-crafted features [Mahdisoltani et al., 2014] Time-consuming, labor-expensive and suffers from extension inflexibility. Embedding-based Methods Encoding the KGs in separated embedding space or unified embedding space. MTransE [Chen et al., 2017], ITransE [Zhu et al., 2017]. Jointly modeling the KGs and attributes. JAPE [Sun et al., 2017], KDCoE [Chen et al., 2018]. Iteratively enlarging the labeled entity pairs based on the bootstrapping strategy. BootEA [Sun et al., 2018] 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

6 Background Limitations of Current Methods
Limited gain due to the shortage of labeled entity pairs Ignorance of duality Failure on matching the whole distribution 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

7 Our Objective Objective Challenges
Learning the translation matrix by dually minimizing both entity-level and group-level loss. The group-level loss describes the discrepancy between distributions of different embeddings. Challenges The group-level loss is difficult to measure using a statistical distance. GAN still suffers from an unstably weak learning signal. Inspired by the progress of optimal transport, how to use the theory to match distributions is still not explored. 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

8 Contribution Proposed to solve entity alignment by dually minimizing both the entity-level loss and the group-level loss via optimal transport theory. Imposed L2,1 norm on the dual translation matrices, which can enforce the translation matrix to be close to orthogonal. Conducted extensive experiments on six real-world datasets and show the superior performance of our proposed model over the state- of-the-art methods. 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

9 Outline The Task Background and Related Work Proposed Model Experiment
Conclusion 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

10 Proposed Model Knowledge Graph Embedding Entity-level loss TransE
Margin-based ranking Entity-level loss 1 3 2 In our paper, we consider two KGs, our method is suitable for multiple KGs. First, We build our model based on the basic TransE which is based on the margin based ranking loss. The formulas are same as the standard TransE loss. Then, we define the entity-level loss. After obtaining entity embeddings of graph Gi and Gj from TransE, we make labeled entities aligned by dually minimizing two distances. That is to say, M1 is learned to transfer the embeddings of Gi into the embedding space of Gj, and M2 is to transfer the embeddings of Gj into the embedding space of Gi. Embeddings of head, tail entity, and relation. M1 transfer embedding of Gi into the embedding space of Gj 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

11 Proposed Model Group-level Loss – Optimal Transport based. WGAN
Transferred embedding entity embedding Then, we define the group-level loss. We define the group-level loss by measuring the difference between p and q with optimal transport distance, 􏰠(p,q) denotes the set of all joint distributions γ(p,q) with marginals p(x) and q(y); c(x,y):G×G→R indicates the transportation cost function for moving x to y. The group-level loss measured by Eq. (8) is hard to calculate directly due to its high computational complexity. Then we can convert it to the next one according to some math theorems. Hence, solving the optimal transport problem has been tran-formed to optimize Wasserstein GAN. Transport matrix WGAN 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

12 Proposed Model Regularizer
The translation matrix is desired to be orthogonal. Employing L2,1 norm as the regularizer. Preventing the matrix to be dense, and mitigating the error induced by dense matrix. 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

13 Outline The Task Background and Related Work Proposed Model Experiment
Conclusion 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

14 Experiment Datasets Baselines: three categories Evaluation Metric
Category 1: Encoding the KGs in separated embedding space or unified embedding space. MTransE, ITransE. Category 2: Jointly modeling the KGs and attributes. JAPE, GCN based method. Category 3: Iteratively enlarging the labeled entity pairs based on the bootstrapping strategy. BootEA, ITransE Evaluation Metric We adopt popular metrics and MRR 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

15 Experiment 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

16 Experiment Sensitivity to the Proportion of Prior Aligned Entities.
All methods have better performance with the growth of the proportion of aligned entities. OTEA and BootEA have much better performance than other baselines, due to the employment of unlabeled data and selection of labeled data. 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

17 Experiment Sensitivity to the Dimension of KG Embeddings
Time Complexity Comparison OTEA method is consistently better than all other baselines. And its performance is quite stable when varying d. OTEA is faster than BootEA, because the bootstrapping based method need to propose the new aligned entities by calculating the similarity with all unaligned entities. 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

18 Outline The Task Background and Related Work Proposed Model Experiment
Conclusion 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

19 Conclusion Introduced a novel framework for cross-lingual entity alignment. Solved the entity alignment by dually minimizing both the entity-level loss and group-level loss via optimal transport theory. Imposed regularizer on the dual translation matrices to mitigate the effect of noise during transformation. Achieved superior results comparing with other SOTA methods. In future work, how to combine the model with attribute and relation information. 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

20 Thank you for your attention
Thank you for your attention! Q&A Lab of Machine Intelligence and kNowledge Engineering (MINE): 10/14/2019 The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China


Download ppt "Improving Cross-lingual Entity Alignment via Optimal Transport"

Similar presentations


Ads by Google