Download presentation
Presentation is loading. Please wait.
Published byFilip Magnussen Modified over 5 years ago
1
Shichao Pei, Lu Yu, Robert Hoehndorf, Xiangliang Zhang
Semi-Supervised Entity Alignment via Knowledge Graph Embedding with Awareness of Degree Difference Shichao Pei, Lu Yu, Robert Hoehndorf, Xiangliang Zhang KAUST Group Meeting
2
Background
3
Knowledge Graph There are several descriptions:
Ehrlinger, Lisa, and Wolfram Wöß. "Towards a Definition of Knowledge Graphs." SEMANTiCS (Posters, Demos, SuCCESS)
4
Knowledge Graph knowledge graph is a knowledge-based system that contains a knowledge base and a reasoning engine. Essential characteristic: collection, extraction, and integration of information from external sources extends a pure knowledge-based system with the concept of integration systems. Formally Ehrlinger, Lisa, and Wolfram Wöß. "Towards a Definition of Knowledge Graphs." SEMANTiCS (Posters, Demos, SuCCESS)
5
Knowledge Graph Interesting Definition: KG is usually represented using triple facts of (head entity, relation, tail entity). A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge. Ehrlinger, Lisa, and Wolfram Wöß. "Towards a Definition of Knowledge Graphs." SEMANTiCS (Posters, Demos, SuCCESS)
6
Multi-Relational data
directed graphs whose nodes correspond to entities and edges of the form (head, edge, tail) each of edges indicates that there exists a relationship between the entities, head and tail. The modeling process boils down to extracting local or global connectivity patterns between entities. Single-relational: ad-hoc but simple modeling assumptions can be made after some descriptive analysis of the data. Multi-relational: the notion of locality may involve relationships and entities of different types at the same time KG is also the multi relational data which is a directed graph whose nodes correspond to entities and edges of the form (head, edge, tail). To get better representation of multi relational data, we need to model the multi relational data first,
7
KG Embedding To embed components of a KG including entities and relations into continuous vector spaces. To simplify the manipulation while preserving the inherent structure of the KG. Those entity and relation embedding can further be used to benefit all kinds of tasks, such as KG completion, relation extraction, entity classification, and entity resolution. We can also model the kg or multi relational data using embedding based method.
8
KG Embedding Many works focus on the knowledge representation learning. TransE [Bordes et al., 2013] projects both entities and relations into a continuous low-dimensional vector space. TransE assumes that in the vector space we have h + r ≃ t, which is simple and effective.
9
Translation-based model
hierarchical relationships are extremely common in KBs and translations are the natural transformations for representing them. In detail,
10
KG Embedding Wang, Quan, et al. "Knowledge graph embedding: A survey of approaches and applications." IEEE Transactions on Knowledge and Data Engineering 29.12 (2017):
11
Entity Alignment - Motivation
Various methods, sources, and languages have been explored to construct KGs, and most existing KGs are developed separately. These KGs are inevitably heterogeneous in surface forms and typically supplementary in contents. It is thus essential to align entities in multiple KGs and join them into a unified KG for knowledge-driven applications.
12
Entity Alignment Chen, Muhao, et al. "Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment.”. IJCAI 2018.
13
Entity Alignment Zhu, Hao, et al. "Iterative entity alignment via joint knowledge embeddings." Proceedings of the 26th International Joint Conference on Artificial Intelligence. AAAI Press, 2017.
14
Problem Definition Describe knowledge in knowledge graph as triples (h, r, t), in which h and t denote head and tail entities and r denotes the relations between entities. A knowledge graph is formalized as KG = (E,R,T), where E,R,T are the set of entities, relations and triples respectively. Suppose there are multiple knowledge graphs Σ = {KGi|KGi = (Ei, Ri, Ti)} of heterogenous and complementary triples. An entity in a KG has its counterparts in other KGs in different languages or surface names. So until now,
15
Problem Definition some synonymous entities among KGs are already known, defined as aligned seeds. each pair of entities from alignment seeds is also called aligned entities. The task of entity alignment is to automatically find and align more synonymous entities based on known alignment seeds.
16
Related Works Feature Engineering.
the semantics of OWL properties [Hu et al., 2011] compatible neighbors and attribute values of entities [Suchanek et al., 2012] structural information of relations [Lacoste-Julien et al., 2013] make use of external lexicons, machine translation, Wikipedia links [Suchanek et al., 2012; Wang et al., 2013] crowdsourcing [Vrandecˇic ́ and Kro ̈tzsch, 2014] well-designed hand-crafted features [Mahdisoltani et al., 2014] These works can achieve high alignment accuracies, while the human- involved approach is time-consuming, labor-expensive and usually suffers from extension inflexibility. automatable
17
Related Works Embedding-based models:
MTransE [Chen et al., 2017] uses TransE to represent different KGs as independent embeddings, and learns transformation between KGs via five alignment models. IPTransE [Zhu et al., 2017] employs PTransE to embed a single KG and integrates three modules (translation-based, linear transformation and parameter sharing) for jointly embedding different KGs. JAPE [Sun et al., 2017] learns embeddings for entities and relations of different KGs in a unified embedding space. It also embeds attributes and leverages attribute correlations to refine entity embeddings. KDCoE [Chen et al., 2018] leverages a weakly aligned multilingual KG for semi- supervised cross-lingual learning using entity descriptions. BootEA [Sun et al., 2018] tries iteratively enlarge the labeled entity pairs based on the bootstrapping strategy.
18
Introduction Knowledge graphs have been constructed and widely applied to organize and represent the knowledge of different domains. Even in the same domain, knowledge graphs are generated by different methods in different languages. It is thus essential to connect multiple knowledge graphs in same domain. So back to our paper, we already know that
19
Introduction The number of accessible prior alignment is usually a small proportion of a whole knowledge graph. Most of these methods require a sufficient number of labeled entities to generalize well in downstream applications. Our work targets on designing semi-supervised entity alignment model, which learns from both labeled and unlabeled entities.
20
Introduction Knowledge graph embedding methods show significant improvement on entity alignment. Our work also takes advantage of embedding methods for building a semi-supervised entity alignment model. we address an important issue in the embedding process, which is caused by the degree difference of entities in different knowledge graphs.
21
Introduction issue in the embedding process
Figure 1 shows a toy example of the problem, when aligning "Matt Damon" in English and French KGs. "Matt Damon" is a very popular name in English KG, while "Matt Damon" in French has fewer records. Embedding results of English KG show "Matt Damon" is close to other popular entities like "Actor" (in color blue in Figure 1 (a)), while the results of French KG (in Figure 1 (b)) have "Matt Damon" in yellow region that is distant to popular entity "Acteur" in blue region. By aligning "Actor" in English and "Acteur" in French, "USA" in English and " Etats-Unis" in French, the embedding space of French KG can be transformed to the embedding space of English KG (e.g., with linear transformation W). The alignment is illustrated in Figure 1 (c)), where we can see that "Matt Damon" in French is far from the "Matt Damon" in English, even they are actually the same entity. Blue: Popular Entity Orange: Normal Entity Yellow: Rare Entity
22
Analysis low and high degree values (in blue and red)
normal degree values (in green) It can be observed that entities at the same degree level tend to be close in the mapped space. This is evidence that there is a real phenomena in kg. Comparison of (a) and (c) shows that entities with low and high degree values (in blue and red) are less matchable than entities with normal degree values (in green). So it proves our observation. Results in (d-f) show that the phenomena in (a-c) have been mitigated by our model. Figure best viewed in pdf or colored print. The influence of entity’s degree is less severe after our model.
23
Analysis Distribution of the 50 nearest neighbors of entities at different degree levels in embedding space of TransE (top) and our degree-aware KG embedding model (bottom). Values in columns of %High (%Normal, %Low) are the portion of the 50 nearest neighbors with high (normal, low) degrees. mitigate
24
Degree-Aware KG embedding
Based on TransE
25
Degree-Aware KG embedding
Design the degree-aware KGE model by training the knowledge graph embedding in an adversarial framework. TransE suffers from the entity with different degree, which will influence the effect of embedding. So we need to push the entities with different degree into a common area in embedding space. We designed two discriminators which can distinguish the embedding of entity with high degree and normal degree, also the embedding of entity with normal degree and low degree. TransE is a generator, then the discriminator tries to distinguish the embedding from high degree or from normal degree, taking the first formula as a example, after training, the discriminator cannot distinguish the embedding from high degree or from normal degree, so we get the embedding space where the entity with high degree and normal degree lay in common area in vector space. So we can address the issue using adversarial training.
26
Framework
27
Semi-Supervised Entity Alignment
Semi-Supervised loss: Inspired by the work of CycleGAN in computer vision. Define cycled consistent loss. After obtaining the degree-aware embedding, we can do entity alignment,
28
Algorithm
29
Experiments Dataset
30
Experiments How much degree-aware KE embedding can help is limited by this small proportion of entities with high and low degree levels.
31
Experiments
32
Experiments
33
Contributions We propose to solve entity alignment in a semi-supervised way, not only using the given aligned entity, but also incorporating the unaligned entity to enhance the performance. We investigate the impact of entity’s degree difference on embedding of knowledge graph, and address the problem under the adversarial training framework.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.