Who Are Similar to Einstein: A Multi-Type Object Similarity Measure for Entity Recommendation Zheng Liang.

Who Are Similar to Einstein: A Multi-Type Object Similarity Measure for Entity Recommendation
Zheng Liang

Outline Introduction Similarity measures based on EMD
Approaches to Entity Type Weighting Evaluation Summary

Introduction Today most users’ activities are pivoted around entities in Web search and browsing. In order to help users explore further, more and more online systems (such as Google, Yahoo!, and others) can identify the real-world entity, and provide recommendations of related entities based on the relationships in the knowledge base.

Introduction With the publish of a number of knowledge bases as Linked Data (such as Freebase2, DBpedia3, and others), we have the extremely valuable resources to be utilized. However, such knowledge bases have a large amount of related entities based on the relationships with the current entity. Therefore, it is difficult for the online system to find out and determine what users are looking for.

Introduction However, we know not only that the user’s initial understanding of entity can be uniquely linked to an entity type in a knowledge base, but also entity type is important and interesting facet of each entity. Here we focus on recommending the most relevant entities that are similar to the current entity type. The large-scale knowledge bases define a multitude of entity types. For example, the entity `Albert Einstein ' in DBpedia[ ] has 63 types, among which `Person' ,`JewishScientists', `NobelLaureatesInPhysics', and `ETHZurichAlumni' can be found.

Introduction Thus, there is a need for evaluating semantic similarity between the multi-type entities. In previous research, the objects being compared are often modeled as sets, with their similarity traditionally determined based on set intersection. Most existing similarity measures, such as the Cosine measure, the Dice measure, the Jaccard measure, the Overlap measure and the information-theoretic measure.

Introduction However, the above similarity measures cannot take into account structural similarity between objects by adding a hierarchy describing the relationships among domain elements. By exploiting hierarchical structure in some domain, such as WordNet, Cyc and so on, a variety of methods to measure semantic similarity/ distance between objects have been proposed. The main approaches, such as the Shortest Path Lengths, the Lowest Common Ancestor( LCA), are based on distance within an ontological structure or concept information content.

#Entity #Types Albert Einstein Person JewishScientists NobelLaureatesInPhysics ETHZurichAlumni Max Born Felix Bloch Marie Curie Scientist NobelLaureatesInChemistry Sim s(Einstein,Born) s(Einstein,Bloch) s(Einstein,Curie) Jaccard 0.5 Cosine-IDF 0.55 0.87 Cosine-LCA 0.79 0.66 0.68 The results demonstrate the difference of the three measures. But which one is more reasonable?

Introduction Obviously, measuring pairwise element similarity is important for computing similarity between the two collections. Actually, the extent of each element's importance in one collection plays a more crucial role, which represents the contribution or weight in computing the similarity between the two collections. It determines how “good” a “match” between the element of two collections is.

Introduction In this study we introduce a novel similarity measure based on the earth mover’s distance (EMD) [20], which not only takes into account pairwise element similarity, but also the weight of element. Here, the weight of entity type is the key factor in EMD. In this paper, we define the new task of entity type weighting, whose goal is to measure the importance of the entity type. We propose several methods for entity type weighting by exploiting the entity type hierarchy(e.g., the depth of ancestors of entity type), collection statistics(e.g.,IDF), and the graph structure(e.g., weighted PageRank)

Similarity measures based on EMD
0 s(txi, tyj ) 1 i wxi =1 j wyj =1 1  i  m 1  j  n Capacity Cost tx1 txi txm . ty1 tyj tyn X Y wx1 wxi wxm wy1 wyj wyn s(txi, tyj ) b (vxi , vyj ) = [bij]=1-s(txi, tyj ) 1  i  m ; 1  j  n ; 0  bij  1

Problem is formalized as follows:

Approaches to Entity Type Weighting
We define the task of entity type weighting Given an entity e and its types Te = {t1, t2,…, tn} in the knowledge base, we define a type weighting function w(ti), ti  Te . Let w(t1), w(t2),…, w(tn)  [0,1] such that i w(ti)=1 w(ti)>w(tj) represent that the type ti is more important than the type tj among the entity types Te

Statistics-based Approach idf wxi =idf(txi ) /  idf (txi ) txi  X 0  wxi  1 Hierarchy-based Approach ANC_DEPTH wxi =ANC_DEPTH(txi ) /  ANC_DEPTH (txi ) txi  X

Weighted PageRank-based Approach There are some common sense approaches to the way of thinking, such as vertical thinking and horizontal thinking. In current context, the vertical thinking and horizontal thinking will be reflected in the cognitive entity type. Entity type graph is restructured and we newly define two kinds of edge: “Vertical Edge” and “Horizontal Edge”.

Weighted PageRank-based Approach t1 Vertical Edge Vertical Edge t2 tn Horizontal Edge Entity type graph is restructured

Weighted PageRank-based Approach Furthermore, Considering that when a user is navigating inside the entity type DAG, the user may have a preference on which kind of edge to follow. We define a Weighted Type Graph w(i, j) = p* vert(i, j) + (1-p)* hor(i, j) where vert(i, j) and hor(i, j) are 0 or 1, representing the existence of vertical or horizontal edge from i to j respectively, and p is the navigational preference of a surfer.

Weighted PageRank-based Approach We denote the measurement of entity type based on Weighted PageRank, as Cp . The Cp of each entity type can be computed as following, wxi = Cp(txi ) /  Cp (txi ) txi  X 0  wxi  1

The Experimental Setup
EVALUATION The Experimental Setup DBpedia 4 Data sets(Scientist, Actor, Company, City) Data set #Entity #Types Max.Type Avg.Type Avg.Depth Scientist 9920 7980 55 5.68 Actor 2244 1513 26 5.22 Company 31096 9137 52 6.71 City 13494 2596 17 7.63

The Experimental Setup
Case Study The two tasks: weight of entity type; Similar Type entity The four entities: (Einstein, Sydney, Jackie Chan, IBM ) Gold Standard the depth-10 pooling technique The 20 users, give ratings 3, 2 and 1 (“closely important/similar”, “somewhat important/similar” and “no important/similar”)

Evaluation Metrics

1: Type Weight NDCG@3 Albert Einstein Sydney Jackie Chan IBM IDF
ANC_DEPTH WPR (p=0.2) WPR (p=0.5) WPR (p=0.8)

Analysis of the Results
观察1：最终nDCG值，基于WPR方法的nDCG值高于IDF和ANC_DEPTH两个方法，验证 WPR方法的有效性观察2：在基于WPR方法，导航概率p分别取0.2, 0.5, 0.8 三种不同的情况，随着p增加， nDCG值保持上升或稳定，得到的推断：用户对那种特殊的且临近该type周边丰富做为重要度的评价依据, 较符合用户的直觉。

2: Entity Recommendation Based on Similar Type
Albert Einstein Sydney Jackie Chan IBM Jaccard Cosine-IDF EMD Weight Cost 1/n Edit-distances LCA[1] 1 IDF LCA [1] WPR [1]Jiang J J, Conrath D W. Semantic similarity based on corpus statistics and lexical taxonomy[J]. arXiv preprint cmp-lg/ , 1997.

NDCG@5 Albert Einstein Sydney Jackie Chan IBM Jaccard
Cosine-IDF EMD Weight Cost 1/n Edit-distances LCA[1] IDF LCA [1] WPR [1]Jiang J J, Conrath D W. Semantic similarity based on corpus statistics and lexical taxonomy[J]. arXiv preprint cmp-lg/ , 1997.

NDCG@10 Albert Einstein Sydney Jackie Chan IBM Jaccard
Cosine-IDF EMD Weight Cost 1/n Edit-distances LCA[1] IDF LCA [1] WPR [1]Jiang J J, Conrath D W. Semantic similarity based on corpus statistics and lexical taxonomy[J]. arXiv preprint cmp-lg/ , 1997.

Analysis of the Results
观察1：采用EMD方法得到的nDCG值比传统方法基本上要高，（除了基于IDF的EMD方法）验证 EMD方法的有效性观察2：基于1/n、IDF及WPR的的EMD方法基于WPR的EMD方法比基于1/n及基于IDF的EMD方法得到的nDCG值基本上要高，验证基于WPR的EMD方法的有效性，结果符合人们的直觉基于IDF的EMD方法得到nDCG值在某些情况下甚至传统方法还要低。得到的结论：权重在EMD方法中起到重要作用，不合理的权重分配，会起到反作用，导致比简单方法更差的结果

Summary In summary, the main contributions of this paper are:
We introduce the multi-type object similarity measures based on EMD for similar entity recommendation, leading to similar entities that are more intuitive than the ones generated by traditional similarity measures. We define the task of weighting entity type, and develop a novel approach to type weighting, which mainly simulate a user’s walk on type graph.

不足需要进一步完善的实验的评价度量单一（只有NDCG，添加其它一些度量AP…）
在Type Weight 实验中 (k=3,5,10,20) 在similar entity recommendation实验中再添加一些传统度量进行比较算法时间复杂度

Who Are Similar to Einstein: A Multi-Type Object Similarity Measure for Entity Recommendation Zheng Liang.

Similar presentations

Presentation on theme: "Who Are Similar to Einstein: A Multi-Type Object Similarity Measure for Entity Recommendation Zheng Liang."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Who Are Similar to Einstein: A Multi-Type Object Similarity Measure for Entity Recommendation Zheng Liang.

Similar presentations

Presentation on theme: "Who Are Similar to Einstein: A Multi-Type Object Similarity Measure for Entity Recommendation Zheng Liang."— Presentation transcript:

Similar presentations

About project

Feedback