Who Are Similar to Einstein: A Multi-Type Object Similarity Measure for Entity Recommendation Zheng Liang.

Slides:

Advertisements

Similar presentations

Mining User Similarity Based on Location History Yu Zheng, Quannan Li, Xing Xie Microsoft Research Asia.

Advertisements

Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.

Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme Presented by Smitashree Choudhury.

Improved TF-IDF Ranker

Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.

A Framework for Ontology-Based Knowledge Management System

Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.

Compare&Contrast: Using the Web to Discover Comparable Cases for News Stories Presenter: Aravind Krishna Kalavagattu.

Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.

1 Visual Analysis of Large Heterogeneous Social Networks by Semantic and Structural Abstraction Zequian shen, Kwan-Liu Ma, Tina Eliassi-Rad Department.

2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007.

Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.

Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.

-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.

RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,

Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova ， Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.

Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.

Evgeniy Gabrilovich and Shaul Markovitch

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.

GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011

1 Intelligent Information System Lab., Department of Computer and Information Science, Korea University Semantic Social Network Analysis Kyunglag Kwon.

1 The case of sculpting atmospheres: towards design principles for expressive tangible interaction in control of ambient systems Pers Ubiquit Comput (2007)

VECTOR SPACE INFORMATION RETRIEVAL 1Adrienn Skrop.

Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,

Automatic Writing Evaluation

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Automatic cLasification d

Course Outcomes of Object Oriented Modeling Design (17630,C604)

Guangbing Yang Presentation for Xerox Docushare Symposium in 2011

CCNT Lab of Zhejiang University

Esityksen aihe Learning technology entrepreneurship via step-by step deepening E-E collaboration Juha Saukkonen senior lecturer, Management.

Parts of an Academic Paper

Neil A. Ernst, Margaret-Anne Storey, Polly Allen, Mark Musen

Personalized Social Image Recommendation

KNOWLEDGE ACQUISITION

E-Commerce Theories & Practices

Associative Query Answering via Query Feature Similarity

Measuring Polygon Side Lengths

3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.

Detecting Insider Information Theft Using Features from File Access Logs Every action, on your phone, on your computer, online, has some risk associated.

Social Knowledge Mining

Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.

Wikitology Wikipedia as an Ontology

Presentation 王睿.

#VisualHashtags Visual Summarization of Social Media Events using Mid-Level Visual Elements Sonal Goel (IIIT-Delhi), Sarthak Ahuja (IBM Research, India),

2.3 Inverse Trigonometric Functions

Postdoc, School of Information, University of Arizona

Liang Zheng, Yuzhong Qu Nanjing University, China

Liang Zheng and Yuzhong Qu

Representation of documents and queries

Martin Rajman, EPFL Switzerland & Martin Vesely, CERN Switzerland

Topic Oriented Semi-supervised Document Clustering

An Empirical Study of Property Collocation on Large Scale of Knowledge Base 龚赛赛

Property consolidation for entity browsing

ISWC 2013 Entity Recommendations in Web Search

Magnet & /facet Zheng Liang

Towards Exploratory Relationship Search: A Clustering-Based Approach

Korea University of Technology and Education

Navigation-Aided Retrieval

Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.

Facilitating Navigation on Linked Data through Top-K Link Patterns

Type Similarity Measure and Its Application to Entity Recommendation

Filtering Properties of Entities By Class

ROLE OF «electronic virtual enhanced research-engaged student teams» WEB PORTAL IN SOLUTION OF PROBLEM OF COLLABORATION INTERNATIONAL TEAMS INSIDE ONE.

Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou

WSExpress: A QoS-Aware Search Engine for Web Services

Connecting the Dots Between News Article

Presentation transcript:

Who Are Similar to Einstein: A Multi-Type Object Similarity Measure for Entity Recommendation Zheng Liang

Outline Introduction Similarity measures based on EMD Approaches to Entity Type Weighting Evaluation Summary

Introduction Today most users’ activities are pivoted around entities in Web search and browsing. In order to help users explore further, more and more online systems (such as Google, Yahoo!, and others) can identify the real-world entity, and provide recommendations of related entities based on the relationships in the knowledge base.

Introduction With the publish of a number of knowledge bases as Linked Data (such as Freebase2, DBpedia3, and others), we have the extremely valuable resources to be utilized. However, such knowledge bases have a large amount of related entities based on the relationships with the current entity. Therefore, it is difficult for the online system to find out and determine what users are looking for.

Introduction However, we know not only that the user’s initial understanding of entity can be uniquely linked to an entity type in a knowledge base, but also entity type is important and interesting facet of each entity. Here we focus on recommending the most relevant entities that are similar to the current entity type. The large-scale knowledge bases define a multitude of entity types. For example, the entity `Albert Einstein ' in DBpedia[ ] has 63 types, among which `Person' ,`JewishScientists', `NobelLaureatesInPhysics', and `ETHZurichAlumni' can be found.

Introduction Thus, there is a need for evaluating semantic similarity between the multi-type entities. In previous research, the objects being compared are often modeled as sets, with their similarity traditionally determined based on set intersection. Most existing similarity measures, such as the Cosine measure, the Dice measure, the Jaccard measure, the Overlap measure and the information-theoretic measure.

Introduction However, the above similarity measures cannot take into account structural similarity between objects by adding a hierarchy describing the relationships among domain elements. By exploiting hierarchical structure in some domain, such as WordNet, Cyc and so on, a variety of methods to measure semantic similarity/ distance between objects have been proposed. The main approaches, such as the Shortest Path Lengths, the Lowest Common Ancestor( LCA), are based on distance within an ontological structure or concept information content.

#Entity #Types Albert Einstein Person JewishScientists NobelLaureatesInPhysics ETHZurichAlumni Max Born Felix Bloch Marie Curie Scientist NobelLaureatesInChemistry Sim s(Einstein,Born) s(Einstein,Bloch) s(Einstein,Curie) Jaccard 0.5 Cosine-IDF 0.55 0.87 Cosine-LCA 0.79 0.66 0.68 The results demonstrate the difference of the three measures. But which one is more reasonable?

Introduction Obviously, measuring pairwise element similarity is important for computing similarity between the two collections. Actually, the extent of each element's importance in one collection plays a more crucial role, which represents the contribution or weight in computing the similarity between the two collections. It determines how “good” a “match” between the element of two collections is.

Introduction In this study we introduce a novel similarity measure based on the earth mover’s distance (EMD) [20], which not only takes into account pairwise element similarity, but also the weight of element. Here, the weight of entity type is the key factor in EMD. In this paper, we define the new task of entity type weighting, whose goal is to measure the importance of the entity type. We propose several methods for entity type weighting by exploiting the entity type hierarchy(e.g., the depth of ancestors of entity type), collection statistics(e.g.,IDF), and the graph structure(e.g., weighted PageRank)

Similarity measures based on EMD 0 s(txi, tyj ) 1 i wxi =1 j wyj =1 1  i  m 1  j  n Capacity Cost tx1 txi txm . ty1 tyj tyn X Y wx1 wxi wxm wy1 wyj wyn s(txi, tyj ) b (vxi , vyj ) = [bij]=1-s(txi, tyj ) 1  i  m ; 1  j  n ; 0  bij  1

Problem is formalized as follows:

Approaches to Entity Type Weighting We define the task of entity type weighting Given an entity e and its types Te = {t1, t2,…, tn} in the knowledge base, we define a type weighting function w(ti), ti  Te . Let w(t1), w(t2),…, w(tn)  [0,1] such that i w(ti)=1 w(ti)>w(tj) represent that the type ti is more important than the type tj among the entity types Te

Approaches to Entity Type Weighting Statistics-based Approach idf wxi =idf(txi ) /  idf (txi ) txi  X 0  wxi  1 Hierarchy-based Approach ANC_DEPTH wxi =ANC_DEPTH(txi ) /  ANC_DEPTH (txi ) txi  X

Approaches to Entity Type Weighting Weighted PageRank-based Approach There are some common sense approaches to the way of thinking, such as vertical thinking and horizontal thinking. In current context, the vertical thinking and horizontal thinking will be reflected in the cognitive entity type. Entity type graph is restructured and we newly define two kinds of edge: “Vertical Edge” and “Horizontal Edge”.

Approaches to Entity Type Weighting Weighted PageRank-based Approach t1 Vertical Edge Vertical Edge t2 tn Horizontal Edge Entity type graph is restructured

Approaches to Entity Type Weighting Weighted PageRank-based Approach Furthermore, Considering that when a user is navigating inside the entity type DAG, the user may have a preference on which kind of edge to follow. We define a Weighted Type Graph w(i, j) = p* vert(i, j) + (1-p)* hor(i, j) where vert(i, j) and hor(i, j) are 0 or 1, representing the existence of vertical or horizontal edge from i to j respectively, and p is the navigational preference of a surfer.

Approaches to Entity Type Weighting Weighted PageRank-based Approach We denote the measurement of entity type based on Weighted PageRank, as Cp . The Cp of each entity type can be computed as following, wxi = Cp(txi ) /  Cp (txi ) txi  X 0  wxi  1

The Experimental Setup EVALUATION The Experimental Setup DBpedia 4 Data sets(Scientist, Actor, Company, City) Data set #Entity #Types Max.Type Avg.Type Avg.Depth Scientist 9920 7980 55 14.328629 5.68 Actor 2244 1513 26 16.070856 5.22 Company 31096 9137 52 11.959127 6.71 City 13494 2596 17 10.809471 7.63

The Experimental Setup Case Study The two tasks: weight of entity type; Similar Type entity The four entities: (Einstein, Sydney, Jackie Chan, IBM ) Gold Standard the depth-10 pooling technique The 20 users, give ratings 3, 2 and 1 (“closely important/similar”, “somewhat important/similar” and “no important/similar”)

Evaluation Metrics

1: Type Weight NDCG@3 Albert Einstein Sydney Jackie Chan IBM IDF 0.4010352503843686 0.45740605904715065 0.49136513350960015 0.4347886049176016 ANC_DEPTH 0.7108654703178249 0.4701606425251276 0.5184331474616707 0.4339566908687293 WPR (p=0.2) 0.7477647758223119 0.44308336488840117 WPR (p=0.5) WPR (p=0.8) 0.5800016153724736 0.5989955621018614 0.46105136253703993

Analysis of the Results 观察1：最终nDCG值，基于WPR方法的nDCG值高于IDF和ANC_DEPTH两个方法，验证 WPR方法的有效性观察2：在基于WPR方法，导航概率p分别取0.2, 0.5, 0.8 三种不同的情况，随着p增加， nDCG值保持上升或稳定，得到的推断：用户对那种特殊的且临近该type周边丰富做为重要度的评价依据, 较符合用户的直觉。

2: Entity Recommendation Based on Similar Type NDCG@3 Albert Einstein Sydney Jackie Chan IBM Jaccard 0.6173634172 0.8046129061698147 0.6967307005844323 0.39664163797074165 Cosine-IDF 0.7462155048864013 0.791760695043853 0.3439765968668957 EMD Weight Cost 1/n Edit-distances 0.8308103365909342 0.9025183213513805 0.6296908825811813 LCA[1] 1 0.5450960508766485 IDF 0.4636814559739926 LCA [1] 0.5977610919804944 WPR 0.7988805459902472 0.9594535145926796 [1]Jiang J J, Conrath D W. Semantic similarity based on corpus statistics and lexical taxonomy[J]. arXiv preprint cmp-lg/9709008, 1997.

NDCG@5 Albert Einstein Sydney Jackie Chan IBM Jaccard 0.7925353824831283 0.7797481355613947 0.7790523571555098 0.43224719895879193 Cosine-IDF 0.8646508450697649 0.7448119036846329 0.6899624473586741 0.4912471141333091 EMD Weight Cost 1/n Edit-distances 0.8586296918245756 0.9659238902765536 0.7366173634617255 0.5810205889646239 LCA[1] 0.8692416898160293 0.8357926221960373 0.7820509230540151 IDF 0.5717903032384376 0.7854509758300989 0.6454174924602561 0.5604228010734621 LCA [1] 0.6765473050556958 0.8278815954653227 0.5647854111425092 WPR 0.7526317273206226 0.7811623183601433 0.7827344880884941 0.708512626044815 [1]Jiang J J, Conrath D W. Semantic similarity based on corpus statistics and lexical taxonomy[J]. arXiv preprint cmp-lg/9709008, 1997.

NDCG@10 Albert Einstein Sydney Jackie Chan IBM Jaccard 0.7983937697565586 0.7840496616487352 0.8390045106028717 0.6760959448747315 Cosine-IDF 0.9502454219871792 0.8410468031668601 0.6474201485048228 0.6347262357660418 EMD Weight Cost 1/n Edit-distances 0.9182318047645928 0.9142181639951701 0.8391398641661951 0.7357275672847454 LCA[1] 0.9081030793140307 0.8869856924025382 0.853142648772392 0.8073396705129684 IDF 0.6476567476942635 0.9286642051843628 0.7294801570674362 0.730098497264242 LCA [1] 0.7393879612055377 0.8895015723878011 0.6679687976891499 0.7257362148942302 WPR 0.8454701000691338 0.968993398369703 0.8401970866141474 0.7048479378594289 0.9502072438095415 0.9314469354384113 0.8854937913535869 0.8209491186905515 [1]Jiang J J, Conrath D W. Semantic similarity based on corpus statistics and lexical taxonomy[J]. arXiv preprint cmp-lg/9709008, 1997.

Analysis of the Results 观察1：采用EMD方法得到的nDCG值比传统方法基本上要高，（除了基于IDF的EMD方法）验证 EMD方法的有效性观察2：基于1/n、IDF及WPR的的EMD方法基于WPR的EMD方法比基于1/n及基于IDF的EMD方法得到的nDCG值基本上要高，验证基于WPR的EMD方法的有效性，结果符合人们的直觉基于IDF的EMD方法得到nDCG值在某些情况下甚至传统方法还要低。得到的结论：权重在EMD方法中起到重要作用，不合理的权重分配，会起到反作用，导致比简单方法更差的结果

Summary In summary, the main contributions of this paper are: We introduce the multi-type object similarity measures based on EMD for similar entity recommendation, leading to similar entities that are more intuitive than the ones generated by traditional similarity measures. We define the task of weighting entity type, and develop a novel approach to type weighting, which mainly simulate a user’s walk on type graph.

不足需要进一步完善的实验的评价度量单一（只有NDCG，添加其它一些度量AP…）在Type Weight 实验中 NDCG@k (k=3,5,10,20) 在similar entity recommendation实验中再添加一些传统度量进行比较算法时间复杂度