Gleaning Types for Literals in RDF Triples with Application to Entity Summarization 1 Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis),

Gleaning Types for Literals in RDF Triples with Application to Entity Summarization 1 Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State University, USA 2 National Key Laboratory for Novel Software Technology, Nanjing University, China 13 th Extended Semantic Web Conference (ESWC ) 2016 Greece, 05.31.2016 Kalpa Gunaratna 1 Krishnaprasad Thirunarayan 1 Amit Sheth 1 Gong Cheng 2

o Literals and background of Entity Summarization o Typing literals in knowledge graphs o Entity Summarization (FACES-E) o Evaluation – Typing – Entity Summarization with datatype properties o Conclusion and Future Work 2 Talk Overview

o Considerable amount of information captured in datatype properties. – 1600 datatype properties vs. 1079 object properties in DBpedia o Many literals can be “easily typed” for proper interpretation and use. – Example: in DBpedia, http://dbpedia.org/property/location has ~1,00,000 unique simple literals that can be directly mapped to entities.http://dbpedia.org/property/location o Added semantics can be used in practical and useful applications like (i) entity summarization, (ii) property alignment, (iii) data integration, and (iv) dataset profiling. 3 Motivating Facts – Literals and Semantics

o Datasets and knowledge graphs on the web continue to grow in number and size. – DBpedia (3.9) has around 200 triples on average per entity. o All the facts of an entity are difficult to process when browsing. o Better presentation is required. Good quality summaries can help! 4 Lets Focus on Entity Summarization now …..

5 Importance of Entities and Summaries Google has its own knowledge graph called Google Knowledge Graph (GKG) to facilitate search. Google made summarization their second priority in building GKG*. * Singhal, A. 2012. Introducing the knowledge graph: things, not strings. Official Google Blog, May.

o Introduced FACES (FACeted Entity Summaries) approach *. o FACES follows two main steps. o First, it groups “conceptually” similar features. – Two groups will have different facts from each other. o Second, it picks features (property-value pairs) from these groups, improving diversity, for the summaries. 6 Diversity-Aware Entity Summaries (FACES approach) - Background * Kalpa Gunaratna, Krishnaprasad Thirunarayan, and Amit Sheth. 'FACES: Diversity-Aware Entity Summarization using Incremental Hierarchical Conceptual Clustering'. 29th AAAI Conference on Artificial Intelligence (AAAI 2015), AAAI, 2015.

7 Faceted Entity Summary - Example Marie Curie Pierre Curie Warsaw Passy,_Haute- Savoie ESPCI_ParisTech University_of_Paris Radioactivity Chemistry Spouse DeathPlace Birth Place Field AlmaMater KnownFor WorkInstitutions {f1,f2, f6} Concise and comprehensive summary could be: {f1,f2, f6} Another summary could be: {f4, f6, f7}

o FACES utilizes type semantics of objects in grouping features. o Literals in RDF triples do not have “semantic” types. They only have primary data types (e.g., date, integer, string, etc). o Can we try to add semantic types to literals? How? 8 Information coming from literals???

o FACES can only handle object property based features. o Why? – any specific reason??? – Values of the features are not URIs and have no “semantic” types. – Hence, the adapted algorithm (Cobweb) for grouping features can not get types for property object values. – It can not create the partitions for faceted entity summaries. o Our contributions are to: – First compute types for the values of datatype property based features (data enrichment). – Then, adapt and improve ranking algorithms (summarization). FACES-E system. 9 Typing Literals in RDF Triples for Entity Summarization

10 Typing Datatype Property Values - Example dbr:Barack_Obama dbo:Politician dbp:vicePresident dbr:Joe_Biden rdf:type dbr:Barack_Obama “44 th President of the United States”^^xsd:string dbp:shortDescription dbr:Calvin_Coolidge “48 th Governor of Massachusetts”^^xsd:string dbp:orderInOffice dbo:President dbo:Governor rdf:subClassOf

o Focus of the literal is not clear unlike URIs. o May contain several entities or labels matching ontology classes. o The literal can be long. – In this work, we focus on one sentence long literals. – For a paragraph like text, finding a single focus is hard and needs different techniques. 11 Why is it Hard? 44 th President of the United States option 1 option 2option 3

o We expect the focus of the sentence or phrase leads to the representative entity/type of the sentence. o There are prominent works on identifying head word of a sentence/phrase. – Example: member of committee o We use existing head word detection algorithms to identify the focus term. – Collins’ head word detection algorithm 12 Focus term identification

We filter out date and numeric values. 1.Exact matching of focus term to class labels. – E.g., “48 th Governor of Massachusetts”  Governor (class) 2.Get the n-grams and see for a matching class using n-gram and focus term overlap (maximal match). I.Check for a matching class for an overlapping n-gram. II.If a type not found, spot entities in the n-grams and get their types. “United States Senate”  “United State Senate” 3-gram matches the entity in DBpedia. 3.Semantic matching of focus term to class labels. – We compare pairwise similarity of the focus term with all the class labels and pick the highest (we utilize UMBC similarity service). 13 Deriving type (class) from head word

14 Process Flow N-grams Extractor Head-word Detector Entity Spotter Phrase Identifier Primary Type Filter N-grams + Head-word to Class Label Matcher Head-word Semantic Matcher Types for the literal Pre-processing Type processing Head-word to Class Label Matcher

15 Typing Literals Algorithm Outline If you really wanted to know how ….

o Ranking mechanism for objects (in the FACES) do not work. – Why? Two literals can be unique even if their types and the main entities are the same. Example, “United States President” Vs. “President of the United States” (counting is affected). Not desirable to search using the whole phrase. – Hence, use entities. – A literal can have several entities. Which one to choose? 16 Ranking Datatype Property Features

o We observe humans recognize popular entities. – Entities can be in literals with variations. o We use the popular entities in literals and not the literals themselves for ranking. o Functions – Function ES(v) returns all entities present in the value v. – Function max(ES(v)) returns the most popular entity in ES(v). 17 Idea for Ranking v = “44 th President of the United States” ES(v) = {db:President, db:United States} max(ES(v)) = db:United States Remember: our goal and objective of ranking is disjoint with typing mechanism

18 Modified Ranking Equations If you really wanted to know … informativeness is inversely proportional to the number of entities that are associated with overlapping values containing the most popular entity of feature f. Frequency of the most popular entity in v. tf-idf based ranking score.

o Aggregate feature ranking scores for each facet. o Rank facets based on the aggregated scores. 19 Facet Ranking Rank(f) is the original function and Rank(f)’ is the modified one for datatype property based features.

1.Extract features for the entity e. 2.Enrich each feature and get the WordSet WS(f). 3.Enriched feature set FS(e) is input to the partitioning algorithm and get facet set F(e). 4.First get the feature ranking scores (R(f)) and then compute the facet ranking scores for each facet (FacetRank(F(e)). 5.Top ranked features from top ranked facets in the order are picked to form the faceted entity summary. The constraints defined in the definition for the faceted entity summary hold. 20 FACES-E Entity Summary Generation (1)(2)(3)(4)(5) Enriching LiteralsModified Ranking

LiteralTypes United States Ambassador to the United NationsAgent, Ambassador, Person Chairman of the Republican National CommitteeAgent, Politician, Person, President, United States NavyAgent, Organisation, Military Unit Member of the New York State SenateAgent, OrganisationMember, Person Senate Minority LeaderAgent, Politician, Person, President United States SenateAgent, Organisation, Legislature from VirginiaAdministrative Region, Place, Region, Populated Place Denison, Texas, U.S.Administrative Region, Place, Country, Region,Populated Place 21 Type Computation Samples with super types excluding owl:Thing

o Type Set TS(v) is the generated set of types for the value v. 22 Evaluation – Type Generation Metrics n is the total number of features.

o DBpedia Spotlight is used as the baseline and had 1117 unique property-value pairs (features). o 118 pairs (consisting of labelling properties and noisy features) were removed. o Results convey that special care should be taken in deciding types for literals. 23 Evaluation – Type Generation Mean Precision (MP)Any Mean Precision (AMP)Coverage Our approach0.82900.88290.8529 Baseline0.48670.58250.5533

24 Evaluation – Summarization Metrics Average pairwise agreement of the ideal summaries Average summary overlap between system generated and ideal summaries.

o The gold standard consists of 20 random entities used in FACES taken from DBpedia 3.9 and 60 random entities taken from DBpedia 2015-04. o 17 human users created ideal summaries (total of 900). Each entity received at least 4 ideal summaries for each length. 25 Evaluation – FACES-E Summary Generation Systemk = 5k = 10 Avg. Quality% IncreaseAvg. Quality% Increase FACES-E1.5308-4.5320- RELIN0.961159 %3.098846 % RELINM1.025149 %3.651424 % Avg. Agreement2.11685.4363 k is the summary length

o Consider meaning of the property name to compute types. o Literals and properties are noisy. – Identify those automatically to filter out. – Filter out labelling properties (automatic identification). This is hard. o A formal model to capture the semantic types in RDF for literals. – Without changing their original representation (literals). 26 Future Work

27 Thank You http://knoesis.wright.edu/researchers/kalpa kalpa@knoesis.org Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA Questions ? FACES project page: http://wiki.knoesis.org/index.php/FACES

28 Appendix

o Entities are described by features. o Feature: A property-value pair is called a feature. o Feature Set: All the features that describe an entity. o Entity Summary of size k: A subset of the feature set for an entity, constrained by size k. 29 Preliminaries Entity summaries for k=3: {f1,f2,f5}, {f4, f6, f7}, {f3,f4,f5}, … Entity – Marie Curie Feature SetFeaturesPropertyValue FS f1spousePierre_Curie f2birthPlaceWarsaw f3deathPlacePassy_Haute-Savoie f4almaMaterESPI_ParisTech f5workInstitutionsUniversity_of_Paris f6knownForRadioactivity f7fieldChemistry

Facets (partition) Given an entity e, a set of facets F(e) of e is a partition of the feature set FS(e). That is, F(e) = {C 1, C 2,..C n } such that F(e) satisfies: (i) Non-empty: ∉ F(e). (i) Non-empty: ∅ ∉ F(e). (ii) Collectively exhaustive: C 1 ∪ C 2 ∪ …C n = FS(e). (iii) Mutually (pairwise) disjoint: C i ≠ C j then C i ∩ C j =. (iii) Mutually (pairwise) disjoint: C i ≠ C j then C i ∩ C j = ∅. Faceted entity summary Given an entity e and a positive integer k |F(e)| and ∀ X ∈ F(e), X ∩ FSumm(e,k) ≠ or (ii) k ≤ |F(e)| and ∀ X ∈ F(e), | X ∩ FSumm(e,k)| ≤ 1 holds, where F(e) is a set of facets of FS(e). Given an entity e and a positive integer k |F(e)| and ∀ X ∈ F(e), X ∩ FSumm(e,k) ≠ ∅ or (ii) k ≤ |F(e)| and ∀ X ∈ F(e), | X ∩ FSumm(e,k)| ≤ 1 holds, where F(e) is a set of facets of FS(e). 30 Faceted Entity Summary Faceted entity summary, {f1, f6} k=2: {f1, f6} {f1, f2, f6} k=3: {f1, f2, f6}

Gleaning Types for Literals in RDF Triples with Application to Entity Summarization 1 Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis),

Similar presentations

Presentation on theme: "Gleaning Types for Literals in RDF Triples with Application to Entity Summarization 1 Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis),"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Gleaning Types for Literals in RDF Triples with Application to Entity Summarization 1 Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis),

Similar presentations

Presentation on theme: "Gleaning Types for Literals in RDF Triples with Application to Entity Summarization 1 Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis),"— Presentation transcript:

Similar presentations

About project

Feedback