Download presentation
Presentation is loading. Please wait.
Published byRobyn Hudson Modified over 9 years ago
1
Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb 03/31/2014
2
Structured Data ■ Advantages of structured data over unstructured data: □ Search for explicit facts □ Summarization of possibly interesting information □ Automated knowledge discovery ■ Google Knowledge Graph ■ RDF Knowledge bases □ DBpedia, YAGO/NAGA Assigning Global Relevance Scores to DBpedia Facts 1 A handful of salient facts about the query entity.
3
■ Asking for classes to which Albert Einstein belongs Assigning Global Relevance Scores to DBpedia Facts 2 Querying YAGO
4
■ Asking for classes to which Albert Einstein belongs Assigning Global Relevance Scores to DBpedia Facts 3 Querying DBpedia predicateobject rdf:typeowl:Thing rdf:typedbpedia:Agent rdf:typedbpedia:Person rdf:typedbpedia:Scientist rdf:typeumbel:Scientist rdf:typeschema:Person rdf:typeyago:Astronomer109818343 rdf:typefoaf:Person rdf:type19th-centuryAmericanPeople rdf:type19th-centuryGermanPeople
5
Challenge Assigning Global Relevance Scores to DBpedia Facts 4 select distinct ?p, ?o where { dbpedia:Barack_Obama ?p ?o} pc rdf:typeowl:Thing rdf:typedbpedia:Person rdf:typeyago:Person100007846... rdf:typedbpedia:Politician... dbpedia:spousedbpedia:Michelle_Obama Web Documents pc owl:orderInOfficePresident of the United States dbpedia:typedbpedia:Politician dbpedia:spousedbpedia:Michelle_Obama owl:birthPlacedbpedia:Honolulu dbpprop:residencedbpedia:White_House......... rdf:typeowl:Thing
6
Challenges Assigning Global Relevance Scores to DBpedia Facts 5 Big Data DBpedia 3.8, ClueWeb corpus Architecture Text extraction, score computation/ranking, query processing Evaluation Conduction of user studies Ranking Strategies Imrove the ranking results
7
Overview Assigning Global Relevance Scores to DBpedia Facts 6 Languages Python Java SPARQL JavaScript Frameworks: Django Lucene Web application (Django) DBpedia Endpoint (Apache Jena) Application Data (Postgres) Web corpus (Lucene Index) User Studies Querying Ranking strategies Intra DBpedia strategies Web Corpus strategies 6
8
Ranking Facts ■ Query types: □ Subject queries - return all physicists □ Property queries - return all facts related to Einstein ■ Ranking strategies □ Ranking by frequency and document frequency □ Ranking by information diversity □ Random walk □ Web-based co-occurrence statistics Assigning Global Relevance Scores to DBpedia Facts 7 SELECT ?p ?o { Albert_Einstein ?p ?o } SELECT ?s { ?s type Physicist }
9
Ranking by frequency and document frequency Assigning Global Relevance Scores to DBpedia Facts 8 ; "Switzerland"; "Austria-Hungary"; "German Empire"; "Mileva Maric";... subject document of „Albert Einstein“. predicate document of „topic“.... object document of „Theoretical physicists“ [Shady et al ESWC’11]
10
Ranking by frequency and document frequency ■ Subject queries: □ Global relevance Assigning Global Relevance Scores to DBpedia Facts 9 Isaac Newton academicAdvisor...; birthDate...; birthPlace...; comment...; ethnicity...; field...; influenced...; influencedBy...; knownFor...; label...; notableStudent...; subject...; type...; Ravi Gomatam subject...;
11
Limitations for Property Queries ■ Property queries: □ Global relevant but distinctive to the given subject –type Person vs. type Scientist Assigning Global Relevance Scores to DBpedia Facts 10
12
Ranking by diversity ■ Following a probabilistic model □ Property queries: –Properties and objects that are as discriminative as possible □ Subject queries: Assigning Global Relevance Scores to DBpedia Facts 11
13
Random Walk Model ■ Consider the knowledge base as a directed graph □ Already applied in [Kasneci CIKM’09] □ Problem: literals have no outgoing link ■ Use Wiki Pagelinks and Infobox Property Mappings □ Entities with high indegree, such as countries, are favored –Good for subject queries –Bad for property queries Assigning Global Relevance Scores to DBpedia Facts 12
14
Web Documents Co-occurrence statistics ■ Lemur Project Clueweb09 Category-B web corpus □ 50 million web documents (1.5 TB) □ Only English-language documents □ Includes approx. 2.7 million Wikipedia articles ■ Create an inverted index ■ Consider different word distance limits as documents ■ Rank subject-object pairs □ „Albert Einstein“ and „Physicist“ □ Store only pairwise co-occurrence: □ Compute frequency of s: Assigning Global Relevance Scores to DBpedia Facts 13
15
Evaluation ■ User study 1 □ 8 queries □ all results □ 12 users □ 19 approaches/ configurations ■ 1-4: irrelevant- highly relevant ■ User study 2 □ 8+20 queries □ top-10 results of best 4 approaches side-by-side 10 users □ Best 3 approaches from user study 1 Assigning Global Relevance Scores to DBpedia Facts 14
16
Top 4 Approaches in User study 1 Assigning Global Relevance Scores to DBpedia Facts 15
17
User study 2 Assigning Global Relevance Scores to DBpedia Facts 16
18
Results Example:Theoretical Physicists Assigning Global Relevance Scores to DBpedia Facts 17 Subject Albert Einstein Isaac Newton Galileo Galilei James Clerk Maxwell Richard Feynman Stephen Hawking Max Planck Enrico Fermi Werner Heisenberg Pierre-Simon Laplace DBpedia Random Walk Model
19
Results Example: Albert Einstein DBpedia Co-occurrence statistics Assigning Global Relevance Scores to DBpedia Facts 18 predicateobject rdf:typeowl:Thing rdf:typedbpedia:Agent rdf:typedbpedia:Person rdf:typedbpedia:Scientist rdf:typeumbel:Scientist rdf:typeschema:Person rdf:typeyago:Astronomer109818343 rdf:typefoaf:Person rdf:type19th-centuryAmericanPeople rdf:type19th-centuryGermanPeople predicateobject fieldsPhysics fieldPhysics deathPlaceUnited States placeOfDeathUnited States shortDescriptionPhysicists descriptionPhysicist typeScientist ethnicityJewish subjectEinstein family residenceSwitzerland
20
Conclusions ■ Investigated multiple approaches to rank DBpedia facts □ Information theory, statistical reasoning, random walk, and co-occurrence statistics in web documents ■ DBpedia Knowledge base already provides enough information to improve the ranking of results ■ Improvement of property queries through web-based co- occurrence statistics ■ We provide the annotated datasets at □ https://www.hpi.uni-potsdam.de/naumann/sites/dbpedia/ Assigning Global Relevance Scores to DBpedia Facts 19
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.