Outline 1.Introduction 2.Harvesting Classes 3.Harvesting Facts 4.Common Sense Knowledge 5.Knowledge Consolidation 6.Web Content Analytics 7.Wrap-Up in.

Slides:



Advertisements
Similar presentations
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Advertisements

Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
SWIMs: From Structured Summaries to Integrated Knowledge Base
Gerhard Weikum Max Planck Institute for Informatics & Saarland University Semantic Search: from Names and Phrases to.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
Coupling Semi-Supervised Learning of Categories and Relations by Andrew Carlson, Justin Betteridge, Estevam R. Hruschka Jr. and Tom M. Mitchell School.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Defense: Knowledge Sharing and Yahoo Answers: Everyone Knows Something L. A. Adamic, et al.
Holistic Web Page Classification William W. Cohen Center for Automated Learning and Discovery (CALD) Carnegie-Mellon University.
Module 2b: Modeling Information Objects and Relationships IMT530: Organization of Information Resources Winter, 2007 Michael Crandall.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Querying RDF Data with Text Annotated Graphs Lushan Han, Tim Finin, Anupam Joshi and Doreen Cheng SSDBM’15 
Short Text Understanding Through Lexical-Semantic Analysis
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
DBrev: Dreaming of a Database Revolution Gjergji Kasneci, Jurgen Van Gael, Thore Graepel Microsoft Research Cambridge, UK.
Semantic Search: different meanings. Semantic search: different meanings Definition 1: Semantic search as the problem of searching documents beyond the.
The Web’s Many Models Michael J. Cafarella University of Michigan AKBC May 19, 2010 ?
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
Open Information Extraction using Wikipedia
-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.
Information Extraction Lecture 8 – Ontological and Open IE CIS, LMU München Winter Semester Dr. Alexander Fraser.
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
Semantic Web Exam 1 Review.
© Copyright 2013 STI INNSBRUCK “How to put an annotation in HTML?” Ioannis Stavrakantonakis.
Algorithmic Detection of Semantic Similarity WWW 2005.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Semi-Automatic Quality Assessment of Linked Data without Requiring Ontology Saemi Jang, Megawati, Jiyeon Choi, and Mun Yong Yi KIRD, KAIST NLP&DBPEDIA.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Exploiting Relevance Feedback in Knowledge Graph Search
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Tutorial: Knowledge Bases for Web Content Analytics
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
An Ontological Approach to Financial Analysis and Monitoring.
© Copyright 2015 STI INNSBRUCK PlanetData D2.7 Recommendations for contextual data publishing Ioan Toma.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Applying Link-based Classification to Label Blogs Smriti Bhagat, Irina Rozenbaum Graham Cormode.
The Road to the Semantic Web Michael Genkin SDBI
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
NELL Knowledge Base of Verbs
A Brief Introduction to Distant Supervision
Combining Labeled and Unlabeled Data with Co-Training
Information Extraction from Wikipedia: Moving Down the Long Tail
Big Data Quality the next semantic challenge
Factor Graph in DeepDive
Analyzing and Securing Social Networks
Result of Ontology Alignment with RiMOM at OAEI’06
A Schema and Instance Based RDF Dataset Summarization Tool
Property consolidation for entity browsing
Dave Touretzky Read Mitchell et al. (2018)
An Interactive Approach to Collectively Resolving URI Coreference
Big Data Quality the next semantic challenge
A Graph-Based Approach to Learn Semantic Descriptions of Data Sources
Information Networks: State of the Art
ProBase: common Sense Concept KB and Short Text Understanding
Actively Learning Ontology Matching via User Interaction
Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.
Big Data Quality the next semantic challenge
Presentation transcript:

Outline 1.Introduction 2.Harvesting Classes 3.Harvesting Facts 4.Common Sense Knowledge 5.Knowledge Consolidation 6.Web Content Analytics 7.Wrap-Up in YAGO in NELL in Google KB Alignment Linked Data 1

Goal: Combine several extractors 2 Extractor TextTables Extractor Schriftstück is(Elvis, alive) is(Elvis, dead) is(Elvis, alive) is(Elvis, alive) ?

YAGO combines 170 extractors 3 JimGray bornIn "January 12, 1944" JimGray bornIn SanFrancisco … JimGray bornIn SanFrancisco Infobox Extractor TypeChecker MultilingualMerger

YAGO combines 170 extractors 4 [Mahdisoltani CIDR 2015] Type checking Type coherence checking Translation Learning of foreign language attributes Deduplication Horn rule inference Functional constraint checking (simple preference over sources) => 10 languages, precision of 95%

Outline 1.Introduction 2.Harvesting Classes 3.Harvesting Facts 4.Common Sense Knowledge 5.Knowledge Consolidation 6.Web Content Analytics 7.Wrap-Up in YAGO √ in NELL in Google KB Alignment Linked Data 5

NELL couples different learners Natural Language Pattern Extractor Table Extractor Mutual exclusion Type Check Krzewski coaches the Blue Devils. Krzewski Blue Angels Miller Red Angels sports coach != scientist If I coach, am I a coach? Initial Ontology [Carlson et al and follow-ups] 6

NELL couples different learners Natural Language Pattern Extractor Table Extractor Mutual exclusion Type Check Krzewski coaches the Blue Devils. Krzewski Blue Angels Miller Red Angels sports coach != scientist If I coach, am I a coach? Initial Ontology [Carlson et al and follow-ups] 7 Different learners benefit from each other: table extraction text extraction path ranking (rule learning) morphological features ("…ism" is something abstract) active learning (ask for answers in online forums) learning from images learn from several languages (?)

Estimating Accuracy from Unlabeled Data [Platanios, Blum, Mitchell, UAI‘14] Given: Extractors f 1,…,f n Find: error probability e i of each extractor a ij = P x (f i (x)=f j (x)) = P(both make error) + P(neither makes error) =1 – e i – e j – 2*e ij Probability of a simultaneous error Case 1: Independent errors & acc. > 0.5 then a ij =1 – e i – e j – 2*e i *e j Problem reduced to solving a system of N*(N-1)/2 equations with N unknown values Solvable if N ≥ 3 Agreement (known!) 8

Estimating Accuracy from Unlabeled Data [Platanios, Blum, Mitchell, UAI‘14] Given: Extractors f 1,…,f n Find: error probability e i of each extractor a ij = P x (f i (x)=f j (x)) = P(both make error) + P(neither makes error) =1 – e i – e j – 2*e ij Probability of a simultaneous error Agreement (known!) Case 2: not independent errors Idea: minimize e ij -e i *e j, i.e., find independent classifiers 9

Outline 1.Introduction 2.Harvesting Classes 3.Harvesting Facts 4.Common Sense Knowledge 5.Knowledge Consolidation 6.Web Content Analytics 7.Wrap-Up in YAGO √ in NELL √ in Google KB Alignment Linked Data 10

Google Knowledge Vault Given: Freebase, a relation r, extractors e 1,…,e n Train a classifier that, given the confidences of the extractors, tells us whether the extracted statement is true. KB Fusion facts Extractor facts facts facts Txt DOM Tables ANO Extractor schema.org [Dong et al.: KDD2014] 11 RDFa> Path Ranking

RDFa Annotations My name is Elvis. BrowserRDFa analyzer 30% of Web pages are annotated this way schema.org is a common vocabulary designed by Google, Microsoft, Yandex, and others for this purpose [Guha: "Schema.org", keynote at AKBC 2014] 12

Trustworthiness of Web Sources [Dong et al.: VLDB2015] Tail sources with high trustworthiness Knowledge Base Trust Page Rank Many Gossip Web sites Page Rank and Trustworthiness are not always correlated! 13

Outline 1.Introduction 2.Harvesting Classes 3.Harvesting Facts 4.Common Sense Knowledge 5.Knowledge Consolidation 6.Web Content Analytics 7.Wrap-Up in YAGO √ in NELL √ in Google √ KB Alignment Linked Data 14

Knowledge bases are complementary 15

No Links  No Use Who is the spouse of the guitar player? 16

Linking Records vs. Linking Knowledge Susan B. Davidson Peter Buneman University of Pennsylvania Yi Chen recordKB / Ontology university Differences between DB records and KB entities: Links have rich semantics (e.g. subclassOf) KBs have only binary predicates KBs have no schema Match not just entities, but also classes & predicates (relations) 17

Similarity Flooding matches entities at scale Build a graph: nodes: pairs of entities, weighted with similarity edges: weighted with degree of relatedness similarity: 0.9 similarity: 0.7 relatedness 0.8 Iterate until convergence: similarity := weighted sum of neighbor similarities similarity: 0.8 many variants (belief propagation, label propagation, etc.) 18

Some neighborhoods are more indicative 1935 sameAs sameAs ? sameAs Many people born in 1935  not indicative Few people married to Priscilla  highly indicative 19

Inverse functionality as indicativeness [Suchanek et al.: VLDB’12] 1935 sameAs sameAs ? 20 sameAs

Match entities, classes and relations sameAs subPropertyOf 21

Match entities, classes and relations sameAs subPropertyOf 22

Match entities, classes and relations sameAs subPropertyOf 23

Match entities, classes and relations sameAs subPropertyOf 24 subClassOf PARIS matches YAGO and DBpedia time: 1:30 hours precision for instances: 90% precision for classes: 74% precision for relations: 96% [Suchanek et al.: VLDB’12]

Many challenges remain Entity linkage is at the heart of semantic data integration. More than 50 years of research, still some way to go! Benchmarks: OAEI Ontology Alignment & Instance Matching: oaei.ontologymatching.orgoaei.ontologymatching.org TAC KBP Entity Linking: TREC Knowledge Base Acceleration: trec-kba.orgtrec-kba.org Highly related entities with ambiguous names George W. Bush (jun.) vs. George H.W. Bush (sen.) Long-tail entities with sparse context Records with complex DB / XML / OWL schemas Ontologies with non-isomorphic structures 25 LOD>

Outline 1.Introduction 2.Harvesting Classes 3.Harvesting Facts 4.Common Sense Knowledge 5.Knowledge Consolidation 6.Web Content Analytics 7.Wrap-Up in YAGO √ in NELL √ in Google √ KB Alignment √ Linked Data Warning: Numbers mentioned here are not authoritative, because (1) they are based on incomplete crawls or (2) they may be outdated. See the respective sources for details. 26

Linked Open Data Cloud user- generated: 51 media: 24 publications: 138 life-sciences: 85 social-networking: 520 government: 183 KBs geographic: 27 cross-domain: 47 linguists 30 Bio. triples 500 Mio. links April 2011 From 2011 to 2014, the number of KBs tripled from 297 to [Schmachtenberg et al.: ICSW2014] 27

Links between KBs #links as of April 2014: unknown, crawled only sample #links as of April 2011: 500 mio #links “sameAs” at sameAs.org: 150 mio 44% of KBs are not linked at all Top Linking Predicates owl:sameAs rdfs:seeAlso dct:source dct:language dct:creator skos:exactMatch skos:closeMatch geographic [Schmachtenberg et al.: ICSW2014] KB 1 KB 2 owl:sameAs Watch out: “sameAs” has developed 5 meanings: Identical to Same in different context Same but referentially opaque Represents Very similar to [Halpin & Hayes: “When owl:sameAs isn’t the Same”, LDOW, 2010] 28 LOD>

Dereferencing URIs fullpartialnone 19%9%72% Dereferencability of schemas [Schmachtenberg et al.: ICSW2014] [Hogan et al: “Weaving the Pedantic Web”, LDOW, 2010] In a crawl of 1.6m dereferenceable y: y:Elvis rdf:type y:livingPerson y:Elvis y:wasBornIn y:USA … 29 LOD>

Publish the Rubbish keynote] 30 LOD>

Vocabularies (% of KBs) Term% KBsTerm% KBs rdfs:range10%rdfs:seeAlso2% rdfs:subClassOf9%owl:equivalentClass2% rdfs:subPropertyOf7%owl:inverseOf1% rdfs:domain6%swivt:type1% rdfs:isDefinedBy4%owl:equivalentProperty1% Vocabulary FOAF27%69% Dublin Core31%56% Larger adoption of standard vocabularies Usage of standard vocabularies [Schmachtenberg et al.: ICSW2014] 31

Open Problems and Grand Challenges Automatic and continuously maintained sameAs links for Web of Linked Data with high accuracy & coverage Distilling out the high quality pieces of information Web-scale, robust Entity Linking with high quality Handle huge amounts of linked-data sources, Web tables, … 32