Saarbrucken / Germany ¨

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 1 SOFIE: A Self-Organizing Framework for Information Extraction Fabian.
Fabian M. SuchanekYAGO - A Core of Semantic Knowledge 1 YAGO – A Core of Semantic Knowledge Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum (Max-Planck.
Fabian M. SuchanekYAGO - A Core of Semantic Knowledge 1 YAGO – A Core of Semantic Knowledge Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum (Max-Planck.
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
On-line Compilation of Comparable Corpora and Their Evaluation Radu ION, Dan TUFIŞ, Tiberiu BOROŞ, Alexandru CEAUŞU and Dan ŞTEFĂNESCU Research Institute.
YAGO: A Large Ontology from Wikipedia and WordNet Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum Max-Planck-Institute for Computer Science, Saarbruecken,
SWIMs: From Structured Summaries to Integrated Knowledge Base
So What Does it All Mean? Geospatial Semantics and Ontologies Dr Kristin Stock.
Leveraging Community-built Knowledge For Type Coercion In Question Answering Aditya Kalyanpur, J William Murdock, James Fan and Chris Welty Mehdi AllahyariSpring.
Graph Data Management Lab, School of Computer Science Put conference information here.
The Web of data with meaning... By Michael Griffiths.
Ontology Notes are from:
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
Database and Information- Retrieval Methods for Knowledge Discovery Database and Information- Retrieval Methods for Knowledge Discovery Gerhard Weikum,
Overall Information Extraction vs. Annotating the Data Conference proceedings by O. Etzioni, Washington U, Seattle; S. Handschuh, Uni Krlsruhe.
Methods for Domain-Independent Information Extraction from the Web An Experimental Comparison Oren Etzioni et al. Prepared by Ang Sun
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Classifications, Taxonomies, Ontologies, Thesauri The following three terms: classifications, taxonomies and ontologies are often confused. This is caused.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
1 The BT Digital Library A case study in intelligent content management Paul Warren
YAGO:A LARGE ONTOLOGY FROM WIKIPEDIA AND WORDNET FABIAN M. SUCHANEK, GJERGJI KASNECI, GERHARD WEIKUM Subbalakshmi Iyer.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Entity Recognition via Querying DBpedia ElShaimaa Ali.
DBrev: Dreaming of a Database Revolution Gjergji Kasneci, Jurgen Van Gael, Thore Graepel Microsoft Research Cambridge, UK.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.
Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference.
© Paul Buitelaar – November 2007, Busan, South-Korea Evaluating Ontology Search Towards Benchmarking in Ontology Search Paul Buitelaar, Thomas.
Tommie Curtis SAIC January 17, 2000 Open Forum on Metadata Registries Santa Fe, NM SDC JE-2023.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Developing “Geo” Ontology Layers for Web Query Faculty of Design & Technology Conference David George, Department of Computing.
Information Extraction Lecture 8 – Ontological and Open IE CIS, LMU München Winter Semester Dr. Alexander Fraser.
Information Extraction Lecture 8 – Ontological and Open IE Dr. Alexander Fraser, U. Munich September 10th, 2014 ISSALE: University of Colombo School of.
Advanced topics in software engineering (Semantic web)
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 1 LEILA – Learning to Extract Information by Linguistic Analysis presented.
Understanding User’s Query Intent with Wikipedia G 여 승 후.
C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual.
Ontology Mapping in Pervasive Computing Environment C.Y. Kong, C.L. Wang, F.C.M. Lau The University of Hong Kong.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.
1 NAGA: Searching and Ranking Knowledge Gjergji Kasneci Joint work with: Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum.
Knowledge Representation. Keywordsquick way for agents to locate potentially useful information Thesaurimore structured approach than keywords, arranging.
Tutorial: Knowledge Bases for Web Content Analytics
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
The Semantic Web (Slides by Fabian M. Suchanek). Motivation scientists from Brisbane Australia's scientists visit Brisbane The National Science Education.
Gaby Nativ, SDBI  Motivation  Other Ontologies  System overview  YAGO Dive IN  LEILA  NAGA  Conclusion.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Of 24 lecture 11: ontology – mediation, merging & aligning.
Johannes Hoffart, Fabian Suchanek Klaus BerberichBy Gerhard Weikum Madhav Charan.
Ontology Evaluation Outline Motivation Evaluation Criteria Evaluation Measures Evaluation Approaches.
Knowledge Representation Part I Ontology Jan Pettersen Nytun Knowledge Representation Part I, JPN, UiA1.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Extracting Semantic Concept Relations
DBpedia 2014 Liang Zheng 9.22.
Yago Type Heuristics 丁基伟.
Presentation transcript:

Saarbrucken / Germany ¨ YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia 16th international World Wide Web conference (WWW 2007) Fabian M. Suchanek Gjergji Kasneci Gerhard Weikum Max-Planck-Institut Saarbrucken / Germany ¨ suchanekaOmpii.mpg.de kasneciaOmpii.mpg.de weikumaOmpii.mpg.de

What is YAGO? A light-weight, extensible, high coverage and quality ontology. 1 million entities and 5 million facts. Wikipedia unified with WordNet, design combination of rule-based methods in this paper.

Motivation Which other singers were born when Elvis was born? Google Search I'm Feeling Lucky Elvis Presley - Wikipedia, the free encyclopedia Elvis Presley was born on January 8, 1935 at around 4:35 a.m. in a two-room ... Other singers had been doing this for generations, but they were black. ... en.wikipedia.org/wiki/Elvis_Presley Cached Similar pages YAGO - A Core of Semantic Knowledge

Motivation Many applications utilize ontological knowledge. Existing applications use only a single source. They could boost their performance, if a huge ontology with knowledge from several sources was available. Such an ontology would have to be of high quality, with accuracy close to 100 percent. Has to be extensible, easily re-usable, and application-independent.

Various Approaches to Problem Assemble the ontology manually. Example: Wordnet, SUMO, GeneOntology Problem: Usually low coverage of result Extract the ontology from corpora (eg.The web) Example: KnowItAll, Espresso, Snowball, LEILA Problem: Usually low accuracy (50%-92%)

The YAGO Model Extension of RDFS. Unification of Wikipedia & WordNet. Make use of rich structures and information, such as: Infoboxes, Category Pages, etc.

Wikipedia Model Utilizes Wikipedia’s category pages. Category pages are lists of articles that belong to a specific category Zidane is in the category of French football players These lists give us candidates for entities and candidates for relations isCitizenOf(Zidane, France))

WordNet A lexical database for the English language. Groups English words into sets of synonyms called synsets. thesaurus that is more intuitively usable supports automatic text analysis and artificial intelligence applications.

Why Unify? WordNet, in contrast, provides a clean and carefully assembled hierarchy of thousands of concepts. YAGO ontology: Vast amount of individuals known to Wikipedia + clean taxonomy of concepts from WordNet.

The YAGO Model Knowledge represented as a set of concepts and relationships. All objects (e.g. cities, people, even URLs) are represented as entities in the YAGO model. Entities: AlbertEinstein, NobelPrize Relations: hasWonPrize Facts: AlbertEinstein hasWonPrize NobelPrize Properties

The YAGO Model How to express that a certain word refers to a certain entity ? words are entities as well: ”Einstein” means AlbertEinstein Similar entities are grouped into classes AlbertEinstein type physicist

The YAGO Model Classes are also entities. physicist subClassOf scientist How to represent properties of relations (like transitivity)? Relations are entities as well. subclassOf type transitiveRelation

The YAGO Model (n-ary) Relations Relational database setting: won-prize-in-year(Einstein, Nobel-Prize, 1921) Disadvantage: Space will be wasted if not all arguments of the n-ary facts are known. YAGO Solution Fact1 #1 : AlbertEinstein hasWonPrize NobelPrize Fact2 #2 : #1 time 1921 Fact3 #1 foundIn http : //www.wikipedia.org/Einstein

Semantics Any YAGO ontology must contain at least: Minimal set of common entities: C = entity, class, relation, acyclicTransitiveRelation Minimal set of relation names: R = type, subClassOf, domain, range, subRelationOf

Semantics

Semantics

Properties Property 1: Given a set of facts F E f, the largest set S with F->*S is unique. A base of a YAGO ontology y is any equivalent YAGO ontology b with b ⊆ y. A canonical base of y is a base so that there exists no other base with less elements. Property 2: [Uniqueness of the Canonical Base] The canonical base of a consistent YAGO ontology is unique.

Stand Out Property RDFS does not have a built-in transitive relation. OWL : The only concept that does not have an exact built-in counterpart is the acyclicTransitiveRelation.

THE YAGO SYSTEM

Classes and Categories To establish for each individual its class, we exploit the category system of Wikipedia. This well-defined taxonomy of synsets is used WordNet to establish the hierarchy of classes.

Integrating Wikipedia Categories There are different types of categories: Exploit relational categories Exploit conceptual categories Avoid administrational categories Disputed_article American_singer is a is a born 1935 YAGO - A Core of Semantic Knowledge

Integrating Wordnet Synsets Each synset of WordNet becomes a class of YAGO. To be on the safe side, we always give preference to WordNet and discard the Wikipedia individual in case of a conflict. hypernymy/hyponymy the relation b/w a sub-concept and a super-concept holonymy/meronymy : The relation between a part and the whole

Establishing Relations SubClassOf: hyponymy relation from WordNet: A class is a subclass of another one, if the first synset is a hyponym of the second. Means: Wikipedia, WordNet both yield information on meaning. WordNet Synsets: ”urban center” and ”metropolis” both belong to the synset “city”.

Establishing Relations More Relational Wikipedia categories extraction: bornInYear, diedInYear, establishedIn, locatedIn, writtenInYear, politicianOf, and hasWonPrize. category “1879 births” -> individual is born in 1879. category 1980 establishments-> organization was established in 1980

Albert Einstein is linked to Relativity Theory. Meta-relations Descriptions and Witnesses: We also store meta-relations uniformly together with usual relations. Fact3 #1 foundIn http : //www.wikipedia.org/Einstein Context: We store for each individual the individuals it is linked to in the corresponding Wikipedia page. Albert Einstein is linked to Relativity Theory.

Person#3 subclass Singer#1 means subclass "singer" American_singer is a born 1935 "Elvis Presley" means

Query An interface to query YAGO in a SPARQL like fashion

Evaluation and Experimentation No computer-processable ground truth of suitable extent, we had to rely on manual evaluation. We presented randomly selected facts of the ontology to human judges and asked them to assess whether the facts were correct. Also, it would be pointless to evaluate the non-heuristic relations in YAGO, such as describes, means, or context. This is why we evaluated only those facts that constitute potentially weak points in the ontology.

Evaluation and Experimentation

Conclusion Main features of YAGO, and its contributions: High coverage and high quality ontology. Integration of two largest ontologies Wikipedia, and WordNet. Usage of structured information such as Infoboxes, Wikipedia Categories, WordNet Synsets. Expression of acyclic transitive relations. Type checking, ensuring that only plausible facts are contained.

Thank you