Information Extraction Lecture 8 – Ontological and Open IE CIS, LMU München Winter Semester 2013-2014 Dr. Alexander Fraser.

Slides:



Advertisements
Similar presentations
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 1 SOFIE: A Self-Organizing Framework for Information Extraction Fabian.
Advertisements

Fabian M. SuchanekYAGO - A Core of Semantic Knowledge 1 YAGO – A Core of Semantic Knowledge Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum (Max-Planck.
Fabian M. SuchanekYAGO - A Core of Semantic Knowledge 1 YAGO – A Core of Semantic Knowledge Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum (Max-Planck.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Information Extraction Lecture 7 – Linear Models (Basic Machine Learning) CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
Information Extraction Lecture 3 – Rule-based Named Entity Recognition CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
YAGO: A Large Ontology from Wikipedia and WordNet Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum Max-Planck-Institute for Computer Science, Saarbruecken,
Information Extraction Lecture 4 – Named Entity Recognition II CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
Modifying existing content Adding/Removing content on a page using jQuery.
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
Information Extraction Lecture 9 – Multilingual Extraction CIS, LMU München Winter Semester Dr. Alexander Fraser.
Information Extraction session in the course “Web Search” at the École nationale supérieure des télécommunications in Paris/France in spring 2011 by Fabian.
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Intelligent Information Retrieval CS 336 –Lecture 2: Query Language Xiaoyan Li Spring 2006 Modified from Lisa Ballesteros’s slides.
Database and Information- Retrieval Methods for Knowledge Discovery Database and Information- Retrieval Methods for Knowledge Discovery Gerhard Weikum,
Open Information Extraction From The Web Rani Qumsiyeh.
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
Modules, Hierarchy Charts, and Documentation
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Saarbrucken / Germany ¨
Outline 1.Introduction 2.Harvesting Classes 3.Harvesting Facts 4.Common Sense Knowledge 5.Knowledge Consolidation 6.Web Content Analytics 7.Wrap-Up in.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
PREPARING FOR THE OSSLT Thursday, March 27 Glenforest S. S. (
Information Extraction Referatsthemen CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Going Beyond Simple Question Answering Bahareh Sarrafzadeh CS 886 – Spring 2015.
YAGO:A LARGE ONTOLOGY FROM WIKIPEDIA AND WORDNET FABIAN M. SUCHANEK, GJERGJI KASNECI, GERHARD WEIKUM Subbalakshmi Iyer.
Entity Recognition via Querying DBpedia ElShaimaa Ali.
Artificial intelligence project
Information Extraction UCAD, ESP, UGB Senegal, Winter 2011 by Fabian M. SuchanekFabian M. Suchanek This document is available under a Creative Commons.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Department of Chemical Engineering Project IV Lecture 3: Literature Review.
Ontology-Based Information Extraction: Current Approaches.
NLP And The Semantic Web Dainis Kiusals COMS E6125 Spring 2010.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Structured Querying of Web Text A Technical Challenge Kulsawasd Jitkajornwanich University of Texas at Arlington CSE6339 Web Mining.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
Information Extraction Lecture 8 – Ontological and Open IE Dr. Alexander Fraser, U. Munich September 10th, 2014 ISSALE: University of Colombo School of.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 1 LEILA – Learning to Extract Information by Linguistic Analysis presented.
Automatic Set Instance Extraction using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University Pittsburgh,
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
CSC USI Class Meeting 10 November 9, 2010.
Information Extraction Lecture 4 – Named Entity Recognition II CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
Information Extraction Lecture 3 – Rule-based Named Entity Recognition Dr. Alexander Fraser, U. Munich September 3rd, 2014 ISSALE: University of Colombo.
Tutorial: Knowledge Bases for Web Content Analytics
Information Extraction Lecture 3 – Rule-based Named Entity Recognition CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
Gaby Nativ, SDBI  Motivation  Other Ontologies  System overview  YAGO Dive IN  LEILA  NAGA  Conclusion.
Information Extraction Lecture 10 – Ontological and Open IE
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Johannes Hoffart, Fabian Suchanek Klaus BerberichBy Gerhard Weikum Madhav Charan.
Institute of Informatics & Telecommunications NCSR “Demokritos” Spidering Tool, Corpus collection Vangelis Karkaletsis, Kostas Stamatakis, Dimitra Farmakiotou.
Einat Minkov University of Haifa, Israel CL course, U
An Introduction to Markov Logic Networks in Knowledge Bases
NELL Knowledge Base of Verbs
Information Extraction Lecture 3 – Rule-based Named Entity Recognition
Information Extraction from Wikipedia: Moving Down the Long Tail
ece 720 intelligent web: ontology and beyond
Yet Another Great Ontology
Extracting Semantic Concept Relations
Introduction Task: extracting relational facts from text
DBpedia 2014 Liang Zheng 9.22.
ProBase: common Sense Concept KB and Short Text Understanding
Representations & Reasoning Systems (RRS) (2.2)
Presentation transcript:

Information Extraction Lecture 8 – Ontological and Open IE CIS, LMU München Winter Semester Dr. Alexander Fraser

Administravia Today: 45 minute lecture + one two- person Referat Next week: the same Review session next week on Tuesday at 18:00, check web page for location Klausur the following week! Nachholklausur – probably April 2nd (week before Vorlesungsbeginn - check web page!) 2

Ontological IE In the last lecture, we discussed how to extract relations from text We looked in detail at relations expressed in a single sentence Event extraction captures relations which can be expressed at the document level (i.e., in multiple sentences) Consider the CMU Seminar task – the task is to extract events (seminars), with their date, location and title Sometimes referred to as "Fact Extraction" Today we will discuss updating a knowledge base with the extracted relations, facts or events This is called "Ontological IE" 3

Ontologies 4 An ontology is a consistent knowledge base without redundancy EntityRelationEntity Angela MerkelcitizenOfGermany PersonNationality Angela MerkelGerman MerkelGermany A.MerkelFrench  Every entity appears only with exactly the same name There are no semantic contradictions Slide from Suchanek

Ontological IE PersonNationality Angela MerkelGerman MerkelGermany A.MerkelFrench Angela Merkel is the German chancellor Merkel was born in Germany......A. Merkel has French nationality... 5 Ontological Information Extraction (IE) aims to create or extend an ontology. EntityRelationEntity Angela MerkelcitizenOfGermany Slide from Suchanek

Ontological IE Challenges 6 Challenge 1: Map names to names that are already known EntityRelationEntity Angela MerkelcitizenOfGermany A. Merkel Angie Merkel Slide from Suchanek

Ontological IE Challenges 7 Challenge 2: Be sure to map the names to the right known names EntityRelationEntity Angela MerkelcitizenOfGermany Una MerkelcitizenOfUSA ? Merkel is great! Slide from Suchanek

Ontological IE Challenges 8 Challenge 3: Map to known relationships EntityRelationEntity Angela MerkelcitizenOfGermany … has nationality … … has citizenship … … is citizen of … Slide from Suchanek

Ontological IE Challenges 9 Challenge 4: Take care of consistency EntityRelationEntity Angela MerkelcitizenOfGermany Angela Merkel is French…  Slide from Suchanek

Triples 10 EntityRelationEntity Angela MerkelcitizenOfGermany A triple (in the sense of ontologies) is a tuple of an entity, a relation name and another entity: Most ontological IE approaches produce triples as output. This decreases the variance in schema. PersonCountry AngelaGermany PersonBirthdateCountry Angela1980Germany CitizenNationality AngelaGermany Slide from Suchanek

Triples 11 EntityRelationEntity Angela MerkelcitizenOfGermany A triple can be represented in multiple forms: citizenOf = = Slide from Suchanek

YAGO Example: Elvis in YAGOElvis in YAGO 12 Slide from Suchanek

Wikipedia Why is Wikipedia good for information extraction? It is a huge, but homogenous resource (more homogenous than the Web) It is considered authoritative (more authoritative than a random Web page) It is well-structured with infoboxes and categories It provides a wealth of meta information (inter article links, inter language links, user discussion,...) 13 Wikipedia is a free online encyclopedia 3.4 million articles in English 16 million articles in dozens of languages Slide from Suchanek

Ontological IE from Wikipedia Wikipedia is a free online encyclopedia 3.4 million articles in English 16 million articles in dozens of languages Every article is (should be) unique => We get a set of unique entities that cover numerous areas of interest Angela_Merkel Una_Merkel Germany Theory_of_Relativity 14 Slide from Suchanek

Wikipedia Source Example: Elvis on WikipediaElvis on Wikipedia |Birth_name = Elvis Aaron Presley |Born = {{Birth date|1935|1|8}} [[Tupelo, Mississippi|Tupelo]] 15 Slide from Suchanek

IE from Wikipedia 1935 born Elvis Presley Blah blah blub fasel (do not read this, better listen to the talk) blah blah Elvis blub (you are still reading this) blah Elvis blah blub later became astronaut blah ~Infobox~ Born: Exploit Infoboxes Categories: Rock singers 16 bornOnDate = 1935 (hello regexes!) Slide from Suchanek

IE from Wikipedia Rock Singer type Exploit conceptual categories 1935 born Elvis Presley Blah blah blub fasel (do not read this, better listen to the talk) blah blah Elvis blub (you are still reading this) blah Elvis blah blub later became astronaut blah ~Infobox~ Born: Exploit Infoboxes Categories: Rock singers 17 Slide from Suchanek

IE from Wikipedia Rock Singer type Exploit conceptual categories 1935 born Singer subclassOf Person subclassOf Singer subclassOf Person Elvis Presley Blah blah blub fasel (do not read this, better listen to the talk) blah blah Elvis blub (you are still reading this) blah Elvis blah blub later became astronaut blah ~Infobox~ Born: Exploit Infoboxes WordNet Categories: Rock singers 18 Every singer is a person Slide from Suchanek

19 Consistency Checks Rock Singer type Check uniqueness of functional arguments 1935 born Singer subclassOf Person subclassOf 1977 diedIn Place GuitaristGuitar Check domains and ranges of relations Check type coherence Slide from Suchanek

Ontological IE from Wikipedia YAGO 3m entities, 28m facts focus on precision 95% (automatic checking of facts) DBpedia 3.4m entities 1b facts (also from non-English Wikipedia) large community Community project on top of Wikipedia (bought by Google, but still open) 20 Slide from Suchanek

born Recap: The challenges: deliver canonic relations deliver canonic entities deliver consistent facts died in, was killed in Elvis, Elvis Presley, The King born (Elvis, 1970) born (Elvis, 1935) Ontological IE by Reasoning Idea: These problems are interleaved, solve all of them together. Elvis was born in 1935 Slide from Suchanek

Ontology Documents Elvis was born in 1935 Consistency Rules birthdate<deathdate type(Elvis_Presley,singer) subclassof(singer,person)... appears(“Elvis”,”was born in”, ”1935”)... means(“Elvis”,Elvis_Presley,0.8) means(“Elvis”,Elvis_Costello,0.2)... born(X,Y) & died(X,Z) => Y<Z appears(A,P,B) & R(A,B) => expresses(P,R) appears(A,P,B) & expresses(P,R) => R(A,B)... First Order Logic 1935 born Using Reasoning SOFIE system Slide from Suchanek

Ontological IE by Reasoning Reasoning-based approaches use logical rules to extract knowledge from natural language documents. Current approaches use either Weighted MAX SAT or Datalog or Markov Logic Input: often an ontology manually designed rules Condition: homogeneous corpus helps 23 Slide from Suchanek

Ontological IE Summary Current hot approaches: extraction from Wikipedia reasoning-based approaches nationality Ontological Information Extraction (IE) tries to create or extend an ontology through information extraction. 24 Slide from Suchanek

Open Information Extraction Open Information Extraction/Machine Reading aims at information extraction from the entire Web. Vision of Open Information Extraction: the system runs perpetually, constantly gathering new information the system creates meaning on its own from the gathered data the system learns and becomes more intelligent, i.e. better at gathering information 25 Slide from Suchanek

Open Information Extraction Open Information Extraction/Machine Reading aims at information extraction from the entire Web. Rationale for Open Information Extraction: We do not need to care for every single sentence, but just for the ones we understand The size of the Web generates redundancy The size of the Web can generate synergies 26

KnowItAll &Co KnowItAll, KnowItNow and TextRunner are projects at the University of Washington (in Seattle, WA). SubjectVerbObjectCount Egyptiansbuiltpyramids400 Americansbuiltpyramids20... Valuable common sense knowledge (if filtered) 27 Slide from Suchanek

KnowItAll &Co 28 Slide from Suchanek

Read the Web “Read the Web” is a project at the Carnegie Mellon University in Pittsburgh, PA. Natural Language Pattern Extractor Table Extractor Mutual exclusion Type Check Krzewski coaches the Blue Devils. Krzewski Blue Angels Miller Red Angels sports coach != scientist If I coach, am I a coach? Initial Ontology 29 Slide from Suchanek

Open IE: Read the Web 30 Slide from Suchanek

Open Information Extraction Open Information Extraction/Machine Reading aims at information extraction from the entire Web. Main hot projects TextRunner (University of Washington) Read the Web (Carnegie Mellon) Prospera/SOFIE (Max-Planck Informatics Saarbrücken) Input The Web Read the Web: Manual rules Read the Web: initial ontology Conditions none 31 Slide modified from Suchanek

Slide sources – The slides today came from the Information Extraction course of Fabian Suchanek (Télécom ParisTech) 32

Thank you for your attention! 33