Harith Alani, Sanghee Kim, David Millard, Mark Weal, Paul Lewis, Wendy Hall, Nigel Shadbolt Using Protégé for Automatic Ontology Instantiation 7 th International.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal.
Hermes: News Personalization Using Semantic Web Technologies
SPICE! An Ontology Based Web Application By Angela Maduko and Felicia Jones Final Presentation For CSCI8350: Enterprise Integration.
References Kempen, Gerard & Harbusch, Karin (2002). Performance Grammar: A declarative definition. In: Nijholt, Anton, Theune, Mariët & Hondorp, Hendri.
March 17, 2008SAC WT Hermes: a Semantic Web-Based News Decision Support System* Flavius Frasincar Erasmus University Rotterdam.
Mark Weal Auld Linky and the AKT - EQUATOR Bridge.
Overall Information Extraction vs. Annotating the Data Conference proceedings by O. Etzioni, Washington U, Seattle; S. Handschuh, Uni Krlsruhe.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Information Extraction and Ontology Learning Guided by Web Directory Authors:Martin Kavalec Vojtěch Svátek Presenter: Mark Vickers.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Web Crawler Agent (WCA) Presented by Kirk Martinez University of Southampton.
Danius T. Michaelides, David E. Millard, Mark J. Weal, David De Roure Auld Leaky: A Contextual Open Hypermedia Link Server.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Information Extraction from HTML: General Machine Learning Approach Using SRV.
Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,
Information Extraction from Documents for Automating Softwre Testing by Patricia Lutsky Presented by Ramiro Lopez.
Systems Analysis I Data Flow Diagrams
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Page 1 ISMT E-120 Introduction to Microsoft Access & Relational Databases The Influence of Software and Hardware Technologies on Business Productivity.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
CS 586 – Distributed Multimedia Information Management Prof. Dennis McLeod.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Logic Programming for Natural Language Processing Menyoung Lee TJHSST Computer Systems Lab Mentor: Matt Parker Analytic Services, Inc.
TDT 4242 Inah Omoronyia and Tor Stålhane Guided Natural Language and Requirement Boilerplates TDT 4242 Institutt for datateknikk og informasjonsvitenskap.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Copyright © 2013 Curt Hill The Zachman Framework What is it all about?
Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
A Language Independent Method for Question Classification COLING 2004.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
SupervisorStudent Prof. Atilla ElciHussam Hussein ABUAZAB June 2007 Using ORACLE XML Parser to Access Ontology CMPE 588 Engineering Semantic for.
© Copyright 2013 STI INNSBRUCK “How to put an annotation in HTML?” Ioannis Stavrakantonakis.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
Towards the Semantic Web 6 Generating Ontologies for the Semantic Web: OntoBuilder R.H.P. Engles and T.Ch.Lech 이 은 정
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
OWL Representing Information Using the Web Ontology Language.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Topic Maps introduction Peter-Paul Kruijsen CTO, Morpheus software ISOC seminar, april 5 th 2005.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Winter 2011SEG Chapter 11 Chapter 1 (Part 1) Review from previous courses Subject 1: The Software Development Process.
An Ontological Approach to Financial Analysis and Monitoring.
Adaptive Faceted Browsing in Job Offers Danielle H. Lee
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Supporting Collaborative Ontology Development in Protégé International Semantic Web Conference 2008 Tania Tudorache, Natalya F. Noy, Mark A. Musen Stanford.
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
General Architecture of Retrieval Systems 1Adrienn Skrop.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Harnessing the Deep Web : Present and Future -Tushar Mhaskar Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy January 7,
Dr. Barry Norton, Development Manager, ResearchSpace*
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Big Data Quality the next semantic challenge
Searching and browsing through fragments of TED Talks
Semantic Markup for Semantic Web Tools:
Probabilistic Databases
Linked Data Reuse in the Language Services Industry
A framework for ontology Learning FROM Big Data
Presentation transcript:

Harith Alani, Sanghee Kim, David Millard, Mark Weal, Paul Lewis, Wendy Hall, Nigel Shadbolt Using Protégé for Automatic Ontology Instantiation 7 th International Protégé Conference

ArtEquAKT Aims : –Use NLT to automatically extract relevant information about the life and work of artists from online documents –Feed this information automatically to an ontology designed for this domain –Generate stories by extracting and structuring information from the knowledge base in the form of biographical narratives

Motivation The knowledge is out there! –Available on the web, buried in text documents, not understood by machines! Semantic annotation might help –Annotations are rare –In the near future, annotations will probably not be rich or detailed enough to support the capture of extended amounts of content Knowledge extraction –There will always be a need for tools that can locate and extract specific types of knowledge, and store it in a KB for further inference and use

Architecture

ArtEquAKT Ontology Based on the Conceptual Reference Model (CRM) ontology Developed by CIDOC and promoted as an ISO standard CRM models the concepts and relationships used in cultural heritage documentation CRM is extended in ArtEquAKT to cover the life and work of artists

User Interface

Search and Filter Documents Documents are selected following these steps: 1.Query search engine (Google) with the given artist name 1.Calculate the similarity of the returned documents to some example documents about artists 1.Apply some heuristics (e.g. minimum paragraph length) to filter out documents containing mainly tables or hyperlinks 1.Send the remaining documents to the information extraction process

Knowledge Extraction Component

Knowledge Extraction Process Rembrandt Harmenszoon van Rijn was born on July 15, 1606, in Leiden, the Netherlands. Date Place Person date_of_birth place_of_birth Artequakt Ontology Rembrandt GATE person July 15, 1606 date Netherlandslocation Leiden WordNet city Netherlandscountry Apple Pie Parser Rembrandtnoun wasverb Subject:Rembrandt Harmensoon van Rijn Verb:born Object:on July 15, 1606 … Netherlands Tense:past Syntactic Analysis Input HTML A r t e q u a k t K n o w l e d g e E x t r a c t i o n T o o l Relation Formulation

Send the identified triples to the ontology server: 1. Person_1 Rembrandt … 2. Person_1 15 July Person_1 Leiden Extraction Output <kb:Person rdf:about="&kb;Person_1" kb:name=“Rembrandt Harmenszoon van Rijn" rdfs:label="Person_1"> <kb:Date rdf:about="&kb;Date_1" kb:day=“15" kb:month=“7" kb:year="1606" rdfs:label="Date_1"> <kb:Place rdf:about="&kb;Place_1" kb:name=“Leiden" rdfs:label="Place_1"/> “Rembrandt Harmenszoon van Rijn was born on July 15, 1606, in Leiden, the Netherlands” extracted triples name date_of_birth place_of_birth RDF add to KB

Knowledge Management Component

Knowledge Management Process Provide guidance to the extraction process Receives extracted knowledge in RDF format Instantiate the ontology with the given knowledge triples (add to the KB) Consolidation the knowledge Verify inconsistencies Ontology server providing a set of inference queries

Knowledge Consolidation

Types of Duplication Rembrandt Leyden1606 Rembrandt Leiden15 July 1606 duplicate attribute values Rembrandt van Rijn Leiden1606 Rembrandt Leiden1606 duplicate instances of the same artist Rembrandt Leyden1606Leiden15 July 1606 duplicate instances and attribute values Rembrandt van Rijn Leiden 15 July 1606 Leyden dob pob synonym

Unique Name Assumption –e.g all “Rembrandts” are merged –Not fool-proof, but works well in this limited domain Information Overlap –Merge similarly named artists if they share specific attribute values –e.g. Rembrandt, and Rembrandt Harmenszoon share a date of birth and a place of birth Merge less specific information into more detailed ones –This is mainly performed for dates and places e.g 1606 is merged into 15/7/1606; Netherlands is merged into Leiden –Place names are expanded with WordNet Synonyms: Leiden = Leyden Holonyms (part of): Leiden is part of The Netherlands What if there is more than one Leiden? How do we know which to select? –Use the specificity variation of the given place for disambiguation –e.g. we are here looking for a Leiden that is related to the Netherlands Consolidation Procedure

Verifying Inconsistencies

We don’t aim for “the right answer”, but for some sort of a confidence value But which answer is more likely to be the correct one? –Trust: certain sources can be more trusted than others, but how do we judge that? –Frequency: certain facts might be extracted more often than others –Extraction: some extraction rules are more reliable than others!

Instantiated Ontology

Narrative Generation Component

1 Level of Detail (LoD) LoD 212 Intro paragraph : DOB + place Rembrandt Harmenszoon van Rijn was born on July 15, 1606, in Leiden, the Netherlands. His father was a miller who wanted the boy to follow a learned profession, but Rembrandt left the University of Leiden to study painting. Paragraph with DOB and Place Best option is to have one paragraph that contains both pieces of information Sequence Narrative Generation

1 Level of Detail (LoD) 2 Sequence LoD 212 Intro paragraph : DOB + place Constructed sentence: Rembrandt was born on July 15, DOB Otherwise need a sequence of two fragments (DOB and place). Either use a paragraph for each fragment, or construct out of raw facts FOHM Template

Example Biography

ArtEquAKT Challenges Extraction –Some fact are too complex to extract –Rule based IE is not always sufficient –Mapping of ontology terms to those in the text is unreliable (better for the ontology editor to include synonymous terms) Generation –A much wider range of facts should be extracted to be able to generate the biographies from scratch –Narrative construction may require richer semantic support (e.g. ontology of narrative) –Generation is not error free. We rely on people’s ability to parse and understand text –Difficult to track what facts has been included in the biography if these facts have not bee identified Consolidation –Unreliable if the facts are extracted incorrectly –Could be inaccurate with spars information –Geographical expansion can be wrong for places with same names Planning a bid for a second generation of ArtEquAKT –Entirely ontology driven –Domain independent –Much better text generation

Questions you may want to ask! 1.So does this system work with other domains? 2.Why bother with biographies anyway! There are many out there already! 3.Why extract knowledge, then use whole paragraphs in your biographies?! 4.Did you evaluate any of this? 5.What kind of knowledge did you manage to extract? 6.What did you say that Armadillo thing does? 7.How can we get GATE to recognise different entities? 8.How much rubbish does your system extract? 9.Can we use this system?! ….. please? 10.How would you like me to fund you? cash or check?