Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

…to Ontology Repositories Mathieu dAquin Knowledge Media Institute, The Open University From…
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.
26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.
AHRT: The Automated Human Resources Tool BY Roi Ceren Muthukumaran Chandrasekaran.
SemTag and Seeker: Bootstrapping the Semantic Web via Automated Semantic Annotation Presented by: Hussain Sattuwala Stephen Dill, Nadav Eiron, David Gibson,
Semiautomatic Generation of Data-Extraction Ontologies Master’s Thesis Proposal Yihong Ding.
Semantic Annotation for Multilingual Search Shibamouli Lahiri
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Applications Chapter 9, Cimiano Ontology Learning Textbook Presented by Aaron Stewart.
OWL-AA: Enriching OWL with Instance Recognition Semantics for Automated Semantic Annotation 2006 Spring Research Conference Yihong Ding.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
Text mining and the Semantic Web Dr Diana Maynard NLP Group Department of Computer Science University of Sheffield.
Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.
Logic Programming for Natural Language Processing Menyoung Lee TJHSST Computer Systems Lab Mentor: Matt Parker Analytic Services, Inc.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
1 Large Scale Semantic Annotation, Indexing, and Search at The National Archives Diana Maynard Mark Greenwood University of Sheffield, UK.
Final Review 31 October WP2: Named Entity Recognition and Classification Claire Grover University of Edinburgh.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.
Survey of Semantic Annotation Platforms
University of Sheffield, NLP Entity Linking Kalina Bontcheva © The University of Sheffield, This work is licensed under the Creative Commons.
The PrestoSpace Project Valentin Tablan. 2 Sheffield NLP Group, January 24 th 2006 Project Mission The 20th Century was the first with an audiovisual.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
Populating A Knowledge Base From Text Clay Fink, Tim Finin, Christine Piatko and Jim Mayfield.
Extracting Metadata for Spatially- Aware Information Retrieval on the Internet Clough, Paul University of Sheffield, UK Presented By Mayank Singh.
Semantic Technologies & GATE NSWI Jan Dědek.
Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News (proceedings page 255) Mike Dowman Valentin Tablan Hamish Cunningham.
© Copyright 2008 STI INNSBRUCK Semantic Annotation Semantic Web Lecture Dieter Fensel.
8 Apr, 2005 OWLIM - OWL DLP support within Sesame Damyan Ognyanov Ontotext Lab, Sirma AI.
1 Language Technologies (1) Diana Maynard University of Sheffield, UK ACAI 05 ADVANCED COURSE ON KNOWLEDGE DISCOVERY.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
Oracle Database 11g Semantics Overview Xavier Lopez, Ph.D., Dir. Of Product Mgt., Spatial & Semantic Technologies Souripriya Das, Ph.D., Consultant Member.
Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Ontea: Pattern based Annotation Platform Michal Laclavík.
Towards the Semantic Web 6 Generating Ontologies for the Semantic Web: OntoBuilder R.H.P. Engles and T.Ch.Lech 이 은 정
1 Language Technologies (2) Valentin Tablan University of Sheffield, UK ACAI 05 ADVANCED COURSE ON KNOWLEDGE DISCOVERY.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Jed Hassell, Boanerges Aleman-Meza, Budak ArpinarBoanerges Aleman-MezaBudak Arpinar 5 th International Semantic Web Conference Athens, GA, Nov. 5 – 9,
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Natural Language Interfaces to Ontologies Danica Damljanović
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Semantic Web Course - Semantic Annotation
Co-funded by the European Union Semantic CMS Community Reference Architecture for Semantic CMS Copyright IKS Consortium 1 Lecturer Organization Date of.
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
An Ontological Approach to Financial Analysis and Monitoring.
WIKT 2007Košice, november Tvorba sémantických metadát Michal Laclavík Ústav Informatiky SAV.
Multi-Source Information Extraction Valentin Tablan University of Sheffield.
Improving Data Discovery Through Semantic Search
Social Knowledge Mining
Searching and browsing through fragments of TED Talks
Content Augmentation for Mixed-Mode News Broadcasts Mike Dowman
CS246: Information Retrieval
Hierarchical, Perceptron-like Learning for OBIE
Presentation transcript:

Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

Toward a Semantic Web Fully automatic methods for the semantic annotation are needed Related topics  Information retrieval (IR)  Information extraction (IE)  Name-entity recognition (NER)  Annotation processes

Semantic Annotation Diagram

Name Entities Named Entities (NE)  people, organizations, locations, and others referred by name. May also include scalars and expressions  numbers, amounts of money, dates, etc. (NUMEX, TIMEX) Hypothesis  Named entities (and the relations between them) mentioned in a resource constitute an important part of its semantics

Semantic Annotation of NEs Semantic Annotation of the NEs in a text includes:  Recognition of the type of the entities in the text  Identification of the entity individual Comparison  the traditional NER approach results in: Yihong Ding  the Semantic Annotation of NEs should result in something like the following: Yihong Ding

The KIM Platform The Knowledge and Information Management Platform provides:  Automatic Semantic Annotation of NEs (and relations between them)  Ontology Population with NE individuals and relations  Indexing and Retrieval w.r.t NEs  Query and Navigation over the Formal Knowledge

KIM Constituents KIM Ontology (KIMO) KIM World KB KIM Server – with API for remote access and integration Front-ends: KIM Web UI, Plug-in for Internet Explorer, and KB Explorer

KIM Bases KIM is based on the following open-source platforms:  GATE – NLP and IE platform in University of Sheffield  Sesame – RDF(S) repository Administrator b.v. Ontology Middleware and Custom Inference by Ontotext as extensions of Sesame  Lucene – open source IR-engine from Apache

KIM Architecture

KIM Ontology (KIMO) Light-weight upper- level ontology  250 NE classes  100 relations and attributes: covers mostly NE classes, and ignores general concepts includes classes representing lexical resources kimo.rdfs

KIM World KB A projection of the world (domain ontology)  Quasi-exhaustive coverage of the most popular entities in the world  Entities of general importance – like the ones that appear in the news At present KIM KB consists of about 200,000 entities:  50,000 locations, 130,000 organizations, 6000 people, etc.

Entity Description NEs are represented in KIM World KB with their Semantic Descriptions consisting of…  Aliases (Florida & FL)  Relations with other entities (Person hasPosition Position)  Attributes (latitude & longitude of geographic entities)  Proper class of the NE

KIM Server APIs for:  Semantic Annotation  Document Persistence  Indexing & Retrieval of documents w.r.t NEs  Semantic Repository Access & Exploration

KIM Semantic Information Extraction Based on GATE  NLP IE platform  Rules now based on ontology classes instead of a flat set of NE types Recognition and Identification of the NEs IE supported by a Semantic Repository  Containing lexical and gazetteer resources  Annotations referring to Entity Descriptions Ontology Population with the newly recognized entities & relations

KIM IE Pipeline

KIM Plug-in

KIM IE Performance Evaluated over 3 human-annotated corpora of news articles:  International Business News, International Political News, and UK Political News (~500 articles): Precision 86%, Recall 84% w.r.t the standard NE types But these metrics are not representative for semantic annotation

Semantic Annotation Metrics There are no established metrics for semantic annotation:  No human-annotated corpora with precise class and instance information  No metrics for various partial matches When a more specific class is recognized When a more general class is recognized When the class is correctly recognized, but the individual entity is not correctly identified.

Conclusion It is possible to adopt traditional IE techniques for semantic annotation It is worth using almost-exhaustive entity knowledge for IE KIM is still under development  Proper evaluation metrics  Precise disambiguation  More advanced IE techniques  KIM ontology and KB development