Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Semantic Web Thanks to folks at LAIT lab Sources include :
Semiautomatic Generation of Data-Extraction Ontologies Master’s Thesis Proposal Yihong Ding.
Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence.
Information and Business Work
Information Retrieval in Practice
CS652 Spring 2004 Summary. Course Objectives  Learn how to extract, structure, and integrate Web information  Learn what the Semantic Web is  Learn.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study.
OWL-AA: Enriching OWL with Instance Recognition Semantics for Automated Semantic Annotation 2006 Spring Research Conference Yihong Ding.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 The Enhanced Entity- Relationship (EER) Model.
A Review of Ontology Mapping, Merging, and Integration Presenter: Yihong Ding.
Two-Level Semantic Annotation Model BYU Spring Conference 2007 Yihong Ding Sponsored by NSF.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Toward Making Online Biological Data Machine Understandable Cui Tao.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
A New Web Semantic Annotator Enabling A Machine Understandable Web BYU Spring Research Conference 2005 Yihong Ding Sponsored by NSF.
1 Extracting RDF Data from Unstructured Sources Based on an RDF Target Schema Tim Chartrand Research Supported By NSF.
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya F. Noy and Mark A. Musen.
Enriching OWL with Instance Recognition Semantics for Automated Semantic Annotation Stephen W. Liddle Information Systems Department Yihong Ding & David.
1 Cui Tao PhD Dissertation Defense Ontology Generation, Information Harvesting and Semantic Annotation For Machine-Generated Web Pages.
Semi-Automatic Generation of Mini-Ontologies from Canonicalized Relational Tables Chris Hathaway.
BYU A Synergistic Semantic Annotation Model December 2007 Yihong Ding,
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Overview of Search Engines
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
Thesis Proposal Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Processing of large document collections Part 10 (Information extraction: multilingual IE, IE from web, IE from semi-structured data) Helena Ahonen-Myka.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.
Dimitrios Skoutas Alkis Simitsis
Department of computer science and engineering Two Layer Mapping from Database to RDF Martin Švihla Research Group Webing Department.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Slide 12.1 Chapter 12 Implementation. Slide 12.2 Learning outcomes Produce a plan to minimize the risks involved with the launch phase of an e-business.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
DataBase and Information System … on Web The term information system refers to a system of persons, data records and activities that process the data.
Ontology-Centered Personalized Presentation of Knowledge Extracted from the Web Ralitsa Angelova.
Working with Ontologies Introduction to DOGMA and related research.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
What’s Ahead for Embedded Software? (Wed) Gilsoo Kim
UNEP Terminology Workshop - Geneva, April 15, Environmental Terminology & Thesaurus Workshop UN Environment Programme Regional Office of Europe.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Chapter 7 K NOWLEDGE R EPRESENTATION, O NTOLOGICAL E NGINEERING, AND T OPIC M APS L EO O BRST AND H OWARD L IU.
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
OWL Web Ontology Language Summary IHan HSIAO (Sharon)
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Of 24 lecture 11: ontology – mediation, merging & aligning.
The Semantic Web By: Maulik Parikh.
Search Engine Architecture
ONTOMERGE Ontology translations by merging ontologies Paper: Ontology Translation on the Semantic Web by Dejing Dou, Drew McDermott and Peishen Qi 2003.
Presentation transcript:

Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding

2 Motivation The representation of web content limits its usability A machine understandable web –Shared, explicit, formal conceptualizations (ontologies) –The semantic web

3 A Problem How to transform current web to be the semantic web?

4 A Solution: Semantic Annotation Add explicit, formal, and unambiguous metadata to web documents Explicit: publicly accessible Formal: publicly agreeable Unambiguous: publicly identifiable

5 Annotation Representation Explicit Annotation Implicit Annotation

6 Semantic Annotation Current Research Status Manual annotation through friendly interfaces [Annotea, etc.] Automatic annotation with ontology generation [SCORE] Automatic annotation using automated IE tool based on pre-defined ontologies [SemTag, MnM, etc.]

7 Current Automatic Annotator a typical paradigm Domain Ontology Non-ontology-based IE Wrapper Rules and extracting categories Document (1) Extraction (2) Alignment (3) Annotation

8 Current Automatic Annotator Problems Domain Ontology Document (1) Problem of data recognition (2) Problem of concept disambiguation (3) Problem of Annotation formatting, storing, indexing, sharing (4) Problem of Assembling ontologies Non-ontology-based IE Wrapper Rules and extracting categories

9 “Main Drawback of Using Automated IE” [Kiryakov04] “none of these approaches expects an input or produces output with respect to ontologies” “a set of heuristics for post-processing and mapping of the IE results to an ontology … not sufficient for large-scale, domain-independent semantic annotation.” “IE and wrapper induction techniques need to use the ontology more directly during the process of extraction.”

10 Ontology-driven Paradigm (Data-Extraction Ontology) for Semantic Annotation Document Non-ontology-based IE Wrapper Ontology-based IE Wrapper Document

11 Ontology-driven Paradigm for Semantic Annotation Some Arguments Resiliency w.r.t. web page layouts (helps scale to large set of web pages) Adpativeness w.r.t. domain specifications (helps scale to large size domains) Creation of ontologies: still a problem but no longer a drawback Speed of execution: still a drawback (but we are going to propose a solution next)

12 Two-Layer Annotation Model Conceptual Annotator using an ontology-based IE tool Document Structural Annotator Sample Annotation Process Similar Documents Massive Annotation Process

13 Structural Annotator Major components –HTML hierarchical path that leads to concept locations –Local context around locations –Dependencies among multiple semantic categories Significance –Identify both categories and their semantic meanings

14 Ontology Factors in Semantic Annotation Tasks Knowledge specification –Semantic web community –Web Ontology Language (OWL) Knowledge instantiation –IE and database community –Object-oriented System Model in XML (OSMX)

15 Ontology Conversion Similarities (OWL vs. OSMX) –Class vs. object set –ObjectProperty vs. relationship set –Cardinality restriction vs. participation constraint –subclassOf vs. is-a relationship Unique features –OWL subpropertyOf symmetric and transitive property namespace declaration ontology importing –OSMX arbitrary n-ary relationship sets data frames general constraints

16 Ontology Construction An Unavoidable Problem Semantic annotation tasks require ontologies. The ontology for a specific semantic annotation task is not promised to be available all the time.

17 Ontology Construction General and Special Generally speaking –Until now, main stream, manual construction –Automatic and semi-automatic ontology generation, many research papers, few or none practical, a very hard problem Special to semantic annotation purpose –Very dynamic and variant domains –Much overlapped information –Limited size of scope for one web page –Flat structure

18 Ontology Construction Knowledge Reusing “What has been will be again, what has been done will be done again; there is nothing new under the sun.” (The Holy Bible, Ecclesiastes, 1:9, NIV translation) A “new” ontology is a new assembly with unions and projections of several pre- existed ontologies.

19 Architecture on Dynamically Assembling Domain of Interest Web Page (1) (2) (1)Knowledge-component selection (2)Ontology assembly …… Collection of Knowledge Selected Knowledge Components … Assembled Ontology …

20 Thesis Statement Propose a new solution to perform semantic annotation on normal HTML web pages, specifically 1.apply ontology-based automatic IE techniques 2.augment OWL with knowledge recognition extension 3.combine conceptual annotator and layout-based annotator 4.assemble a new domain ontology for an annotation task dynamically

21 Standard Evaluation Annotation performance –Precision –Recall –Speed of execution Testing bed –5 ~ 10 different domains, with over 10 lexical concepts in each domain ontology –20 ~ 50 web pages on each domain

22 Ontology Converter Test A complete and sound checking is costly and difficult to implement. Our simple test A –Start with an OSMX ontology A B –Covert it to OWL and then transform it back to be OSMX ontology B AB –Process both A and B to annotate a same set of web pages (say 30 – 50 web pages) –Annotation results should be identical

23 Two-Layer Annotation Model Evaluation Standard evaluation In addition –About five large web sites with machine-generated web pages, each of which contains at least dozens of web pages

24 Dynamic Ontology Assembler Evaluation Regular precision and recall study according to selected knowledge components A pilot study on when ontology assembler works better than manual ontology construction –Record the time to use a tool to create an ontology from scratch –Record the time to assemble a same ontology –Compare their differences and the special conditions for each case –Make empirical suggestions about how to build a knowledge base that favors ontology assembly

25 Delimitations Automatic ontology creation from scratch Annotation storing, indexing, and sharing mechanisms Semantic annotation for multimedia content Parallel or distributional computing to further scale the semantic annotation system to a large number of web pages

26 Contributions To convert current web pages into machine-understandable semantic web pages Producing a pure ontology-driven semantic annotator using ontology-based IE wrapper Proposing a novel two-layer annotation model to do fast, accurate, and resilient annotation Studying a dynamic ontology assembler that helps maximize the reuse of existing knowledge and minimize the load of manual ontology creation Implementing an ontology converter so that this work is useful to the rest of the semantic web society.