BYU A Synergistic Semantic Annotation Model December 2007 Yihong Ding,

Slides:



Advertisements
Similar presentations
Semantic Web Thanks to folks at LAIT lab Sources include :
Advertisements

Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
Surfing the Service Web Sudhir Agarwal, Siegfried Handschuh, and Steffen Staab Presenter: Yihong Ding.
FOCIH: Form-based Ontology Creation and Information Harvesting Cui Tao, David W. Embley, Stephen W. Liddle Brigham Young University Nov. 11, 2009 Supported.
CS652 Spring 2004 Summary. Course Objectives  Learn how to extract, structure, and integrate Web information  Learn what the Semantic Web is  Learn.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Annotation for the Semantic Web Yihong Ding A PhD Research Area Background Study.
OWL-AA: Enriching OWL with Instance Recognition Semantics for Automated Semantic Annotation 2006 Spring Research Conference Yihong Ding.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Recognizing Ontology-Applicable Multiple-Record Web Documents David W. Embley Dennis Ng Li Xu Brigham Young University.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Advanced Distributed Learning. Conditions Before SCORM  Couldn’t move courses from one Learning Management System to another  Couldn’t reuse content.
Overall Information Extraction vs. Annotating the Data Conference proceedings by O. Etzioni, Washington U, Seattle; S. Handschuh, Uni Krlsruhe.
PZ03CX Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ03CX - Language semantics Programming Language Design.
Semiautomatic Generation of Resilient Data-Extraction Ontologies Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF.
Machine Learning for Information Extraction Li Xu.
Two-Level Semantic Annotation Model BYU Spring Conference 2007 Yihong Ding Sponsored by NSF.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
A New Web Semantic Annotator Enabling A Machine Understandable Web BYU Spring Research Conference 2005 Yihong Ding Sponsored by NSF.
Integration of Information Extraction with an Ontology M. Vargas-Vera, J.Domingue, Y.Kalfoglou, E. Motta and S. Buckingham Sum.
1 On Embedding Machine-Processable Semantics into Documents Krishnaprasad Thirunarayan Department of Computer Science & Engineering Wright State University.
Ontology-Based Free-Form Query Processing for the Semantic Web Mark Vickers Brigham Young University MS Thesis Defense Supported by:
Enriching OWL with Instance Recognition Semantics for Automated Semantic Annotation Stephen W. Liddle Information Systems Department Yihong Ding & David.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
Selima Besbes Essanaa, Nadira Lammari ISID - CEDRIC Laboratory - CNAM - Paris.
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
Generating Data-Extraction Ontologies By Example Joe Zhou Data Extraction Group Brigham Young University.
Automatic Creation and Simplified Querying of Semantic Web Content An Approach Based on Information-Extraction Ontologies Yihong Ding, David W. Embley,
OIL: An Ontology Infrastructure for the Semantic Web D. Fensel, F. van Harmelen, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider Presenter: Cristina.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
A Brief Survey of Web Data Extraction Tools Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran S. da Silva, Juliana S. Teixeira Federal University.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Processing of large document collections Part 10 (Information extraction: multilingual IE, IE from web, IE from semi-structured data) Helena Ahonen-Myka.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.
Copyright © 2013 Curt Hill The Zachman Framework What is it all about?
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
 Knowledge Acquisition  Machine Learning. The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
Semantic data model
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
Markup and Validation Agents in Vijjana – A Pragmatic model for Self- Organizing, Collaborative, Domain- Centric Knowledge Networks S. Devalapalli, R.
Dimitrios Skoutas Alkis Simitsis
 Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Semantic on the Social Semantic Desktop.
VIKEF – Take the VIKEF train towards smart services …
1 Everyday Requirements for an Open Ontology Repository Denise Bedford Ontolog Community Panel Presentation April 3, 2008.
Scenarios for a Learning GRID Online Educa Nov 30 – Dec 2, 2005, Berlin, Germany Nicola Capuano, Agathe Merceron, PierLuigi Ritrovato
Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.
Cost-Effective Information Extraction from Lists in OCRed Historical Documents Thomas Packer and David W. Embley Brigham Young University FamilySearch.
Towards the Semantic Web 6 Generating Ontologies for the Semantic Web: OntoBuilder R.H.P. Engles and T.Ch.Lech 이 은 정
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Chapter 7 K NOWLEDGE R EPRESENTATION, O NTOLOGICAL E NGINEERING, AND T OPIC M APS L EO O BRST AND H OWARD L IU.
David W. Embley Brigham Young University Provo, Utah, USA.
© 2014 IBM Corporation The BE 2 model: When Business Events meet Business Entities Fabiana Fournier and Lior Limonad 8 September 2014.
©2003 Paula Matuszek CSC 9010: AeroText, Ontologies, AeroDAML Dr. Paula Matuszek (610)
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
The Semantic Web By: Maulik Parikh.
David W. Embley Brigham Young University Provo, Utah, USA
Social Knowledge Mining
ece 627 intelligent web: ontology and beyond
Ontology-Based Approaches to Data Integration
CH 4 - Language semantics
Presentation transcript:

BYU A Synergistic Semantic Annotation Model December 2007 Yihong Ding,

BYU 12/7/20072 Grand challenge: new generation World Wide Web The current Web  Enormous amount content  Feasible for humans to read/write  But … Content is simply too much to read The future Web  Even more content but machine-processable  Feasible for humans and machines to read/write  Key issue Converting non-machine-processable content to machine- processable content, i.e., semantic annotation

BYU 12/7/20073 Semantic annotation, the general picture Data Extraction/Instance Recognition Engine AptRental Ontology

BYU 12/7/20074 Semantic annotation, the general picture AptRental Ontology

BYU 12/7/20075 Ontology Definition: Explicit, formal specifications of conceptualizations  Unique identity of each concept  Unique identity of each relationship among concepts  Logic derivation rules underneath every declared relationship Annotation: is-a AptRental:ContactPhone $1250 is-a AptRental:MonthlyRate is-about AptRentalAd-instance-1 $1250 is-about AptRentalAd-instance-1 Ontology: AptRentalAd hasContactPhone AptRentalAd hasMonthlyRate Logic derivation: To rent the apartment that costs $1250 monthly please call (machine understanding)

BYU 12/7/20076 Automated semantic annotation, methods  Layout-driven method (e.g. [Mukherjee et. al. 03])  Machine-learning-based method (e.g. [Handschuh et. al. 02])  Rule-based method (e.g. [Dill et. al. 03])  NLP-based method (e.g. [Popov et. al. 03])  Ontology-based method (e.g. [Ding et. al. 06])

BYU 12/7/20077 Ontology-based annotation

BYU 12/7/20078 Data extraction ontology Standard Ontology BedroomNr epistemological extension (instance recognizer) CAPITOL HILL Luxury 2 bdrm 2 bath, 2 grg, w/d,views, 1700 sq ft. $1250 mo. Call BedroomNr External representation Context Phrase Exception Phrase X

BYU 12/7/20079 Ontology-based annotation BedroomNr External representation Context Phrase BathNr External representation Context Phrase Feature External representation MonthRate External representation Context Phrase ContactPhone External representation CAPITOL HILL Luxury 2 bdrm 2 bath, 2 grg, w/d,views, 1700 sq ft. $1250 mo. Call Context Keyword

BYU 12/7/ Ontology-based annotation: strength and weakness Strengths  Ignore layout difference  Ignore layout change  Less maintenance once built Weakness  Expensive to build instance recognizers

BYU 12/7/ Layout-driven annotation

BYU 12/7/ Layout-driven annotation

BYU 12/7/ Layout-driven annotation, strength and weakness Strengths  Accurate  Simple and straightforward  Less domain knowledge requirement Weakness  Expensive in layout-pattern maintenance

BYU 12/7/ Problem How to overcome the weaknesses but retaining the strengths at the same time?

BYU 12/7/ Observation Extraction Domain ontology A Document Conceptual Annotator (ontology-based annotation) Annotated Document Layout Patterns Structural Annotator (layout-driven annotation) Domain ontology A Document Annotated Document accurate resilient

BYU 12/7/ Synergistic model Extraction Domain ontology A Document Conceptual Annotator (ontology-based annotation) Annotated Document Pattern Generation Layout Patterns Structural Annotator (layout-driven annotation) Annotated Document Instance Recognizer Enrichment

BYU 12/7/ Pattern Generation  Get the annotated outputs from ontology-based annotator  Apply HTML-structure analysis and produce a typical layout pattern for each extracted field  If applicable, produce a sequential dependency between the generated layouts  If applicable, produce simple heuristic rules such as “if A then B” between the generated layouts

BYU 12/7/ Instance recognizer enrichment  Get the annotated outputs from layout-driven annotator  Apply the results to the current corresponding instance recognizers If recognized, continue; Otherwise, if dictionary-type recognizers, insert. if regular-expression-type recognizers, try to generate a new regular expression and alert the user to check

BYU 12/7/ Preliminary results Apartment Rental domain  Ontology-based annotation 90% accuracy in average on both precision and recall for nearly all fields Except Location and Contact Name  Layout-driven annotation Nearly 100% accuracy on both precision and recall on Location and Contact Name Less recall on fields such as BedroomNr  Pattern generation Great on well structured fields such as Location Less successful on semi-structured fields such as BedroomNr  Instance recognizer enrichment Good results even with poorly constructed initial instance recognizers

BYU 12/7/ Summary  Automatically produce layout patterns using outputs of ontology-based annotation  Automatically enrich domain-specific instance recognizers using outputs of layout-driven annotation  A new synergistic annotation model that retains original strengths and minimizes original weaknesses  An annotation system that self-improves its performance during its execution

BYU 12/7/ Future work  Dynamical tuning annotation based on user perspectives  Ensemble of various annotators  Collaborative annotation

BYU 12/7/ Thank you Yihong Ding (801) TMCB, Brigham Young University Provo, UT Data Extraction Research Lab at Brigham Young University Homepage, my virtual home on Web Thinking Space, my virtual home on Web 2.0