Download presentation
Presentation is loading. Please wait.
1
BYU A Synergistic Semantic Annotation Model December 2007 Yihong Ding, http://www.deg.byu.edu/ding/http://www.deg.byu.edu/ding/
2
BYU 12/7/20072 Grand challenge: new generation World Wide Web The current Web Enormous amount content Feasible for humans to read/write But … Content is simply too much to read The future Web Even more content but machine-processable Feasible for humans and machines to read/write Key issue Converting non-machine-processable content to machine- processable content, i.e., semantic annotation
3
BYU 12/7/20073 Semantic annotation, the general picture Data Extraction/Instance Recognition Engine AptRental Ontology
4
BYU 12/7/20074 Semantic annotation, the general picture AptRental Ontology
5
BYU 12/7/20075 Ontology Definition: Explicit, formal specifications of conceptualizations Unique identity of each concept Unique identity of each relationship among concepts Logic derivation rules underneath every declared relationship Annotation: 533-0293is-a AptRental:ContactPhone $1250 is-a AptRental:MonthlyRate 533-0293is-about AptRentalAd-instance-1 $1250 is-about AptRentalAd-instance-1 Ontology: AptRentalAd hasContactPhone AptRentalAd hasMonthlyRate Logic derivation: To rent the apartment that costs $1250 monthly please call 533-0293. (machine understanding)
6
BYU 12/7/20076 Automated semantic annotation, methods Layout-driven method (e.g. [Mukherjee et. al. 03]) Machine-learning-based method (e.g. [Handschuh et. al. 02]) Rule-based method (e.g. [Dill et. al. 03]) NLP-based method (e.g. [Popov et. al. 03]) Ontology-based method (e.g. [Ding et. al. 06])
7
BYU 12/7/20077 Ontology-based annotation
8
BYU 12/7/20078 Data extraction ontology Standard Ontology BedroomNr epistemological extension (instance recognizer) CAPITOL HILL Luxury 2 bdrm 2 bath, 2 grg, w/d,views, 1700 sq ft. $1250 mo. Call 533-0293 BedroomNr External representation Context Phrase Exception Phrase X
9
BYU 12/7/20079 Ontology-based annotation BedroomNr External representation Context Phrase BathNr External representation Context Phrase Feature External representation MonthRate External representation Context Phrase ContactPhone External representation CAPITOL HILL Luxury 2 bdrm 2 bath, 2 grg, w/d,views, 1700 sq ft. $1250 mo. Call 533-0293 Context Keyword
10
BYU 12/7/200710 Ontology-based annotation: strength and weakness Strengths Ignore layout difference Ignore layout change Less maintenance once built Weakness Expensive to build instance recognizers
11
BYU 12/7/200711 Layout-driven annotation
12
BYU 12/7/200712 Layout-driven annotation
13
BYU 12/7/200713 Layout-driven annotation, strength and weakness Strengths Accurate Simple and straightforward Less domain knowledge requirement Weakness Expensive in layout-pattern maintenance
14
BYU 12/7/200714 Problem How to overcome the weaknesses but retaining the strengths at the same time?
15
BYU 12/7/200715 Observation Extraction Domain ontology A Document Conceptual Annotator (ontology-based annotation) Annotated Document Layout Patterns Structural Annotator (layout-driven annotation) Domain ontology A Document Annotated Document accurate resilient
16
BYU 12/7/200716 Synergistic model Extraction Domain ontology A Document Conceptual Annotator (ontology-based annotation) Annotated Document Pattern Generation Layout Patterns Structural Annotator (layout-driven annotation) Annotated Document Instance Recognizer Enrichment
17
BYU 12/7/200717 Pattern Generation Get the annotated outputs from ontology-based annotator Apply HTML-structure analysis and produce a typical layout pattern for each extracted field If applicable, produce a sequential dependency between the generated layouts If applicable, produce simple heuristic rules such as “if A then B” between the generated layouts
18
BYU 12/7/200718 Instance recognizer enrichment Get the annotated outputs from layout-driven annotator Apply the results to the current corresponding instance recognizers If recognized, continue; Otherwise, if dictionary-type recognizers, insert. if regular-expression-type recognizers, try to generate a new regular expression and alert the user to check
19
BYU 12/7/200719 Preliminary results Apartment Rental domain Ontology-based annotation 90% accuracy in average on both precision and recall for nearly all fields Except Location and Contact Name Layout-driven annotation Nearly 100% accuracy on both precision and recall on Location and Contact Name Less recall on fields such as BedroomNr Pattern generation Great on well structured fields such as Location Less successful on semi-structured fields such as BedroomNr Instance recognizer enrichment Good results even with poorly constructed initial instance recognizers
20
BYU 12/7/200720 Summary Automatically produce layout patterns using outputs of ontology-based annotation Automatically enrich domain-specific instance recognizers using outputs of layout-driven annotation A new synergistic annotation model that retains original strengths and minimizes original weaknesses An annotation system that self-improves its performance during its execution
21
BYU 12/7/200721 Future work Dynamical tuning annotation based on user perspectives Ensemble of various annotators Collaborative annotation
22
BYU 12/7/200722 Thank you Yihong Ding ding@cs.byu.eduding@cs.byu.edu (801) 422-7604 2262 TMCB, Brigham Young University Provo, UT 84601 Data Extraction Research Lab at Brigham Young University http://www.deg.byu.edu Homepage, my virtual home on Web 1.0 http://www.deg.byu.edu/ding/ Thinking Space, my virtual home on Web 2.0 http://yihongs-research.blogspot.com/
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.