Presentation is loading. Please wait.

Presentation is loading. Please wait.

BYU A Synergistic Semantic Annotation Model December 2007 Yihong Ding,

Similar presentations


Presentation on theme: "BYU A Synergistic Semantic Annotation Model December 2007 Yihong Ding,"— Presentation transcript:

1 BYU A Synergistic Semantic Annotation Model December 2007 Yihong Ding, http://www.deg.byu.edu/ding/http://www.deg.byu.edu/ding/

2 BYU 12/7/20072 Grand challenge: new generation World Wide Web The current Web  Enormous amount content  Feasible for humans to read/write  But … Content is simply too much to read The future Web  Even more content but machine-processable  Feasible for humans and machines to read/write  Key issue Converting non-machine-processable content to machine- processable content, i.e., semantic annotation

3 BYU 12/7/20073 Semantic annotation, the general picture Data Extraction/Instance Recognition Engine AptRental Ontology

4 BYU 12/7/20074 Semantic annotation, the general picture AptRental Ontology

5 BYU 12/7/20075 Ontology Definition: Explicit, formal specifications of conceptualizations  Unique identity of each concept  Unique identity of each relationship among concepts  Logic derivation rules underneath every declared relationship Annotation: 533-0293is-a AptRental:ContactPhone $1250 is-a AptRental:MonthlyRate 533-0293is-about AptRentalAd-instance-1 $1250 is-about AptRentalAd-instance-1 Ontology: AptRentalAd hasContactPhone AptRentalAd hasMonthlyRate Logic derivation: To rent the apartment that costs $1250 monthly please call 533-0293. (machine understanding)

6 BYU 12/7/20076 Automated semantic annotation, methods  Layout-driven method (e.g. [Mukherjee et. al. 03])  Machine-learning-based method (e.g. [Handschuh et. al. 02])  Rule-based method (e.g. [Dill et. al. 03])  NLP-based method (e.g. [Popov et. al. 03])  Ontology-based method (e.g. [Ding et. al. 06])

7 BYU 12/7/20077 Ontology-based annotation

8 BYU 12/7/20078 Data extraction ontology Standard Ontology BedroomNr epistemological extension (instance recognizer) CAPITOL HILL Luxury 2 bdrm 2 bath, 2 grg, w/d,views, 1700 sq ft. $1250 mo. Call 533-0293 BedroomNr External representation Context Phrase Exception Phrase X

9 BYU 12/7/20079 Ontology-based annotation BedroomNr External representation Context Phrase BathNr External representation Context Phrase Feature External representation MonthRate External representation Context Phrase ContactPhone External representation CAPITOL HILL Luxury 2 bdrm 2 bath, 2 grg, w/d,views, 1700 sq ft. $1250 mo. Call 533-0293 Context Keyword

10 BYU 12/7/200710 Ontology-based annotation: strength and weakness Strengths  Ignore layout difference  Ignore layout change  Less maintenance once built Weakness  Expensive to build instance recognizers

11 BYU 12/7/200711 Layout-driven annotation

12 BYU 12/7/200712 Layout-driven annotation

13 BYU 12/7/200713 Layout-driven annotation, strength and weakness Strengths  Accurate  Simple and straightforward  Less domain knowledge requirement Weakness  Expensive in layout-pattern maintenance

14 BYU 12/7/200714 Problem How to overcome the weaknesses but retaining the strengths at the same time?

15 BYU 12/7/200715 Observation Extraction Domain ontology A Document Conceptual Annotator (ontology-based annotation) Annotated Document Layout Patterns Structural Annotator (layout-driven annotation) Domain ontology A Document Annotated Document accurate resilient

16 BYU 12/7/200716 Synergistic model Extraction Domain ontology A Document Conceptual Annotator (ontology-based annotation) Annotated Document Pattern Generation Layout Patterns Structural Annotator (layout-driven annotation) Annotated Document Instance Recognizer Enrichment

17 BYU 12/7/200717 Pattern Generation  Get the annotated outputs from ontology-based annotator  Apply HTML-structure analysis and produce a typical layout pattern for each extracted field  If applicable, produce a sequential dependency between the generated layouts  If applicable, produce simple heuristic rules such as “if A then B” between the generated layouts

18 BYU 12/7/200718 Instance recognizer enrichment  Get the annotated outputs from layout-driven annotator  Apply the results to the current corresponding instance recognizers If recognized, continue; Otherwise, if dictionary-type recognizers, insert. if regular-expression-type recognizers, try to generate a new regular expression and alert the user to check

19 BYU 12/7/200719 Preliminary results Apartment Rental domain  Ontology-based annotation 90% accuracy in average on both precision and recall for nearly all fields Except Location and Contact Name  Layout-driven annotation Nearly 100% accuracy on both precision and recall on Location and Contact Name Less recall on fields such as BedroomNr  Pattern generation Great on well structured fields such as Location Less successful on semi-structured fields such as BedroomNr  Instance recognizer enrichment Good results even with poorly constructed initial instance recognizers

20 BYU 12/7/200720 Summary  Automatically produce layout patterns using outputs of ontology-based annotation  Automatically enrich domain-specific instance recognizers using outputs of layout-driven annotation  A new synergistic annotation model that retains original strengths and minimizes original weaknesses  An annotation system that self-improves its performance during its execution

21 BYU 12/7/200721 Future work  Dynamical tuning annotation based on user perspectives  Ensemble of various annotators  Collaborative annotation

22 BYU 12/7/200722 Thank you Yihong Ding ding@cs.byu.eduding@cs.byu.edu (801) 422-7604 2262 TMCB, Brigham Young University Provo, UT 84601 Data Extraction Research Lab at Brigham Young University http://www.deg.byu.edu Homepage, my virtual home on Web 1.0 http://www.deg.byu.edu/ding/ Thinking Space, my virtual home on Web 2.0 http://yihongs-research.blogspot.com/


Download ppt "BYU A Synergistic Semantic Annotation Model December 2007 Yihong Ding,"

Similar presentations


Ads by Google