Download presentation
Presentation is loading. Please wait.
1
A New Web Semantic Annotator Enabling A Machine Understandable Web BYU Spring Research Conference 2005 Yihong Ding Sponsored by NSF
2
2 Machine Understandable Web Content is represented in commonly shared, explicitly defined, generic conceptualizations. Ontology Also known as the Semantic Web
3
3 Why Machine Understandable? Meaningful data Exchangeable information Interoperable programs/services “… allows data to be shared and reused across application, enterprise, and community boundaries …” --- Tim Berners-Lee etc. 2001
4
4 Semantic Annotation: A Way to Achieve Machine Understandable Add explicit, formal, and unambiguous notes to web documents Explicit: publicly accessible Formal: publicly agreeable Unambiguous: publicly identifiable
5
5 Semantic Annotation Using Automated IE Engines Document Non-ontology-based IE Wrapper Ontology-based IE Wrapper Document
6
6 Augmentations for the Annotator Semantic annotator using data-extraction ontologies: a two-layer annotation model to achieve fast, high accurate, and resilient semantic annotation a divide-and-conquer style architecture to scale system to large domains a web ontology language augmentation to compliment OWL for semantic annotation purposes
7
7 Two-Layer Annotation Model Conceptual Annotator using ontology-based IE tool Document Structural Annotator Sample Annotation Process Same-Layout Documents Massive Annotation Process
8
8 Two-Layer Annotation Model, Benefits Achieve both resiliency and fast speed of execution Require no training for generating structural annotators Demand no labeling to results from structural annotators
9
9 Scalability Issues Large domain containing many concepts Large annotation task dealing with many web pages
10
10 Observation A large domain is a combination of several small domains. Consistently clustered domains exist, where each this type of domain is Composed with same cluster of concepts Consistent to any larger domain in which it participates Usually with small number of concepts
11
11 Divide-and-Conquer Style Architecture for Scalability Issue Selected Domain Ontologies …… Collection of small atomic domain ontologies Document (1) (2) (1)Text classification (2)Scalable annotation Document
12
12 Divide-and-Conquer, Benefits Comparing to large ontologies, small ontologies are Simpler to construct Faster to execute Easier to check and update More convenient to reuse Identify the range of an ontology dynamically in the web page level Avoid the problem of narrowing a large domain ontology down to the web page level Maximize the reuse of existing ontologies
13
13 Ontology Representation Two ontology languages Data-extraction ontology (OSMX) Semantic web ontology (OWL) Language unification
14
14 Contributions Automatically semantic annotator using ontology- based IE wrapper Two level annotation: layout-based annotator on top of conceptual annotator Divide-and-conquer style solution to scale annotation process to large number of concepts Web ontology language unification
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.