Download presentation
Presentation is loading. Please wait.
1
Survey of Semantic Annotation Platforms
SAC 2005 Survey of Semantic Annotation Platforms Lawrence Reeve Hyoil Han
2
Semantic Annotation Creating semantic labels within documents for the Semantic Web Used to support: Advanced searching (e.g. concept) Information Visualization (using ontology) Reasoning about Web resources Converting syntactic structures into knowledge structures (humanmachine)
3
Semantic Annotation Process
4
Semantic Annotation Concerns
Scale, Volume Existing & new documents on the Web Manual annotation Expensive – economic, time Subject to personal motivation Schema Complexity Storage support for multiple ontologies within or external to source document? Knowledge base refinement Access - How are annotations accessed? API, custom UI, plug-ins
5
Semantic Annotation Platforms
Why semantic annotation platforms (‘SAPs’)? Reduces human involvement Consistent application of ontologies Reduced cost – economic & time Scalability Multiple ontologies for single document
6
Semantic Annotation Platforms
Characteristics Provide many services, not just annotation Storage: ontology, KB, and annotation Access APIs (query annotations) Integrate information extraction methods Support for IE (gazetteers) Extensible
7
SAP General Architecture
8
SAP Classification
9
SAP Classification Pattern-based Pattern-discovery Rules
Iterative learning provide initial seed set find new entities find new patterns repeat Rules Manually define rules to find entities in text Simple label matching
10
SAP Classification Machine-learning based Wrapper Induction
LP2 Uses structural and linguistic information Produces tagging & correction rules as output Statistical models Hidden Markov Model
11
SAP Classification Multistrategy
Combine pattern and machine-learning approaches Did not find a platform that implements this approach Platform extensibility important for implementation
12
Semantic Annotation Platforms
Selection Idea is to get a representative sample of platforms using various information extraction techniques System needed to be a platform offering services, not just algorithm
13
Semantic Annotation Platforms
14
Language Toolkits GATE – language processing system
Component architecture, SDK, IDE ANNIE (‘A Nearly-New IE system’) tokenizer, gazetteer, POS tagger, sentence splitter, etc JAPE – Java Annotations Pattern Engine provides regular-expression based pattern/action rules Amilcare adaptive IE system designed for document annotation based on LP2 uses ANNIE
15
KIM (2003) ontology, kb, semantic annotation, indexing and retrieval server, front-ends (Web UI, IE plug-in) KIMO ontology 250 classes, 100 properties 80,000 entities from general news corpus in KB (plus >100,000 aliases) IE Uses GATE, JAPE Gazetteers (from KB) Source:
16
Ont-O-Mat (2002) Uses Amilcare Extensible Wrapper induction (LP2)
Adapted in 2004 for PANKOW algorithm Disambiguation by maximal evidence Proper nouns + ontology linguistic phrases Source: kcap2001-annotate-sub.pdf
17
MUSE (2003) Pipeline of processing resources (PRs) Makes use of JAPE
PRs called conditionally based on text attributes Makes use of JAPE Adaptive rules Can link multiple resources together Gazetteer + part-of-speech tagger Resolve entity ambiguities Source:
18
SemTag (2003) Large-scale annotation Uses the TAP taxonomy
Annotations separate from source “Semantic Label Bureau” Uses the TAP taxonomy Approach is: Find match to label in taxonomy Save window before & after match Perform disambiguation Main contribution is using taxonomy for disambiguation Source: resources/semtag.pdf
19
Platform Effectiveness
*as reported by platform authors
20
Summary Several platforms developed in last several years
Large implementation effort; many services Differentiated by IE methods used Services provided Future IE integration will likely improve annotation accuracy Extension of existing platforms will allow for quicker research
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.