1 Semi-Automatic Semantic Annotation for Hidden-Web Tables Cui Tao & David W. Embley Data Extraction Research Group Department of Computer Science Brigham Young University Supported by NSF
2 Semantic Annotation The Hidden Web: Hidden behind forms Hard to query “cdk-4"
3 Semantic Annotation The Hidden Web: Hidden behind forms Hard to query to find the protein and the animo-acids information for gene “cdk-4"
4 Semantic Annotation The Hidden Web: Hidden behind forms Hard to query Semantic annotation Machine-”understandable” Publicly accessible
5 System Overview Initial semantic annotation Manually annotate a sample page With respect to a selected ontology Table interpretation Automatic Tables from hidden web pages Final semantic annotation Automatic Annotate interpreted tables
6 Initial Semantic Annotation SMORE: Semantic Markup, Ontology and RDF Editor [Maryland information and network dynamics lab]
7
8 Table Interpretation Table interpretation Locate label and value Pair label-value pairs Remember path TISP – Table Interpretation by Sibling Pages
9 TISP
10 Interpretation Technique: Sibling Page Comparison Same
11 Interpretation Technique: Sibling Page Comparison Almost Same
12 Interpretation Technique: Sibling Page Comparison Different Same
13 Interpretation Technique: Sibling Page Comparison Label Path = Identification.Gene model(s).Gene Model Xpath = html[1]/…/table[3]/tr[1]/td[2]/table[1]/tr[6]/td[2]/table[1]/tr[2]/td[1] Structure Pattern of a Table
14 Annotation Protein Name
15 Annotation – Split Nucleotide Size
16 Annotation – Merge Protein Information
17 Annotation—Union Name
18 Annotation—Selection Molecular Function
19 Generated RDF Annotation
20 Querying Annotated Data to find the protein and the animo-acids information for gene “cdk-4"
21 Summary Semi-automatic semantic annotation for hidden web tables Facilitate large-scale annotation to the web