Download presentation
Presentation is loading. Please wait.
1
6/17/20151 Table Structure Understanding by Sibling Page Comparison Cui Tao Data Extraction Group Department of Computer Science Brigham Young University Supported by NSF
2
6/17/20152 Table Structure Understanding Motivation Many documents contain tables Data extraction Data integration Ontology evolution Solution Locate tables Locate table labels Locate table values Find label/value associations
3
6/17/20153 Table Structure Understanding
4
6/17/20154 Table Structure Understanding 1 2 (Gene Model, 1) = F 1 8H 3.5a (Gene Model, 2) = F 1 8H 3.5b :
5
6/17/20155
6
6
7
7 Sibling Pages Generated output pages user query results in predefined page structure Same web site ~ same structure
8
6/17/20158 Problems Data rich area --- discard the irrelevant parts Find table correspondences Find mappings between table cells Find structure patterns
9
6/17/20159 HTML Table Components
10
6/17/201510 Data Rich Area
11
6/17/201511 Table Unnesting
12
6/17/201512 DOM Tree
13
6/17/201513 Simple Tree Matching Simple Tree Matching (STM) Yang91 Maximum matching pairs of nodes O(mn) label Value
14
6/17/201514 Table Structure Pattern
15
6/17/201515 Table Structure Pattern
16
6/17/201516 Experimental Results Initial Test General pattern extraction Molecular biology: 95.6% Car ad: 100% Dynamic adjustment Unseen structure Structure variations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.