Download presentation
Presentation is loading. Please wait.
1
Table Interpretation by Sibling Page Comparison Cui Tao & David W. Embley Data Extraction Group Department of Computer Science Brigham Young University Supported by NSF
2
Table Interpretation (in context) Context: Table Understanding Table Recognition Table Interpretation Table Conceptualization Table Understanding Applications Not only “understanding” wrt community knowledge But also creation or augmentation of community knowledge Challenging Conceptual-Modeling Work
3
Table Interpretation (in context) Context: Table Understanding Table Recognition Table Interpretation with Sibling Pages: Table Conceptualization Table Understanding Applications Not only “understanding” wrt community knowledge But also creation or augmentation of community knowledge Challenging Conceptual-Modeling Work TISP
4
TISP: Table Recognition and Interpretation Recognize tables (discard non-tables) Locate table labels Locate table values Find label/value associations
5
Recognize Tables Data Table Layout Tables (discard) Nested Data Tables
6
Locate Table Labels Examples: Identification.Gene model(s).Protein Identification.Gene model(s).2
7
Locate Table Labels Examples: Identification.Gene model(s).Gene Model Identification.Gene model(s).2 1212
8
Locate Table Values Value
9
Find Label/Value Associations Example: (Identification.Gene model(s).Protein, Identification.Gene model(s).2) = WP:CE28918 1212
10
Conceptual Table Interpretation Wang Notation [Wang96]; (Identification.Gene model(s).Protein, Identification.Gene model(s).2) = WP:CE28918 Table Ontology
11
Interpretation Technique: Sibling Page Comparison
12
Same
13
Interpretation Technique: Sibling Page Comparison Almost Same
14
Interpretation Technique: Sibling Page Comparison Different Same
15
Technique Details Unnest tables Match tables in sibling pages “Perfect” match (table for layout discard ) “Reasonable” match (sibling table) Determine/Use Table-Structure Pattern Discover pattern Pattern usage Dynamic pattern adjustment
16
Table Unnesting
17
Match Based on DOM Tree
18
Simple Tree Matching Algorithm Labels Values [Yang91] Match Score Categorization: Exact/Near-Exact, Sibling-Table, False
19
Table Structure Patterns Regularity Expectations: ( {L} {V}) n ( {L}) n ( ( {V}) n ) + … Pattern combinations are also possible.
20
Pattern Usage (Location.Genetic Position) = X:12.69 +/- 0.000 cM [mapping data] (Location.Genomic Position) = X:13518823..13515773 bp
21
Dynamic Pattern Adjustment ( {L}) 5 ( ( {V}) 5 ) + ( {L}) 5 ( ( {V}) 5 ) + | ( {L}) 6 ( ( {V}) 6 ) +
22
TISP Evaluation Applications Commercial: car ads Scientific: molecular biology Geopolitical: US states and countries Data: > 2,000 tables, 275 sibling tables, 35 web sites Evaluation Initial two sibling pages Correct separation of data tables from layout tables? Correct pattern recognition? Remaining tables in site Information properly extracted? Able to detect and adjust for pattern variations?
23
Experimental Results Table recognition: correctly discarded 157 of 158 layout tables Pattern recognition: correctly found 69 of 72 structure patterns Extraction and adjustments: 5 path adjustments and 34 label adjustments all correct
24
Discovered Difficulties Abundance of null entries Multiple tables as a single table Recognize and group Use box model [Gatterbauer07] Factored labels
25
Table Understanding Table Recognition Data table vs. table for layout Adjust (group table components, defactor labels, …) Table Interpretation Populate table ontology Additional table-ontology elements (title, footnotes, …) Table Conceptualization Capture table semantics Reverse engineer as a conceptual model Table Understanding Embed within a community ontology Alternatively, augment community knowledge
26
fleckvelter gonsity (ld/gg) hepth (gd) burlam1.2120 falder2.3230 multon2.5400 repeat: 1.recognize table 2.interpret table 3.conceptualize table 4.merge 5.adjust until ontology developed Knowledge Generation TANGO (Table Analysis for Generating Ontologies) repeatedly turns raw tables into conceptual mini-ontologies and integrates them into a growing ontology. Growing Ontology
27
Conclusions and Future Opportunities Conclusions Table Interpretation: overall F-measure of 94.5% Can successfully apply sibling-page technique Future Opportunities Table understanding Knowledge generation Challenging conceptual-modeling work www.deg.byu.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.