Download presentation
Presentation is loading. Please wait.
1
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported by NSF
2
2 Introduction Many tables on the Web Ontology-based extraction: Works well for unstructured or semi-structured data What about structured data – tables? How to integrate data stored in different tables? Detect the table of interest Form attribute-value pairs (adjust if necessary) Do extraction Infer mappings from extraction patterns
3
3 Problem Detecting The Table of Interest ?
4
4 Problem Different source table schemas {Run #, Yr, Make, Model, Tran, Color, Dr} {Make, Model, Year, Colour, Price, Auto, Air Cond., AM/FM, CD} {Vehicle, Distance, Price, Mileage} {Year, Make, Model, Trim, Invoice/Retail, Engine, Fuel Economy} Target database schema {Car, Year, Make, Model, Mileage, Price, PhoneNr}, {Car, Feature} Different schemas
5
5 Problem Attribute is Value
6
6 Problem Attribute-Value is Value ??
7
7 Problem Value is not Value
8
8 Problem Factored Values
9
9 Problem Split Values
10
10 Problem Merged Values
11
11 Problem Information Behind Links List Table extending over several pages
12
12 Solution Detect the table of interest Form attribute-value pairs (adjust if necessary) Do extraction Infer mappings from extraction patterns
13
13 Solution Detect The Table of Interest Top-level tables Table size: at least 3 rows and columns Grid layout: same # of values Attributes Value density: # of ontology extracted values total # of values in the table
14
14 Solution Detect The Table of Interest Linked-page tables Table size: at least 2 rows and columns Attributes Attribute-value-pair pattern Page-spanning tables
15
15 Solution Remove Factoring 2001 2000 1999
16
16 Solution Replace Boolean Values
17
17 Solution Form Attribute-Value Pairs,,,,,,,
18
18 Solution Adjust Attribute-Value Pairs,,,,,,,
19
19 Solution Add Information Hidden Behind Links Unstructured and semi-structured: concatenate,,,,,,,,, Single attribute value pairs: Pair them together List: Mark the beginning and the end < >
20
20 Solution Inferred Mapping Creation {Car, Year, Make, Model, Mileage, Price, PhoneNr}, {Car, Feature}
21
21 Solution Inferred Mapping Creation {Car, Year, Make, Model, Mileage, Price, PhoneNr}, {Car, Feature} Each row is a car.
22
22 Solution Inferred Mapping Creation {Car, Year, Make, Model, Mileage, Price, PhoneNr}, {Car, Feature}
23
23 Solution Inferred Mapping Creation {Car, Year, Make, Model, Mileage, Price, PhoneNr}, {Car, Feature}
24
24 Solution Inferred Mapping Creation {Car, Year, Make, Model, Mileage, Price, PhoneNr}, {Car, Feature}
25
25 Solution Inferred Mapping Creation {Car, Year, Make, Model, Mileage, Price, PhoneNr}, {Car, Feature}
26
26 Solution Inferred Mapping Creation {Car, Year, Make, Model, Mileage, Price, PhoneNr}, {Car, Feature}
27
27 Solution Inferred Mapping Creation {Car, Year, Make, Model, Mileage, Price, PhoneNr}, {Car, Feature}
28
28 Experimental Results − Table Location Car advertisement application domain 12 2 Structured Linked Page Location Precision: 86% Recall: 92% Testing Set 53 Training Set 7 87%(46) 100%(7) Top Table Location Precision:100% Recall:87% 46 100%(7) 28 Linked Pages 13 15
29
29 Experimental Results − Mapping Car advertisement application domain 46 recognized tables in the testing set Total 319 mappings Precision: 95.8% Recall: 92.8% Top-level tables: 77% of the 296 correct mappings Linked tables: 19.6% Both: 3.4%
30
30 Experimental Results − Table Location Cell-phone sales application domain Testing Set 12 Training Set 5 92%(11) 100%(5) Top Table Location Precision:100% Recall:92% Linked Pages 11 100%(5) 3
31
31 Experimental Results − Mapping Cell-phone sales application domain 11 recognized tables in the testing Set Total 97 mappings Precision: 90.1% Recall: 85.4% Top-level tables: 85.4% of the 88 correct mappings Linked tables: 50.5% Both: 35.9%
32
32 Contribution Provides an approach to extract information automatically from HTML tables Suggests a different way to solve the problem of schema matching
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.