Download presentation
Presentation is loading. Please wait.
1
Scheme Matching and Data Extraction over HTML Tables from Heterogeneous Sources Cui Tao March, 2002 Founded by NSF
2
Introduction Many Web sites present their information in tables. Ontology-Based Extraction: Works for unstructured or semi-structured data Does not work for structured data -- tables Only tables for information, not for layout.
3
Problems Different Source Table Schemas {Run #, Yr, Make, Model, Tran, Color, Dr} {Make, Model, Year, Colour, Price, Auto, Air Cond., AM/FM, CD} {Vehicle, Distance, Price, Mileage} {Year, Make, Model, Trim, Invoice/Retail, Engine, Fuel Economy} Target Database Schema {Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature} Different Schema
4
Problems Attribute Value Pairs Switch ?
5
Problems Attribute/Value Combinations Year/sty Cyl. # Dr Tran Color
6
Problems Attribute/Value Split
7
Problems Information in the linked pages Tables Lists Unstructured data … Header information
8
Methods Table Understanding Table Recognition, : Row; : Data Entry; : Header., Attribute/Value Determination First row, first column Different font style Table Understanding. Recognize Attributes and Values Form Attribute-Value Pairs Adjust Attribute-Value Pairs Form Records Inferred Mapping Creation Pre data extraction Infer General Mapping Data Extraction. Table Understanding. Recognize Attributes and Values Form Attribute-Value Pairs Adjust Attribute-Value Pairs Form Records Inferred Mapping Creation Pre data extraction Infer General Mapping Data Extraction.
9
Methods Form Attribute-Value Pairs AOT (Attribute On Top) Tables and AOL (Attribute On Left) Tables Table Understanding. Recognize Attributes and Values Form Attribute-Value Pairs Adjust Attribute-Value Pairs Form Records Inferred Mapping Creation Pre data extraction Infer General Mapping Data Extraction. AOT/AOL ATL MA Form Attribute-Value Pairs
11
Run# 1; Year SEE; Make AVAILABLE; Model TRUCKS; Tran----; Color----;Dr---- _________________________________________________________________ Run# 2; Year 93; Make Mercury; Model Sable; Tran A; Color Green;Dr 4 __________________________________________________________________ Run# 3; Year 94; Make Chevrolet; Model Camaro; Tran A; Color Red;Dr 2 __________________________________________________________________ : Run# 1; Year SEE; Make AVAILABLE; Model TRUCKS; Tran----; Color----;Dr---- __________________________________________________________________
12
Methods Form Attribute-Value Pairs AOT (Attribute On Top) Tables and AOL (Attribute On Left) Tables Table Understanding. Recognize Attributes and Values Form Attribute-Value Pairs Adjust Attribute-Value Pairs Form Records Inferred Mapping Creation Pre data extraction Infer General Mapping Data Extraction. MA (Multiple Set Attribute) Tables AOT/AOL ATL MA ATL (Attribute On both Top and Left) Tables
13
The City Fuel Economy of 2001 Honda Civic DX
14
Methods Form Attribute-Value Pairs AOT (Attribute On Top) Tables and AOL (Attribute On Left) Tables Table Understanding. Recognize Attributes and Values Form Attribute-Value Pairs Adjust Attribute-Value Pairs Form Records Inferred Mapping Creation Pre data extraction Infer General Mapping Data Extraction. MA (Multiple Set Attribute) Tables Adjust Attribute-Value Pairs CD: Yes -> “CD”; Auto: No -> “ “ AOT/AOL ATL MA ATL (Attribute On both Top and Left) Tables
15
Form Records Methods Attr 1 Value 1; Attr 2 Value 2;...; Attr n:Value n ::::........ Detailed information in the linked page(s) Attr 1 Value 1; Attr 2 Value 2;...; Attr n:Value n Detailed information in the linked page(s) Attr 1 Value 1; Attr 2 Value 2;...; Attr n:Value n Detailed information in the linked page(s) Table Understanding. Recognize Attributes and Values Form Attribute-Value Pairs Adjust Attribute-Value Pairs Form Records Inferred Mapping Creation Pre Data Extraction Infer General Mapping Data Extraction. Pre Data Extraction Table Understanding. Recognize Attributes and Values Form Attribute-Value Pairs Adjust Attribute-Value Pairs Form Records Inferred Mapping Creation Pre Data Extraction Infer General Mapping Data Extraction.
16
Methods Infer General Mapping Extract Data Table Understanding. Recognize Attributes and Values Form Attribute-Value Pairs Adjust Attribute-Value Pairs Form Records Inferred Mapping Creation Pre data extraction Infer General Mapping Data Extraction. Table Understanding. Recognize Attributes and Values Form Attribute-Value Pairs Adjust Attribute-Value Pairs Form Records Inferred Mapping Creation Pre data extraction Infer General Mapping Data Extraction.
17
Experiment Tables of car advertisement from 20 sites. 10 training tables. Used to develop the ontology 10 testing tables Used to measure recall ratios and precision ratios Before table processing, before training and after training
18
Results Mapping ratios: Before table-processing: hard to find record boundary. After table-processing and before training: 336/490 = 68.57% After table-processing and after training: 480/490 = 97.96% Precision and Recall
19
Conclusion and Future Work Tests are only for AOT tables Experimental results show that we have a very successful approach. Next step: Table understanding and inferred mapping.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.