Download presentation
Presentation is loading. Please wait.
1
Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach AnHai Doan Pedro Domingos Alon Halevy
2
Data Integration
3
Problem & Solution Problem Large-scale Data Integration Systems Bottleneck: Semantic Mappings 1-1 Mappings Solution Multi-strategy Learning Integrity Constraints XML Structure Learner
4
Learning Source Descriptions (LSD) Components Base learners Meta-learner Prediction converter Constraint handler Operations Training phase Matching phase
5
Learners Basic Learners Name Matcher (Whirl) Content Matcher (Whirl) Naïve Bayes Learner County-Name Recognizer XML Learner Meta-Learner (Stacking)
6
XML Learner
7
XML Learner (Cont.)
8
Constraint Handler Domain Constraints
9
Constraint Handler (Cont.) Search Heuristic Mapping Cost
10
Training Phase
11
Example1 (Training Phase)
12
Example1 (Cont.)
13
(“location” , ADDRESS) (“Miami, FL”, ADDRESS)
14
Matching Phase
15
Example2 (Matching Phase)
16
Example2 (Cont.)
18
Empirical Evaluation
19
Measures Matching accuracy of a source Average matching accuracy of a source Average matching accuracy of a domain
20
Experiment Result
21
Experiment Result (Cont.) Contributions of base learners and the constraint handler
22
Experiment Result (Cont.) Contributions of Schema information and Data Instances
23
Experiment Result (Cont.) Performance sensitivity to the amount of data instances
24
Limitations Enough Training Data Domain Dependent Learners Ambiguities in Sources Efficiency Overlapping of Schemas
25
Conclusion and Future Work Improve over time Extensible framework Multiple types of knowledge Non 1-1 mapping ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.