Download presentation
Presentation is loading. Please wait.
1
Learning Object Identification Rules for Information Integration Sheila Tejada Craig A. Knobleock Steven Minton @ University of Southern California
2
Introduction When integrating information, data objects can exist in inconsistent text formats across several sources Previous methods manually construct mapping rules for object identification Active Atlas learns to tailor mapping rules, through limited user input, to a specific application domain Active Atlas achieves higher accuracy and require less user involvement than previous methods
3
Object Identification Example
4
Ariadne Information Mediator
5
Ariadne Information Mediator (cont’d)
6
Active Atlas Approach to Map Objects First, determine the text formatting transformations and propose candidate mappings Then, learn domain-specific mapping rules
7
Active Atlas Architecture
8
Mapping Objects (Transformation Functions) General Transformation Functions Type I: Stemming, Soundex, Abbreviation Type II: Equality, Initial, Prefix, Suffix, Substring, Abbreviation, Acronym
9
Mapping Objects (Transformation Functions Example)
10
Mapping Objects (Compute Attribute Similarity Scores)
11
Mapping Objects (Compute Total Similarity Scores) Total object similarity score is computed as a weighted sum of the attribute similarity scores Each attribute has a uniqueness weight that is a heuristic measure of the importance of that attribute
12
Mapping Objects ( Output of Candidate Generator)
13
Mapping Objects (Mapping-Rule Learning) Decision Tree Learning Passive Learning Requires a large set of training examples Active Learning Uses query by bagging technique Selects a small set of initial training examples Includes a variety of training examples Creates a diverse set of decision tree learners Actively chooses the examples for user to label
14
Mapping Objects (Active Learning)
15
Experimental Results Three different domains: Restaurants, Companies and Airports Experiments: Two base line experiments Compare the shared attributes seperately Compare the object as a whole Both requires choosing an optimal threshold Passive learning Active learning
16
Experimental Results (Restaurants) Source A: 331 objects Source B: 533 objects 112 correct mappings 3259 candidate mappings over 10 runs
17
Measurement of Accuracy Accuracy The total number of correct classifications over the total number of mappings plus the number of correct mappings not proposed
18
Experimental Results
20
Related Work
21
Conclusion The research addresses the problem of mapping objects between structured web sources The experiments results show that Active Atlas can achieve high accuracy, while limiting the user involvement.
22
Future Work
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.