Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Object Identification Rules for Information Integration Sheila Tejada Craig A. Knobleock Steven University of Southern California.

Similar presentations


Presentation on theme: "Learning Object Identification Rules for Information Integration Sheila Tejada Craig A. Knobleock Steven University of Southern California."— Presentation transcript:

1 Learning Object Identification Rules for Information Integration Sheila Tejada Craig A. Knobleock Steven Minton @ University of Southern California

2 Introduction When integrating information, data objects can exist in inconsistent text formats across several sources Previous methods manually construct mapping rules for object identification Active Atlas learns to tailor mapping rules, through limited user input, to a specific application domain Active Atlas achieves higher accuracy and require less user involvement than previous methods

3 Object Identification Example

4 Ariadne Information Mediator

5 Ariadne Information Mediator (cont’d)

6 Active Atlas Approach to Map Objects First, determine the text formatting transformations and propose candidate mappings Then, learn domain-specific mapping rules

7 Active Atlas Architecture

8 Mapping Objects (Transformation Functions) General Transformation Functions Type I: Stemming, Soundex, Abbreviation Type II: Equality, Initial, Prefix, Suffix, Substring, Abbreviation, Acronym

9 Mapping Objects (Transformation Functions Example)

10 Mapping Objects (Compute Attribute Similarity Scores)

11 Mapping Objects (Compute Total Similarity Scores) Total object similarity score is computed as a weighted sum of the attribute similarity scores Each attribute has a uniqueness weight that is a heuristic measure of the importance of that attribute

12 Mapping Objects ( Output of Candidate Generator)

13 Mapping Objects (Mapping-Rule Learning) Decision Tree Learning Passive Learning Requires a large set of training examples Active Learning Uses query by bagging technique Selects a small set of initial training examples Includes a variety of training examples Creates a diverse set of decision tree learners Actively chooses the examples for user to label

14 Mapping Objects (Active Learning)

15 Experimental Results Three different domains: Restaurants, Companies and Airports Experiments: Two base line experiments Compare the shared attributes seperately Compare the object as a whole Both requires choosing an optimal threshold Passive learning Active learning

16 Experimental Results (Restaurants) Source A: 331 objects Source B: 533 objects 112 correct mappings 3259 candidate mappings over 10 runs

17 Measurement of Accuracy Accuracy The total number of correct classifications over the total number of mappings plus the number of correct mappings not proposed

18 Experimental Results

19

20 Related Work

21 Conclusion The research addresses the problem of mapping objects between structured web sources The experiments results show that Active Atlas can achieve high accuracy, while limiting the user involvement.

22 Future Work


Download ppt "Learning Object Identification Rules for Information Integration Sheila Tejada Craig A. Knobleock Steven University of Southern California."

Similar presentations


Ads by Google