Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He, Kevin Chen-Chuan Chang, Jiawei Han Presented by Dayi Zhou.

Similar presentations


Presentation on theme: "Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He, Kevin Chen-Chuan Chang, Jiawei Han Presented by Dayi Zhou."— Presentation transcript:

1 Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He, Kevin Chen-Chuan Chang, Jiawei Han Presented by Dayi Zhou 10/2005

2 DCM Problem: schema matching Complex matchings across different deep web data sources Most existing techniques focus on 1:1 matching Solution: a correlation mining approach Motivation grouping attributes -> co-present E.g. first name, last name Synonym-> Negative correlated E.g. Departing, from Considering both positive correlation and negative correlation Instead of matching 2 schema at a time, matching all the schemas at the same time

3 Formal Schema Matching Problem Note: A schema is viewed as a transaction, which is a set of items.

4 DCM framework Automatic data preparation Correlation mining

5 Correlation Measure Contingency table Co-presence, co-absence, only one present

6 Correlation Measure (cont.) The sparseness problem High co-absence Rare Attribute Problem False negatively correlation H-measure Frequent Attribute Problem False positive correlation

7 Matching Discovery Measure correlations between two groups C min Minimal value of pairwise correlation measurement Positively(negative) correlated Measure for positive(negative) correlation The C min is greater than some threshhold

8 Matching Selection Rank the discovered matchings Maximal measurement value -> rank Top-k to break the tie If still tie, choose the one with richer semantic information

9 Data Preparation Form extraction Type recognition Syntactic merging Name-based merging Domain-based merging

10 Experiments Database TEL-8: 447 deep web sources in 8 domain BAMM: 211 deep web sources in 4 domain Metrics Target accuracy Target question Given any attribute, find its synonyms, hyponyms, and hypernyms

11 Target Accuracy

12 Comparing H-measurement and Jaccard


Download ppt "Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He, Kevin Chen-Chuan Chang, Jiawei Han Presented by Dayi Zhou."

Similar presentations


Ads by Google