Download presentation
Presentation is loading. Please wait.
1
Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He, Kevin Chen-Chuan Chang, Jiawei Han Presented by Dayi Zhou 10/2005
2
DCM Problem: schema matching Complex matchings across different deep web data sources Most existing techniques focus on 1:1 matching Solution: a correlation mining approach Motivation grouping attributes -> co-present E.g. first name, last name Synonym-> Negative correlated E.g. Departing, from Considering both positive correlation and negative correlation Instead of matching 2 schema at a time, matching all the schemas at the same time
3
Formal Schema Matching Problem Note: A schema is viewed as a transaction, which is a set of items.
4
DCM framework Automatic data preparation Correlation mining
5
Correlation Measure Contingency table Co-presence, co-absence, only one present
6
Correlation Measure (cont.) The sparseness problem High co-absence Rare Attribute Problem False negatively correlation H-measure Frequent Attribute Problem False positive correlation
7
Matching Discovery Measure correlations between two groups C min Minimal value of pairwise correlation measurement Positively(negative) correlated Measure for positive(negative) correlation The C min is greater than some threshhold
8
Matching Selection Rank the discovered matchings Maximal measurement value -> rank Top-k to break the tie If still tie, choose the one with richer semantic information
9
Data Preparation Form extraction Type recognition Syntactic merging Name-based merging Domain-based merging
10
Experiments Database TEL-8: 447 deep web sources in 8 domain BAMM: 211 deep web sources in 4 domain Metrics Target accuracy Target question Given any attribute, find its synonyms, hyponyms, and hypernyms
11
Target Accuracy
12
Comparing H-measurement and Jaccard
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.