Finding Hidden Correlations and Filtering out Incorrect Matchings with Compatibility Detection across Web Query Interfaces Lei Lei June 11, 2004 June 11, 2004
Introduction Deep Web scales rapidly Deep Web scales rapidly Proliferating sources with structured Info. Proliferating sources with structured Info. Vocabulary Converge to small size Vocabulary Converge to small size Dynamic Queries instead of URLs Dynamic Queries instead of URLs
Complex Matching Traditional methods focus on 1:1 matching Traditional methods focus on 1:1 matching Query shemas form Complex Matchings Query shemas form Complex Matchings M:n M:n
Web Query Interfaces Web Query Interfaces Web Query Interfaces Attribute Group
Problems to solve Relations are complicated and multi-ary Relations are complicated and multi-ary How to Judge the Relations of Synonyms? How to Judge the Relations of Synonyms? How to pick out incorrect matchings? How to pick out incorrect matchings?
Statement Find out the hidden synonyms and build correlations to solve m:n matching problem Find out the hidden synonyms and build correlations to solve m:n matching problem Filter out false matchings and partially incorrect ones with the three step “ compatibility detection ”. Filter out false matchings and partially incorrect ones with the three step “ compatibility detection ”.
MGSsd and Improved Model Original Hidden Model from MGSsd Original Hidden Model from MGSsd
Find Hidden Synonyms Assume existence of hidden synonyms Assume existence of hidden synonyms Correlations between synonyms Correlations between synonyms Function: HC(bi,bj) Function: HC(bi,bj) Apply HC directly Apply HC directly
Example Synonyms on air booking domain Synonyms on air booking domain Set a Threshold Set a Threshold HC (b2,b4)
Compatibility Detection Not all raw matching are correct Not all raw matching are correct Clean partially correct or inaccurate ones Clean partially correct or inaccurate ones Three Steps: Three Steps: Transitivity Check Transitivity Check Examine Confidence Examine Confidence Subsumption Subsumption
Compatibility Detection(Cont.) Raw Matching Results Raw Matching Results 1.Check Transitivity 1.Check Transitivity 2. Choose Confidence 3. Subsumption
Evaluation Using Recall and Precision Using Recall and Precision Compare with MSGsd data Compare with MSGsd data Perform Correlation and Compatibility on matching results from other researches Perform Correlation and Compatibility on matching results from other researches
Contributions m:n mapping rather than only 1:1 mapping m:n mapping rather than only 1:1 mapping Present a hidden synonym approach to statistically compute the correlation between synonym groups Present a hidden synonym approach to statistically compute the correlation between synonym groups Develop the “ Compatibility Detection ” approach to refine the raw mapping data Develop the “ Compatibility Detection ” approach to refine the raw mapping data Suitable and efficient as the Web scales Suitable and efficient as the Web scales
Future Work Figure out the HC Function Figure out the HC Function “ Minimum ” is feasible “ Minimum ” is feasible Distinguish Trivial Difference in Confidence Distinguish Trivial Difference in Confidence Set up a proper threshold Set up a proper threshold Space Complexity Space Complexity Type Subsumption Type Subsumption Departing: datetime Departing: datetime Departing: string Departing: string
Questions ? Questions ?