1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng & Xiaofeng Meng
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 2 The previous Web: things are just on the surface
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 3 The current Web: Getting “deeper” A great deal of information is hidden behind query forms Deep = not accessible through search engines
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 4 Why is it important? More than 10 million distinct forms
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 5 Why is it important? Up to 5,000 billions dynamic result pages
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 6 A Key Component: Query translation Challenge Large-scale Heterogeneity Autonomy Integrated query interface Web database query interfaces Query translation
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 7 The Problem Selectivity Estimation for Exclusive Query Translation
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 8 Example √ ??
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 9 Related work & the Challenge A prominent solution for selectivity estimation —— histograms [Piatetsky+, Poosala+, Ioannidis+] Categorical attribute Infinite-value attribute Another solution —— random sampling [Goodman+, Haas+, Oliken+, Vitter+, Dasgupta+] Random sampling Challenge Selectivity estimation of infinite-value attribute
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 10 Selectivity Estimation for Exclusive Query Translation
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 11 Two Observations There exist different correlations between different attribute pairs the word frequency of the values on an infinite-value attribute usually has a Zipf-like distribution Weakest Strongest Weaker
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 12 Selectivity Estimation for Exclusive Query Translation Attribute Correlation calculation for a domain Selectivity estimation for a Web database Correlation-based sampling Word frequency probing Zipf equation calculation Selectivity estimation
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 13 Selectivity Estimation Challenges 1. Attribute Correlation calculation Find the least correlative attribute Discover the word rank 2. Zipf equation calculation Calculate the parameters of Zipf equation Estimate selectivity
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 14 Attribute Correlation Calculation
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 15 Goal Random sample Word Rank Attribute Correlation calculation (1) (2)
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 16 Discussion on Word rank Word rank should be computed for each attribute
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 17 Zipf Equation Calculation
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 18 Zipf equation calculation Zipf equantion
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 19 The parameters of Zipf equation
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 20 discussion on P, p and E
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 21 Experiments
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 22 Data Sets & Evaluation Method Data sets Evaluation method
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 23 Experimental Results The average precision of selectivity estimations is high.
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 24 Summary
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 25 Contributions Identify the selectivity estimation problem of infinite-value attribute for exclusive query translation Propose correlation-base sampling approach to obtain the sample as random as possible Propose Zipf-based selectivity estimation method Verify the accuracy of our approach
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 26 Thanks (Q&A)