Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng.

Similar presentations


Presentation on theme: "1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng."— Presentation transcript:

1 1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng & Xiaofeng Meng

2 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 2 The previous Web: things are just on the surface

3 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 3 The current Web: Getting “deeper” A great deal of information is hidden behind query forms Deep = not accessible through search engines

4 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 4 Why is it important? More than 10 million distinct forms

5 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 5 Why is it important? Up to 5,000 billions dynamic result pages

6 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 6 A Key Component: Query translation Challenge  Large-scale  Heterogeneity  Autonomy Integrated query interface Web database query interfaces Query translation

7 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 7 The Problem Selectivity Estimation for Exclusive Query Translation

8 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 8 Example √ ??

9 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 9 Related work & the Challenge A prominent solution for selectivity estimation —— histograms [Piatetsky+, Poosala+, Ioannidis+] Categorical attribute Infinite-value attribute Another solution —— random sampling [Goodman+, Haas+, Oliken+, Vitter+, Dasgupta+] Random sampling Challenge Selectivity estimation of infinite-value attribute

10 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 10 Selectivity Estimation for Exclusive Query Translation

11 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 11 Two Observations There exist different correlations between different attribute pairs the word frequency of the values on an infinite-value attribute usually has a Zipf-like distribution Weakest Strongest Weaker

12 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 12 Selectivity Estimation for Exclusive Query Translation Attribute Correlation calculation for a domain Selectivity estimation for a Web database Correlation-based sampling Word frequency probing Zipf equation calculation Selectivity estimation

13 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 13 Selectivity Estimation Challenges 1. Attribute Correlation calculation Find the least correlative attribute Discover the word rank 2. Zipf equation calculation Calculate the parameters of Zipf equation Estimate selectivity

14 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 14 Attribute Correlation Calculation

15 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 15 Goal Random sample Word Rank Attribute Correlation calculation (1) (2)

16 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 16 Discussion on Word rank Word rank should be computed for each attribute

17 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 17 Zipf Equation Calculation

18 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 18 Zipf equation calculation Zipf equantion

19 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 19 The parameters of Zipf equation

20 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 20 discussion on P, p and E

21 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 21 Experiments

22 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 22 Data Sets & Evaluation Method Data sets Evaluation method

23 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 23 Experimental Results The average precision of selectivity estimations is high.

24 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 24 Summary

25 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 25 Contributions Identify the selectivity estimation problem of infinite-value attribute for exclusive query translation Propose correlation-base sampling approach to obtain the sample as random as possible Propose Zipf-based selectivity estimation method Verify the accuracy of our approach

26 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 26 Thanks (Q&A)


Download ppt "1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng."

Similar presentations


Ads by Google