Download presentation
Presentation is loading. Please wait.
Published byBerenice Carpenter Modified over 9 years ago
1
1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng & Xiaofeng Meng
2
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 2 The previous Web: things are just on the surface
3
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 3 The current Web: Getting “deeper” A great deal of information is hidden behind query forms Deep = not accessible through search engines
4
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 4 Why is it important? More than 10 million distinct forms
5
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 5 Why is it important? Up to 5,000 billions dynamic result pages
6
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 6 A Key Component: Query translation Challenge Large-scale Heterogeneity Autonomy Integrated query interface Web database query interfaces Query translation
7
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 7 The Problem Selectivity Estimation for Exclusive Query Translation
8
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 8 Example √ ??
9
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 9 Related work & the Challenge A prominent solution for selectivity estimation —— histograms [Piatetsky+, Poosala+, Ioannidis+] Categorical attribute Infinite-value attribute Another solution —— random sampling [Goodman+, Haas+, Oliken+, Vitter+, Dasgupta+] Random sampling Challenge Selectivity estimation of infinite-value attribute
10
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 10 Selectivity Estimation for Exclusive Query Translation
11
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 11 Two Observations There exist different correlations between different attribute pairs the word frequency of the values on an infinite-value attribute usually has a Zipf-like distribution Weakest Strongest Weaker
12
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 12 Selectivity Estimation for Exclusive Query Translation Attribute Correlation calculation for a domain Selectivity estimation for a Web database Correlation-based sampling Word frequency probing Zipf equation calculation Selectivity estimation
13
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 13 Selectivity Estimation Challenges 1. Attribute Correlation calculation Find the least correlative attribute Discover the word rank 2. Zipf equation calculation Calculate the parameters of Zipf equation Estimate selectivity
14
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 14 Attribute Correlation Calculation
15
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 15 Goal Random sample Word Rank Attribute Correlation calculation (1) (2)
16
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 16 Discussion on Word rank Word rank should be computed for each attribute
17
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 17 Zipf Equation Calculation
18
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 18 Zipf equation calculation Zipf equantion
19
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 19 The parameters of Zipf equation
20
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 20 discussion on P, p and E
21
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 21 Experiments
22
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 22 Data Sets & Evaluation Method Data sets Evaluation method
23
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 23 Experimental Results The average precision of selectivity estimations is high.
24
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 24 Summary
25
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 25 Contributions Identify the selectivity estimation problem of infinite-value attribute for exclusive query translation Propose correlation-base sampling approach to obtain the sample as random as possible Propose Zipf-based selectivity estimation method Verify the accuracy of our approach
26
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration (DASFAA2009) 26 Thanks (Q&A)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.