Selectivity Estimation for Optimizing Similarity Query in Multimedia Databases IDEAL 2003 Paper review
Query optimization in traditional database Query: find the employee who’s age between and work for Engineering Faculty Running time of different execution plans depend on Number of employees between Number of employees work for Engineering Faculty Task: Estimate the number in advance and select the best execution plan (selectivity estimation) Statistics are stored in database (metadata)
Techniques: one dimension Parametric – unrealistic Curve fitting – negative value problem Sampling – large overhead Non-parametric (Histogram technique) – widely used age
Problem in multimedia database (Color = ‘red’) ^ (Shape = ‘round’) Color, shape – feature vector Multi-dimension Number of buckets increases exponentially with dimension Histogram technique fails 1d – 5 2d – 25 3d – 125 4d – 625
Previous Work – SIGMOD 99 Use DCT to compress information of histogram 2D example Store DCT coefficient DCT Histogram valueDCT coefficients DCT
Reconstruction of histogram value DCT Zone sampling IDCT
Selectivity estimation
Current Work - IDEAL 2003 Extend the range query from hyper-cube to hyper- sphere Model hyper-sphere as combination of hyper-cube Task Find combination of hyper-cubes to represent hyper-sphere Find the area of overlapping
Generate combination of hyper- cube
Overlapping of hyper-cube with hyper- sphere Monte-Carlo method Generate uniformly distributed random point inside the hypercube Count the number of points within the hyper-sphere Use the ratio to estimate area of overlapping
Generate uniformly distributed points inside a hyper-sphere Accept / Reject method Generate points within hyper-cube Accept those fall within the hyper-sphere Greedy method Generate θ uniformly [0,2π] Generate r according to F -1 (U(0,1)) θ r
Experiment