Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Queryy Sampling Based High Dimensional Hybrid Index Junqi Zhang, Xiangdong Zhou Fudan University.

Similar presentations


Presentation on theme: "1 Queryy Sampling Based High Dimensional Hybrid Index Junqi Zhang, Xiangdong Zhou Fudan University."— Presentation transcript:

1 1 Queryy Sampling Based High Dimensional Hybrid Index Junqi Zhang, Xiangdong Zhou Fudan University

2 2 Nearest Neighbors Query Dims Overlap Accessed

3 3 Cluster Partitioning Based B + -tree

4 4 Index Structure What’s the optimal extent to partition ?   iDistance : by experiments   Ours: by cost model to predict

5 5 Object of Cluster Partitioning - Lowest Query Cost Appropriate M : Distribute M to each cluster Overall number of clusters :

6 6 Dimension Curse dim>10 : tree<scan< VA-file dim scan> VA-file Non uniform : tree VA-file   VA-file defect   How to improve tree performance ?

7 7 Tree and scan — which better ? tree  advantage : filter data instead of linear scan the whole file  disadvantage : position cost for each data is the height of intermediate nodes,which is higher than scan scan  advantage : position cost for each data is 0  disadvantage : linear scan the whole file

8 8 Cost that view from each point  (C<1) : tree - useful - compared with scan  ( C>=1) : tree - useless - compared with scan

9 9 Data distribution and index performance Known work : index data in a single index DIMS tree Real image data set : Non uniform Non uniform data aggregate tree FAST

10 10 Data type Sparse data tree<scan Dense data tree>scan

11 11 Hybrid data type hybrid index hybrid index Sequencial file B + -tree Sparse data dense data tree<scantree>scan

12 12 How to differentiate data type ? Each data as a unit difficult Each cluster ring as a unit easier

13 13 Clsuter partitioning What extent ?

14 14 Clsuter partitioning based B + -tree

15 15 Clsuter partitioning based image retrieval system Outer rings of custers are often accessed

16 16 Some rings of custers are often accessed Treat outer rings as sparse rings? ?

17 17 Frequence of being accessed for each ring

18 18

19 19 Hybrid index - cut branches ( according to the contribution of each ring to the query cost ) Expected cost Cost by linear scan

20 20 Standard of rings being cut - Index Capability IC ( index capability ): Question : how to determine ?

21 21 Estimate - query samping Question : for large database , lot of queris bring expensive cost  Object : given confidence a% , make minimum

22 22 Threshold of rings being cut  When IC equal 0 : Rule :  When the probability of ring being accessed by queries is lower than this threshold, this ring should remain in the tree, or else, it should be cut into the sequence file for linear scan.

23 23 Query sampling algorithm : When or or , stop sampling. User can balance the accuracy and efficiency of sampling by tuning the confidence a% , and the complexity of this algorithm is less than.

24 24 Query algorithm of hybrid index Linear scan the sequence file for sparse data Retrieve the dense data on the B + -tree

25 25 Thanks!


Download ppt "1 Queryy Sampling Based High Dimensional Hybrid Index Junqi Zhang, Xiangdong Zhou Fudan University."

Similar presentations


Ads by Google