Download presentation
Presentation is loading. Please wait.
Published byJoão Vítor Barreto Modified over 5 years ago
1
Ch. 16: Sweep-Zones Basic Question: Is it possible to compute nearest neighbors in expected time O(n*log(n)) ??? Basic Idea: Generalize sweep-lines to sweep-zones !!! Def.: The sweep-zone SZ of an area is the set of regions touching the upper boundary of an area from below. July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000
2
UB-Tree Insertion 18/19 1 3 6 7 2 4 8 9 10 6 5 15 11 16 12 17 18 13 14 July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000
3
Sweep-Zone Algorithm 1: i
{ Z-regions have been read in increasing Z-order up to region Ri-1, i.e. area(R i-1) with upper boundary B(R i-1) } { set of cached regions C(R i) is the set of regions in SZi-1 = SZ(area(R i-1)) plus region Ri } 1. for every point p Ri let l(p) and h(p) be the lower and higher neighbor of p on Z-curve, compute l(p) and h(p). 2. let q = l(p) if dist(p,l(p)) < dist (p, h(p)) = h(p) otherwise 3. Let Q(p) be the query box with center p and side length 2*dist(p,q) q p July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000
4
5. Cache regions intersecting Q(p) to enforce linear I/O time
4. Retrieve Q(p) from cache or disk and compute the nearest neighbor (p) { Note: retrieval of Q(p) should take time O(log n), finding (p) should be nearly constant } 5. Cache regions intersecting Q(p) to enforce linear I/O time 6. If Ri was the last region in Z-order then exit 7. Release all regions from C(Ri) which are not in SZi 8. i:= i+1; read next region R i in Z-order; 9. Goto step 1 { all nearest neighbors are known, now cluster } July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000
5
Sweep-Zone Algorithm 2:
Basic Idea: run algorithm forward to compute lower (w.r. to Z-order) nearest neighbor (p) of p and backward to compute upper (w.r. to Z-order) nearest neighbor (p) of p, then (p) = closest of {(p), (p)} i.e. modify step 4 in Sweep-Zone algorithm 1 to compute Q(p) area(Ri) Advantages: all pages are read in increasing or decreasing Z-order only (sequential reads) and cache requirements are smaller Disadvantage: data must be read twice, tradeoff??? July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000
6
Cache Contents for Algorithm 2: 1 10 9 8 7 6 5 2 1 11 10 6 5
July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000
7
2. Determine regions that can be released, i.e. SZi - SZi-1
Cache Modification 1. Determine extension of next region to be read using upper part of UB-index 2. Determine regions that can be released, i.e. SZi - SZi-1 3. Release regions from cache 4. Read next region, i.e. transfer it from disk to cache July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000
8
expected cache size ~ 1.5 * sqrt (18) = 6.4
Observations: expected cache size ~ 1.5 * sqrt (18) = 6.4 maximal occurring cache size = 6 average cache size = Cache Organization: keep cache organized as a set of regions sorted in Z-order, e.g. AVL-tree with elementary operations append single element and delete set of elements July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000
9
which algorithm is faster which algorithm requires less resources
Open Questions: which algorithm is faster which algorithm requires less resources what are the tradeoffs between I/O, cache size, CPU-time, total time, etc. analytic comparison of both algorithms? July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000
10
this is a local optimization of Algorithm 2:
if Q(p) area (Ri) then (p) = (p) and we can ignore the computation of (p) in the backward phase Algorithm 4 if (p) = (p) then discard p entirely from the backward phase, i.e. reduce the amount of data and computations for the second phase, but then we have to write out the non-discarded points Open Question: under what conditions is Algorithm 4 better than Algorithm 3? July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.