Ch. 16: Sweep-Zones Basic Question: Is it possible to compute nearest neighbors in expected time O(n*log(n)) ??? Basic Idea: Generalize sweep-lines to.

Slides:



Advertisements
Similar presentations
Vorlesung Datawarehousing Table of Contents Prof. Rudolf Bayer, Ph.D. Institut für Informatik, TUM SS 2002.
Advertisements

1 Voronoi Diagrams. 2 Voronoi Diagram Input: A set of points locations (sites) in the plane.Input: A set of points locations (sites) in the plane. Output:
Convex Hulls in 3-space Jason C. Yang.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
2-dimensional indexing structure
Chapter 8 File organization and Indices.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
Point Location Computational Geometry, WS 2007/08 Lecture 5 Prof. Dr. Thomas Ottmann Algorithmen & Datenstrukturen, Institut für Informatik Fakultät für.
Lecture 6: Point Location Computational Geometry Prof. Dr. Th. Ottmann 1 Point Location 1.Trapezoidal decomposition. 2.A search structure. 3.Randomized,
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Complexity of algorithms Algorithms can be classified by the amount of time they need to complete compared to their input size. There is a wide variety:
SEARCHING UNIT II. Divide and Conquer The most well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances.
Jessie Zhao Course page: 1.
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Arrays.
DATA STRUCTURE & ALGORITHMS (BCS 1223) CHAPTER 8 : SEARCHING.
Advance Data Structure 1 College Of Mathematic & Computer Sciences 1 Computer Sciences Department م. م علي عبد الكريم حبيب.
Prof. Bayer, DWH, Ch.4, SS Chapter 4: Dimensions, Hierarchies, Operations, Modeling.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
CSC 211 Data Structures Lecture 13
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
Prof. Bayer, DWH, Ch.5, SS Chapter 5. Indexing for DWH D1Facts D2.
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Review 1 Arrays & Strings Array Array Elements Accessing array elements Declaring an array Initializing an array Two-dimensional Array Array of Structure.
1 / 41 Convex Hulls in 3-space Jason C. Yang. 2 / 41 Problem Statement Given P: set of n points in 3-space Return: –Convex hull of P: CH (P) –Smallest.
Bin Yao, Feifei Li, Piyush Kumar Presenter: Lian Liu.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Searching Topics Sequential Search Binary Search.
Prof. Bayer, DWH, Ch.7, SS20021 Chapt. 7 Multidimensional Hierarchical Clustering Fig. 3.1 Hierarchies in the `Juice and More´ schema Year (3) Month (12)
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Spatial Queries Nearest Neighbor and Join Queries.
Computational Geometry
15.1 – Introduction to physical-Query-plan operators
CS 440 Database Management Systems
Database Management System
Storage and Disks.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Spatial Indexing I Point Access Methods.
CMPS 3130/6130 Computational Geometry Spring 2017
Chapter 12: Query Processing
Query Processing in Databases Dr. M. Gavrilova
Evaluation of Relational Operations
AVL Trees "The voyage of discovery is not in seeking new landscapes but in having new eyes. " - Marcel Proust.
Nearest Neighbor Queries using R-trees
R-tree: Indexing Structure for Data in Multi-dimensional Space
Dynamic Hashing Good for database that grows and shrinks in size
Database Management Systems (CS 564)
Nearest-Neighbor Classifiers
Lecture#12: External Sorting (R&G, Ch13)
Computational Geometry Capter:1-2.1
Module 13: Query Processing
Lecture 2- Query Processing (continued)
CS-447– Computer Architecture Lecture 20 Cache Memories
Chapter 12 Query Processing (1)
General External Merge Sort
CENG 351 Data Management and File Structures
Hashing.
Prof. R. Bayer, Ph.D. Dr. Volker Markl
Chapt. 7 Multidimensional Hierarchical Clustering
Efficient Aggregation over Objects with Extent
Data Mining CSCI 307, Spring 2019 Lecture 23
Presentation transcript:

Ch. 16: Sweep-Zones Basic Question: Is it possible to compute nearest neighbors in expected time O(n*log(n)) ??? Basic Idea: Generalize sweep-lines to sweep-zones !!! Def.: The sweep-zone SZ of an area is the set of regions touching the upper boundary of an area from below. July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000

UB-Tree Insertion 18/19 1 3 6 7 2 4 8 9 10 6 5 15 11 16 12 17 18 13 14 July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000

Sweep-Zone Algorithm 1: i { Z-regions have been read in increasing Z-order up to region Ri-1, i.e. area(R i-1) with upper boundary B(R i-1) } { set of cached regions C(R i) is the set of regions in SZi-1 = SZ(area(R i-1)) plus region Ri } 1. for every point p  Ri let l(p) and h(p) be the lower and higher neighbor of p on Z-curve, compute l(p) and h(p). 2. let q = l(p) if dist(p,l(p)) < dist (p, h(p)) = h(p) otherwise 3. Let Q(p) be the query box with center p and side length 2*dist(p,q) q p July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000

5. Cache regions intersecting Q(p) to enforce linear I/O time 4. Retrieve Q(p) from cache or disk and compute the nearest neighbor (p) { Note: retrieval of Q(p) should take time O(log n), finding (p) should be nearly constant } 5. Cache regions intersecting Q(p) to enforce linear I/O time 6. If Ri was the last region in Z-order then exit 7. Release all regions from C(Ri) which are not in SZi 8. i:= i+1; read next region R i in Z-order; 9. Goto step 1 { all nearest neighbors are known, now cluster } July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000

Sweep-Zone Algorithm 2: Basic Idea: run algorithm forward to compute lower (w.r. to Z-order) nearest neighbor (p) of p and backward to compute upper (w.r. to Z-order) nearest neighbor (p) of p, then (p) = closest of {(p), (p)} i.e. modify step 4 in Sweep-Zone algorithm 1 to compute Q(p)  area(Ri) Advantages: all pages are read in increasing or decreasing Z-order only (sequential reads) and cache requirements are smaller Disadvantage: data must be read twice, tradeoff??? July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000

Cache Contents for Algorithm 2: 1 10 9 8 7 6 5 2 1 11 10 6 5 1 10 9 8 7 6 5 2 1 11 10 6 5 3 2 1 12 11 10 6 4 3 2 13 5 4 3 2 14 6 5 4 3 15 7 6 5 4 3 16 15 14 12 10 8 7 6 5 4 17 16 15 14 12 9 8 7 6 5 4 18 17 16 July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000

2. Determine regions that can be released, i.e. SZi - SZi-1 Cache Modification 1. Determine extension of next region to be read using upper part of UB-index 2. Determine regions that can be released, i.e. SZi - SZi-1 3. Release regions from cache 4. Read next region, i.e. transfer it from disk to cache July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000

expected cache size ~ 1.5 * sqrt (18) = 6.4 Observations: expected cache size ~ 1.5 * sqrt (18) = 6.4 maximal occurring cache size = 6 average cache size = 4.28 Cache Organization: keep cache organized as a set of regions sorted in Z-order, e.g. AVL-tree with elementary operations append single element and delete set of elements July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000

which algorithm is faster which algorithm requires less resources Open Questions: which algorithm is faster which algorithm requires less resources what are the tradeoffs between I/O, cache size, CPU-time, total time, etc. analytic comparison of both algorithms? July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000

this is a local optimization of Algorithm 2: if Q(p)  area (Ri) then (p) = (p) and we can ignore the computation of (p) in the backward phase Algorithm 4 if (p) = (p) then discard p entirely from the backward phase, i.e. reduce the amount of data and computations for the second phase, but then we have to write out the non-discarded points Open Question: under what conditions is Algorithm 4 better than Algorithm 3? July 20, 2000 R. Bayer, Ch. 16, DWH-SS2000