 This lecture introduces multi-dimensional queries in databases, as well as addresses how we can query and represent multi- dimensional data.

Slides:



Advertisements
Similar presentations
Nearest Neighbor Search
Advertisements

Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
Nearest Neighbor Queries using R-trees
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Multidimensional Indexing
Searching on Multi-Dimensional Data
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
KD TREES CS16: Introduction to Data Structures & Algorithms Tuesday, April 7,
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Other time considerations Source: Simon Garrett Modifications by Evan Korth.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
Spatial Indexing I Point Access Methods.
UNC Chapel Hill M. C. Lin Overview of Last Lecture About Final Course Project –presentation, demo, write-up More geometric data structures –Binary Space.
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
AALG, lecture 11, © Simonas Šaltenis, Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees and.
10/11/2001CS 638, Fall 2001 Today Kd-trees BSP Trees.
S PATIAL DATA STRUCTURES – KD - TREES Jianping Fan Department of Computer Science UNC-Charlotte.
Data Structures for Computer Graphics Point Based Representations and Data Structures Lectured by Vlastimil Havran.
Spatial Data Structures Jason Goffeney, 4/26/2006 from Real Time Rendering.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
Chapter Tow Search Trees BY HUSSEIN SALIM QASIM WESAM HRBI FADHEEL CS 6310 ADVANCE DATA STRUCTURE AND ALGORITHM DR. ELISE DE DONCKER 1.
Trees for spatial data representation and searching
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
10/09/2001CS 638, Fall 2001 Today Spatial Data Structures –Why care? –Octrees/Quadtrees –Kd-trees.
UNC Chapel Hill M. C. Lin Orthogonal Range Searching Reading: Chapter 5 of the Textbook Driving Applications –Querying a Database Related Application –Crystal.
Project 2 Presentation & Demo Course: Distributed Systems By Pooja Singhal 11/22/
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.
Multi-dimensional Search Trees
PRESENTED BY – GAURANGI TILAK SHASHANK AGARWAL Collision Detection.
2IL50 Data Structures Fall 2015 Lecture 9: Range Searching.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Starting at Binary Trees
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
Computational Geometry Piyush Kumar (Lecture 5: Range Searching) Welcome to CIS5930.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
CS 61B Data Structures and Programming Methodology Aug 7, 2008 David Sun.
Binary Search Trees (BSTs) 18 February Binary Search Tree (BST) An important special kind of binary tree is the BST Each node stores some information.
Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Lecture 10COMPSCI.220.FS.T Binary Search Tree BST converts a static binary search into a dynamic binary search allowing to efficiently insert and.
CSE554Contouring IISlide 1 CSE 554 Lecture 3: Contouring II Fall 2011.
CSE554Contouring IISlide 1 CSE 554 Lecture 5: Contouring (faster) Fall 2013.
UNC Chapel Hill M. C. Lin Geometric Data Structures Reading: Chapter 10 of the Textbook Driving Applications –Windowing Queries Related Application –Query.
Spatial Data Management
School of Computing Clemson University Fall, 2012
Multiway Search Trees Data may not fit into main memory
B-Trees B-Trees.
Multidimensional Access Structures
CMPS 3130/6130 Computational Geometry Spring 2017
Spatial data structures -kdtrees
Spatial Indexing I Point Access Methods.
KD Tree A binary search tree where every node is a
Orthogonal Range Searching and Kd-Trees
Lecture 12 CS203 1.
Shape-based Registration
Data Mining CSCI 307, Spring 2019 Lecture 23
Presentation transcript:

 This lecture introduces multi-dimensional queries in databases, as well as addresses how we can query and represent multi- dimensional data

 “A reasonable man adapts himself to his environment. An unreasonable man persists in attempting to adapt his environment to suit himself …Therefore, all progress depends on unreasonable man”  George Bernard Shaw

 Definitions  Basic operations and construction  Range queries on multi-attributes  Variants  Applications

 Rendering  Surface reconstruction  Collision detection  Vision and machine learning  Intel Interactive technology

Facial expression created by the action of 32 transversely isotropic muscles (top left) and simulated on a finite element tetrahedral mesh (top right). Muscle activations and bone kinematics are automatically estimated to match motion capture markers. “Automatic Determination of Facial Muscle Activations from Sparse Motion Capture Marker Data”, Eftychios Sifakis, Igor Neverov, Ronald Fedkiw

 A recursive space partitioning tree.  – Partition along x and y axis in an alternating fashion.  – Each internal node stores the splitting node along x (or y).

 Used for point location and multiple database quesries, k –number of the attributes to perform the search  Geometric interpretation – to perform search in 2D space – 2-d tree  Search components (x,y) interchange!

a c b e d d b f f cae

 The canonical method of kd-tree construction is the following:  As one moves down the tree, one cycles through the axes used to select the splitting planes. (For example, the root would have an x-aligned plane, the root's children would both have y-aligned planes, the root's grandchildren would all have z-aligned planes, the next level would have an x-aligned plane, and so on.)  Points are inserted by selecting the median of the points being put into the subtree, with respect to their coordinates in the axis being used to create the splitting plane. (Note the assumption that we feed the entire set of points into the algorithm up-front.)median

 This method leads to a balanced kd-tree, in which each leaf node is about the same distance from the root. However, balanced trees are not necessarily optimal for all applications.balanced  Note also that it is not required to select the median point. In that case, the result is simply that there is no guarantee that the tree will be balanced. A simple heuristic to avoid coding a complex linear-time median-finding algorithm or using an O(n log n) sort is to use sort to find the median of a fixed number of randomly selected points to serve as the cut line

kd-tree partitions of a uniform set of data points, using the mean (left image) and the median (right image) thresholding options. Median: The middle value of a set of values. Mean: The arithmetic average. (Andrea Vivaldi and Brian Fulkersson)

 One adds a new point to a kd-tree in the same way as one adds an element to any other search tree.search tree  First, traverse the tree, starting from the root and moving to either the left or the right child depending on whether the point to be inserted is on the "left" or "right" side of the splitting plane.  Once you get to the node under which the child should be located, add the new point as either the left or right child of the leaf node, again depending on which side of the node's splitting plane contains the new node.  Adding points in this manner can cause the tree to become unbalanced, leading to decreased tree performance

 To remove a point from an existing kd-tree, without breaking the invariant, the easiest way is to form the set of all nodes and leaves from the children of the target node, and recreate that part of the tree.  Another approach is to find a replacement for the point removed. First, find the node R that contains the point to be removed. For the base case where R is a leaf node, no replacement is required. For the general case, find a replacement point, say p, from the sub-tree rooted at R. Replace the point stored at R with p. Then, recursively remove p.

 Balancing a kd-tree requires care. Because kd-trees are sorted in multiple dimensions, the tree rotation technique cannot be used to balance them — this may break the invariant.tree rotation  Several variants of balanced kd-tree exists. They include divided kd-tree, pseudo kd- tree, K-D-B-tree, hB-tree and Bkd-tree. Many of these variants are adaptive k-d tree.adaptive k-d tree

 Kdtree query uses a best-bin first search heuristic. This is a branch-and-bound technique that maintains an estimate of the smallest distance from the query point to any of the data points down all of the open paths.  Kdtree query supports two important operations: nearest-neighbor search and k-nearest neighbor search. The first returns nearest-neighbor to a query point, the latter can be used to return the k nearest neighbors to a given query point Q. For instance:

 Starting with the root node, the algorithm moves down the tree recursively (i.e. it goes right or left depending on whether the point is greater or less than the current node in the split dimension).  Once the algorithm reaches a leaf node, it saves that node point as the "current best"  The algorithm unwinds the recursion of the tree, performing the following steps at each node:

◦ If the current node is closer than the current best, then it becomes the current best. ◦ The algorithm checks whether there could be any points on the other side of the splitting plane that are closer to the search point than the current best. In concept, this is done by intersecting the splitting hyperplane with a hypersphere around the search point that has a radius equal to the current nearest distance.hyperplanehypersphere ◦ If the hypersphere crosses the plane, there could be nearer points on the other side of the plane, so the algorithm must move down the other branch of the tree from the current node looking for closer points, following the same recursive process as the entire search. If the hypersphere doesn't intersect the splitting plane, then the algorithm continues walking up the tree, and the entire branch on the other side of that node is eliminated.

 kd-trees are not suitable for efficiently finding the nearest neighbour in high dimensional spaces.  In very high dimensional spaces, the curse of dimensionality causes the algorithm to need to visit many more branches than in lower dimensional spaces. In particular, when the number of points is only slightly higher than the number of dimensions, the algorithm is only slightly better than a linear search of all of the points.curse of dimensionality  The algorithm can be improved. It can provide the k- Nearest Neighbors to a point by maintaining k current bests instead of just one. Branches are only eliminated when they can't have points closer than any of the k current bests.

 Kd tree provide convenient tool for range search query in databases with more than one key. The search might go down the root in both directions (left and right), but can be limited by strict inequality on key value at each tree level.  Kd tree is the only data structure that allows easy multi-key search.

 Building a static kd-tree from n points takes O(n log 2 n) time if an O(n log n) sort is used to compute the median at each level.O  The complexity is O(n log n) if a linear median- finding algorithm such as the one described in Cormen et al. ] is used.Omedian- finding ]  Inserting a new point into a balanced kd-tree takes O(log n) time.  Removing a point from a balanced kd-tree takes O(log n) time.  Querying an axis-parallel range in a balanced kd-tree takes O(n 1-1/k +m) time, where m is the number of the reported points, and k the dimension of the kd- tree.

 Instead of points, a kd-tree can also contain rectangles.rectangles  A 2D rectangle is considered a 4D object (x low, x high, y low, y high ).  Thus range search becomes the problem of returning all rectangles intersecting the search rectangle.  The tree is constructed the usual way with all the rectangles at the leaves. In an orthogonal range search, the opposite coordinate is used when comparing against the median. For example, if the current level is split along x high, we check the x low coordinate of the search rectangle. If the median is less than the x low coordinate of the search rectangle, then no rectangle in the left branch can ever intersect with the search rectangle and so can be pruned. Otherwise both branches should be traversed.orthogonal range search  Note that interval tree is a 1-dimensional special case. Note that interval tree

 Query processing in sensor networks  Nearest-neighbor searchers  Optimization  Ray tracing  Database search by multiple keys

Developed by Hugues Hoppe, Microsoft Research Inc. Published first in SIGGRAPH 1996.

Problems with Geometric Subdivisions

The basic operating principle of ROAM

 Define kd tree  What is the difference from B tree? R tree? Quad tree? Grid file? Interval tree?  Define complexity of basic operations  What is the difference between mean and median kd tree?  List typical queries – nearest-neighbor, k nearest neighbors  Provide examples of kd tree applciations