CSE 326: Data Structures Lecture #22 Multidimensional Search Trees Alon Halevy Spring Quarter 2001.

Slides:



Advertisements
Similar presentations
Nearest Neighbor Search
Advertisements

COL 106 Shweta Agrawal and Amit Kumar
Multidimensional Indexing
Searching on Multi-Dimensional Data
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
1 CSE 326: Data Structures Lecture #21 Multidimensional Search Trees Henry Kautz Winter Quarter 2002.
QuadTrees 1. 2 x Quad Tree y 3 Quad Trees Split on all (two) dimensions at each level Split key space into equal size partitions (quadrants) Add a new.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
CSE332: Data Abstractions Lecture 6: Dictionaries; Binary Search Trees Dan Grossman Spring 2010.
CSE332: Data Abstractions Lecture 7: AVL Trees Tyler Robison Summer
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Binary Heaps CSE 373 Data Structures Lecture 11. 2/5/03Binary Heaps - Lecture 112 Readings Reading ›Sections
CSE 326: Data Structures Lecture #7 Branching Out Steve Wolfman Winter Quarter 2000.
CSE 326 Multi-Dimensional Search Trees David Kaplan Dept of Computer Science & Engineering Autumn 2001.
CSE 326: Data Structures Binary Search Trees Ben Lerner Summer 2007.
CSE 326: Data Structures Lecture #11 B-Trees Alon Halevy Spring Quarter 2001.
BST Data Structure A BST node contains: A BST contains
Course Review COMP171 Spring Hashing / Slide 2 Elementary Data Structures * Linked lists n Types: singular, doubly, circular n Operations: insert,
CSE 326 Randomized Data Structures David Kaplan Dept of Computer Science & Engineering Autumn 2001.
CSE 326: Data Structures Lecture #7 Binary Search Trees Alon Halevy Spring Quarter 2001.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
CSE 326: Data Structures Lecture #13 Extendible Hashing and Splay Trees Alon Halevy Spring Quarter 2001.
CSE 326: Data Structures Lecture #8 Binary Search Trees Alon Halevy Spring Quarter 2001.
AVL Trees ITCS6114 Algorithms and Data Structures.
(B+-Trees, that is) Steve Wolfman 2014W1
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
End of SQL XML April 22 th, Null Values If x=Null then 4*(3-x)/7 is still NULL If x=Null then x=“Joe” is UNKNOWN Three boolean values: –FALSE =
1 CSE 326: Data Structures Trees Lecture 7: Wednesday, Jan 23, 2003.
1 CSE 326: Data Structures Part 10 Advanced Data Structures Henry Kautz Autumn Quarter 2002.
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees Nicki Dell Spring 2014 CSE373: Data Structures & Algorithms1.
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees Lauren Milne Summer
CSC 213 – Large Scale Programming. Today’s Goals  Review a new search tree algorithm is needed  What real-world problems occur with old tree?  Why.
Compiled by: Dr. Mohammad Alhawarat BST, Priority Queue, Heaps - Heapsort CHAPTER 07.
ADT Table and Heap Ellen Walker CPSC 201 Data Structures Hiram College.
S EARCHING AND T REES COMP1927 Computing 15s1 Sedgewick Chapters 5, 12.
CSC 213 – Large Scale Programming Lecture 17: Binary Search Trees.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
B-Trees and Red Black Trees. Binary Trees B Trees spread data all over – Fine for memory – Bad on disks.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Starting at Binary Trees
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
School of Engineering and Computer Science Victoria University of Wellington Copyright: Xiaoying Gao, Peter Andreae, VUW B Trees and B+ Trees COMP 261.
CPSC 221: Algorithms and Data Structures Lecture #7 Sweet, Sweet Tree Hives (B+-Trees, that is) Steve Wolfman 2010W2.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Nicki Dell Spring 2014.
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees continued Aaron Bauer Winter 2014 CSE373: Data Structures & Algorithms1.
CSC 213 – Large Scale Programming Lecture 18: Zen & the Art of O (log n ) Search.
Binary Search Trees (BSTs) 18 February Binary Search Tree (BST) An important special kind of binary tree is the BST Each node stores some information.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
ADT Binary Search Tree Ellen Walker CPSC 201 Data Structures Hiram College.
CISC220 Fall 2009 James Atlas Dec 07: Final Exam Review.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees Linda Shapiro Winter 2015.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
CSE 589 Applied Algorithms Spring 1999 Prim’s Algorithm for MST Load Balance Spanning Tree Hamiltonian Path.
BSTs, AVL Trees and Heaps Ezgi Shenqi Bran. What to know about Trees? Height of a tree Length of the longest path from root to a leaf Height of an empty.
Nov 2, 2001CSE 373, Autumn Hash Table example marking deleted items + choice of table size.
Spatial Data Management
B/B+ Trees 4.7.
Tree-Structured Indexes: Introduction
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Linda Shapiro Spring 2016.
Multidimensional Indexes
Semi-Structured data (XML Data MODEL)
Steve Wolfman Winter Quarter 2000
Lecture 8: XML Data Wednesday, October
Presentation transcript:

CSE 326: Data Structures Lecture #22 Multidimensional Search Trees Alon Halevy Spring Quarter 2001

Today’s Outline Multi-dimensional search trees Range Queries k-D Trees Quad Trees

Multi-D Search ADT Dictionary operations –create –destroy –find –insert –delete –range queries Each item has k keys for a k-dimensional search tree Searches can be performed on one, some, or all the keys or on ranges of the keys 9,13,64,2 5,78,21,94,4 8,42,5 5,2

Applications of Multi-D Search Astronomy (simulation of galaxies) - 3 dimensions Protein folding in molecular biology - 3 dimensions Lossy data compression - 4 to 64 dimensions Image processing - 2 dimensions Graphics - 2 or 3 dimensions Animation - 3 to 4 dimensions Geographical databases - 2 or 3 dimensions Web searching or more dimensions

Range Query A range query is a search in a dictionary in which the exact key may not be entirely specified. Range queries are the primary interface with multi-D data structures.

Range Query Examples: Two Dimensions Search for items based on just one key Search for items based on ranges for all keys Search for items based on a function of several keys: e.g., a circular range query

x Range Querying in 1-D Find everything in the rectangle…

x Range Querying in 1-D with a BST Find everything in the rectangle…

x y 1-D Range Querying in 2-D

x y 2-D Range Querying in 2-D

k-D Trees Split on the next dimension at each succeeding level If building in batch, choose the median along the current dimension at each level –guarantees logarithmic height and balanced tree In general, add as in a BST k-D tree node dimension leftrightkeysvalue The dimension that this node splits on

x Building a 2-D Tree (1/4) y

x y Building a 2-D Tree (2/4)

x y Building a 2-D Tree (3/4)

x y Building a 2-D Tree (4/4)

k-D Tree a c i h m d e f b j k g l ldkfhg e cjimba

x y 2-D Range Querying in 2-D Trees Search every partition that intersects the rectangle. Check whether each node (including leaves) falls into the range.

x y Other Shapes for Range Querying Search every partition that intersects the shape (circle). Check whether each node (including leaves) falls into the shape.

Find in a k-D Tree find(, root) finds the node which has the given set of keys in it or returns null if there is no such node Node *& find(const keyVector & keys, Node *& root) { int dim = root->dimension; if (root == NULL) return root; else if (root->keys == keys) return root; else if (keys[dim] keys[dim]) return find(keys, root->left); else return find(keys, root->right); } runtime:

k-D Trees Can Be Inefficient (but not when built in batch!) insert( ) 6,9 5,0 6,5 9,3 8,6 7,7 suck factor:

Find Example find( ) 5,78,21,94,4 8,42,5 5,2 9,13,64,2

Quad Trees Split on all (two) dimensions at each level Split key space into equal size partitions (quadrants) Add a new node by adding to a leaf, and, if the leaf is already occupied, split until only one node per leaf quad tree node Quadrants: 0,11,1 0,01,0 quadrant 0,01,00,11,1 keysvalue Center xy Center:

x Building a Quad Tree (1/5) y

x Building a Quad Tree (2/5) y

x Building a Quad Tree (3/5) y

x Building a Quad Tree (4/5) y

x Building a Quad Tree (5/5) y

Quad Tree Example a g b e f d c ga fed cb

x 2-D Range Querying in Quad Trees y

Find in a Quad Tree find(, root) finds the node which has the given pair of keys in it or returns quadrant where the point should be if there is no such node Node *& find(Key x, Key y, Node *& root) { if (root == NULL) return root; // Empty tree if (root->isLeaf) return root; // Key may not actually be here int quad = getQuadrant(x, y, root); return find(x, y, root->quadrants[quad]); } runtime: Compares against center; always makes the same choice on ties.

Quad Trees Can Suck b a suck factor:

Find Example a g b e f d c ga fed cb find( ) (i.e., c) find( ) (i.e., d)

Insert Example a insert(,x) …… a g b e f d c x gx Find the spot where the node should go. If the space is unoccupied, insert the node. If it is occupied, split until the existing node separates from the new one.

Delete Example a g b e f d c ga fed cb delete( )(i.e., c) Find and delete the node. If its parent has just one child, delete it. Propagate!

Nearest Neighbor Search ga fed cb getNearestNeighbor( ) g b f d c Find a nearby node (do a find). Do a circular range query. As you get results, tighten the circle. Continue until no closer node in query. a e Works on k-D Trees, too!

Quad Trees vs. k-D Trees k-D Trees –Density balanced trees –Number of nodes is O(n) where n is the number of points –Height of the tree is O(log n) with batch insertion –Supports insert, find, nearest neighbor, range queries Quad Trees –Number of nodes is O(n(1+ log(  /n))) where n is the number of points and  is the ratio of the width (or height) of the key space and the smallest distance between two points –Height of the tree is O(log n + log  ) –Supports insert, delete, find, nearest neighbor, range queries

326 in a Nutshell Lists, stacks, queues BST’s: AVL Splay, B Multi-D trees Hash tables Priority Q’s Sorting: Quick, radix heap, merge Disjoint sets Graphs: shortest paths spanning trees A*

More Highlights Analysis: –O notation: constants don’t matter. –The little cliff: from log to polynomial –The big cliff: from polynomial to exponential –In practice: constants to matter! (especially for large datasets). Things change when data is on disk. –Key: know your application! Culture: –This stuff applies everywhere! –You are now a computer scientist! Data structures and algorithms are the first thing you think about. –Awards: Nobel, Turing, database guru. –Names: Dijkstra, Knuth, Bayer, …, –Some good jokes; some really lousy ones. –Life is all about databases.

XML eXtensible Markup Language Roots: comes from SGML (very nasty language). After the roots: a format for sharing data Emerging format for data exchange on the Web and between applications

XML Applications Sharing data between different components of an application: no need to share data structures. Format for storing all data in Office EDI: electronic data exchange: –Transactions between banks –Producers and suppliers sharing product data (auctions) –Extranets: building relationships between companies –Scientists sharing data about experiments.

XML Syntax Very simple: <db> <book> <title>Complete Guide to DB2</title> <author>Chamberlin</author> </book> < > <title>Transaction Processing</title> <author>Bernstein</author> < >Newcomer</author> </book> <publisher> <name>Morgan Kaufman</name> <state>CA</state> </publisher> </db>

What is XML ? From HTML to XML HTML describes the presentation: easy for humans

HTML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteboul, Buneman, Suciu Morgan Kaufmann, 1999 HTML is hard for applications

XML Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML describes the content: easy for applications