15-826: Multimedia Databases and Data Mining

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos.
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture #4: Multi-key and Spatial Access Methods - I C. Faloutsos.
Nearest Neighbor Search
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU
Searching on Multi-Dimensional Data
Data Organization - B-trees. 11.2Database System Concepts A simple index Brighton A Downtown A Downtown A Mianus A Perry.
Chapter 14 Indexing Structures for Files Copyright © 2004 Ramez Elmasri and Shamkant Navathe.
1 Lecture 8: Data structures for databases II Jose M. Peña
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
COMP 451/651 Multiple-key indexes
Indexing dww-database System.
Chapter 14-1 Chapter Outline Types of Single-level Ordered Indexes –Primary Indexes –Clustering Indexes –Secondary Indexes Multilevel Indexes Dynamic Multilevel.
Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Multi-dimensional Search Trees
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
1 Ullman et al. : Database System Principles Notes 4: Indexing.
Chapter 5 Ranking with Indexes. Indexes and Ranking n Indexes are designed to support search  Faster response time, supports updates n Text search engines.
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
10/3/2017 Chapter 6 Index Structures.
Indexing Structures for Files
Spatial Data Management
School of Computing Clemson University Fall, 2012
Indexing Structures for Files
15-826: Multimedia Databases and Data Mining
Data Organization - B-trees
Data Indexing Herbert A. Evans.
Chapter Outline Indexes as additional auxiliary access structure
Indexing Structures for Files
CPS216: Data-intensive Computing Systems
Indexing Structures for Files and Physical Database Design
Indexing Goals: Store large files Support multiple search keys
Indexing and hashing.
Tree-Structured Indexes: Introduction
Multidimensional Access Structures
Storage and Indexes Chapter 8 & 9
CS 245: Database System Principles Notes 4: Indexing
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
COMP 430 Intro. to Database Systems
Chapter 11: Indexing and Hashing
KD Tree A binary search tree where every node is a
C. Faloutsos Indexing and Hashing – part I
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
15-826: Multimedia Databases and Data Mining
11/14/2018.
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
(Slides by Hector Garcia-Molina,
Hash-Based Indexes Chapter 10
15-826: Multimedia Databases and Data Mining
Multidimensional Indexes
CS 245: Database System Principles Notes 4: Indexing
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
CSE 544: Lecture 11 Storing Data, Indexes
15-826: Multimedia Databases and Data Mining
CPS216: Advanced Database Systems
15-826: Multimedia Databases and Data Mining
Lecture 20: Indexes Monday, February 27, 2006.
Indexing Structures for Files
15-826: Multimedia Databases and Data Mining
8/31/2019.
Lec 6 Indexing Structures for Files
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

15-826: Multimedia Databases and Data Mining Lecture #4: Multi-key and Spatial Access Methods - I C. Faloutsos

Copyright: C. Faloutsos (2017) 15-826 Must-Read Material MM-Textbook, Chapter 4 [Bentley75] J.L. Bentley: Multidimensional Binary Search Trees Used for Associative Searching, CACM, 18,9, Sept. 1975. Ramakrinshan+Gehrke, Chapter 28.1-3 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining 15-826 Copyright: C. Faloutsos (2017)

Indexing - Detailed outline primary key indexing secondary key / multi-key indexing spatial access methods text ... 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Sec. key indexing attributes w/ duplicates (eg., EMPLOYEES, with ‘job-code’) Query types: exact match partial match ‘job-code’= ‘PGM’ and ‘dept’=‘R&D’ range queries ‘job-code’=‘ADMIN’ and salary < 50K 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Sec. key indexing Query types - cont’d boolean ‘job-code’=‘ADMIN’ or salary>20K nn salary ~ 30K 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Solution? 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Solution? Inverted indices (usually, w/ B-trees) Q: how to handle duplicates? salary-index 50 70 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Solution A#1: eg., with postings lists postings lists salary-index 50 70 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Solution A#2: modify B-tree code, to handle dup’s salary-index 50 50 70 15-826 Copyright: C. Faloutsos (2017)

How to handle Boolean Queries? eg., ‘sal=50 AND job-code=PGM’? salary-index 50 50 70 15-826 Copyright: C. Faloutsos (2017)

How to handle Boolean Queries? from indices, find lists of qual. record-ids merge lists (or check real records) salary-index 50 50 70 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Sec. key indexing easily solved in commercial DBMS: create index sal-index on EMPLOYEE (salary); select * from EMPLOYEE where salary > 50 and job-code = ‘ADMIN’ 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Sec. key indexing can create combined indices: create index sj on EMPLOYEE( salary, job-code); 15-826 Copyright: C. Faloutsos (2017)

Indexing - Detailed outline primary key indexing secondary key / multi-key indexing main memory: quad-trees main memory: k-d-trees spatial access methods text ... 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Quad-trees problem: find cities within 100mi from Pittsburgh assumption: all fit in main memory Q: how to answer such queries quickly? 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Quad-trees A: recursive decomposition of space, e.g.: PGH ATL PHL 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Quad-trees A: recursive decomposition of space, e.g.: PHL PGH (30,10) SW ATL 10 30 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Quad-trees A: recursive decomposition of space, e.g.: PHL PGH (30,10) NE SW 20 ATL 10 40,20 30 40 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Quad-trees - search? find cities with (35<x<45, 15<y<25): PHL PGH (30,10) NE SW 20 ATL 10 40,20 30 40 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Quad-trees - search? find cities with (35<x<45, 15<y<25): PHL PGH (30,10) NE SW 20 ATL 10 40,20 30 40 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Quad-trees - search? pseudocode: range-query( tree-ptr, range) if (tree-ptr == NULL) exit; if (tree-ptr->point within range){ print tree-ptr->point} for each quadrant { if ( range intersects quadrant ) { range-query( tree-ptr->quadrant-ptr, range); } 15-826 Copyright: C. Faloutsos (2017)

Quad-trees - k-nn search? k-nearest neighbor algo - more complicated: find ‘good’ neighbors and put them in a stack go to the most promising quadrant, and update the stack of neighbors until we hit the leaves 15-826 Copyright: C. Faloutsos (2017)

Quad-trees - discussion great for 2- and 3-d spaces several variations, like fixed decomposition: ‘adaptive’ ‘fixed’ z-ordering (later) PGH ATL PHL PGH ATL PHL middle 15-826 Copyright: C. Faloutsos (2017)

Quad-trees - discussion but: unsuitable for higher-d spaces (why?) 15-826 Copyright: C. Faloutsos (2017)

Quad-trees - discussion but: unsuitable for higher-d spaces (why?) A: 2^d pointers, per node! Q: how to solve this problem? A: k-d-trees! 15-826 Copyright: C. Faloutsos (2017)

Indexing - Detailed outline primary key indexing secondary key / multi-key indexing main memory: quad-trees main memory: k-d-trees spatial access methods text ... 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) k-d-trees Binary trees, with alternating ‘discriminators’ PHL PGH (30,10) SW ATL 10 quad-tree 30 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) 15-826 k-d-trees Binary trees, with alternating ‘discriminators’ PHL PGH (30,10) W E ATL 10 k-d-tree 30 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) k-d-trees Binary trees, with alternating ‘discriminators’ ATL PHL PGH (30,10) x<=30 x>30 ATL 10 30 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) k-d-trees Binary trees, with alternating ‘discriminators’ ATL PHL x PGH (30,10) 20 x<=30 x>30 PHL ATL 10 y (40,20) y<=20 y>20 30 40 15-826 Copyright: C. Faloutsos (2017)

(Several demos/applets, e.g.) http://donar.umiacs.umd.edu/quadtree/points/kdtree.html 15-826 Copyright: C. Faloutsos (2017)

Indexing - Detailed outline primary key indexing secondary key / multi-key indexing main memory: quad-trees main memory: k-d-trees insertion; deletion range query; k-nn query spatial access methods text ... 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) k-d-trees - insertion Binary trees, with alternating ‘discriminators’ (30,10) x<=30 x>30 ATL PHL x PGH 20 (40,20) y<=20 y>20 PHL ATL 10 y 30 40 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) k-d-trees - insertion discriminators: may cycle, or .... Q: which should we put first? PGH ATL PHL 30 10 (30,10) x<=30 x>30 20 40 (40,20) y<=20 y>20 x y 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) k-d-trees - deletion How? PGH ATL PHL 30 10 (30,10) x<=30 x>30 20 40 (40,20) y<=20 y>20 x y 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) k-d-trees - deletion Tricky! ‘delete-and-promote’ (or ‘mark as deleted’) PGH ATL PHL 30 10 (30,10) x<=30 x>30 20 40 (40,20) y<=20 y>20 x y 15-826 Copyright: C. Faloutsos (2017)

k-d-trees - range query (30,10) x<=30 x>30 ATL PHL PGH 20 (40,20) y<=20 y>20 PHL ATL 10 30 40 15-826 Copyright: C. Faloutsos (2017)

k-d-trees - range query similar to quad-trees: check the root; proceed to appropriate child(ren). (30,10) x<=30 x>30 ATL PHL PGH 20 (40,20) y<=20 y>20 PHL ATL 10 30 40 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) k-d-trees - k-nn query e.g., 1-nn: closest city to ‘X’ (30,10) x<=30 x>30 ATL PHL PGH 20 (40,20) y<=20 y>20 PHL ATL X 10 30 40 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) k-d-trees - k-nn query A: check root; put in stack; proceed to child (30,10) x<=30 x>30 ATL PHL PGH 20 (40,20) y<=20 y>20 PHL ATL X 10 30 40 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) k-d-trees - k-nn query A: check root; put in stack; proceed to child (30,10) x<=30 x>30 ATL PHL PGH 20 (40,20) y<=20 y>20 PHL ATL X 10 30 40 15-826 Copyright: C. Faloutsos (2017)

Indexing - Detailed outline primary key indexing secondary key / multi-key indexing main memory: quad-trees main memory: k-d-trees insertion; deletion range query; k-nn query discussion spatial access methods text ... 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) k-d trees - discussion great for main memory & low ‘d’ (~<10) Q: what about high-d? A: Q: what about disk 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) k-d trees - discussion great for main memory & low ‘d’ (~<10) Q: what about high-d? A: most attributes don’t ever become discriminators Q: what about disk? A: Pagination problems, after ins./del. (solutions: next!) 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Conclusions sec. keys: B-tree indices (+ postings lists) multi-key, main memory methods: quad-trees k-d-trees 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) References [Bentley75] J.L. Bentley: Multidimensional Binary Search Trees Used for Associative Searching, CACM, 18,9, Sept. 1975. [Finkel74] R.A. Finkel, J.L. Bentley: Quadtrees: A data structure for retrieval on composite keys, ACTA Informatica,4,1, 1974 Applet: eg., http://donar.umiacs.umd.edu/quadtree/points/kdtree.html 15-826 Copyright: C. Faloutsos (2017)