CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos.

Slides:



Advertisements
Similar presentations
Multidimensional Index Structures One dimensional index structures assume a single search key, and retrieve records that match a given search-key value.
Advertisements

Slide 1 Insert your own content. Slide 2 Insert your own content.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Introduction to Rails.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 3.1 Chapter 3.
Combining Like Terms. Only combine terms that are exactly the same!! Whats the same mean? –If numbers have a variable, then you can combine only ones.
0 - 0.
CMU SCS : Multimedia Databases and Data Mining Lecture #17: Text - part IV (LSI) C. Faloutsos.
5.9 + = 10 a)3.6 b)4.1 c)5.3 Question 1: Good Answer!! Well Done!! = 10 Question 1:
Introduction to Indexes Rui Zhang The University of Melbourne Aug 2006.
Chapter 7 Indexing Structures for Files Copyright © 2004 Ramez Elmasri and Shamkant Navathe.
CMU SCS : Multimedia Databases and Data Mining Lecture #14: Text – Part I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #4: Multi-key and Spatial Access Methods - I C. Faloutsos.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
1 Unit 1 Kinematics Chapter 1 Day
Copyright © 2004 Pearson Education, Inc.. Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#14(b): Implementation of Relational.
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals - case studies - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications Faloutsos & Pavlo Lecture#11 (R&G ch. 11) Hashing.
CMU SCS : Multimedia Databases and Data Mining Lecture#3: Primary key indexing – hashing C. Faloutsos.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
15-826: Multimedia Databases and Data Mining
CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
CMU SCS : Multimedia Databases and Data Mining Lecture #16: Text - part III: Vector space model and clustering C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals - case studies Part III (regions, quadtrees, knn queries) C. Faloutsos.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part II (more case studies) C. Faloutsos.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
CMU SCS : Multimedia Databases and Data Mining Lecture#3: Primary key indexing – hashing C. Faloutsos.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 “How index-learning turns no student pale Yet holds.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
CMU SCS : Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos.
CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections )
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
File Organizations and Indexing
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 Jianping Fan Dept of Computer Science UNC-Charlotte.
15-826: Multimedia Databases and Data Mining
Diskusi-08 Jelaskan dan berikan contoh penggunaan theta join, equijoin, natural join, outer join, dan semijoin The slides for this text are organized into.
15-826: Multimedia Databases and Data Mining
Spatial Indexing I Point Access Methods.
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Faloutsos & Pavlo Lecture#11 (R&G ch. 11) Hashing
15-826: Multimedia Databases and Data Mining
Multidimensional Indexes
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
File Organizations and Indexing
Presentation transcript:

CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos

CMU SCS Copyright: C. Faloutsos (2012)#2 Must-read material Textbook, Chapter 5.3 Ramakrinshan+Gehrke, Chapter 28.5

CMU SCS Copyright: C. Faloutsos (2012)#3 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining

CMU SCS Copyright: C. Faloutsos (2012)#4 Indexing - Detailed outline primary key indexing secondary key / multi-key indexing spatial access methods –problem dfn –z-ordering –R-trees –misc text...

CMU SCS Copyright: C. Faloutsos (2012)#5 SAMs - Detailed outline spatial access methods –problem dfn –z-ordering –R-trees –misc topics grid files dimensionality curse; dim. reduction metric trees other nn methods text,...

CMU SCS Copyright: C. Faloutsos (2012)#6 Grid files problem: spatial queries in k-d point-sets Main idea: try to generalize hashing to k-d (how?)

CMU SCS Copyright: C. Faloutsos (2012)#7 Grid files A: put a grid specs: [Nievergelt +, 84] –symmetric to all attributes –2 disk accesses for exact match queries –adaptive to non-uniform distr. Q: details? Jurg Nievergelt

CMU SCS Copyright: C. Faloutsos (2012)#8 Grid files cuts: all the way through cuts: at ½, ¾, ¼ etc; but on demand each cell -> disk page

CMU SCS Copyright: C. Faloutsos (2012)#9 Grid files Thus, we only need: -cut-points for each axis -k-d directory ½ ¼ ½ ¼ ½ ½ x-cuts y-cuts

CMU SCS Copyright: C. Faloutsos (2012)#10 Grid files Search ( for exact match) – eg., (0.3; 0.3) ½ ¼ ½ ¼ ½ ½ x-cuts y-cuts

CMU SCS Copyright: C. Faloutsos (2012)#11 Grid files Search ( for exact match) – eg., (0.3; 0.3) ½ ¼ ½ ¼ ½ ½ x-cuts y-cuts

CMU SCS Copyright: C. Faloutsos (2012)#12 Grid files specs: [Nievergelt +, 84] –symmetric to all attributes –2 disk accesses for exact match queries –adaptive to non-uniform distr. X

CMU SCS Copyright: C. Faloutsos (2012)#13 Grid files partial match – eg., 0<x<0.3 ½ ¼ ½ ¼ ½ ½ x-cuts y-cuts

CMU SCS Copyright: C. Faloutsos (2012)#14 Grid files partial match – eg., 0<x<0.3 ½ ¼ ½ ¼ ½ ½ x-cuts y-cuts

CMU SCS Copyright: C. Faloutsos (2012)#15 Grid files exactly the symmetric algo for eg., 0<y<0.3

CMU SCS Copyright: C. Faloutsos (2012)#16 Grid files specs: [Nievergelt +, 84] –symmetric to all attributes –2 disk accesses for exact match queries –adaptive to non-uniform distr. X X

CMU SCS Copyright: C. Faloutsos (2012)#17 Grid files Q: How to split an overflowing page? ½ ¼ ½ ¼ ½ ½ x-cuts y-cuts

CMU SCS Copyright: C. Faloutsos (2012)#18 Grid files A: pick the ‘best’ axis, and cut all the way through ½ ¼ ½ ¼ ½ ½ x-cuts y-cuts

CMU SCS Copyright: C. Faloutsos (2012)#19 Grid files A: pick the ‘best’ axis, and cut all the way through... ½ ¼ ½ ¼ 3/8 ½ x-cuts y-cuts 3/8 ½

CMU SCS Copyright: C. Faloutsos (2012)#20 Grid files... updating the directory appropriately (ouch!) ½ ¼ ½ ¼ 3/8 ½ x-cuts y-cuts 3/8 ½

CMU SCS Copyright: C. Faloutsos (2012)#21 Grid files specs: [Nievergelt +, 84] –symmetric to all attributes –2 disk accesses for exact match queries –adaptive to non-uniform distr. X X X

CMU SCS Copyright: C. Faloutsos (2012)#22 Grid files it meets the three goals had follow-up work [twin grid files, multi- level; etc] BUT: has some disadvantages (which ones?)

CMU SCS Copyright: C. Faloutsos (2012)#23 Grid files - disadvantages #1: problems in high-d: directory splits can be expensive

CMU SCS Copyright: C. Faloutsos (2012)#24 Grid files - disadvantages #2: even in low-d, suffers on correlated attributes:

CMU SCS Copyright: C. Faloutsos (2012)#25 Grid files - disadvantages (Q: how to fix, for 2-d, linearly correlated points?)

CMU SCS Copyright: C. Faloutsos (2012)#26 Grid files - disadvantages (A1: rotate [Hinrichs+]; A2: triangular cells [Rego+])

CMU SCS Copyright: C. Faloutsos (2012)#27 Grid files - disadvantages #3: how about region data?

CMU SCS Copyright: C. Faloutsos (2012)#28 Grid files - disadvantages #3: how about region data? if we ‘cut’ them, then we have O(volume) pieces (while z-ordering: O(surface)) what to do?

CMU SCS Copyright: C. Faloutsos (2012)#29 Grid files - disadvantages what to do? Translation to 2k – d points! (clever, BUT, still has subtle problems) E.g., 1-d ‘regions’ AB C x-start x-end 0 1 ½ ¼ ¾ 0 1 ½ ¼ ¾ A B C

CMU SCS Copyright: C. Faloutsos (2012)#30 Grid files - disadvantages what to do? Translation to 2k – d points! (clever, BUT, still has subtle problems) E.g., 1-d ‘regions’ AB C x-start x-end 0 1 ½ ¼ ¾ 0 1 ½ ¼ ¾ A B C

CMU SCS Copyright: C. Faloutsos (2012)#31 Grid files - disadvantages what to do? Translation to 2k – d points! (clever, BUT, still has subtle problems) E.g., 1-d ‘regions’ AB C x-start x-end 0 1 ½ ¼ ¾ 0 1 ½ ¼ ¾ A B C

CMU SCS Copyright: C. Faloutsos (2012)#32 Grid files - disadvantages what is the problem, then?

CMU SCS Copyright: C. Faloutsos (2012)#33 Grid files - disadvantages what is the problem, then? A: dimensionality curse; large query regions

CMU SCS Copyright: C. Faloutsos (2012)#34 Grid files – conclusions works OK in low-d un-correlated points but z-ordering/R-trees seem to work better for higher-d smart idea to translate k-d rectangles into 2*k - points (but: dim. curse)

CMU SCS Copyright: C. Faloutsos (2012)#35 SAMs - Detailed outline spatial access methods –problem dfn –z-ordering –R-trees –misc topics grid files dimensionality curse; dim. reduction metric trees other nn methods text,...

CMU SCS Copyright: C. Faloutsos (2012)#36 Dimensionality ‘curse’ Q: What is the problem in high-d?

CMU SCS Copyright: C. Faloutsos (2012)#37 Dimensionality ‘curse’ Q: What is the problem in high-d? A: indices do not seem to help, for many queries (eg., k-nn) –in high-d (& uniform distributions), most points are equidistant -> k-nn retrieves too many near- neighbors –[Yao & Yao, ’85]: search effort ~ O( N (1-1/d) )

CMU SCS Copyright: C. Faloutsos (2012)#38 Dimensionality ‘curse’ (counter-intuitive, for db mentality) Q: What to do, then?

CMU SCS Copyright: C. Faloutsos (2012)#39 Dimensionality ‘curse’ A1: switch to seq. scanning A2: dim. reduction A3: consider the ‘intrinsic’/fractal dimensionality A4: find approximate nn

CMU SCS Copyright: C. Faloutsos (2012)#40 Dimensionality ‘curse’ A1: switch to seq. scanning –X-trees [Kriegel+, VLDB 96] –VA-files [Schek+, VLDB 98], ‘test of time’ award

CMU SCS Copyright: C. Faloutsos (2012)#41 Dimensionality ‘curse’ A1: switch to seq. scanning A2: dim. reduction A3: consider the ‘intrinsic’/fractal dimensionality A4: find approximate nn

CMU SCS Copyright: C. Faloutsos (2012)#42 Dim. reduction a.k.a. feature selection/extraction: SVD (optimal, to preserve Euclidean distances) random projections using the fractal dimension [Traina+ SBBD2000]

CMU SCS Copyright: C. Faloutsos (2012)#43 Singular Value Decomposition (SVD) SVD (~LSI ~ KL ~ PCA ~ spectral analysis...) LSI: S. Dumais; M. Berry KL: eg, Duda+Hart PCA: eg., Jolliffe MANY more details: soon

CMU SCS Copyright: C. Faloutsos (2012)#44 Random projections random projections(Johnson-Lindenstrauss thm [Papadimitriou+ pods98])

CMU SCS Copyright: C. Faloutsos (2012)#45 Random projections pick ‘enough’ random directions (will be ~orthogonal, in high-d!!) distances are preserved probabilistically, within epsilon (also, use as a pre-processing step for SVD [Papadimitriou+ PODS98]

CMU SCS Copyright: C. Faloutsos (2012)#46 Dim. reduction - w/ fractals Main idea: drop those attributes that don’t affect the intrinsic (‘fractal’) dimensionality [Traina+, SBBD 2000]

CMU SCS Copyright: C. Faloutsos (2012)#47 Dim. reduction - w/ fractals global FD=1

CMU SCS Copyright: C. Faloutsos (2012)#48 Dimensionality ‘curse’ A1: switch to seq. scanning A2: dim. reduction A3: consider the ‘intrinsic’/fractal dimensionality A4: find approximate nn

CMU SCS Copyright: C. Faloutsos (2012)#49 Intrinsic dimensionality before we give up, compute the intrinsic dim.: the lower, the better... [Pagel+, ICDE 2000] more details: under ‘fractals’ intr. d = 2intr. d = 1

CMU SCS Copyright: C. Faloutsos (2012)#50 Dimensionality ‘curse’ A1: switch to seq. scanning A2: dim. reduction A3: consider the ‘intrinsic’/fractal dimensionality A4: find approximate nn

CMU SCS Copyright: C. Faloutsos (2012)#51 Approximate nn [Arya + Mount, SODA93], [Patella+ ICDE 2000] Idea: find k neighbors, such that the distance of the k-th one is guaranteed to be within epsilon of the actual.

CMU SCS Copyright: C. Faloutsos (2012)#52 SAMs - Detailed outline spatial access methods –problem dfn –z-ordering –R-trees –misc topics grid files dimensionality curse; dim. reduction metric trees other nn methods text,...

CMU SCS Copyright: C. Faloutsos (2012)#53 Conclusions Dimensionality ‘curse’: –for high-d, indices slow down to ~O(N) If the intrinsic dim. is low, there is hope otherwise, do seq. scan, or sacrifice accuracy (approximate nn)

CMU SCS Copyright: C. Faloutsos (2012)#54 References Sunil Arya, David M. Mount: Approximate Nearest Neighbor Queries in Fixed Dimensions. SODA 1993: ANN library:

CMU SCS Copyright: C. Faloutsos (2012)#55 References Berchtold, S., D. A. Keim, et al. (1996). The X- tree : An Index Structure for High-Dimensional Data. VLDB, Mumbai (Bombay), India. Faloutsos, C. and W. Rego (1989). “Tri-cell: A Data Structure for Spatial Objects.” Information Systems 14(2):

CMU SCS Copyright: C. Faloutsos (2012)#56 References Hinrichs, K. and J. Nievergelt (1983). The Grid File: A Data Structure to Support Proximity Queries on Spatial Objects. Proc. of the WG'83 (Intern. Workshop on Graph Theoretic Concepts in Computer Science), Linz, Austria, Trauner Verlag.

CMU SCS Copyright: C. Faloutsos (2012)#57 References cnt’d Nievergelt, J., H. Hinterberger, et al. (March 1984). “The Grid File: An Adaptable, Symmetric Multikey File Structure.” ACM TODS 9(1): Papadimitriou, C. H., P. Raghavan, et al. (1998). Latent Semantic Indexing: A Probabilistic Analysis. PODS, Seattle, WA.

CMU SCS Copyright: C. Faloutsos (2012)#58 References cnt’d Weber, R., H.-J. Schek, et al. (1998). A Quantitative Analysis and Performance Study for Similarity-Search Methods in high-dimensional spaces. VLDB, New York, NY. Yao, A. C. and F. F. Yao (May 6-8, 1985). A General Approach to d-Dimensional Geometric Queries. Proc. of the 17th Annual ACM Symposium on Theory of Computing (STOC), Providence, RI.