Hans-Peter Kriegel, Martin Pfeifle, Marco Pötke, Thomas Seidl A Cost Model for Interval Intersection Queries on RI-Trees Institute for Computer Science.

Slides:



Advertisements
Similar presentations
External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Advertisements

Danzhou Liu Ee-Peng Lim Wee-Keong Ng
Hashing and Indexing John Ortiz.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Efficient Density-Based Clustering of Complex Objects Stefan Brecheisen, Hans-Peter Kriegel, Martin Pfeifle University of Munich Institute for Computer.
B+-trees. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B = n pages I/O complexity:
Continuous Intersection Joins Over Moving Objects Rui Zhang University of Melbourne Dan Lin Purdue University Kotagiri Ramamohanarao University of Melbourne.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
BTrees & Bitmap Indexes
B+-tree and Hashing.
Special Cases of the Hidden Line Elimination Problem Computational Geometry, WS 2007/08 Lecture 16 Prof. Dr. Thomas Ottmann Algorithmen & Datenstrukturen,
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing.
Lars Arge1, Mark de Berg2, Herman Haverkort3 and Ke Yi1
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized.
Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Kyoto, 03/26/03 Kyoto, 03/26/03 Martin Pfeifle, Database Group, University of Munich Spatial Query Processing for High Resolutions Hans-Peter Kriegel,
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
PowerPoint Presentation for Dennis, Wixom, & Tegarden Systems Analysis and Design with UML, 4th Edition Copyright © 2009 John Wiley & Sons, Inc. All rights.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
The Forest and the Trees Julia Stoyanovich Candidacy Exam in Database Systems Fall 2005.
CSED101 INTRODUCTION TO COMPUTING TREE 2 Hwanjo Yu.
CompSci 100E 39.1 Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
Creating and Maintaining Geographic Databases. Outline Definitions Characteristics of DBMS Types of database Relational model SQL Spatial databases.
1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu.
Segment Trees Basic data structure in computational geometry. Computational geometry.  Computations with geometric objects.  Points in 1-, 2-, 3-, d-space.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
IT System Administration Lesson 3 Dr Jeffrey A Robinson.
Session 1 Module 1: Introduction to Data Integrity
CMPS 3130/6130 Computational Geometry Spring 2015
PowerPoint Presentation for Dennis, Wixom, & Tegarden Systems Analysis and Design with UML, 5th Edition Copyright © 2015 John Wiley & Sons, Inc. All rights.
COMP 5704 Project Presentation Parallel Buffer Trees and Searching Cory Fraser School of Computer Science Carleton University, Ottawa, Canada
B-Trees Katherine Gurdziel 252a-ba. Outline What are b-trees? How does the algorithm work? –Insertion –Deletion Complexity What are b-trees used for?
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
Presenters : Virag Kothari,Vandana Ayyalasomayajula Date: 04/21/2010.
CPS216: Data-intensive Computing Systems
CS522 Advanced database Systems
Record Storage, File Organization, and Indexes
Indexing Goals: Store large files Support multiple search keys
CMPS 3130/6130 Computational Geometry Spring 2017
CPSC-608 Database Systems
Segment tree and Interval Tree
File Organizations Chapter 8 “How index-learning turns no student pale
External Memory Hashing
Spatio-Temporal Databases
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Joining Interval Data in Relational Databases
Database Design and Programming
2018, Spring Pusan National University Ki-Joune Li
CPS216: Advanced Database Systems
CPSC-608 Database Systems
Self-organizing Tuple Reconstruction in Column-stores
CPSC-608 Database Systems
File Organizations and Indexing
Presentation transcript:

Hans-Peter Kriegel, Martin Pfeifle, Marco Pötke, Thomas Seidl A Cost Model for Interval Intersection Queries on RI-Trees Institute for Computer Science University of Munich, Germany Database Group SSDBM 2002, Edinburgh

07/25/02 Martin Pfeifle Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work

07/25/02 Martin Pfeifle Extended Objects in Databases t 1D Objects: Temporal data Approximate values Interval constraints … 2D Objects: Geographic data VLSI design Bitemporal data … 3D Objects: CAD documents Digital mockup Haptic rendering … t Interval query Box query Window query

07/25/02 Martin Pfeifle Integration of Access Methods Declarative Embedding Object-relational DML and DDL Extensible Indexing Framework query processing index_open() index_fetch() index_close() maintenance index_create() index_drop() index_insert() index_delete() index_update()

07/25/02 Martin Pfeifle Integration of Access Methods Extensible Indexing Framework Declarative Embedding Object-relational DML and DDL Physical Implementation Block-Manager, Caches, Locking, Logging, … User-defined Index Structure Extensible Indexing Framework Object-relational interface for index maintenance and querying functions. Relational Implementation Mapping to built-in indexes (B + -trees); SQL-based query processing

07/25/02 Martin Pfeifle Integration of Access Methods User-defined Index Structure Extensible Indexing Framework Object-relational interface for index maintenance and querying functions. Relational Implementation Mapping to built-in indexes (B + -trees); SQL-based query processing Physical Implementation Block-Manager, Caches, Locking, Logging, … Declarative Embedding Object-relational DML and DDL Extensible Optimization Framework optimization stats_collect() stats_delete() predicate_sel() index_io_cost()

07/25/02 Martin Pfeifle Integration of Access Methods User-defined Index Structure Extensible Indexing Framework Object-relational interface for index maintenance and querying functions. Relational Implementation Mapping to built-in indexes (B + -trees); SQL-based query processing Physical Implementation Block-Manager, Caches, Locking, Logging, … User-defined Cost Model Extensible Optimization Framework Object-relational interface for selectivity estimation and cost prediction functions. Relational Implementation Mapping to built-in statistics facilities; SQL-based evaluation of cost model Declarative Embedding Object-relational DML and DDL

07/25/02 Martin Pfeifle Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work

07/25/02 Martin Pfeifle 3a3a 15 a 12 c 5c5c 15 a Relational Interval Tree (RI-Tree) alice bob chris dave 7b7b 1b1b 13 d Foundation: Interval Tree [Edelsbrunner 1980] primary structure: binary search tree on possible endpoints secondary structure: sorted lists of stored endpoints  each interval is registered at exactly one node [Kriegel, Pötke, Seidl VLDB 2000]

07/25/02 Martin Pfeifle RI-Tree: Virtual Primary Structure root = 2 h–1 12 h – 1 no materialization of the binary tree storage cost O(1): parameter root fixed data space: root = 2 h–1 covers [1..2 h – 1]  first step: virtualize the primary structure

07/25/02 Martin Pfeifle RI-Tree: Relational Secondary Structure a3a 15 a 12 c 5c5c 15 a 7b7b 1b1b 13 d  second step : manage secondary structure by two B + -trees storage of n intervals: O(n/b) disk blocks of size b insert and delete: O(log b n) disk block accesses in the indexes nodelower id bacdbacd nodeupper id bcadbcad lowerIndex (node,lower,id) upperIndex (node,upper,id)

07/25/02 Martin Pfeifle upper = 2522 = lower RI-Tree: Interval Intersection Query 16 = root 24 = fork  h = 5 23

07/25/02 Martin Pfeifle upper = 2522 = lower RI-Tree: Interval Intersection Query 16 = root 20  h = 5 select id from upperIndex i, leftNodes left where i.node = left.node and i.upper >= .lower

07/25/02 Martin Pfeifle upper = 2522 = lower RI-Tree: Interval Intersection Query 24 = fork  h = 5 select id from upperIndex i, leftNodes left where i.node = left.node and i.upper >= .lower union all select id from upperIndex i where i.node between .lower and .upper 23

07/25/02 Martin Pfeifle upper = 2522 = lower RI-Tree: Interval Intersection Query  h = 5 select id from upperIndex i, leftNodes left where i.node = left.node and i.upper >= .lower union all select id from upperIndex i where i.node between .lower and .upper union all select id from lowerIndex i, rightNodes right where i.node = right.node and i.lower <= .upper

07/25/02 Martin Pfeifle upper = 2522 = lower RI-Tree: Interval Intersection Query 16 = root 24 = fork  h = 5  I/O complexity: O(h·log b n + r/b) select id from upperIndex i, leftNodes left where i.node = left.node and i.upper >= .lower union all select id from upperIndex i where i.node between .lower and .upper union all select id from lowerIndex i, rightNodes right where i.node = right.node and i.lower <= .upper 23

07/25/02 Martin Pfeifle Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work

07/25/02 Martin Pfeifle join I/O (T,  ) = Gaps left (  ) Gaps right (  )  root  I/O Cost Model for Interval Intersections h = 5 T upperIndex(node, upper, id) lowerIndex(node, lower, id) B  root output I/O (T,  ) =  (T,  )·B O( h·log b n + r/b )

07/25/02 Martin Pfeifle Selectivity Estimation Histogram-based: (equi-width histogram) – replication of intervals intersection multiple buckets – statistics management requires user-defined code Quantile-based: (equi-count histogram) + better adaption to the data distribution + exploits built-in statistics of the ORDBMS analogously to r left

07/25/02 Martin Pfeifle Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work

07/25/02 Martin Pfeifle Experimental Evaluation Datasets UNIREAL

07/25/02 Martin Pfeifle Experimental Evaluation Computation of Statistics

07/25/02 Martin Pfeifle Experimental Evaluation Selectivity Estimation UNIREAL

07/25/02 Martin Pfeifle Experimental Evaluation Selectivity Estimation

07/25/02 Martin Pfeifle Experimental Evaluation Cost Estimation UNIREAL

07/25/02 Martin Pfeifle Experimental Evaluation Cost Estimation UNIREAL

07/25/02 Martin Pfeifle Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work

07/25/02 Martin Pfeifle Conclusions and Future Work Relational access methods: – employ an ORDBMS as virtual machine – extensible indexing and optimizing framework Indexing extended objects: – Relational Interval Tree Development of cost models: – estimation of selectivity and I/O cost Conclusions: Future Work: Cost models: – general interval relationships – interval sequences

07/25/02 Martin Pfeifle Any questions? ? ? ? ? ? ? ?