On the analysis of indexing schemes

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Analysis of Algorithms CS Data Structures Section 2.6.
Algorithms & Complexity
Asymptotic Notation (O, Ω,  ) s Describes the behavior of the time or space complexity for large instance characteristics s Common asymptotic functions.
Tables Lesson 6. Skills Matrix Tables Tables store data. Tables are relational –They store data organized as row and columns. –Data can be retrieved.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
Chapter 3.1 : Memory Management
Dean H. Lorenz, Danny Raz Operations Research Letter, Vol. 28, No
Basic Definitions Data Structures: Data Structures: A data structure is a systematic way of organizing and accessing data. Or, It’s the logical relationship.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
1 Overview of Storage and Indexing Chapter 8 1. Basics about file management 2. Introduction to indexing 3. First glimpse at indices and workloads.
1 Chapter 3.1 : Memory Management Storage hierarchy Storage hierarchy Important memory terms Important memory terms Earlier memory allocation schemes Earlier.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
1.1 Chapter 1: Introduction What is the course all about? Problems, instances and algorithms Running time v.s. computational complexity General description.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
© Pearson Education Limited, Chapter 16 Physical Database Design – Step 7 (Monitor and Tune the Operational System) Transparencies.
Approximation Algorithms Pages ADVANCED TOPICS IN COMPLEXITY THEORY.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Complexity of algorithms Algorithms can be classified by the amount of time they need to complete compared to their input size. There is a wide variety:
A Quantitative Analysis and Performance Study For Similar- Search Methods In High- Dimensional Space Presented By Umang Shah Koushik.
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
Real Zeros of Polynomial Functions. Quick Review.
Algorithm Analysis Part 2 Complexity Analysis. Introduction Algorithm Analysis measures the efficiency of an algorithm, or its implementation as a program,
2.4/2.52.4/2.5 Real Zeros of Polynomial Functions.
Computing Boolean Functions: Exact Quantum Query Algorithms and Low Degree Polynomials Alina Dubrovska, Taisia Mischenko-Slatenkova University of Latvia.
Interactive Data Exploration Using Semantic Windows Alexander Kalinin Ugur Cetintemel, Stan Zdonik.
Scalability for Search Scaling means how a system must grow if resources or work grows –Scalability is the ability of a system, network, or process, to.
تصميم وتحليل الخوارزميات عال311 Chapter 3 Growth of Functions
Algorithm Complexity L. Grewe 1. Algorithm Efficiency There are often many approaches (algorithms) to solve a problem. How do we choose between them?
B-Trees Katherine Gurdziel 252a-ba. Outline What are b-trees? How does the algorithm work? –Insertion –Deletion Complexity What are b-trees used for?
ProblemData StructuresLower Bound Preprocess a set of N 3-dimensional points into an I/O-efficient data structure, such that all points inside an axis.
Coding and Algorithms for Memories Lecture 6 1.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Fast Subsequence Matching in Time-Series Databases.
8.1 Determine whether the following statements are correct or not
Practical Database Design and Tuning
An Efficient Algorithm for Incremental Update of Concept space
Record Storage, File Organization, and Indexes
Indexing Goals: Store large files Support multiple search keys
GC 211:Data Structures Week 2: Algorithm Analysis Tools
Scalability for Search
Introduction Algorithms Order Analysis of Algorithm
GC 211:Data Structures Algorithm Analysis Tools
Online parameter optimization for elastic data stream processing
Chapter 11: File System Implementation
CS 584 Lecture 3 How is the assignment going?.
Spatial Indexing I Point Access Methods.
Methodology – Physical Database Design for Relational Databases
Dynamic Coverage In Wireless Ed-Hoc Sensor Networks
CSCI 2670 Introduction to Theory of Computing
國立臺北科技大學 課程:資料庫系統 fall Chapter 18
Chapter 11: Indexing and Hashing
Automatic Physical Design Tuning: Workload as a Sequence
CS222: Principles of Data Management Notes #09 Indexing Performance
Chapter 11: File System Implementation
Practical Database Design and Tuning
So far… Text RO …. printf() RW link printf Linking, loading
Main Memory Background Swapping Contiguous Allocation Paging
CS222P: Principles of Data Management Notes #09 Indexing Performance
A Restaurant Recommendation System Based on Range and Skyline Queries
Database Systems Instructor Name: Lecture-3.
Rounded Off Values Upper and Lower Bounds.
Chapter 11: File System Implementation
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #08 Comparisons of Indexes and Indexing Performance Instructor: Chen Li.
Efficient Aggregation over Objects with Extent
Sunil Agarwal | Principal Program Manager
Presentation transcript:

On the analysis of indexing schemes Written by: Joseph M. Hellerstein Elias Koutsoupias Christos H. Papadimitriou Presented by Tali Kaufman

Presentation layout Problem definition - define a framework to measure the efficiency of an index. Performance factors - access overhead and storage redundancy. Range-queries access overhead upper bound access overhead lower bound (r = 1) access overhead lower bound (r >= 1) Set-queries worst-case access overhead conclusions open problems

The problem Problem - define a framework for measuring the efficiency of an indexing scheme for a workload, based on two performance factors: storage redundancy and access overhead. Workload - a definition of a data set and a set of potential queries. Indexing scheme - a collection of blocks, which store an actual data set instance.

Workload definition

Example - a workload with two dimensional range queries

Indexing scheme definition

Performance factors definition

Access overhead upper bound for two dimensional range queries

Access overhead lower bound (redundancy = 1)

Access overhead lower bound (redundancy = 1) [cont]

Access overhead lower bound (redundancy 1)

Access overhead lower bound (redundancy 1) [cont]

Access overhead lower bound (redundancy 1) [cont]

Access overhead lower bound (redundancy 1) [cont]

Access overhead lower bound (redundancy 1) [cont]

Access overhead lower bound (redundancy 1) [cont]

Example - Set inclusion workloads

Set inclusion workloads worst-case access overhead

Conclusions Theory of indexability- the article presents a framework for studying indexability. Workload and index scheme in indexability theory vs. language and algorithm in complexity theory. Emphasis the secondary storage nature of indexing schemes, examine storage utilization(redundancy) and disk access (access overhead) Consider range queries and set queries and focus on lower bounds and trade-off between redundancy and access overhead The trade-off is worse for workloads with large number of queries (set queries - exponential, range queries - polynomial) Algorithms to find the best access methods (search algorithms), and to find best partition into blocks, are not considered. The size of the instance does not affect the results

Open problems