Access Methods for Next- Generation Database Systems Marcel Kornacker UC Berkeley.

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Concurrency Control Part 2 R&G - Chapter 17 The sequel was far better than the original! -- Nobody.
CS 440 Database Management Systems Lecture 10: Transaction Management - Recovery 1.
Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY
© Dennis Shasha, Philippe Bonnet 2001 Log Tuning.
Transaction.
Dynamic Granular Locking Approach to Phantom Protection in R-trees Kaushik Chakrabarti Sharad Mehrotra Department of Computer Science University of Illinois.
Transaction Management and Concurrency Control
Transaction Management and Concurrency Control
Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
SIGMOD 99 Efficient Concurrency Control in Multidimensional Access Methods Kaushik Chakrabarti Sharad Mehrotra University of.
Chapter 3: Data Storage and Access Methods
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
Amdb: An Access Method Debugging and Analysis Tool Marcel Kornacker, Mehul Shah, Joe Hellerstein UC Berkeley.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Locking Key Ranges with Unbundled Transaction Services 1 David Lomet Microsoft Research Mohamed Mokbel University of Minnesota.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
Chapter 11 Heap. Overview ● The heap is a special type of binary tree. ● It may be used either as a priority queue or as a tool for sorting.
Lecture 12 Recoverability and failure. 2 Optimistic Techniques Based on assumption that conflict is rare and more efficient to let transactions proceed.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
The Vesta Parallel File System Peter F. Corbett Dror G. Feithlson.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
Chapter 9 Database Systems © 2007 Pearson Addison-Wesley. All rights reserved.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
CS4432: Database Systems II Query Processing- Part 2.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Transaction Management Overview. Transactions Concurrent execution of user programs is essential for good DBMS performance. – Because disk accesses are.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 5 Index and Clustering
Session 1 Module 1: Introduction to Data Integrity
CSC 411/511: DBMS Design Dr. Nan WangCSC411_L12_JDBC_MySQL 1 Transations.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
CS 540 Database Management Systems
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13 Managing Transactions and Concurrency Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Mehdi Kargar Department of Computer Science and Engineering
Practical Database Design and Tuning
Module 11: File Structure
CS522 Advanced database Systems
Transaction Management and Concurrency Control
Chapter 12: Query Processing
Evaluation of Relational Operations: Other Operations
B+-Trees and Static Hashing
Practical Database Design and Tuning
Introduction to Database Systems
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Overview of Query Evaluation
Transactions and Concurrency
Database Recovery 1 Purpose of Database Recovery
Evaluation of Relational Operations: Other Techniques
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Access Methods for Next- Generation Database Systems Marcel Kornacker UC Berkeley

Overview and Motivation this talk’s topic: access method (AM) extensibility –how to support novel AMs in extensible ORDBMSs –not: yet another point, spatial, metric,... AM outline: –AM extensibility architecture for ORDBMSs –concurrency & recovery –AM performance analysis

Overview and Motivation why bother with AMs: –we have B-trees –our customers don’t care well... –new apps need ORDBMS support: GIS, multimedia, genomic sequence databases, etc. –customers do care about fast access to data –B-trees won’t help

Overview and Motivation AM extensibility: what has been done –slew of papers about novel AMs –slew of papers about extensible DBMSs (in 80s!) what needs to be done: –storage-level techniques for ORDBMSs: reconciling functionality with performance and reliability –AMs crucial to performance, deserve special attention

Outline Overview and Motivation High-Performance Extensible Indexing with Generalized Search Trees –AM Support in Commercial DBMSs –GiST Overview –IUS Implementation Overview Concurrency and Recovery Access Method Performance Analysis

AM Support in Commercial DBMSs OR modeling and extensibility very successful in commercial DBMS, but: –AM extensibility has not received same degree of attention –DBMS vendors now struggling to add/improve novel (spatial) AMs State of art: IUS virtual index interface/Oracle extensible indexing interface: –iterator interface: open(), getnext(), close(), insert(), delete() –AM handles internally: locking, recovery, page management, … (same as built-in AMs)

AM Support in Commercial DBMSs (2) What’s wrong with this interface –concurrency and recovery need to be (re-) implemented for each new AM –difficult to implement: AM developer = domain expert, rarely also DBMS internals expert –would prefer to deal purely with AM specifics: Generalized Search Tree

Generalized Search Tree Overview Generalized Search Tree (GiST) = template index structure –extensible set of data types and queries –customize tree behavior through extension methods –examples: B-trees, R-trees, … –details: Hellerstein, Naughton, Pfeffer, VLDB ‘95 GiST provides –basic structure: height-balanced tree –template algorithms: search, insert and delete –no assumptions about keys and how they’re arranged –AM developer provides key-specific functions and particular operational properties

Generalized Search Tree Overview Internal Nodes Leaf Nodes SP 1 SP n SP 2... …..... GiST AM Parameters Union( ) updates SPs Penalty( ) returns insertion cost Subtree predicate (SP)=User-defined key Consistent( ) returns true/false PickSplit( ) splits page items into two groups

Generalized Search Tree Overview More suitable basis for AM extensibility than iterator interface, because: –raises level of abstraction, allows developer to focus on AM specifics/performance- relevant AM properties (clustering, internal predicates) –built-in features (e.g., page handling, tree traversal) need not be re-implemented for specific AMs

Generalized Search Trees Overview Goal of thesis work: making idea industrial strength: –efficiency: Informix Universal Server impl (VLDB ’99) –concurrency & recovery (SIGMOD ’97) –analysis framework (demo SIGMOD ’98)

IUS GiST Datatype extensibility: –IUS allows UDTs to be indexed by built-in AMs –require same degree of datatype extensibility for GiST Good performance: –operations on UDTs implemented via UDFs, which are orders of magnitude slower than regular function calls –GiST needs to avoid large number of UDF calls Intra-page storage format: –original GiST design assumed R-tree-like page layout –IUS GiST needs to allow customized formats (e.g., B-tree, hB-tree) to support compression or simplified access

IUS GiST - Summary Extensibility architecture: –GiST core in server –AM extension in DataBlade, implements GiST API Improved GiST API –page-based, not entry-based: few UDF calls –leave page format to AM extension: very flexible

IUS GiST - Performance Comparison of GiST-based and built-in R-trees in IUS: –built-in R-tree: datatype extensible –software engineering: 1,400 lines of C for GiST R-tree vs. 10,000 lines of C for built-in R-tree (GiST core: about 10,000 lines of C) –performance: identical # of I/OS, GiST uses 14 to 40% less CPU time Reasons for GiST performance advantage: –far fewer UDF calls for GiST R-tree: on average 1 per page (built-in: 1 per page entry) –but: higher set-up cost for GiST: needs to set up descriptors for 11 interface UDFs (built-in: only needs 7 UDFs)

Outline Overview and Motivation High-Performance Extensible Indexing with Generalized Search Trees Concurrency and Recovery –Physical CC - concurrent index operations –Logical CC - transaction isolation –Recovery and how it affects concurrency Access Method Performance Analysis

Concurrent Index Ops Problem: concurrent structure modifications (example: B-tree) Lock-coupling strategy with repositioning (ARIES/IM): requires key ordering B-link tree strategy: compensation during traversal

Concurrent Index Ops Navigating linked nodes: –how to detect node split? –when to stop going right? Structural GiST extensions –global sequence counter –sequence number (NSN) for each node

Concurrency Extensions Remember and compare global counter with NSNs NSNs allow split compensation independently of key properties Can be implemented very efficiently in typical WAL environments

A Word on Latch-Coupling Popular technique for B-trees (ARIES/IM): –hold parent latched while going to child –avoid invalid pointers after node deletion Won’t work for GiST: –GiST search might traverse multiple subtrees –either: keep parent latched while traversing all children (low conc.) –or: reposition in parent after traversing each child –but: repositioning requires partitioning

Transaction Isolation SQL Isolation Levels: –most of it: locks on base table items (unrelated to index) –hard part: “serializable” isolation level (i.e., preventing phantoms) Phantom Problem: –SELECT specifies logical range –need to prevent insertions into that range –can’t lock non-existing items

Transaction Isolation: B-Trees B-Trees: Example of Key-Range Locking –partition data space into intervals –each leaf item corresponds to interval –scan from 5 to 9: lock data in range, lock next key –insert 9: check interval 8-10 (check item 10)

Transaction Isolation: GiSTs But: this doesn’t work for GiSTs! Example: –2-dim point keys and search rectangle –what are next keys? where do we find them in tree?

Predicate Locking Idea: –readers register shared predicate (search qual.), check exclusive predicates –updaters register exclusive predicate (updated item), check shared predicates Compared to key-range locking –perfectly accurate, maximal concurrency –expensive: evaluate lots of predicates –no gradual expansion: (1) lock entire range, even when cursor stops early (2) nearest-neighbor search locks entire tree

Hybrid Locking Instead: novel hybrid mechanism –2-phase locking of retrieved, inserted and deleted data records –restricted pred locking for phantom avoidance (‘‘covering the holes’’) Restricted predicate locking –search predicate attached to every visited node during traversal –structure mods replicate predicate attachments –no insert/delete predicates –insert only checks target leaf’s predicates

Hybrid Locking Example

Hybrid Locking In comparison to predicate locking: –retains perfect accuracy –fewer predicates to check (only those on leaf) –search and delete don’t check predicates –almost gradual expansion In comparison to key-range locking: –structure mods more expensive (replicating predicate attachments) –high implementation complexity (replicating...): difficult/expensive to implement

Node Locking Simplified hybrid locking: –replace pred. attachments with node locks –block structure mods that would need to replicate node locks Comparison to hybrid locking: –diminished accuracy/concurrency –structure mods cheap –higher implementation complexity

AM Recovery Purpose: –bring AM structure into working condition –restore AM contents to reflect committed xacts Reminder: WAL recovery –every update writes log record with redo and undo info –rollback: apply undo portion in reverse chronological order –restart, phase 1: apply redo portion in chronological order –restart, phase 2: undo all uncommitted xacts

GiST Recovery Keys to high concurrency: –separate updates into contents change and structure modification (operation, SMO) –“commit” SMOs separately and immediately –“logical” undo of contents change: re-locate leaf, compensate update, only lock data, not structure Logical undo: –cannot re-traverse tree structure, but can follow rightlinks –in Aries/IM, need tree-global latch to avoid this situation

Summary & Conclusion GiST concurrency and recovery techniques: –Concurrency: adaption of link technique –Isolation: hybrid predicate/2-phase locking –Recovery: WAL-based a la ARIES –invisible to AM developer: can write industrial-strength AMs without knowledge of server internals

Outline Overview and Motivation High-Performance Extensible Indexing with Generalized Search Trees Concurrency and Recovery Access Method Performance Analysis –Motivation –Goals of Framework –Analysis Framework with Examples –amdb –Conclusion

Motivation Access method (AM) design and tuning is a black art –Which AM do I use to index my non-traditional data type? –What are contributions of individual design ideas? –How to explain performance differences between AMs? –How to assess AMs on their own? Current practice of performance analysis little help: –aggregate runtime or I/Os numbers provide no explanations –measuring semantic properties (e.g., spatial overlap) is domain-specific, not useful for general analysis framework

Goals of Analysis Framework (1) Measure performance in context of specific workload (data and queries) –recognize workload as part of analysis input –compare workloads by running against same AM –allows tuning of AM for specific workload Performance metrics characterize observed performance (I/Os) –independent of data or query semantics –reflects purpose of performance tuning

Goals of Analysis Framework (2) Metrics express performance loss –loss = difference between observed and optimal performance –shows potential for performance improvement –optimal performance obtained by executing workload in optimal tree –fixed point of reference allows AM to be assessed on its own Loss metrics for each query, node of input tree and structure-shaping aspects of implementation –breakdown allows performance flaws to be traced back, facilitates assessment of individual design aspects

Analysis Framework: Overview (1) Performance-relevant aspects of tree: –Clustering: determines amount of data that query accesses beyond result set –Page Utilization: determines number of pages that data occupy –SPs: excess coverage (covers more data space than is present in subtree) leads to extra traversals subdivide loss along those factors –break-down provides more detailed clues about cause of performance loss

Analysis Framework: Overview (2) Query Metrics: –run each query in actual and optimal tree to obtain performance loss –workload metrics: sum over all queries Node metrics: –node’s contribution to aggregate loss –obtained by computing per-node metrics for each query and aggregating over workload Implementation metrics: –measure how much pickSplit() and penalty() deteriorate tree –obtained by running sample splits and insertions

amdb Analysis, visualization and debugging tool that implements analysis framework Accepts AMs written for libgist Available at

amdb Features Tracing of insertions, deletions, and searches Debugging operations: breakpoints on node splits, updates, traversals, and other events Global and structural views of tree allow navigation and provide visual summary of statistics Graphical and textual view of node contents Analysis of workloads and tree structure

Node Visualization Node View Displays bounding predicates (SPs) and items within nodes. Node Contents Provides textual description of node Split Visualization Shows how SPs or data items are divided with PickSplit( ) Highlights BPs on current traversal path.

Leaf-Level Statistics I/O counts and corresponding overheads under various scenarios { Breakdown of losses against optimal clustering { Total or per query breakdown Global View Provides summary of node statistics for entire tree Tree View Also displays node stats

Construction of Optimal Tree Optimal tree: optimal clustering, optimal page utilization, no excess coverage Optimal leaf level: –partition items to minimize number of page accesses –partition size = target page utilization –workload can be modeled as hypergraph, approximate clustering with hypergraph partitioning algorithm Optimal internal levels: –cannot be constructed analogously –but: we assume target page utilization and no excess coverage

Performance Metrics - Sample Query XXXX--- X XX- XXX clustering loss exc. cov. loss utilization loss X - - XX--XX-- Optimal Clustering: Actual Tree:

Example 1: Unindexability Test Unindexable workload: aggregate leaf accesses in optimal tree take longer than sequential scan for each query –typical ratio of sequential/random I/Os: 14:1 Conclusion: uniformly-distributed multi- dimensional point sets mostly unindexable

Example 2: Comparison of SP designs Goal: assess effect of SP on nearest-neighbor AMs –R*-trees: bounding rectangles, SS-trees: bounding spheres, SR-trees: combination of BRs and BSs Experiment: –bulk-load 3 trees with identical data (only internal levels differ) –data: 8-dim points, arranged into sphere-shaped clusters –measure excess coverage loss Results: –R*- and SR-tree identical, SS-tree order of magnitude worse at leaf level –SR-tree paper came to contrary conclusion, because it compared insertion-loaded trees

Summary (1) Analysis framework –workload (= data and queries) is part of input –comparison of observed with optimal performance –metrics express performance loss, subdivided into clustering, utilization, excess coverage loss Advantages over current practice: –fixed point of reference allows AM to be assessed on its own –metrics more meaningful than aggregate I/O numbers, facilitate evaluation of individual design ideas –metrics are independent of data or query semantics

Summary (2) amdb –implements framework, along with visualization and debugging features –utilizes hypergraph partitioning to approximate opt. clustering –user experience verifies usefulness of framework and tool

Conclusion GiST-based AM extensibility is effective: –reduces implementation complexity w/o performance/robustness penalty Focus on template data structure leads to general solutions: –generally applicable concurrency & recovery protocols, performance analysis framework

Backup slides

Generalized Search Tree Overview Search: –traverse all subtrees for which Consistent(SP, qual) is true; return all leaf items for which Consistent(item, qual) is true Insert: –find leaf by following single path from root, guided by Penalty() –if leaf is full, call PickSplit() to determine split info, then perform split (recursively) –if SP needs to be updated, call Union(old SP, new item) to determine new SP

Generalized Search Tree Overview GiST = simple abstraction of search tree –simple algorithms for basic functionality (search, insert, delete); easy to comprehend and extend –full control over performance-relevant properties of tree: clustering and page util (PickSplit() and Penalty()), SPs –captures essence of indexing: organize data into clusters, build directory structure to locate clusters

Extensibility Architecture: Overview GiST CoreAM Extension Datatype Adapter

Extensibility Architecture: Overview (2) GiST core: –built into server, written by ORDBMS vendor –defines GiST interface, exports page mgmt interface –implements iterator interface defined by server –calls AM extension during insert(), delete(), getnext() –fully encapsulates locking and logging (to be shown) AM extension: –written by AM developer –defines extension interface –implements GiST interface defined by GiST core –calls GiST’s page mgmt interface functions

Extensibility Architecture: Overview (3) Datatype adapter: –written by (datatype) domain expert –implements extension interface defined by AM extension

GiST Interface Overview Page-based, not entry-based –reduce # of UDF calls from 1 per entry to 1 per page –find_min_pen(): find page entry with smallest insertion penalty –search(): find matching entries on page Leave page format to AM extension: –call AM extension fcts to update or extract data on pages –insert(): add entry to page –update_pred(): update predicate part of entry –get_key(): extract key from page entry –remove(): remove entries from page

Conclusion GiST works! –reduces AM implementation complexity considerably by factoring out common operation functionality (no locking and logging) –separation of functionality achieves tight integration of external AM with server: same degree of concurrency and recoverability as built-in AMs –improved GiST interface reduces # of costly UDF calls: more efficient than (naïve) built-in extensible AM –improved GiST interface increases flexibility: customizable page layout

Implementation Details Two mechanisms for synchronization in DBMS Latches: –like mutexes, addressed physically –used for physical mem-resident resources (buffers) –no “latch manager”, no “deadlatch” detection –cheap Locks –addressed logically (lock name = integer) –used for non-mem-resident resources (pages, tuples) –lock manger (hash table) provides deadlock detection –not as cheap

Implementation Details (cont.) use latches to synchronize access to index pages (physical concurrency control) use locks to synchronize access to data (logical concurrency control) want deadlock-free protocol for AMs: can use latches want to avoid node latches during I/Os for GiST: high concurrency

Node Deletion No latch-coupling possible, can’t avoid incorrect pointers must delay deletion until there are no more traversals with pointer to node how to: –traversal sets S-lock on node when reading pointer –deletion sets X-lock on node –not pretty: replicate S-locks on node split (or prevent them)

Implementing NSNs Global counter can become bottleneck: –read at each visited node –avoid by storing maximum child NSN in parent Global counter needs to be recoverable –end-of-log LSN is a good candidate –if not available, need to write log records

Transaction Isolation SQL Isolation Levels and how to achieve them –read uncommitted: no locks –read committed: instant-duration locks –repeatable read (read set won’t change): xact-dur. locks –serializable (same op, same result, no phantoms): ?? Phantom Problem: –SELECT specifies logical range –need to prevent insertions into that range –can’t lock non-existing items

AM Recovery Recovery affects concurrency: need to lock what you log (until commit) Keys to high concurrency: –separate updates into contents change and structure modification (operation, SMO) –“commit” SMOs separately and immediately –“logical” undo of contents change: re-locate leaf, compensate update, only lock data, not structure

SMOs Short-duration, recoverable, atomic, but outside of transaction “Committed” through nested top action

Logical Undo Example for compensating action: re-traverse tree and delete leaf item to undo insert op Recovery affects concurrency, part II: at restart, no latches to protect inconsistent structure in GiST, cannot re-traverse tree, but can follow rightlinks in B-tree (Aries/IM), need tree-global latch to avoid this situation

Debugging Operations Tree View Shows structural organization of index. Highlights current traversal path during debugging steps. Stepping Controls Breakpoint Table Defines and enables breakpoints on events Console window Displays search results, debugging output, and other status info.

Subtree Predicate Statistics Views highlight nodes traversed by query Query breakdown in terms of “empty” and required I/O. Excess Coverage Overheads due to loose/wide BPs

Conclusion GiST-based AM extensibility is effective: –reduces implementation complexity w/o performance/robustness penalty Focus on template data structure leads to general solutions: –generally applicable concurrency & recovery protocols, performance analysis framework Open problems: –automatic reorganization –automatic SP refinement –automatic AM design?

Motivation^2 Before we launch into the details... –why worry about databases? –what are ORDBMSs again? –access methods...?

Motivation^2 (2) Why worry about databases –declarative access –transaction - simplicity for application –concurrency - 10,000s of simultaneous users –recovery - restore data after failure –flexibility - separate data from application –scalability - automatic parallelization

Motivation^2 (3) What are ORDBMSs again? –“object-relational DBMS” –combines object-oriented concepts with relational DBMS functionality –in practice: extensibility of DBMS with new data types –now supported by most big vendors (IBM, Informix, Oracle), standardized in SQL3

Motivation^2 (4) Example: Geospatial data in Informix –implementation of user-defined type “GeoObject” provided by dynamic library –register type and functions with system: create opaque type GeoObject (...); create function Contains (GeoObject, GeoObject) returns boolean external name “ ”; –store inside DBMS: create table customers (..., loc GeoObject,...); insert into customers (...) values (..., “ ”,...); –query data: select * from customers where Contains(“ ”, loc);

Motivation^2 (5) Access methods...? –aka indices, e.g., B-trees –more generally: persistent data structure for associative access –B-tree reminder: height-balanced tree structure data at leaves in sorted order directory structure in internal nodes search proceeds from root to leaf predictable performance