SIGMOD 99 Efficient Concurrency Control in Multidimensional Access Methods Kaushik Chakrabarti Sharad Mehrotra University of Illinois at Urbana Champaign University of California at Irvine Presented at ACM SIGMOD Conference June 1, 1999
SIGMOD 99 Outline of talk Introduction Background Phantom protection in Generalized Search Trees –Define granules –Describe lock protocols Experiments Conclusion
SIGMOD 99 Introduction Increasing number of applications deal with multidimensional data Examples: spatial (CAD, GIS), spatio-temporal (moving objects, weather) DBMS should allow applications to: (1) define their own data types and operations (2) define multidimensional access methods (AMs) for those data types for efficient query processing OR technology solves (1) Generalized Search Trees (GiSTs) addresses (2)
SIGMOD 99 Introduction For successful integration, we need to support concurrent accesses via GiST Concurrency control problems: (1) Preserve consistency of data structure (2) Prevent phantom anomalies (1) has been addressed in Kornacker, Mohan and Hellerstein, SIGMOD97 This paper addresses the problem of phantom protection in GiSTs
SIGMOD 99 Phantom Definition : –T1 reads a set of items satisfying some –T2 creates data items that satisfy T1’s and commits –T1 repeats its scan with the same, gets a different set of items Serializability No phantoms
SIGMOD 99 Example
SIGMOD 99 Solution Predicate locks: costly Granular locks: efficient
SIGMOD 99 Key Range Locking ARIES/KVL(Mohan, 1990)
SIGMOD 99 Phantoms in Spatial/Spatio-temporal Databases Compute average rainfall over all locations a 2-d region where the locations are indexed using a GiST Get all objects in a given region from a moving objects database where the objects are indexed using a GiST
SIGMOD 99 Solutions Adapting KRL: too costly. Predicate locking based strategy by Kornacker, Mohan and Hellerstein, SIGMOD97. Our granular locking based approach for phantom protection in R-trees, ICDE98. Does not work well when applied to GiSTs (details in paper)
SIGMOD 99 Granular Locking in GiST Solution involves –Define the granules –Define the lock protocol for the operations Challenges –“nice’’ granules –handling overlap among granules –handling “loss of lock’’ problem –high concurrency and low lock overhead
SIGMOD 99 GiST Keys can be arbitrary predicates An AM can be implemented by specifying some extension methods which dictate the tree operations
SIGMOD 99 Granules in GiST Leaf Granules : One per leaf node Non-leaf granules : One per non-leaf node Lock name: Lock Coverage: defined by Granule Predicate (GP) –GP(N) = BP(N) if N is root = BP(N) GP(P) otherwise, P=parent(N)
SIGMOD 99
SIGMOD 99 Locks
SIGMOD 99 Overlap between granules Correctness: p p’ lset(p) lset(p’) NULL Problem does not arise in KRL Policies –Overlap-for-Search & Cover-for-Insert (OSCI) –Cover-for-Search & Overlap-for-Insert (CSOI)
SIGMOD 99 Loss of lock coverage
SIGMOD 99 Search Protocol Get commit duration S lock on the granule corresponding to each index node visited Correctness: –GP(T) Q is satisfiable i (Consistent(BP(P i ), Q), P i is ancestor of T Note –No object locks –No extra cost except that of acquiring the lock (no extra checks)
SIGMOD 99 Insert Protocol Correctness: –full coverage –prevent phantoms due to loss of lock coverage
SIGMOD 99 Insert Protocol Case 1: No growth, No split –commit duration IX lock on g (target granule) –commit duration X lock on O Case 2: Growth, No split –2 locks as before –short duration IX lock on lowest unchanged node (LU-node)
SIGMOD 99 Example
SIGMOD 99 Insert Protocol Case 3: No growth, Split –instant duration SIX on g –commit duration IX on whichever contains O after split; X on O –instant duration SIX on each ancestor that splits Case 4: Growth, Split –lock requirements of Cases 2 and 3
SIGMOD 99 Deletion Protocol Problem: g does not cover O after deletion commit duration lock on LU-node We do: –logical deletion (IX on target granule, X on object) –defer physical deletion till transaction commits
SIGMOD 99 Protocol for Other Operations ReadSingle: S lock on object UpdateSingle: –if indexed attributes not changed, IX on g, X on O –else, deletion followed by insertion UpdateScan: same as search for the region, same as updatesingle for every object updated
SIGMOD 99 Empirical Evaluation Data sets: –2-d spatial data: 62,556 2-d points from Sequoia 2000 benchmark –3-d feature data: First 3 Fourier coefficients from 480,471 Fourier vectors
SIGMOD 99 Measurements & Parameters Performance: Throughput (tps) Concurrency: Conflict ratio Overhead: #locks, # pred. Checks Parameters: MPL, transaction size, write probability, query size, external think time (fixed 3sec), restart delay (fixed 3sec)
SIGMOD 99 Implementation
SIGMOD 99 Performance 2-d data3-d data
SIGMOD 99 Performance/Concurrency Under various loadsConflict ratio
SIGMOD 99 Overhead SearchInsert
SIGMOD 99 Conclusions GL is significantly more efficient than PL We expect the performance gap to increase with better implementation (mainly LM) Dimensionality curse is a problem in GL Can be integrated with a consistency protocol for complete solution to concurrency control in multidimensional AMs