Download presentation
Presentation is loading. Please wait.
Published byJerome Hamilton Modified over 9 years ago
1
GiST, Concluded, and Query Execution Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 16, 2008 Content on hashing and sorting courtesy Ramakrishnan & Gehrke
2
2 Generalizing B+ Trees and R Trees: GiST Question: can we create an index toolkit that can be customized for any kind of index we’d like? GiST doesn’t quite get there – but a nice framework for understanding the concepts Observation: some aspects of the B+ Tree and R Tree that can be drawn out: Height balanced – requires re-organization Data on leaves; intermediate nodes help focus on a child High fan-out GiST is actually used in PostgreSQL
3
3 Similarities and Differences Recall differences we saw between B+ Trees and R Trees: B+ Tree: one-dimensional full ordering among data R Tree: many-dimensional no complete ordering among the data means that intermediate nodes’ children may fit in more than one place also, intermediate node’s bounding box may have lots of space that’s not occupied by child nodes (relates to “curse of dimensionality”) needs to rely on heuristics about where to insert a node may need to search many possible paths along the tree
4
4 Search: Do We Explore a Path? Each subtree node can be tested – Consistent(node, predicate) Returns FALSE only if the node or its children can’t contain the predicate – i.e., can return false positives What does this correspond to in a B+ Tree? An R Tree? What additional work needs to be done if contains() returns true?
5
5 Insertion – Simple Case Union(entrySet) returns predicate Returns a predicate (e.g., bounding box) that holds over a set (e.g., of children) – basically a disjunction Compress(entry) returns entry’ Simplifies the predicate of the entry, potentially increasing the chance of false positives Examples: simplify polygon; truncate string Decompress(entry) returns entry’ The inverse of the above – may have false positives Penalty(entry1, entry2) Gives the cost of inserting entry2 into entry1 Example: how much does it expand bounding rectangle?
6
6 Insertion – Splitting PickSplit(entrySet) returns Splits the set of children in an intermediate node into two sets Typically split according to a “badness metric”
7
7 Basic Routines I How do we search for a node set satisfying a predicate? Search(node, predicate) For every nonleaf, if Consistent, call recursively on children; return union of result sets from children For leaf, if Consistent, return singleton set, else empty set Ordered domains: FindMin(root, predicate), Next(root, predicate, current) Used to do a linear scan of the data at the leaves, if ordered
8
8 Basic Routines II How do we insert a node? Insert(node, new, level) L = ChooseSubtree(node, new, level) If room in L, insert new as a child, else invoke Split(node, L, new) AdjustKeys(node, L) ChooseSubtree(node, new, level) returns node at level Recursively descend tree, minimizing Penalty If at desired level, return node Among child entries in node, find one with minimal Penalty, return ChooseSubtree on that child
9
9 Helper Functions Split(root, node, new) Invoke PickSplit on union of children of node and new Put one partition into node and create a new node’ with the remainder Insert all of node’ children into Parent(node) – if insufficient space, invoke PickSplit on this parent Modify the predicate describing node AdjustKeys(root, node) If node = root, or predicate referring to node is correct, return Otherwise, modify predicate for node to contain the union of all keys of node recursively call AdjustKeys(root, parent(node))
10
10 Query Execution Takes a “physical plan” and attempts to use it to produce query answers Some considerations in building execution engines: Efficiency – minimize copying, comparisons Scheduling – make standard code-paths fast Data layout – how to optimize cache behavior, buffer management, distributed execution, etc.
11
11 Execution System Architectures Central vs. distributed vs. parallel vs. mediator Data partitioning – vertical vs. horizontal Monet model – binary relations Distributed – data placement One operation at a time – INGRES Pipelined Iterator-driven Dataflow-driven Hybrid approaches
12
12 Execution Strategy Issues Granularity & parallelism: Pipelining vs. blocking Materialization Select Client = “Atkins” Join PressRel.Symbol = Clients.Symbol Scan PressRel Scan Clients Join PressRel.Symbol = EastCoast.CoSymbol Project CoSymbol Scan EastCoast
13
13 Iterator-Based Query Execution Execution begins at root open, next, close Propagate calls to children May call multiple child nexts “Synchronous pipelining” Minimize copies Efficient scheduling & resource usage Can you think of alternatives and their benefits? Select Client = “Atkins” Join PressRel.Symbol = Clients.Symbol Scan PressRel Scan Clients Join PressRel.Symbol = EastCoast.CoSymbol Project CoSymbol Scan EastCoast
14
14 The Simplest Method Iteration over tables Sequential scan Nested loops join What’s the cost? What tricks might we use to speed it up? Optimizations: Double-buffering Overlap I/O and computation Prefetch a page into a shadow block while CPU processes different block Requires second buffer to prefetch into Switch to that when the CPU is finished with the alternate buffer Alternate the direction of reads in file scan
15
15 Speeding Operations over Data Three general data organization techniques: Indexing Associative lookup & synopses Sorting Hashing
16
16 Indices GiST and B+ Trees Alternatives for storage: ; ; Clustered vs. unclustered Bitmapped index – bit position for each value in the domain Requires a domain with discrete values (not necessarily ordinal) Booleans; enumerations; range- bounded integers Low-update data Efficient for AND, OR only expressions between different predicates
17
17 Usefulness of Indices Where are these structures most useful? Sargable predicates Covering indices In many cases, only help with part of the story Filter part of the answer set, but we still need further computation e.g., AND or OR of two predicates General rule of thumb: Unclustered index only useful if selectivity is < 10-20% Impact of flash memory (SSDs)?
18
Sorting – External Binary Sort Divide and conquer: sort into subfiles and merge Each pass: we read & write every page If N pages in the file, we need: d log 2 (N) e + 1 passes to sort the data, yielding a cost of: 2N d log 2 (N) e + 1 Input file 1-page runs 2-page runs 4-page runs 8-page runs PASS 0 PASS 1 PASS 2 PASS 3 9 3,46,29,48,75,63,12 3,4 5,62,64,97,8 1,32 2,3 4,6 4,7 8,9 1,3 5,62 2,3 4,4 6,7 8,9 1,2 3,5 6 1,2 2,3 3,4 4,5 6,6 7,8
19
General External Merge Sort To sort a file with N pages using B buffer pages: Pass 0: use B buffer pages. Produce d N / B e sorted runs of B pages each Pass 2, …, etc.: merge B-1 runs B Main memory buffers INPUT 1 INPUT B-1 OUTPUT Disk INPUT 2... How can we utilize more than 3 buffer pages?
20
Cost of External Merge Sort Number of passes: 1+ d log B-1 d N / B ee Cost = 2N * (# of passes) With 5 buffer pages, to sort 108 page file: Pass 0: d 108/5 e = 22 sorted runs of 5 pages each (last run is only 3 pages) Pass 1: d 22/4 e = 6 sorted runs of 20 pages each (final run only uses 8 pages) Pass 2: d 6/4 e = 2 sorted runs, 80 pages and 28 pages Pass 3: Sorted file of 108 pages
21
21 Applicability of Sort Techniques Join Intersection Aggregation Duplicate removal as an instance of aggregation XML nesting as an instance of aggregation
22
22 Merge Join Requires data sorted by join attributes Merge and join sorted files, reading sequentially a block at a time Maintain two file pointers While tuple at R < tuple at S, advance R (and vice versa) While tuples match, output all possible pairings Maintain a “last in sequence” pointer Preserves sorted order of “outer” relation Cost: b(R) + b(S) plus sort costs, if necessary, plus buffer In practice, approximately linear, 3 (b(R) + b(S))
23
23 Hashing Several types of hashing: Static hashing Extensible hashing Consistent hashing (used in P2P; we’ll see later)
24
Static Hashing Fixed number of buckets (and pages); overflow when necessary h(k) mod N = bucket to which data entry with key k belongs Downside: long overflow chains h(key) mod N h key Primary bucket pages Overflow pages 2 0 N-1
25
Extendible Hashing If a bucket becomes full split in half Use directory of pointers to buckets, double the directory, splitting just the bucket that overflowed Directory much smaller than file, so doubling it is much cheaper Only one page of data entries is split Trick lies in how hash function is adjusted!
26
Example Directory is array of size 4. For r’s bucket, take last ‘global depth’ # bits of h(r); we denote r by h(r) If h(r) = 5 = binary 101, it is in bucket pointed to by 01 Insert: If bucket is full, split it (allocate new page, re-distribute) If necessary, double directory. (As we will see, splitting a bucket does not always require doubling; we can tell by comparing global depth with local depth for the split bucket.) 13* 00 01 10 11 2 2 2 2 2 LOCAL DEPTH GLOBAL DEPTH DIRECTORY Bucket A Bucket B Bucket C Bucket D DATA PAGES 10* 1*21* 4*12*32* 16* 15*7*19* 5*
27
Insert h(r)=20 (Causes Doubling) 20* 00 01 10 11 2 2 2 2 LOCAL DEPTH 2 2 DIRECTORY GLOBAL DEPTH Bucket A Bucket B Bucket C Bucket D Bucket A2 (`split image' of Bucket A) 1* 5*21*13* 32* 16* 10* 15*7*19* 4*12* 19* 2 2 2 000 001 010 011 100 101 110 111 3 3 3 DIRECTORY Bucket A Bucket B Bucket C Bucket D Bucket A2 (‘split image' of Bucket A) 32* 1*5*21*13* 16* 10* 15* 7* 4* 20* 12* LOCAL DEPTH GLOBAL DEPTH
28
Points to Note 20 = binary 10100 Last 2 bits (00) r belongs in A or A2 Last 3 bits needed to tell which Global depth of directory: Max # of bits needed to tell which bucket an entry belongs to Local depth of a bucket: # of bits used to determine if an entry belongs to this bucket When does bucket split cause directory doubling? Before insert, local depth of bucket = global depth Insert causes local depth to become > global depth; directory is doubled by copying it over and `fixing’ pointer to split image page (Use of least significant bits enables efficient doubling via copying of directory!)
29
Comments on Extendible Hashing If directory fits in memory, equality search answered with one disk access; else two Directory grows in spurts, and, if the distribution of hash values is skewed, directory can grow large Multiple entries with same hash value cause problems! Delete: If removal of data entry makes bucket empty, can be merged with ‘split image’ If each directory element points to same bucket as its split image, can halve directory
30
30 Relevance of Hashing Techniques Hash indices use extensible hashing Uses of static hashing: Aggregation Intersection Joins
31
31 Hash Join Read entire inner relation into hash table (join attributes as key) For each tuple from outer, look up in hash table & join Not fully pipelined
32
32 Running out of Memory Prevention: First partition the data by value into memory- sized groups Partition both relations in the same way, write to files Recursively join the partitions Resolution: Similar, but do when hash tables full Split hash table into files along bucket boundaries Partition remaining data in same way Recursively join partitions with diff. hash fn! Hybrid hash join: flush “lazily” a few buckets at a time Cost: <= 3 * (b(R) + b(S))
33
33 The Duality of Hash and Sort Different means of partitioning and merging data when comparisons are necessary: Break on physical rule (mem size) in sorting Merge on logical step, the merge Break on logical rule (hash val) in hashing Combine using physical step (concat) When larger-than-memory sorting is necessary, multiple operators use the same key, we can make all operators work on the same in-memory portion of data at the same time Can we do this with hashing? Hash teams (Graefe)
34
34
35
35 What If I Want to Distribute Query Processing? Where do I put the data in the first place (or do I have a choice)? How do we get data from point A point B? What about delays? What about “binding pattern” restrictions? Looks kind of like an index join with a sargable predicate
36
36 Pipelined Hash Join Useful for Joining Web Sources Two hash tables As a tuple comes in, add to the appropriate side & join with opposite table Fully pipelined, adaptive to source data rates Can handle overflow as with hash join Needs more memory
37
37 The Semi-Join/Dependent Join Take attributes from left and feed to the right source as input/filter Important in data integration Simple method: for each tuple from left send to right source get data back, join More complex: Hash “cache” of attributes & mappings Don’t send attribute already seen Bloom joins (use bit-vectors to reduce traffic) Join A.x = B.y AB x
38
38 Wrap-Up Query execution is all about engineering for efficiency O(1) and O(lg n) algorithms wherever possible Avoid looking at or copying data wherever possible Note that larger-than-memory is of paramount importance (Should that be so in today’s world?) As we’ve seen it so far, it’s all about pipelining things through as fast as possible But may also need to consider other axes: Adaptivity/flexibility – may sometimes need this Information flow – to the optimizer, the runtime system
39
39 Upcoming Readings and Talks For Thursday: Read Chaudhuri survey as an overview Read and review Selinger et al. paper For Tuesday: Read Volcano and Starburst papers Write one review contrasting the two on the major issues Especially: how do they handle search, comparison of costs? Note that I’ll be giving a talk in the Dept. Research Seminar, Levine 101, next…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.