Download presentation
Presentation is loading. Please wait.
Published byMatthew Kelley Modified over 9 years ago
1
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004
2
Review of Thursday’s Class Bridge tables –Hierarchies –Multi-Valued Dimensions Extraction-Transformation-Load –Data staging area vs. data warehouse –Assigning surrogate keys –Detecting changed rows –Detecting duplicate dimension rows
3
Outline of Today’s Class Database System Architecture –Memory management –Secondary storage (disk) –Query planning process Joins –Nested Loop Join –Merge Join –Hash Join Grouping –Sort vs. Hash
4
Memory Management Modern operating systems use virtual memory –Address space larger than physical main memory –Layer of indirection between virtual addresses and physical address –Physical memory serves as a cache for most recently accessed memory locations –Infrequently accessed memory locations written to disk when cache fills up Memory transferred in large blocks called pages –Disk access: latency is large relative to throughput –Benefit from locality of reference –Transfer overheads amortized over many requests
5
Memory Management Database management systems (DBMSs) use a similar model to virtual memory –Data is stored persistently on disk –Main memory serves as cache for frequently accessed data This memory cache is called the buffer pool –Data organized into blocks called pages DBMSs override built-in virtual memory –Database operations are very data intensive –Careful management of disk access essential for good performance –DBMS has specialized knowledge of its data access patterns –Specialized memory management includes: Prefetching pages likely to be needed in the future Essential / frequently accessed pages pinned in memory Multiple buffer pools with different parameters for different needs –Parameters: Page size, page replacement policy –Needs: Relations, indexes, temporary results, sort space
6
Disk Access Time Seek cost + rotational latency + transfer cost Seek cost –Move the disk head to the right place –Physical machinery Rotational latency –Wait for the disk to spin –Average wait = ½ rotation Transfer cost –Read the data while the disk spins Seek cost Rotational latency Disk drive
7
Random vs. Sequential I/O Seek cost and rotational latency are overhead –Independent of the transfer size –~ 20 ms on commodity disks –One large transfer is much cheaper than many small transfers Random vs. Sequential I/O –I/O = Input/Output –Random I/O: Read/write an arbitrary page –Sequential I/O: Read/write the next page –Sequential I/O is 10-20x faster than random I/O Avoiding random I/O will be a theme –Cost of sequential I/O is less –Prefetching easy when access patterns predictable
8
Query Planning Process Many different query plans are possible –Logically equivalent ways of answering the same query Job of the query optimizer: choose the most efficient plan for a given query –Most efficient = minimize resource consumption –Resources = CPU, disk I/O –Consume less resources → faster query responses Design of query optimizer –Rule-based Apply heuristic transformations (e.g. push selections down) –Cost-based: Estimate cost (resource usage) of each possible query plan Choose the plan with lowest estimated cost Cost estimation uses statistics collected by the optimizer –In practice, optimizers use a combination of the two Search space of possible plans is too large to exhaustively explore Heuristics guide which types of plans are considered for cost-based optimization
9
parse Query rewriting Physical plan generation execute result SQL query parse tree Best logical query plan statistics Best physical query plan
10
Join Processing OLAP queries involve joins –Fact table joins to dimension tables –Primary-key / Foreign-key joins Choice of join algorithm has large impact on query cost Naïve join processing algorithm To join relations A and B: –Generate “Cartesian product” of A and B Each A tuple is combined with each B tuple –Apply selection conditions to each tuple in the Cartesian product Problem: Intermediate result can be huge! –Relation A has 10 million rows –Relation B has 100,000 rows –A x B has 1 trillion rows (many terabytes of data) Need more efficient, logically equivalent procedure
11
NLJ (conceptually) for each r R1 do for each s R2 do if r.C = s.C then output r,s pair Nested Loop Join (NLJ) BC a10 a20 b10 d30 CD 10cat 40dog 15bat 20rat R1 R2
12
Nested Loop Join In theory, multi-way nested loop joins are possible –Joining n relations → n nested “for” loops In practice, only binary (2-way) joins operators are implemented –Joining n relations → Cascade of n-1 binary joins Nested-loop join terminology: –Outer input: scanned in outer loop of join –Inner input: scanned in inner loop of join AB C
13
Analysis of Nested Loop Join Outer has m tuples, inner has n tuples CPU complexity = O(m*n) –For each of m outer tuples, compare with each of n inner tuples –Quadratic running time What about I/O complexity? –Depends on amount of memory available –We’ll assume B tuples per page –Consider four scenarios Use 2 pages of memory Use (m/B + n/B) pages of memory Use (n/B + 1) pages of memory Use (k + 1) pages of memory –For k < m/B
14
Nested Loop Join (2 pages) One page for outer, one page for inner NLJ algorithm: –Load 1 st outer block into memory Load all n/B blocks of inner, one after another –Load 2 nd outer block into memory Load all n/B blocks of inner, one after another –… –Load (m/B)th outer block into memory Load all n/B blocks of inner, one after another Total I/O cost = m/B * (n/B + 1)
15
Nested Loop Join (m/B+n/B pages) Load both inputs entirely into memory All computation takes place in memory No further paging necessary Total I/O cost = m/B + n/B Requires a lot of memory –Practical if both inputs are small
16
Nested Loop Join (n/B + 1 pages) Allocate 1 page for outer Load entire inner into memory (n/B pages) NLJ algorithm: –Load entire inner into memory –Load 1 st outer block into memory Inner is already in memory → no additional I/O needed –Load 2 nd outer block into memory Inner is already in memory → no additional I/O needed –… –Load (m/B)th outer block into memory Inner is already in memory → no additional I/O needed Total I/O cost = m/B + n/B –Same I/O cost as previous slide –Practical if inner input is small
17
Nested Loop Join (k+1 pages) Allocate 1 page for inner Load k pages of outer into memory at a time NLJ algorithm: –Load 1 st group of k outer blocks into memory Load all n/B blocks of inner, one after another –Load 2 nd group of k outer block into memory Load all n/B blocks of inner, one after another –… –Load (m/kB)th group of k outer blocks into memory Load all n/B blocks of inner, one after another Total I/O cost = m/kB * (n/B + k) –Reduction from 2-page case by a factor of k
18
Summary of NLJ Analysis CPU usage is O(m*n) –Quadratic in size of inputs I/O usage depends on memory allocated –If one input is small: Make the small input the inner I/O complexity is O(m/B + n/B) Linear in size of inputs The minimum possible –Otherwise: Make the small input the outer –Slightly better than the other way around I/O complexity is O(m/kB * (n/B + k)) –When k+1 pages of memory are available –Approximately mn/kB 2
19
Improving on Nested Loops Join For each outer tuple, we scan the entire inner relation to find matching tuples Maybe we can do better by using data structures Two main ideas: –Sorting –Hashing
20
Sorting Using sorting to improve NLJ –First we sort the inner relation –Then, for each outer tuple, find matching inner tuple(s) using binary search over sorted inner CPU usage improves –O(mn) → O(m log n) + sort cost –Sub-quadratic This strategy is indexed nested-loop join –To be covered in more detail on Thursday Implementation details –Use B-Tree instead of binary tree as search structure –Larger branching factor → better I/O efficiency
21
Merge Join What if we sort both inputs instead of one? –No need to perform binary search for each outer tuple –Since outer tuples are also in sorted order, access to inner relation is structured Binary search accesses pages in arbitrary order → random I/O Merge join uses sequential I/O CPU usage improves – O(mn) → O(m + n) + sort costs 1234567 5273514 123 45 1234567 123445557 scan Indexed NLJMerge Join Random I/O for each outer tuple
22
External-Memory Sorting How to efficiently sort large data sets? –Consider I/O complexity Sort a relation with n tuples, B tuples per block –Case 1: Use n/B pages of memory –Case 2: Use k+1 pages of memory Case 1: Use n/B pages of memory –Read the relation into memory n/B sequential disk reads –Sort in main memory using quicksort No disk access required –Write the sorted output back to disk n/B sequential disk writes –Total: 2n/B disk I/Os (all sequential) Requires lots of memory → only useful for small inputs
23
External-Memory Sorting Case 2: Use k pages of memory –k ≥ sqrt(n/B) Two phases (run generation + merge) Run generation –Repeat for each group of k blocks: Read group of k blocks into memory Sort using quicksort Write sorted “run” back out to disk Merge –Read 1 st page from each run into memory –Merge to produce globally sorted output –Write output pages to disk as they are filled –As a page is exhausted, replace by next page in run Each block read + written once per phase –Total disk I/Os: 4n/B Sorted Runs Final output MemoryDisk
24
Hashing Store inner input in a hash table –Tuples grouped together into buckets using a hash function Apply hash function to join attributes (called the key) to determine bucket –Idea: all records with the same key must be in the same bucket For each outer tuple –Hash key to determine which bucket it goes in –Scan that bucket for matching inner tuples CPU usage improves – O(mn) → O(m + n) Assuming a good hash function with few collisions Collision = tuples with different keys hash to same bucket Only useful for equality joins –Not for join conditions like WHERE A.foo > B.bar –99% of all joins are equality joins
25
Hash Join Analysis Hash join terminology –Inner input is called build input –Outer input is called probe input –Build a hash table based on one input, probe into that hash table using the other Probe has m tuples, build has n tuples Using (n/B + 1) pages of memory: –Allocate 1 page for probe input –Read build input into memory + create hash table Uses n/B pages –Read 1 st page of probe input Hash each tuple and search bucked for matching build tuples Requires no additional I/O –Repeat for 2 nd and subsequent pages of probe input –I/O complexity = O(n/B + m/B) –Suitable when one input is small
26
Partitioned Hash Join Two-phase algorithm (Partition + Join) –Uses k blocks of memory Phase 1: Partition –Performed separately for build and probe inputs –Read 1 st page into memory –Assign each tuple to a partition 1..k using hash function –Each partition is allocated 1 block of memory Tuples assigned to same partition are placed together When the block fills up, write it to disk Each partition stored in a contiguous region on disk –Repeat for 2 nd and subsequent pages Phase 2: Join phase –Join Partition 1 of probe input with Partition 1 of build input using ordinary hash join Use a secondary hash function –Repeat with Partition 2 of each input, etc. A form of recursion –Partitioning creates smaller problem instances
27
Partition Phase Memory Disk Partitions Unsorted Input (Similar to merge phase of merge join, only backwards)
28
Summary of Join Methods When 1 relation fits entirely in memory, all join methods have linear I/O complexity When both relations are large: –NLJ has quadratic I/O complexity –Merge join and hash join have linear I/O complexity CPU complexity –Quadratic for NLJ –N log N for merge join –Linear for hash join Rules of Thumb: –Use nested loop join for very small inputs –Use merge join for large inputs that are already sorted –Use hash join for large, unsorted inputs
29
Query Plans with Multiple Joins Two kinds of execution: –Pipelined No need to store intermediate results on disk Available memory must be shared between joins –Materialized No memory contention between joins Spend I/Os writing intermediate results to disk Blocking operators –Sort, partition, build hash table –Pipelined execution is disallowed Interesting orders –Sorted input → sorted output NLJ: output is sorted by same key as outer input Merge join: output is sorted by same key as inputs –Partitioned input → partitioned output Hash join: output is partitioned by same key as inputs –Take advantage of interesting orders Eliminate sorting / partitioning phases of later joins AB C
30
Grouping First partition tuples on grouping attributes –Tuples in same group placed together –Tuples in different groups separated Then scan tuples in each partition, compute aggregate expressions Two techniques for partitioning –Sorting Sort by the grouping attributes All tuples with same grouping attributes will appear together in sorted list –Hashing Hash by the grouping attributes All tuples with same grouping attributes will hash to same bucket Sort / re-hash within each bucket to resolve collisions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.