Download presentation
Presentation is loading. Please wait.
Published byClaude Wilkins Modified over 9 years ago
1
Parallel Databases 77
2
Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many tasks can be completed in some unit of time. –Response time – how long does it take to complete one task? 4 Using parallelism to increase response time is called speedup. 4 Using parallelism to increase throughput is called scale up. 78
3
Problems 4 Optimally, we would like linear scale up/speedup. This is not usually the case. 4 Why? –Start Up Costs –Interference – different processors need the same resource. –Communication Costs –Some parts may not be able to be parallelized. –Skew – Not likely to be able to break problem into equal sized parts. 79
4
Skew Example 4 Suppose I have 8 processors to do a query. I should be able to do it in 1/8 the time. 4 Now suppose data is distributed this way: –P1: 5% –P2: 10% –P3: 10% –P4: 5% –P5: 10% –P6: 10% –P7: 25% -- these only allow ¼ of the time. –P8: 25% 80
5
What Can Be Shared? 4 Share Memory –Advantages: dynamic partitioning (any process may be allocated all/some of memory available). Cheaper than each processor having its own memory. Lower communication cost between processors –Disadvantages: Memory can become a bottleneck. Scalability is a problem. 81
6
Sharing Continued 4 Share Disk –Advantages: Data need not be replicated – no synchronization Better scalability Fault tolerance may be built into the system –Disadvantages: Single point of failure Communication cost is greater 82
7
Sharing III 4 Share Nothing -- really a type of distributed DB –Advantages: Complete parallel solution Less bottlenecks Multiple points of failures Scalability –Disadvantages: Cost for the bean counters Communication costs are greater Multiple points of failures 83
8
Sharing IV 4 Hierarchical –Advantages: Gain advantages of speed and scalability –Disadvantages: How to partition? 84
9
Disk Partitioning 4 wikipedia-Standard RAID levels wikipedia-Standard RAID levels 85
10
Disk Partitioning for DB Usage 4 Round Robin Partitioning – like RAID 5 4 Range Partitioning – all tuples with a column value within some range go to the same partition. 4 Hash Partition – all tuples with a column value that hash to the same value go to the same partition. 86
11
Usage 4 Which is best for –Simple selects – unique match –Simple selects – non-unique match –Range queries –Print unsorted –Print sorted 87
12
Skew In This Context 4 Attribute-Value Skew – many tuples with the same value for the partitioning column. 4 Partition Skew – some partitions end up with more tuples, even if they have different values. –Change the ranges – use a histogram to better predict cut-offs. 4 Time-Value Skew – a good partitioning algorithm acquires skew over time. 88
13
Parallel Joins R ⨝ (A=B) S –Range Partition R on A and S on B. Pass same ranges off to the same partition. –Hash Partition – would also work R ⨝ (A<B) S –Partition R and replicate S. 89
14
Example 4 Emp(Fn, Minit, LN, SSN, Bdate, Addr, Sex, Salary, SuperSSN, Dno) –r = 100,000 records –bf = 5 records/block –b = 20,000 blocks 4 Dept(D#, Dname, MGRSSN, MgrStartDate) –r = 1250 records –bf = 10 records/block –b = 125 blocks 90
15
Example Query 4 I want to perform Emp ⨝ (DNO=D#) Dept 4 How can I parallelize this and how much can I save? 91
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.