Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many.

Similar presentations


Presentation on theme: "Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many."— Presentation transcript:

1 Parallel Databases 77

2 Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many tasks can be completed in some unit of time. –Response time – how long does it take to complete one task? 4 Using parallelism to increase response time is called speedup. 4 Using parallelism to increase throughput is called scale up. 78

3 Problems 4 Optimally, we would like linear scale up/speedup. This is not usually the case. 4 Why? –Start Up Costs –Interference – different processors need the same resource. –Communication Costs –Some parts may not be able to be parallelized. –Skew – Not likely to be able to break problem into equal sized parts. 79

4 Skew Example 4 Suppose I have 8 processors to do a query. I should be able to do it in 1/8 the time. 4 Now suppose data is distributed this way: –P1: 5% –P2: 10% –P3: 10% –P4: 5% –P5: 10% –P6: 10% –P7: 25% -- these only allow ¼ of the time. –P8: 25% 80

5 What Can Be Shared? 4 Share Memory –Advantages: dynamic partitioning (any process may be allocated all/some of memory available). Cheaper than each processor having its own memory. Lower communication cost between processors –Disadvantages: Memory can become a bottleneck. Scalability is a problem. 81

6 Sharing Continued 4 Share Disk –Advantages: Data need not be replicated – no synchronization Better scalability Fault tolerance may be built into the system –Disadvantages: Single point of failure Communication cost is greater 82

7 Sharing III 4 Share Nothing -- really a type of distributed DB –Advantages: Complete parallel solution Less bottlenecks Multiple points of failures Scalability –Disadvantages: Cost for the bean counters Communication costs are greater Multiple points of failures 83

8 Sharing IV 4 Hierarchical –Advantages: Gain advantages of speed and scalability –Disadvantages: How to partition? 84

9 Disk Partitioning 4 wikipedia-Standard RAID levels wikipedia-Standard RAID levels 85

10 Disk Partitioning for DB Usage 4 Round Robin Partitioning – like RAID 5 4 Range Partitioning – all tuples with a column value within some range go to the same partition. 4 Hash Partition – all tuples with a column value that hash to the same value go to the same partition. 86

11 Usage 4 Which is best for –Simple selects – unique match –Simple selects – non-unique match –Range queries –Print unsorted –Print sorted 87

12 Skew In This Context 4 Attribute-Value Skew – many tuples with the same value for the partitioning column. 4 Partition Skew – some partitions end up with more tuples, even if they have different values. –Change the ranges – use a histogram to better predict cut-offs. 4 Time-Value Skew – a good partitioning algorithm acquires skew over time. 88

13 Parallel Joins  R ⨝ (A=B) S –Range Partition R on A and S on B. Pass same ranges off to the same partition. –Hash Partition – would also work  R ⨝ (A<B) S –Partition R and replicate S. 89

14 Example 4 Emp(Fn, Minit, LN, SSN, Bdate, Addr, Sex, Salary, SuperSSN, Dno) –r = 100,000 records –bf = 5 records/block –b = 20,000 blocks 4 Dept(D#, Dname, MGRSSN, MgrStartDate) –r = 1250 records –bf = 10 records/block –b = 125 blocks 90

15 Example Query 4 I want to perform Emp ⨝ (DNO=D#) Dept 4 How can I parallelize this and how much can I save? 91


Download ppt "Parallel Databases 77. Introduction 4 Basic idea: use multiple disks, memory and/or processors to speed up querying. 4 Measures –Throughput – how many."

Similar presentations


Ads by Google