Download presentation
Presentation is loading. Please wait.
1
Parallel and distributed databases R & G Chapter 22
2
What is a distributed database?
3
Why distribute a database Scalability and performance Resilience to failures Throughput Data size versus X X
4
Why distribute a database Data is already distributed Or needs to be distributed Data is in multiple systems
5
Why not distribute a database You must earn your complexity! Communication needed Must build a complex infrastructure Unpredictable latencies must be masked More types of failures More components to fail Network failures Congestion, timeouts More complex planning Communication cost plus I/O cost May have to deal with heterogeneity Different types of systems Different schemas, possibly incompatible Different administrative domains
6
Types of distributed databases
7
The old days: mainframes Definitely not distributed!
8
Client-server User interaction Data processing Network
9
Parallel database
10
Primary/secondary X
11
Multidatabase
12
How do they work? What is shared? How to distribute the data? How to process the data? How to update the data?
13
What is shared? Memory CPUsRAM Disk Most modern DBMSs
14
What is shared? Disk RAM Oracle RAC
15
What is shared? Nothing RAM Search engines, Teradata
16
Server 1Server 2Server 3Server 4 Bike$86 6/2/07636353 Chair$10 6/5/07662113 How to distribute the data? Couch$570 6/1/07424252 Car$1123 6/1/07256623 Lamp$19 6/7/07121113 Bike$56 6/9/07887734 Scooter$18 6/11/07252111 Hammer$8000 6/11/07116458
17
How to distribute the data? Hash partitioning Range partitioning (key,value) Hash() (key,value) <= X> X
18
Server 1Server 2Server 3Server 4 How to distribute the data? Bike Chair Couch Car Lamp Bike Scooter Hammer $86 $10 $570 $1123 $19 $56 $18 $8000 6/2/07 6/5/07 6/1/07 6/7/07 6/9/07 6/11/07 636353 662113 424252 256623 121113 887734 252111 116458
19
Query processing Intra-operator parallelism Inter-operator parallelism
20
Parallel scanning filter Result
21
Sorting
23
Parallel hash join Hash()
24
Join
25
Semi-join
26
Inter-operator parallelism
27
Updating distributed data Synchronous: read-any-write-all Reads are fast
28
Updating distributed data Synchronous: voting
29
Updating distributed data Synchronous: voting Writes tolerant to disconnection
30
Consistency of distributed data Should provide ACID
31
Primary/secondary
32
Two-phase commit PREPARE PREPARED COMMIT
33
Two-phase commit PREPARE PREPAREDABORT
34
Two-phase commit PREPARE PREPARED ABORT
35
Two-phase commit PREPARE PREPARED X
36
Conclusion Parallelism and distribution very useful Performance Fault tolerance Scale But complex! Rethink lots of aspects of the system Must earn the complexity
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.