Parallel and distributed databases R & G Chapter 22.

Parallel and distributed databases R & G Chapter 22

What is a distributed database?

Why distribute a database Scalability and performance Resilience to failures Throughput Data size versus X X

Why distribute a database Data is already distributed Or needs to be distributed Data is in multiple systems

Why not distribute a database You must earn your complexity! Communication needed Must build a complex infrastructure Unpredictable latencies must be masked More types of failures More components to fail Network failures Congestion, timeouts More complex planning Communication cost plus I/O cost May have to deal with heterogeneity Different types of systems Different schemas, possibly incompatible Different administrative domains

Types of distributed databases

The old days: mainframes Definitely not distributed!

Client-server User interaction Data processing Network

Parallel database

Primary/secondary X

Multidatabase

How do they work? What is shared? How to distribute the data? How to process the data? How to update the data?

What is shared? Memory CPUsRAM Disk Most modern DBMSs

What is shared? Disk RAM Oracle RAC

What is shared? Nothing RAM Search engines, Teradata

Server 1Server 2Server 3Server 4 Bike$86 6/2/07636353 Chair$10 6/5/07662113 How to distribute the data? Couch$570 6/1/07424252 Car$1123 6/1/07256623 Lamp$19 6/7/07121113 Bike$56 6/9/07887734 Scooter$18 6/11/07252111 Hammer$8000 6/11/07116458

How to distribute the data? Hash partitioning Range partitioning (key,value) Hash() (key,value) <= X> X

Server 1Server 2Server 3Server 4 How to distribute the data? Bike Chair Couch Car Lamp Bike Scooter Hammer $86 $10 $570 $1123 $19 $56 $18 $8000 6/2/07 6/5/07 6/1/07 6/7/07 6/9/07 6/11/07 636353 662113 424252 256623 121113 887734 252111 116458

Query processing Intra-operator parallelism Inter-operator parallelism

Parallel scanning filter Result

Sorting

Parallel hash join Hash()

Semi-join

Inter-operator parallelism

Updating distributed data Synchronous: read-any-write-all Reads are fast

Updating distributed data Synchronous: voting

Updating distributed data Synchronous: voting Writes tolerant to disconnection

Consistency of distributed data Should provide ACID

Primary/secondary

Two-phase commit PREPARE PREPARED COMMIT

Two-phase commit PREPARE PREPAREDABORT

Two-phase commit PREPARE PREPARED ABORT

Two-phase commit PREPARE PREPARED X

Conclusion Parallelism and distribution very useful Performance Fault tolerance Scale But complex! Rethink lots of aspects of the system Must earn the complexity

Parallel and distributed databases R & G Chapter 22.

Similar presentations

Presentation on theme: "Parallel and distributed databases R & G Chapter 22."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel and distributed databases R & G Chapter 22.

Similar presentations

Presentation on theme: "Parallel and distributed databases R & G Chapter 22."— Presentation transcript:

Similar presentations

About project

Feedback