Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications C. Faloutsos Distributed DB.

Slides:



Advertisements
Similar presentations
V. Megalooikonomou Distributed Databases (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU) Temple University – CIS.
Advertisements

CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications Faloutsos & Pavlo Lecture#11 (R&G ch. 11) Hashing.
ISOM Distributed Databases Arijit Sengupta. ISOM Learning Objectives Understand the concept and necessity of distributed databases Understand the types.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Transaction.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Distributed Databases
ICS 421 Spring 2010 Distributed Transactions Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/16/20101Lipyeow.
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
Chapter 25 Distributed Databases and Client-Server Architectures Copyright © 2004 Pearson Education, Inc.
1 Distributed Databases Chapter Two Types of Applications that Access Distributed Databases The application accesses data at the level of SQL statements.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Distributed Database Management Systems
Overview Distributed vs. decentralized Why distributed databases
Lecture-12 Concurrency Control in Distributed Databases
Institut für Scientific Computing – Universität WienP.Brezany Optimization of Distributed Queries Univ.-Prof. Dr. Peter Brezany Institut für Scientific.
1 Distributed Databases Chapter What is a Distributed Database? Database whose relations reside on different sites Database some of whose relations.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed Databases
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
Distributed databases
04/20/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos OO and OR DBMSs.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
Query Optimization. Query Optimization Query Optimization The execution cost is expressed as weighted combination of I/O, CPU and communication cost.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Implementation of Database Systems, Jarek Gryz1 Distributed Databases Chapter 21, Part B.
Distributed Database Systems Overview
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
DISTRIBUTED COMPUTING
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Concurrency control.
1 Distributed Databases (DDBs) Chap Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Instructor: Marina Gavrilova. Outline Introduction Types of distributed databases Distributed DBMS Architectures and Storage Replication Synchronous replication.
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
Distributed Database System
CS338Parallel and Distributed Databases11-1 Parallel and Distributed Databases Lecture Topics Multi-CPU and distributed systems Monolithic system Client–server.
1 Distributed Databases Chapter 21, Part B. 2 Introduction v Data is stored at several sites, each managed by a DBMS that can run independently. v Distributed.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Relational domain calculus.
Chapter 19 Distributed Databases. 2 Distributed Database System n A distributed DBS consists of loosely coupled sites that share no physical component.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Recovery.
Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Distributed Databases (based on slides by Silberchatz,Korth, and.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
R*: An overview of the Architecture By R. Williams et al. Presented by D. Kontos Instructor : Dr. Megalooikonomou.
Chapter 12 Distributed Data Bases. Learning Objectives What a distributed database management system (DDBMS) is and what its components are How database.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
1 Distributed Databases architecture, fragmentation, allocation Lecture 1.
Distributed DBMS, Query Processing and Optimization
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Relational tuple calculus.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Indexing and Hashing – part II.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Distributed Databases and Client-Server Architectures
C. Faloutsos Concurrency control - deadlocks
Database System Implementation CSE 507
R*: An Overview of the Architecture
Distributed Database Systems
Distributed Databases
A View over Distributed databases
Distributed Databases
C. Faloutsos Transactions
Distributed Databases
Temple University – CIS Dept. CIS661 – Principles of Data Management
Presentation transcript:

Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Distributed DB

Carnegie Mellon C. Faloutsos2 General Overview - rel. model Relational model - SQL Functional Dependencies & Normalization Physical Design; Indexing Query optimization Transaction processing Advanced topics –Distributed Databases

Carnegie Mellon C. Faloutsos3 Problem – definition centralized DB: LANY CHICAGO

Carnegie Mellon C. Faloutsos4 Problem – definition Distr. DB: DB stored in many places... connected LA NY

Carnegie Mellon C. Faloutsos5 Problem – definition LA NY EMP EMPLOYEE connect to LA; exec sql select * from EMP;... connect to NY; exec sql select * from EMPLOYEE;... now: DBMS1 DBMS2

Carnegie Mellon C. Faloutsos6 Problem – definition LA NY EMP EMPLOYEE connect to distr-LA; exec sql select * from EMPL; ideally: DBMS1 DBMS2 D-DBMS

Carnegie Mellon C. Faloutsos7 Pros + Cons Pros –. Cons –.

Carnegie Mellon C. Faloutsos8 Pros + Cons Pros –Data sharing –reliability & availability –speed up of query processing Cons –software development cost –more bugs –may increase processing overhead (msg)

Carnegie Mellon C. Faloutsos9 Overview Problem – motivation Design issues Query optimization – semijoins transactions (recovery, conc. control)

Carnegie Mellon C. Faloutsos10 Design of Distr. DBMS what are our choices of storing a table?

Carnegie Mellon C. Faloutsos11 Design of Distr. DBMS replication fragmentation (horizontal; vertical; hybrid) both

Carnegie Mellon C. Faloutsos12 Design of Distr. DBMS ssnnameaddress 123smithwall str johnsonsunset blvd horiz. fragm. vertical fragm.

Carnegie Mellon C. Faloutsos13 Transparency & autonomy Issues/goals: naming and local autonomy replication and fragmentation transp. location transparency i.e.:

Carnegie Mellon C. Faloutsos14 Problem – definition LA NY EMP EMPLOYEE connect to distr-LA; exec sql select * from EMPL; ideally: DBMS1 DBMS2 D-DBMS

Carnegie Mellon C. Faloutsos15 Overview Problem – motivation Design issues Query optimization – semijoins transactions (recovery, conc. control)

Carnegie Mellon C. Faloutsos16 Distributed Query processing issues (additional to centralized q-opt) –cost of transmission –parallelism / overlap of delays (cpu, disk, #bytes-transmitted, #messages-trasmitted) minimize elapsed time? or minimize resource consumption?

Carnegie Mellon C. Faloutsos17 Distr. Q-opt – semijoins s#... s1 s2 s5 s11 S1 SUPPLIER s#p# s1p1 s2p1 s3p5 s2p9 SHIPMENT S2 S3 SUPPLIER Join SHIPMENT = ?

Carnegie Mellon C. Faloutsos18 semijoins choice of plans?

Carnegie Mellon C. Faloutsos19 semijoins choice of plans? plan #1: ship SHIP -> S2; join; ship -> S3 plan #2: ship SHIP->S3; ship SUP->S3; join... others?

Carnegie Mellon C. Faloutsos20 Distr. Q-opt – semijoins s#... s1 s2 s5 s11 S1 SUPPLIER s#p# s1p1 s2p1 s3p5 s2p9 SHIPMENT S2 S3 SUPPLIER Join SHIPMENT = ?

Carnegie Mellon C. Faloutsos21 Semijoins Idea: reduce the tables before shipping s#... s1 s2 s5 s11 S1 SUPPLIER s#p# s1p1 s2p1 s3p5 s2p9 SHIPMENT S3 SUPPLIER Join SHIPMENT = ?

Carnegie Mellon C. Faloutsos22 Semijoins How to do the reduction, cheaply? Eg., reduce ‘SHIPMENT’:

Carnegie Mellon C. Faloutsos23 Semijoins Idea: reduce the tables before shipping s#... s1 s2 s5 s11 S1 SUPPLIER s#p# s1p1 s2p1 s3p5 s2p9 SHIPMENT S3 SUPPLIER Join SHIPMENT = ? (s1,s2,s5,s11)

Carnegie Mellon C. Faloutsos24 Semijoins Formally: SHIPMENT’ = SHIPMENT SUPPLIER express semijoin w/ rel. algebra

Carnegie Mellon C. Faloutsos25 Semijoins Formally: SHIPMENT’ = SHIPMENT SUPPLIER express semijoin w/ rel. algebra

Carnegie Mellon C. Faloutsos26 Semijoins – eg: suppose each attr. is 4 bytes Q: transmission cost (#bytes) for semijoin SHIPMENT’ = SHIPMENT semijoin SUPPLIER

Carnegie Mellon C. Faloutsos27 Semijoins Idea: reduce the tables before shipping s#... s1 s2 s5 s11 S1 SUPPLIER s#p# s1p1 s2p1 s3p5 s2p9 SHIPMENT S3 SUPPLIER Join SHIPMENT = ? (s1,s2,s5,s11) 4 bytes

Carnegie Mellon C. Faloutsos28 Semijoins – eg: suppose each attr. is 4 bytes Q: transmission cost (#bytes) for semijoin SHIPMENT’ = SHIPMENT semijoin SUPPLIER A: 4*4 bytes

Carnegie Mellon C. Faloutsos29 Semijoins – eg: suppose each attr. is 4 bytes Q1: give a plan, with semijoin(s) Q2: estimate its cost (#bytes shipped)

Carnegie Mellon C. Faloutsos30 Semijoins – eg: A1: –reduce SHIPMENT to SHIPMENT’ –SHIPMENT’ -> S3 –SUPPLIER -> S3 –do S3 Q2: cost?

Carnegie Mellon C. Faloutsos31 Semijoins s#... s1 s2 s5 s11 S1 SUPPLIER s#p# s1p1 s2p1 s3p5 s2p9 SHIPMENT S3 (s1,s2,s5,s11) 4 bytes

Carnegie Mellon C. Faloutsos32 Semijoins – eg: A2: –4*4 bytes - reduce SHIPMENT to SHIPMENT’ –3*8 bytes - SHIPMENT’ -> S3 –4*8 bytes - SUPPLIER -> S3 –0 bytes - do S3 72 bytes TOTAL

Carnegie Mellon C. Faloutsos33 Other plans?

Carnegie Mellon C. Faloutsos34 Other plans? P2: reduce SHIPMENT to SHIPMENT’ reduce SUPPLIER to SUPPLIER’ SHIPMENT’ -> S3 SUPPLIER’ -> S3

Carnegie Mellon C. Faloutsos35 Other plans? P3: reduce SUPPLIER to SUPPLIER’ SUPPLIER’ -> S2 do S2 ship results -> S3

Carnegie Mellon C. Faloutsos36 A brilliant idea: ‘Bloom-joins’ (not in the book – not in the final exam) how to ship the projection, say, of SUPPLIER.s#, even cheaper? A: Bloom-filter [Lohman+] = –quick&dirty membership testing

Carnegie Mellon C. Faloutsos37 Semijoins s#... s1 s2 s5 s11 S1 SUPPLIER s#p# s1p1 s2p1 s3p5 s2p9 SHIPMENT S3 (s1,s2,s5,s11) 4 bytes

Carnegie Mellon C. Faloutsos38 Another brilliant idea: two-way semijoins (not in book, not in final exam) reduce both relations with one more exchange: [Kang, ’86] ship back the list of keys that didn’t match CAN NOT LOSE! (why?) further improvement: –or the list of ones that matched – whatever is shorter!

Carnegie Mellon C. Faloutsos39 Two-way Semijoins s#... s1 s2 s5 s11 S1 SUPPLIER s#p# s1p1 s2p1 s3p5 s2p9 SHIPMENT S3 (s1,s2,s5,s11) (s5,s11) S2

Carnegie Mellon C. Faloutsos40 Overview Problem – motivation Design issues Query optimization – semijoins transactions (recovery, conc. control)

Carnegie Mellon C. Faloutsos41 Transactions – recovery Problem: eg., a transaction moves $100 from NY -> $50 to LA, $50 to Chicago 3 sub-transactions, on 3 systems, with 3 W.A.L.s how to guarantee atomicity (all-or-none)? Observation: additional types of failures (links, servers, delays, time-outs....)

Carnegie Mellon C. Faloutsos42 Transactions – recovery Problem: eg., a transaction moves $100 from NY -> $50 to LA, $50 to Chicago

Carnegie Mellon C. Faloutsos43 Distributed recovery NY CHICAGO LA NY T1,1:-$100 T1,2: +$50 T1,3: +$50 How?

Carnegie Mellon C. Faloutsos44 Distributed recovery NY CHICAGO LA NY T1,1:-$100 T1,2: +$50 T1,3: +$50 Step1: choose coordinator

Carnegie Mellon C. Faloutsos45 Distributed recovery Step 2: execute a protocol, eg., “2 phase commit”

Carnegie Mellon C. Faloutsos46 2 phase commit time T1,1 (coord.)T1,2 T1,3 prepare to commit

Carnegie Mellon C. Faloutsos47 2 phase commit time T1,1 (coord.)T1,2 T1,3 prepare to commit Y Y

Carnegie Mellon C. Faloutsos48 2 phase commit time T1,1 (coord.)T1,2 T1,3 prepare to commit Y Y commit

Carnegie Mellon C. Faloutsos49 2 phase commit (eg., failure) time T1,1 (coord.)T1,2 T1,3 prepare to commit

Carnegie Mellon C. Faloutsos50 2 phase commit time T1,1 (coord.)T1,2 T1,3 prepare to commit Y N

Carnegie Mellon C. Faloutsos51 2 phase commit time T1,1 (coord.)T1,2 T1,3 prepare to commit Y N abort

Carnegie Mellon C. Faloutsos52 Distributed recovery Many, many additional details (what if the coordinator fails? what if a link fails? etc) and many other solutions (eg., 3-phase commit)

Carnegie Mellon C. Faloutsos53 Overview Problem – motivation Design issues Query optimization – semijoins transactions (recovery, conc. control)

Carnegie Mellon C. Faloutsos54 Distributed conc. control also more complicated: distributed deadlocks!

Carnegie Mellon C. Faloutsos55 Distributed deadlocks NY CHICAGO LA NY T 1,la T 2,la T 1,ny T 2,ny

Carnegie Mellon C. Faloutsos56 Distributed deadlocks LANY T 1,la T 2,la T 1,ny T 2,ny

Carnegie Mellon C. Faloutsos57 Distributed deadlocks LANY T 1,la T 2,la T 1,ny T 2,ny

Carnegie Mellon C. Faloutsos58 Distributed deadlocks LA NY T 1,la T 2,la T 1,ny T 2,ny cites need to exchange wait-for graphs clever algorithms, to reduce # messages

Carnegie Mellon C. Faloutsos59 Conclusions Distr. DBMSs: not deployed BUT: produced clever ideas: –semijoins –distributed recovery / conc. control which can be useful for –parallel db / clusters –‘active disks’ –replicated db (e-commerce servers)