CS 347Notes 021 CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design Hector Garcia-Molina.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Unit 1:Parallel Databases
Distributed Database Systems
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed Database Systems Dr. Mohamed Osman Hegazi.
Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Presenter.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database.
CS 347Notes 041 CS 347: Distributed Databases and Transaction Processing Notes04: Query Optimization Hector Garcia-Molina.
1 Distributed Databases Review CS347 June 6, 2001.
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Storage and.
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
CS 347Notes 031 CS 347: Distributed Databases and Transaction Processing Notes03: Query Processing Hector Garcia-Molina.
1 Distributed Databases CS347 Lecture 13 May 23, 2001.
Distributed Databases
Where in the world is my data? Sudarshan Kadambi Yahoo! Research VLDB 2011 Joint work with Jianjun Chen, Brian Cooper, Adam Silberstein, David Lomax, Erwin.
CSC271 Database Systems Lecture # 30.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Database Design – Lecture 16
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
DISTRIBUTED DATABASE DESIGN
Session-9 Data Management for Decision Support
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
MS Access: Creating Relational Databases Instructor: Vicki Weidler Assistant: Joaquin Obieta.
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
DDBMS Distributed Database Management Systems Fragmentation
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
1 Distributed Databases Chapter 21, Part B. 2 Introduction v Data is stored at several sites, each managed by a DBMS that can run independently. v Distributed.
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
Query Optimizer (Chapter ). Optimization Minimizes uses of resources by choosing best set of alternative query access plans considers I/O cost,
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 9: Fragmentation and Distributed Query Processing Professor Chen Li.
CS4432: Database Systems II Query Processing- Part 2.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
©Silberschatz, Korth and Sudarshan20.1Database System Concepts 3 rd Edition Chapter 20: Parallel Databases Introduction I/O Parallelism Interquery Parallelism.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
1 Distributed Databases architecture, fragmentation, allocation Lecture 1.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
1 CS 430 Database Theory Winter 2005 Lecture 7: Designing a Database Logical Level.
Distributed Database Design Bayu Adhi Tama, MTI Fasilkom-Unsri Adapted from Connolly, et al., Database Systems 4 th Edition, Pearson Education Limited,
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
CS742 – Distributed & Parallel DBMSPage 2. 1M. Tamer Özsu Outline Introduction & architectural issues  Data distribution  Fragmentation  Data Allocation.
Distributed Database Concepts
Module 11: File Structure
Physical Changes That Don’t Change the Logical Design
Distributed Database Management Systems
CIS 155 Table Relationship
ITD1312 Database Principles Chapter 5: Physical Database Design
Chapter 19: Distributed Databases
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Database Management Systems (CS 564)
Vertical Fragmentation
Distributed Database Management Systems
Distributed Databases
Outline Introduction Background Distributed DBMS Architecture
Distributed Databases
Presentation transcript:

CS 347Notes 021 CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design Hector Garcia-Molina

CS 347Notes 022 Distributed DB Design Top-down approach: - have DB… - how to split and allocate the sites Multi-DBs (or bottom-up): no design issues! Chapter 5 Ozsu & Valduriez

CS 347Notes 023 Two issues in DDB design: Fragmentation Allocation Note: issues not independent, but will cover separately

CS 347Notes 024 Example Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select * from E where loc=Sa where loc=Sb and… and...

CS 347Notes 025 Example Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select * from E where loc=Sa where loc=Sb and… and... Motivation: Two sites: Sa, Sb Qa   Qb Sa Sb

CS 347Notes 026 It does not take a rocket scientist to figure out fragmentation...

CS 347Notes 027 # NM Loc Sal E Sa10 SallySb25 TomSa15 Joe # NM Loc Sal 5 8 Sa10 TomSa15 Joe7Sb25Sally.. F At Sa At Sb

CS 347Notes 028 F = { F 1, F 2 } F 1 =  loc=Sa E F 2 =  loc=Sb E

CS 347Notes 029 F = { F 1, F 2 } F 1 =  loc=Sa E F 2 =  loc=Sb E  called primary horizontal fragmentation

CS 347Notes 0210 Fragmentation Horizontal Primary depends on local attributes RDerived depends on foreign relation Vertical R

CS 347Notes 0211 Fragmentation Horizontal Primary depends on local attributes RDerived depends on foreign relation Vertical R Fragmentation also called Sharding

CS 347Notes 0212 Three common horizontal partitioning techniques Round robin Hash partitioning Range partitioning

CS 347Notes 0213 Round robin RD 0 D 1 D 2t1t2t3t4...t5 Evenly distributes data Good for scanning full relation Not good for point or range queries

CS 347Notes 0214 Hash partitioning RD 0 D 1 D 2 t1  h(k 1 )=2t1 t2  h(k 2 )=0t2 t3  h(k 3 )=0t3 t4  h(k 4 )=1t4... Good for point queries on key; also for joins Not good for range queries; point queries not on key If hash function good, even distribution

CS 347Notes 0215 Range partitioning RD 0 D 1 D 2 t1: A=5t1 t2: A=8t2 t3: A=2t3 t4: A=3t4... Good for some range queries on A Need to select good vector: else unbalance  data skew  execution skew 47 partitionin g vector V 0 V 1

CS 347Notes 0216 Which are good fragmentations? Example: F = { F 1, F 2 } F 1 =  sal 20 E

CS 347Notes 0217 Which are good fragmentations? Example: F = { F 1, F 2 } F 1 =  sal 20 E  Problem: Some tuples lost!

CS 347Notes 0218 Which are good fragmentations? Second example: F = { F 3, F 4 } F 3 =  sal 5 E

CS 347Notes 0219 Which are good fragmentations? Second example: F = { F 3, F 4 } F 3 =  sal 5 E  Tuples with 5 < sal < 10 are duplicated...

CS 347Notes 0220  Prefer to deal with replication explicitly Example: F = { F 5, F 6, F 7 } F 5 =  sal  5 E F 6 =  5< sal <10 E F 7 =  sal  10 E  Then replicate F 6 if convenient (part of allocation problem)

CS 347Notes 0221 Desired properties for horizontal fragmentation R  F ={ F 1, F 2, … } (1) Completeness  t  R,  F i  F such that t  F i

CS 347Notes 0222 (2) Disjointness  t  F i,  F j such that t  F j, i  j, F i, F j  F (3) Reconstruction - ignore

CS 347Notes 0223 How do we get completeness and disjointness? (1) Check it “manually”! e.g., F 1 =  sal<10 E ; F 2 =  sal  10 E

CS 347Notes 0224 How do we get completeness and disjointness? (2) “Automatically” generate fragments with these properties Desired simple predicates  Fragments

CS 347Notes 0225 Example of generation Say queries use predicates: A 5, Loc = S A, Loc = S B Next: - generate “minterm” predicates - eliminate useless ones

CS 347Notes 0226 Minterm predicates (part I) (1) A 5  Loc=S A  Loc=S B (2) A 5  Loc=S A  ¬(Loc=S B ) (3) A 5  ¬(Loc=S A )  Loc=S B (4) A 5  ¬(Loc=S A )  ¬(Loc=S B ) (5) A 5)  Loc=S A  Loc=S B (6) A 5)  Loc=S A  ¬(Loc=S B ) (7) A 5)  ¬(Loc=S A )  Loc=S B (8) A 5)  ¬(Loc=S A )  ¬(Loc=S B )

CS 347Notes 0227 Minterm predicates (part I) (1) A 5  Loc=S A  Loc=S B (2) A 5  Loc=S A  ¬(Loc=S B ) (3) A 5  ¬(Loc=S A )  Loc=S B (4) A 5  ¬(Loc=S A )  ¬(Loc=S B ) (5) A 5)  Loc=S A  Loc=S B (6) A 5)  Loc=S A  ¬(Loc=S B ) (7) A 5)  ¬(Loc=S A )  Loc=S B (8) A 5)  ¬(Loc=S A )  ¬(Loc=S B )

CS 347Notes 0228 Minterm predicates (part I) (1) A 5  Loc=S A  Loc=S B (2) A 5  Loc=S A  ¬(Loc=S B ) (3) A 5  ¬(Loc=S A )  Loc=S B (4) A 5  ¬(Loc=S A )  ¬(Loc=S B ) (5) A 5)  Loc=S A  Loc=S B (6) A 5)  Loc=S A  ¬(Loc=S B ) (7) A 5)  ¬(Loc=S A )  Loc=S B (8) A 5)  ¬(Loc=S A )  ¬(Loc=S B ) A  5 5 < A < 10

CS 347Notes 0229 Minterm predicates (part II) (9) ¬(A 5  Loc=S A  Loc=S B (10) ¬(A 5  Loc=S A  ¬(Loc=S B ) (11) ¬(A 5  ¬(Loc=S A )  Loc=S B (12) ¬(A 5  ¬(Loc=S A )  ¬(Loc=S B ) (13) ¬(A 5)  Loc=S A  Loc=S B (14) ¬(A 5)  Loc=S A  ¬(Loc=S B ) (15) ¬(A 5)  ¬(Loc=S A )  Loc=S B (16) ¬(A 5)  ¬(Loc=S A )  ¬(Loc=S B )

CS 347Notes 0230 Minterm predicates (part II) (9) ¬(A 5  Loc=S A  Loc=S B (10) ¬(A 5  Loc=S A  ¬(Loc=S B ) (11) ¬(A 5  ¬(Loc=S A )  Loc=S B (12) ¬(A 5  ¬(Loc=S A )  ¬(Loc=S B ) (13) ¬(A 5)  Loc=S A  Loc=S B (14) ¬(A 5)  Loc=S A  ¬(Loc=S B ) (15) ¬(A 5)  ¬(Loc=S A )  Loc=S B (16) ¬(A 5)  ¬(Loc=S A )  ¬(Loc=S B ) A  10

CS 347Notes 0231 Final fragments: F 2: 5 < A < 10  Loc=S A F 3: 5 < A < 10  Loc=S B F 6: A  5  Loc=S A F 7: A  5  Loc=S B F 10: A  10  Loc=S A F 11: A  10  Loc=S B

CS 347Notes 0232 Note: elimination of useless fragments depends on application semantics: e.g.: if LOC could be  S A,  S B, we need to add fragments F 4: 5 <A <10  Loc  S A  Loc  S B F 8: A  5  Loc  S A  Loc  S B F 12: A  10  Loc  S A  Loc  S B

CS 347Notes 0233 Why does this work? Predicates: p1  p2  p3  p4 p1  p2  p3  ¬ p4 ¬ p1  ¬ p2  ¬ p3  ¬ p4...

CS 347Notes 0234 (1) Completeness: Take t  R p i (t) must be T or F! Say p 1 (t) =T p 2 (t) = T p 3 (t) =F p 4 (t) =F Then t is in fragment with predicate p1  p2  ¬ p3  ¬ p4

CS 347Notes 0235 (2) Disjointness Say t  Fragment p 1  p 2  ¬ p 3  ¬ p 4 Then: p 1 (t) = T, p 2 (t) = T, p 3 (t) = F, p 4 (t)= F  t cannot be in any other fragment!

CS 347Notes 0236 Summary Given simple predicates P r = { p 1, p 2,.. p m } minterm predicates are M={m | m =  p k *, 1  k  m } where p k * is p k or is ¬ p k pkPrpkPr Fragments  m R for all m  M are complete and disjoint

Another Desired Fragmentation Property: Match Access Patterns CS 347Notes 0237 data A data B data C frequently accessed together try to place in same fragment

CS 347Notes 0238 Return to example: E(#, NM, LOC, SAL,…) Common queries: Qa: select *Qb: select * from Efrom E where LOC=Sawhere LOC=Sb and …and...

CS 347Notes 0239 Three choices: (1) Pr = { } F 1 ={ E } (2) Pr = {LOC=Sa, LOC=Sb} F 2 ={  loc=Sa E,  loc=Sb E } (3) Pr = {LOC=Sa, LOC=Sb, Sal<10} F 3 ={  loc=Sa  sal<10 E,  loc=Sa  sal  10 E,  loc=Sb  sal<10 E,  loc=Sb  sal  10 E }

CS 347Notes 0240 In other words: Loc=Sa  sal < 10 Loc=Sa  sal  10 Loc=Sb  sal < 10 Loc=Sb  sal  10 F1F1 F3F3 F2F2 Q a : Select … loc = S a... Q b : Select … loc = S b...

CS 347Notes 0241 In other words: Loc=Sa  sal < 10 Loc=Sa  sal  10 Loc=Sb  sal < 10 Loc=Sb  sal  10 F1F1 F3F3 F2F2 Q a : Select … loc = S a... Q b : Select … loc = S b... F 2 is good… (not F 1, F 3 )

CS 347Notes 0242 Derived horizontal fragmentation Example: E(#, NM, SAL, LOC) F ={ E 1, E 2 } by LOC J(#, DES,…) Common query for project: [Given employee name, list projects (s)he works in]

CS 347Notes 0243 E1E1 (at S a ) (at S b ) E2E2 J

CS 347Notes 0244 E1E1 (at S a ) (at S b ) E2E2 J1J1 J2J2 J 1 = J E 1 J 2 = J E 2

CS 347Notes 0245 Derived horizontal fragmentation R, F = { F 1, F 2,... F n }  S, D = {D 1, D 2, …D n } where D i =S F i Convention: R is owner S is member F could be primary or derived

CS 347Notes 0246 Checking completeness and disjointness of derived fragmentation  But no #= 33 in E 1 nor in E 2 ! Example: Say J is: This J tuple will not be in J 1 nor J 2 Fragmentation not complete

CS 347Notes 0247 Need to enforce referential integrity constraint: join attr(#) of member relation  joint attr(#) of owner relation To get completeness

CS 347Notes 0248 Example: E1E1 E2E2 J1J1 J J2J2 Fragmentation is not disjoint!

CS 347Notes 0249 Join attribute(#) should be key of owner relation To get disjointness

CS 347Notes 0250 Summary: horizontal fragmentation Type: primary, derived Properties: completeness, disjointness

CS 347Notes 0251 Vertical fragmentation E1E1 E E2E2 Example:

CS 347Notes 0252 R[T]  R 1 [T 1 ] T i  T R n [Tn]  Just like normalization of relations...

CS 347Notes 0253 Properties: R[T]  R i [T i ] (1) Completeness U Ti = T all i

CS 347Notes 0254 (2) Disjointness Ti  Tj =  for all i,j i  j E(#,LOC,SAL) E 1 (#,LOC) E 2 (SAL)

CS 347Notes 0255 (2) Disjointness Ti  Tj =  for all i,j i  j E(#,LOC,SAL) E 1 (#,LOC) E 2 (SAL) Not a desirable property!! (could not reconstruct R!)

CS 347Notes 0256 (3) Lossless join R i = R all i  One way to achieve lossless join: Repeat key in all fragments, i.e., Key  T i for all i

CS 347Notes 0257  How do we decide what attributes are grouped with which? E 1 (#,NM,LOC) E 2 (#,SAL) Example: E(#,NM,LOC,SAL)E 1 (#,NM) E 2 (#,LOC) E 3 (#,SAL) ?

CS 347Notes 0258 A 1 A 2 A 3 A 4 A 5 A A A A A Attribute affinity matrix

CS 347Notes 0259 A 1 A 2 A 3 A 4 A 5 A A A A A Attribute affinity matrix R 1 [K,A 1, A 2, A 3 ] R 2 [K,A 4, A 5 ]

CS 347Notes 0260 Textbook (Ozsu & Valduriez) discusses –How to build affinity matrix –How to identify attribute clusters –How to partition relation You are not responsible for –Clustering and partitioning algorithms (i.e., Skip pages )

CS 347Notes 0261 Allocation Example: E(#,NM,LOC,SAL)  F 1 =  loc=Sa E ; F 2 =  loc=Sb E Qa: select … where loc=Sa... Qb: select … where loc=Sb… Site a Site b Where do F 1,F 2 go? ?

CS 347Notes 0262 Issues Where do queries originate What is communication cost? and size of answers, relations,… What is storage capacity, cost at sites? and size of fragments? What is processing power at sites?

CS 347Notes 0263 What is query processing strategy? –How are joins done? –Where are answers collected? More Issues

CS 347Notes 0264 Cost of updating copies? Writes and concurrency control?... Do we replicate fragments?

CS 347Notes 0265 Optimization problem: What is best placement of fragments and/or best number of copies to: –minimize query response time –maximize throughput –minimize “some cost” –... Subject to constraints? –Available storage –Available bandwidth, power,… –Keep 90% of response time below X –...

CS 347Notes 0266 Optimization problem: What is best placement of fragments and/or best number of copies to: –minimize query response time –maximize throughput –minimize “some cost” –... Subject to constraints? –Available storage –Available bandwidth, power,… –Keep 90% of response time below X –... This is an incredibly hard problem

CS 347Notes 0267 Example: Single fragment F Read cost:  [t i  MIN C ij ] i:Originating site of request t i :Read traffic at S i C ij :Retrieval cost Accessing fragment F at Sj from Si ji=1 m

CS 347Notes 0268 Scenario - Read cost i C=inf c i,3 c i,1 c i,2 Stream of read requests for F t i REQ/SEC C=inf F F F

CS 347Notes 0269 Write cost   X j u i C’ ij i: Originating site of request j: Site being updated X j : 0 if F not stored at S j 1 if F stored at S j u i : Write traffic at S i C’ ij : Write cost Updating F at S j from S i i=1j=1 mm

CS 347Notes 0270 Scenario - write cost Updates u i updates/sec..... i F F F

CS 347Notes 0271 Storage cost:  X i d i X i :0 if F not stored at S i 1 if F stored at S i d i : storage cost at S i i=1 m

CS 347Notes 0272 Target function: min  [t i  MIN C ij +  X j  u i  C’ ij ] +  X i  d i j i=1 j=1 i=1 m m m

CS 347Notes 0273 Can add more complications: Examples: - Multiple fragments - Fragment sizes - Concurrency control cost

Case Study: PNUTS Where in the World is My Data? Sudarshan Kadambi, Jianjun Chen, Brian F. Cooper, David Lomax, Raghu Ramakrishnan, Adam Silberstein, Erwin Tam, Hector Garcia-Molina; VLDB 2011 Distributed object/tuple store for Yahoo! CS 347Notes 0274

Case Study: PNUTS Issue: Where to locate data Issue: What and where to replicate CS 347Notes 0275

PNUTS Discussion Dynamic vs Static fragment placement Caching vs Replication CS 347Notes 0276

Policy Constraints MIN_COPIES: The minimum number of full replicas of the record that must exist. INCL_LIST: An inclusion list -- the locations where a full replica of the record must exist. EXCL_LIST: An exclusion list -- the locations where a full replica of the record cannot exist. CS 347Notes 0277

Example Rule Rule 1: IF TABLE_NAME = "Users“ THEN SET 'MIN_COPIES' = 2 CONSTRAINT_PRI = 0 CS 347Notes 0278

Another Example Rule Rule 2: IF TABLE_NAME = "Users" AND FIELD STR('home location') = 'France‘ THEN SET 'MIN_COPIES' = 3 AND SET 'EXCL LIST' = 'USWest, USEast‘ CONSTRAINT PRI = 1 CS 347Notes 0279

CS 347Notes 0280 Summary Description of fragmentation Good fragmentations Design of fragmentation Allocation