1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 9: Fragmentation and Distributed Query Processing Professor Chen Li.

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Distributed Database Systems
CS4432: Database Systems II
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed Database Systems Dr. Mohamed Osman Hegazi.
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 6 A First Course in Database Systems.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Distributed Databases: Review May 2003Yangjun Chen1 Distributed Databases System Architecture Distributed Database Design Semantic Data Control Distributed.
CS 347Notes 021 CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design Hector Garcia-Molina.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
Institut für Scientific Computing – Universität WienP.Brezany Fragmentation Univ.-Prof. Dr. Peter Brezany Institut für Scientific Computing Universität.
1 Distributed Databases Review CS347 June 6, 2001.
CS 4432logical query rewriting - lecture 151 CS4432: Database Systems II Lecture #15 Logical Query Rewriting Professor Elke A. Rundensteiner.
Lecture 5 on Query Optimization
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
CS 347Notes 031 CS 347: Distributed Databases and Transaction Processing Notes03: Query Processing Hector Garcia-Molina.
1 Distributed Databases CS347 Lecture 13 May 23, 2001.
...Looking back Why use a DBMS? How to design a database? How to query a database? How does a DBMS work?
Distributed Databases
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Database Design – Lecture 16
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
DISTRIBUTED DATABASE DESIGN
Session-9 Data Management for Decision Support
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
MS Access: Creating Relational Databases Instructor: Vicki Weidler Assistant: Joaquin Obieta.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
DDBMS Distributed Database Management Systems Fragmentation
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Christoph F. Eick: Functional Dependencies, BCNF, and Normalization 1 Functional Dependencies, BCNF and Normalization.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
CS338Parallel and Distributed Databases11-1 Parallel and Distributed Databases Lecture Topics Multi-CPU and distributed systems Monolithic system Client–server.
9/23/2012ISC329 Isabelle Bichindaritz1 Normalization.
1 CS 430 Database Theory Winter 2005 Lecture 4: Relational Model.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 16: Heterogeneous Distributed Databases Professor Chen Li.
1 Distributed Databases architecture, fragmentation, allocation Lecture 1.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
11/10/2009GAK1 Normalization. 11/10/2009GAK2 Learning Objectives Definition of normalization and its purpose in database design Types of normal forms.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Distributed Database Design Bayu Adhi Tama, MTI Fasilkom-Unsri Adapted from Connolly, et al., Database Systems 4 th Edition, Pearson Education Limited,
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
CS742 – Distributed & Parallel DBMSPage 2. 1M. Tamer Özsu Outline Introduction & architectural issues  Data distribution  Fragmentation  Data Allocation.
CS742 – Distributed & Parallel DBMSPage 3. 1M. Tamer Özsu Outline Introduction & architectural issues Data distribution  Distributed query processing.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
1 CS122A: Introduction to Data Management Lecture #15: Physical DB Design Instructor: Chen Li.
Practical Database Design and Tuning
Database Design Dr. M.E. Fayad, Professor
Chapter 2: Intro to Relational Model
Outline Introduction Background Distributed DBMS Architecture
Vertical Fragmentation
Distributed Database Management Systems
Chapter 2: Intro to Relational Model
Chapter 2: Intro to Relational Model
Example of a Relation attributes (or columns) tuples (or rows)
Chapter 2: Intro to Relational Model
Database Design Dr. M.E. Fayad, Professor
Presentation transcript:

1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 9: Fragmentation and Distributed Query Processing Professor Chen Li

ICS214BNotes 092 Which simple predicates should we use in Pr? Desired property of Pr: - minimality - uniformity

ICS214BNotes 093 Return to example: E(#, NM, LOC, SAL,…) Common queries: Qa: select *Qb: select * from Efrom E where LOC=Sawhere LOC=Sb and …and...

ICS214BNotes 094 Three choices: (1) Pr = { } F 1 ={ E } (2) Pr = {LOC=Sa, LOC=Sb} F 2 ={  loc=Sa E,  loc=Sb E } (3) Pr = {LOC=Sa, LOC=Sb, Sal<10} F 3 ={  loc=Sa  sal<10 E,  loc=Sa  sal  10 E,  loc=Sb  sal<10 E,  loc=Sb  sal  10 E }

ICS214BNotes 095 In other words: Loc=Sa  sal < 10 Loc=Sa  sal  10 Loc=Sb  sal < 10 Loc=Sb  sal  10 F1F1 F3F3 F2F2 Q a : Select … loc = S a... Q b : Select … loc = S b... F 2 is good… (not F 1, F 3 )

ICS214BNotes 096 Informal definition Set of predicates Pr is uniform if: for every F i  F [P r ], every t  F i has equal probability of access by every major application. Note: F [P r ] is fragmentation defined by minterm predicates generated by P r.

ICS214BNotes 097 Back to example: Loc=Sa  sal < 10 Loc=Sa  sal  10 Loc=Sb  sal < 10 Loc=Sb  sal  10 F1F1 Q a : Select … loc = S a... Q b : Select … loc = S b... tuples here have higher probability of access tuples here have lower probability of access so F 1 is not “good”...

ICS214BNotes 098 Back to example: Loc=Sa  sal < 10 Loc=Sa  sal  10 Loc=Sb  sal < 10 Loc=Sb  sal  10 F2F2 Q a : Select … loc = S a... Q b : Select … loc = S b... tuples here have same probability of access so F 2 is “good”... so is F 3...

ICS214BNotes 099 Informal definition Set of predicates P r is minimal if no P r’  P r is uniform

ICS214BNotes 0910 Back to example: (1) Pr = { } N (2) Pr = {LOC=Sa, LOC=Sb} Y (3) Pr = {LOC=Sa, LOC=Sb, Sal<10} N uniform? Pr(2) is a subset of Pr(3), so Pr(3) is not minimal...

ICS214BNotes 0911 Is P r uniform and minimal a good thing? Not necessarily! But it does simplify allocation problem...

ICS214BNotes 0912 Derived horizontal fragmentation E(ENO, NAME, SAL, LOC) J(ENO, DESCRIPTION,…) (Owner) (Member) Common query: Given an employee name, list projects (s)he works in E  F ={ E 1, E 2 } by LOC

ICS214BNotes 0913 E1E1 (at S a ) (at S b ) E2E2 J

ICS214BNotes 0914 E1E1 (at S a ) (at S b ) E2E2 J1J1 J2J2 J 1 = J E 1 J 2 = J E 2

ICS214BNotes 0915 Derived horizontal fragmentation R, F = { F 1, F 2,... F n }  S, D = {D 1, D 2, …D n } where D i =S F i Convention: R is owner S is member F could be primary or derived

ICS214BNotes 0916 Checking completeness and disjointness of derived fragmentation  But no #= 33 in E 1 nor in E 2 ! Example: Say J is: This J tuple will not be in J 1 nor J 2  Fragmentation not complete

ICS214BNotes 0917 To get completeness: Need to enforce referential integrity constraint: join attr(#) of member relation  join attr(#) of owner relation

ICS214BNotes 0918 Example: E1E1 E2E2 J1J1 J J2J2 Fragmentation is not disjoint!

ICS214BNotes 0919 To get disjointness: Join attribute(#) should be key of owner relation

ICS214BNotes 0920 Summary: horizontal fragmentation Type: primary, derived Properties: completeness, disjointness Predicates: minimal, uniform

ICS214BNotes 0921 Vertical fragmentation E1E1 E E2E2 Example:

ICS214BNotes 0922 R[T]  R 1 [T 1 ] T i  T R n [Tn] Just like normalization of relations...

ICS214BNotes 0923 Properties: R[T]  R i [T i ] (1) Completeness U Ti = T all i

ICS214BNotes 0924 (2) Disjointness Ti  Tj =  for all i,j i  j E(#,LOC,SAL) E 1 (#,LOC) E 2 (SAL) Not a desirable property!! (could not reconstruct R!)

ICS214BNotes 0925 (3) Reconstruction: Lossless join R i = R all i  One way to achieve lossless join: Repeat key in all fragments, i.e., Key  T i for all i

ICS214BNotes 0926 Hybrid Fragmentation R R1 R2 R11 R12R21 R22 Horizontal Vertical

ICS214BNotes 0927 Hybrid Fragmentation -- Reconstruction R11 R12R21 R22 Horizontal Vertical U

ICS214BNotes 0928 Allocation Example: E(#,NM,LOC,SAL)  F 1 =  loc=Sa E ; F 2 =  loc=Sb E Qa: select … where loc=Sa... Qb: select … where loc=Sb… Site a Site b Where do F 1,F 2 go? ?

ICS214BNotes 0929 Issues Where do queries originate? What is communication cost? and size of answers, relations,… What is storage capacity, cost at sites? and size of fragments? What is processing power at sites? What is query processing strategy? –How are joins done? –Where are answers collected?

ICS214BNotes 0930 Cost of updating copies? Writes and concurrency control?... Do we replicate fragments?

ICS214BNotes 0931 Optimization problem: What is best placement of fragments and/or best number of copies to: –minimize query response time –maximize throughput –minimize “some cost” –... Subject to constraints? –Available storage –Available bandwidth, power,… –Keep 90% of response time below X –... Often, can use common sense –Place fragments where they are most heavily accessed This is an incredibly hard problem

ICS214BNotes 0932 Summary Horizontal and vertical fragmentation Designing good fragmentations and allocation Next: Query processing in distributed databases

ICS214BNotes 0933 Query Query Plan Algebraic query tree on relations (1) Decomposition Algebraic query tree on relation fragments (2) Localization (3) Optimization

ICS214BNotes 0934 Decomposition Same as in centralized system –Normalization –Eliminating redundancy –Algebraic rewriting

ICS214BNotes 0935 Normalization Convert from query language to relational algebra

ICS214BNotes 0936 Example SELECT R.A, S.D FROM R, S WHERE (R.B=1 and S.C=2) and (R.A = S.A)  R.A,S. D  (R.B=1  S.C=2) R S (R.A = S.A)

ICS214BNotes 0937 Eliminate redundancy E.g.: in conditions: (S.A=1)  (S.A>5)  False (S.A<10)  (S.A<5)  S.A<5

ICS214BNotes 0938 E.g.: Common sub-expressions UU S  cond  cond T S  cond T RRR

ICS214BNotes 0939 Algebraic rewriting E.g.: Push conditions down  cond3  cond  cond1  cond2 RS R S

ICS214BNotes 0940 Query Algebraic query tree on relations (1) Decomposition Algebraic query tree on relation fragments (2) Localization

ICS214BNotes 0941 Localization steps (1) Start with query tree (2) Replace relations by fragments (3) Push  : up ,  : down (4) Simplify – eliminate unnecessary operations

ICS214BNotes 0942 Notation for fragment [R: cond] fragment conditions its tuples satisfy

ICS214BNotes 0943 Example A (1)  E=3 R

ICS214BNotes 0944 (2)  E=3  [R 1 : E < 10] [R 2 : E  10]

ICS214BNotes 0945 (3)   E=3 [R 1 : E < 10] [R 2 : E  10] ØØ

ICS214BNotes 0946 (4)  E=3 [R 1 : E < 10]

ICS214BNotes 0947 Rule 1  C1 [R: c2]   C1 [R: c1  c2] [R: False]  Ø A B

ICS214BNotes 0948 In example A:  E=3 [R 2 : E  10]  [  E=3 R 2 : E=3  E  10]  [  E=3 R 2 : False]  Ø

ICS214BNotes 0949 Example B (1)A=common attribute RS A

ICS214BNotes 0950 (2)  A [S 1 : A<5] [S 2 : A  5] [R 1 : A 10]

ICS214BNotes 0951 (3)  [R 1 :A<5][S 1 :A<5] [R 1 :A<5][S 2 :A  5] [R 2 :5  A  10][S 1 :A<5] [R 2 :5  A  10][S 2 :A  5] [R 3 :A>10][S 1 :A 10][S 2 :A  5] AAAAAA

ICS214BNotes 0952 (4)  [R 1 :A<5][S 1 :A<5] [R 2 :5  A  10][S 2 :A  5] AAA [R 3 :A>10][S 2 :A  5]

ICS214BNotes 0953 Rule 2 [R: C 1 ] [S: C 2 ]  [R S: C 1  C 2  R.A = S.A] AA

ICS214BNotes 0954 In step 4 of Example B: [R 1 : A<5] [S 2 : A  5]  [R 1 S 2 : R 1.A < 5  S 2.A  5  R 1.A = S 2.A ]  [R 1 S 2 : False]  Ø AAA

ICS214BNotes 0955 Localization with derived fragmentation Example C (2) [R 1 :A<10][R 2  10] S 1 :K=R.K S 2 :K=R.K  R.A<10  R.A  10 K

ICS214BNotes 0956 (3)  [R 1 ][S 1 ] [R 1 ][S 2 ] [R 2 ][S 1 ] [R 2 ][S 2 ] KKKK

ICS214BNotes 0957 (4)  [R 1 ][S 1 ] [R 2 ][S 2 ] KK

ICS214BNotes 0958 In step 3 of Example C: [R 1 :A<10] [S 2 :K=R.K  R.A  10]  [R 1 S 2: R 1.A<10  S 2.K=R.K  R.A  10  R 1.K= S 2.K]  [R 1 S 2 :False ] (K is key of R, R 1 )  Ø KKK

ICS214BNotes 0959 Localization with vertical fragmentation Example D (1)  A R 1 (K, A, B) R R 2 (K, C, D)

ICS214BNotes 0960 (2)  A R 1 R 2 (K, A, B) (K, C, D) K

ICS214BNotes 0961 (3)  A  K,A R 1 R 2 (K, A, B) (K, C, D) K not really needed

ICS214BNotes 0962 (4)  A R 1 (K, A, B)

ICS214BNotes 0963 Rule 3 Given vertical fragmentation of R: R i =  Ai (R), Ai  A Then for any B  A:  B (R) =  B [ R i | B  A i  Ø ] i

ICS214BNotes 0964 Localization with hybrid fragmentation Example E R 1 =  k<5 [  k,A R] R 2 =  k  5 [  k,A R] R 3 =  k,B R

ICS214BNotes 0965 Query:  A  k=3 R

ICS214BNotes 0966 Reduced Query:  A  k=3 R 1

ICS214BNotes 0967 Distributed Query Processing Decomposition  Localization  Optimization  next