1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 9: Fragmentation and Distributed Query Processing Professor Chen Li
ICS214BNotes 092 Which simple predicates should we use in Pr? Desired property of Pr: - minimality - uniformity
ICS214BNotes 093 Return to example: E(#, NM, LOC, SAL,…) Common queries: Qa: select *Qb: select * from Efrom E where LOC=Sawhere LOC=Sb and …and...
ICS214BNotes 094 Three choices: (1) Pr = { } F 1 ={ E } (2) Pr = {LOC=Sa, LOC=Sb} F 2 ={ loc=Sa E, loc=Sb E } (3) Pr = {LOC=Sa, LOC=Sb, Sal<10} F 3 ={ loc=Sa sal<10 E, loc=Sa sal 10 E, loc=Sb sal<10 E, loc=Sb sal 10 E }
ICS214BNotes 095 In other words: Loc=Sa sal < 10 Loc=Sa sal 10 Loc=Sb sal < 10 Loc=Sb sal 10 F1F1 F3F3 F2F2 Q a : Select … loc = S a... Q b : Select … loc = S b... F 2 is good… (not F 1, F 3 )
ICS214BNotes 096 Informal definition Set of predicates Pr is uniform if: for every F i F [P r ], every t F i has equal probability of access by every major application. Note: F [P r ] is fragmentation defined by minterm predicates generated by P r.
ICS214BNotes 097 Back to example: Loc=Sa sal < 10 Loc=Sa sal 10 Loc=Sb sal < 10 Loc=Sb sal 10 F1F1 Q a : Select … loc = S a... Q b : Select … loc = S b... tuples here have higher probability of access tuples here have lower probability of access so F 1 is not “good”...
ICS214BNotes 098 Back to example: Loc=Sa sal < 10 Loc=Sa sal 10 Loc=Sb sal < 10 Loc=Sb sal 10 F2F2 Q a : Select … loc = S a... Q b : Select … loc = S b... tuples here have same probability of access so F 2 is “good”... so is F 3...
ICS214BNotes 099 Informal definition Set of predicates P r is minimal if no P r’ P r is uniform
ICS214BNotes 0910 Back to example: (1) Pr = { } N (2) Pr = {LOC=Sa, LOC=Sb} Y (3) Pr = {LOC=Sa, LOC=Sb, Sal<10} N uniform? Pr(2) is a subset of Pr(3), so Pr(3) is not minimal...
ICS214BNotes 0911 Is P r uniform and minimal a good thing? Not necessarily! But it does simplify allocation problem...
ICS214BNotes 0912 Derived horizontal fragmentation E(ENO, NAME, SAL, LOC) J(ENO, DESCRIPTION,…) (Owner) (Member) Common query: Given an employee name, list projects (s)he works in E F ={ E 1, E 2 } by LOC
ICS214BNotes 0913 E1E1 (at S a ) (at S b ) E2E2 J
ICS214BNotes 0914 E1E1 (at S a ) (at S b ) E2E2 J1J1 J2J2 J 1 = J E 1 J 2 = J E 2
ICS214BNotes 0915 Derived horizontal fragmentation R, F = { F 1, F 2,... F n } S, D = {D 1, D 2, …D n } where D i =S F i Convention: R is owner S is member F could be primary or derived
ICS214BNotes 0916 Checking completeness and disjointness of derived fragmentation But no #= 33 in E 1 nor in E 2 ! Example: Say J is: This J tuple will not be in J 1 nor J 2 Fragmentation not complete
ICS214BNotes 0917 To get completeness: Need to enforce referential integrity constraint: join attr(#) of member relation join attr(#) of owner relation
ICS214BNotes 0918 Example: E1E1 E2E2 J1J1 J J2J2 Fragmentation is not disjoint!
ICS214BNotes 0919 To get disjointness: Join attribute(#) should be key of owner relation
ICS214BNotes 0920 Summary: horizontal fragmentation Type: primary, derived Properties: completeness, disjointness Predicates: minimal, uniform
ICS214BNotes 0921 Vertical fragmentation E1E1 E E2E2 Example:
ICS214BNotes 0922 R[T] R 1 [T 1 ] T i T R n [Tn] Just like normalization of relations...
ICS214BNotes 0923 Properties: R[T] R i [T i ] (1) Completeness U Ti = T all i
ICS214BNotes 0924 (2) Disjointness Ti Tj = for all i,j i j E(#,LOC,SAL) E 1 (#,LOC) E 2 (SAL) Not a desirable property!! (could not reconstruct R!)
ICS214BNotes 0925 (3) Reconstruction: Lossless join R i = R all i One way to achieve lossless join: Repeat key in all fragments, i.e., Key T i for all i
ICS214BNotes 0926 Hybrid Fragmentation R R1 R2 R11 R12R21 R22 Horizontal Vertical
ICS214BNotes 0927 Hybrid Fragmentation -- Reconstruction R11 R12R21 R22 Horizontal Vertical U
ICS214BNotes 0928 Allocation Example: E(#,NM,LOC,SAL) F 1 = loc=Sa E ; F 2 = loc=Sb E Qa: select … where loc=Sa... Qb: select … where loc=Sb… Site a Site b Where do F 1,F 2 go? ?
ICS214BNotes 0929 Issues Where do queries originate? What is communication cost? and size of answers, relations,… What is storage capacity, cost at sites? and size of fragments? What is processing power at sites? What is query processing strategy? –How are joins done? –Where are answers collected?
ICS214BNotes 0930 Cost of updating copies? Writes and concurrency control?... Do we replicate fragments?
ICS214BNotes 0931 Optimization problem: What is best placement of fragments and/or best number of copies to: –minimize query response time –maximize throughput –minimize “some cost” –... Subject to constraints? –Available storage –Available bandwidth, power,… –Keep 90% of response time below X –... Often, can use common sense –Place fragments where they are most heavily accessed This is an incredibly hard problem
ICS214BNotes 0932 Summary Horizontal and vertical fragmentation Designing good fragmentations and allocation Next: Query processing in distributed databases
ICS214BNotes 0933 Query Query Plan Algebraic query tree on relations (1) Decomposition Algebraic query tree on relation fragments (2) Localization (3) Optimization
ICS214BNotes 0934 Decomposition Same as in centralized system –Normalization –Eliminating redundancy –Algebraic rewriting
ICS214BNotes 0935 Normalization Convert from query language to relational algebra
ICS214BNotes 0936 Example SELECT R.A, S.D FROM R, S WHERE (R.B=1 and S.C=2) and (R.A = S.A) R.A,S. D (R.B=1 S.C=2) R S (R.A = S.A)
ICS214BNotes 0937 Eliminate redundancy E.g.: in conditions: (S.A=1) (S.A>5) False (S.A<10) (S.A<5) S.A<5
ICS214BNotes 0938 E.g.: Common sub-expressions UU S cond cond T S cond T RRR
ICS214BNotes 0939 Algebraic rewriting E.g.: Push conditions down cond3 cond cond1 cond2 RS R S
ICS214BNotes 0940 Query Algebraic query tree on relations (1) Decomposition Algebraic query tree on relation fragments (2) Localization
ICS214BNotes 0941 Localization steps (1) Start with query tree (2) Replace relations by fragments (3) Push : up , : down (4) Simplify – eliminate unnecessary operations
ICS214BNotes 0942 Notation for fragment [R: cond] fragment conditions its tuples satisfy
ICS214BNotes 0943 Example A (1) E=3 R
ICS214BNotes 0944 (2) E=3 [R 1 : E < 10] [R 2 : E 10]
ICS214BNotes 0945 (3) E=3 [R 1 : E < 10] [R 2 : E 10] ØØ
ICS214BNotes 0946 (4) E=3 [R 1 : E < 10]
ICS214BNotes 0947 Rule 1 C1 [R: c2] C1 [R: c1 c2] [R: False] Ø A B
ICS214BNotes 0948 In example A: E=3 [R 2 : E 10] [ E=3 R 2 : E=3 E 10] [ E=3 R 2 : False] Ø
ICS214BNotes 0949 Example B (1)A=common attribute RS A
ICS214BNotes 0950 (2) A [S 1 : A<5] [S 2 : A 5] [R 1 : A 10]
ICS214BNotes 0951 (3) [R 1 :A<5][S 1 :A<5] [R 1 :A<5][S 2 :A 5] [R 2 :5 A 10][S 1 :A<5] [R 2 :5 A 10][S 2 :A 5] [R 3 :A>10][S 1 :A 10][S 2 :A 5] AAAAAA
ICS214BNotes 0952 (4) [R 1 :A<5][S 1 :A<5] [R 2 :5 A 10][S 2 :A 5] AAA [R 3 :A>10][S 2 :A 5]
ICS214BNotes 0953 Rule 2 [R: C 1 ] [S: C 2 ] [R S: C 1 C 2 R.A = S.A] AA
ICS214BNotes 0954 In step 4 of Example B: [R 1 : A<5] [S 2 : A 5] [R 1 S 2 : R 1.A < 5 S 2.A 5 R 1.A = S 2.A ] [R 1 S 2 : False] Ø AAA
ICS214BNotes 0955 Localization with derived fragmentation Example C (2) [R 1 :A<10][R 2 10] S 1 :K=R.K S 2 :K=R.K R.A<10 R.A 10 K
ICS214BNotes 0956 (3) [R 1 ][S 1 ] [R 1 ][S 2 ] [R 2 ][S 1 ] [R 2 ][S 2 ] KKKK
ICS214BNotes 0957 (4) [R 1 ][S 1 ] [R 2 ][S 2 ] KK
ICS214BNotes 0958 In step 3 of Example C: [R 1 :A<10] [S 2 :K=R.K R.A 10] [R 1 S 2: R 1.A<10 S 2.K=R.K R.A 10 R 1.K= S 2.K] [R 1 S 2 :False ] (K is key of R, R 1 ) Ø KKK
ICS214BNotes 0959 Localization with vertical fragmentation Example D (1) A R 1 (K, A, B) R R 2 (K, C, D)
ICS214BNotes 0960 (2) A R 1 R 2 (K, A, B) (K, C, D) K
ICS214BNotes 0961 (3) A K,A R 1 R 2 (K, A, B) (K, C, D) K not really needed
ICS214BNotes 0962 (4) A R 1 (K, A, B)
ICS214BNotes 0963 Rule 3 Given vertical fragmentation of R: R i = Ai (R), Ai A Then for any B A: B (R) = B [ R i | B A i Ø ] i
ICS214BNotes 0964 Localization with hybrid fragmentation Example E R 1 = k<5 [ k,A R] R 2 = k 5 [ k,A R] R 3 = k,B R
ICS214BNotes 0965 Query: A k=3 R
ICS214BNotes 0966 Reduced Query: A k=3 R 1
ICS214BNotes 0967 Distributed Query Processing Decomposition Localization Optimization next