Distributed Database Management Systems Lecture 20
In the Previous Lecture Continued with VF Computed CA Partitioning Algorithm
In this Lecture Continue with VF Hybrid Fragmentation Allocation Problem Replication
A1 A3 A2 A4 45 53 5 3 80 75 78 A1 A2 A3 A4 q1 1 q2 q3 q4 S1 S2 S3 q1 15 20 10 q2 5 q3 25 q4 3 CA refj(qi) accj(qi) z2 = 3311 z1 = 0 – 452 z3= 0 - 782
A1= jNo A2= jName A3= budget A4= loc V1 = {jNo, budget} V2 = {jNo, jName, loc}
VF- Two Problems 1- Clusters not in the sides, rather in the middle of CA 2- m-way partitioning
VF Correctness
A relation R, defined over attribute set A and key K, generates the vertical partitioning FR = {R1, R2 , …, Rr } Completeness: The following should be true for A A =U Ri
Reconstruction: can be achieved by R = ⋈K Ri, ∀Ri ∈ FR Disjointness: TID's are not considered to be overlapping since they are maintained by the system PK is exception
Hybrid Fragmentation
Practically, applications require the fragmentation of both the types to be combined
So the nesting of fragmentations, i. e So the nesting of fragmentations, i.e., one following the other, it becomes sort of a tree
Disjoint ness and completeness have to be assured at each step, and reconstruction can be obtained by applying Join and Union in reverse order
CUST Beta Delta1 Delta2 A/C# Name Bal Branch AB101 Saeed 4535 MTN Laeeq 45632.34 LHR AB203 Salma 67839.87 AB109 Shaan 45.32 CUST Beta = ΠA/C#, Bal (CUST) Delta1 = σ Loc = “MTN” (ΠA/C#, Name, Branch (CUST)) Delta2 = σ Loc = “LHR” (ΠA/C#, Name, Branch (CUST)) Beta A/C# Bal AB101 4535 AB202 45632.34 AB203 67839.87 AB109 45.32 Delta1 Delta2 A/C# Name Branch AB101 Saeed MTN AB109 Shaan A/C# Name Branch AB202 Laeeq LHR AB203 Salma
Allocation
Find the "optimal" distribution of F to S. Given F = {F1, F2 , …, Fn} fragments S ={S1 , S2 , …, Sm} network sites Q = {q1, q2 ,…, qq } applications Find the "optimal" distribution of F to S.
Optimality Minimize the processing cost and maximize the system throughput at each site
It is a complex problem to be solved mathematically, to make the things very simple, consider the allocation of a single fragment Fk,
set of read only queries on Fk from Si; T = {t1, t2, …, tm} set of update queries U on Fk from Si; U= {u1, u2, .., um}
Communication Cost C(T) = {c1,2, c1,3, …., c1,m, ….cm-1, m} C’(T) = {c’1,2, c’1,3, …., c’1,m, ….c’m-1, m} Storage Cost D = {d1, d2, ……., dm}
Allocation problem is to find the cites out of set of sites S, where the copy of Fk will be stored.
The specification of the allocation problem will be 0 otherwise xj = 1 if the fragment Fk is assigned to site Sj The specification of the allocation problem will be min
That concludes our discussion on Fragmentation Lets summarize it
Fragmentation is splitting a table into smaller tables Alternatives Horizontal Vertical Hybrid
Horizontal Fragmentation
Splits a table into horizontal subsets (row wise) Primary and Derived Horizontal Fragmentation
We need major simple predicates (Pr); should be complete & minimal Pr is transformed into Pr’ Minterm (M) predicates from Pr’
Correctness of PHF depends on the Pr’ Derived Horizontal Fragmentation is based on Owner-member link
Vertical Fragmentation is more complicated due to more options Based on attributes’ affinities
AA is transformed into CA using BEA Calculated using usage data and access frequencies from different sites AA is transformed into CA using BEA
CA establishes clusters of attributes that are split for efficient access Hybrid Fragmentation combines HF and VF That concludes Fragmentation
Thanks