Distributed Database Management Systems Lecture 17
Virtual University of Pakistan In this Lecture Continue with VF Information Requirement Attribute affinities Virtual University of Pakistan
Virtual University of Pakistan Replication of Key attributes does not violate the disjoint ness condition Virtual University of Pakistan
Vertical Fragmentation Information Requirements Virtual University of Pakistan
Virtual University of Pakistan Basic idea of VF is access efficiency Information Requirement is application based Attribute affinities: obtained from more primitive usage data Virtual University of Pakistan
Virtual University of Pakistan (80-20 Rule) Attribute usage values: Given a set of queries Q = {q1 , q2 ,…, qq} that will run on the relation R[A1, A2 ,…, An] Virtual University of Pakistan
Virtual University of Pakistan Attribute Usage Value use(qi,Aj ) 1 if attribute Aj is referenced by query qi use(qi,Aj ) = 0 otherwise use(qi,• ) can be defined accordingly Virtual University of Pakistan
Virtual University of Pakistan PROJ(jNo, jName, budget, loc) q1: SELECT BUDGET FROM PROJ WHERE JNO=Value q2: SELEC JNAME, BUDGET FROM PROJ Virtual University of Pakistan
Virtual University of Pakistan q3: SELECT JNAME FROM PROJ WHERELOC=Value q4: SELECTSUM(BUDGET) FROM PROJ WHERE LOC=Value Let A1= jNo, A2= jName, A3= budget, A4= loc Virtual University of Pakistan
Virtual University of Pakistan A1 A2 A3 A4 1 0 1 0 0 1 1 0 0 1 0 1 0 0 1 1 q1 q2 q3 q4 Attribute Usage Matrix Virtual University of Pakistan
Virtual University of Pakistan AUM does not represent the query frequency at different sites; Attribute affinity between two attribute Ai and Aj, affinity (Ai, Aj), of a relation R(A1, A2, …., An) with respect to applications set Q = {q1, q2, …, qq) is Virtual University of Pakistan
Virtual University of Pakistan aff(Ai, Aj) = ∑ ∑ refl(qk)accl(qk) k|use(qk, Ai) = 1 use(qk, Aj) = 1∀ sites where refl(qk) is number of accesses to attributes (Ai, Aj) for each execution of qk at site Sl, and… accl(qk) is application access frequency measure from Sl Virtual University of Pakistan
Virtual University of Pakistan Attribute Usage Matrix S1 S2 S3 q1 15 20 10 q2 5 q3 25 q4 3 A1 A2 A3 A4 q1 1 q2 q3 q4 Access Frequency Matrix Virtual University of Pakistan
Virtual University of Pakistan acc1(q1) = 15, acc2(q1) = 20, acc3(q1) = 10 acc1(q2) = 5, acc2(q2) = 0, acc3(q2) = 0 acc1(q3) = 25, acc2(q3) = 25, acc3(q3) = 25 acc1(q4) = 3, acc2(q4) = 0, acc3(q4) = 0 Virtual University of Pakistan
Virtual University of Pakistan aff(A3, A4) = ∑k = 4 ∑l =1..3 refl(qk)accl(qk) = 3 *1 + 0 + 0 = 3 aff(A1, A2) = 0, Since no qi accesses them both aff(A2, A2) = 5 * 1 + 0 + 0 = 5 25 * 1 + 25 *1 + 25 * 1 = 75 + 5 = 80 Virtual University of Pakistan
Virtual University of Pakistan q1 15 20 10 q2 5 q3 25 q4 3 A1 A2 A3 A4 q1 1 q2 q3 q4 Virtual University of Pakistan
Virtual University of Pakistan Attribute affinity matrix (AA) A1 A2 A3 A4 45 80 5 75 53 3 78 Virtual University of Pakistan
Clustering Algorithm
Virtual University of Pakistan VF is based on identifying groups of attributes based on AA Vertical Clustering is based on Bond Energy Algorithm (BEA); it uses AA; identifies groups of similar items Virtual University of Pakistan
Virtual University of Pakistan Large affinity attributes are combined together and lower together BEA takes as input the AA and generates the cluster affinity matrix CA Virtual University of Pakistan
Global Affinity Measure (AM)
Virtual University of Pakistan Affinity Measure is a single value that is calculated on the basis of positions of elements in AA and their surrounding elements Virtual University of Pakistan
Virtual University of Pakistan 45 80 5 75 53 3 78 Virtual University of Pakistan
Virtual University of Pakistan AM = ∑ n i = 1 j = 1 ∑ aff(Ai, Aj) [aff(Ai, Aj-1) + aff(Ai, Aj+1) + aff(Ai-1, Aj) + aff(Ai+1, Aj) ] aff(A0, Aj)= aff(Ai, A0)= aff(An+1, Aj)= aff(Ai, An+1)=0 Virtual University of Pakistan
Virtual University of Pakistan 45 80 5 75 53 3 78 Virtual University of Pakistan