Presentation is loading. Please wait.

Presentation is loading. Please wait.

H.Lu/HKUST L04: Physical Database Design (2)  Introduction  Index Selection  Partitioning & Denormalization.

Similar presentations


Presentation on theme: "H.Lu/HKUST L04: Physical Database Design (2)  Introduction  Index Selection  Partitioning & Denormalization."— Presentation transcript:

1 H.Lu/HKUST L04: Physical Database Design (2)  Introduction  Index Selection  Partitioning & Denormalization

2 L04: Physical Database Design (2) -- 2 H.Lu/HKUST Tuning a Relational Schema  The choice of relational schema should be guided by the workload, in addition to redundancy issues:  We may settle for a 3NF schema rather than BCNF.  Workload may influence the choice we make in decomposing a relation into 3NF or BCNF.  We might denormalize (i.e., undo a decomposition step), or we might add fields to a relation  We may further decompose a BCNF schema!  We might consider horizontal partitioning.  If such changes are made after a database is in use, called schema evolution; might want to mask some of these changes from applications by defining views.

3 L04: Physical Database Design (2) -- 3 H.Lu/HKUST Example Schemas  We will concentrate on Contracts, denoted as CSJDPQV. The following ICs are given to hold: JP  C, SD  P, C is the primary key.  What are the candidate keys for CSJDPQV?  What normal form is this relation schema in? Contracts (Cid, Sid, Jid, Did, Pid, Qty, Val) Depts (Did, Budget, Report) Suppliers (Sid, Address) Parts (Pid, Cost) Projects (Jid, Mgr)

4 L04: Physical Database Design (2) -- 4 H.Lu/HKUST Denormalization  Suppose that the following query is important:  Is the value of a contract less than the budget of the department?  To speed up this query, we might add a field budget B to Contracts.  This introduces the FD D  B wrt Contracts.  Thus, Contracts is no longer in 3NF.  We might choose to modify Contracts thus if the query is sufficiently important, and we cannot obtain adequate performance otherwise (i.e., by adding indexes or by choosing an alternative 3NF schema.)

5 L04: Physical Database Design (2) -- 5 H.Lu/HKUST Partitioning  Horizontal Partitioning: Distributing the rows of a table into several separate files  Useful for situations where different users need access to different rows  Vertical Partitioning: Distributing the columns of a table into several separate files  Useful for situations where different users need access to different columns  The primary key must be repeated in each file  Combinations of Horizontal and Vertical Partitions often correspond with User Schemas (user views)

6 L04: Physical Database Design (2) -- 6 H.Lu/HKUST Partitioning  Advantages of Partitioning:  Records used together are grouped together  Each partition can be optimized for performance  Security, recovery  Partitions stored on different disks: contention  Take advantage of parallel processing capability  Disadvantages of Partitioning:  Slow retrievals across partitions  Complexity  Issues: Need to find suitable level  Too little ­ too much of irrelevant data access.  Too much ­ too much processing cost

7 L04: Physical Database Design (2) -- 7 H.Lu/HKUST Horizontal Decompositions  Our definition of decomposition: Relation is replaced by a collection of relations that are projections. Most important case.  Sometimes, might want to replace relation by a collection of relations that are selections.  Each new relation has same schema as the original, but a subset of the rows.  Collectively, new relations contain all rows of the original. Typically, the new relations are disjoint.

8 L04: Physical Database Design (2) -- 8 H.Lu/HKUST Horizontal Decompositions (Contd.)  Suppose that contracts with value > 10000 are subject to different rules. This means that queries on Contracts will often contain the condition val>10000.  One way to deal with this is to build a clustered B+ tree index on the val field of Contracts.  A second approach is to replace contracts by two new relations: LargeContracts and SmallContracts, with the same attributes (CSJDPQV).  Performs like index on such queries, but no index overhead.  Can build clustered indexes on other attributes, in addition!

9 L04: Physical Database Design (2) -- 9 H.Lu/HKUST Masking Conceptual Schema Changes  The replacement of Contracts by LargeContracts and SmallContracts can be masked by the view.  However, queries with the condition val>10000 must be asked wrt LargeContracts for efficient execution: so users concerned with performance have to be aware of the change. CREATE VIEW Contracts(cid, sid, jid, did, pid, qty, val) AS SELECT * FROM LargeContracts UNION SELECT * FROM SmallContracts

10 L04: Physical Database Design (2) -- 10 H.Lu/HKUST Decomposition of a BCNF Relation  Suppose that we choose { SDP, CSJDQV }. This is in BCNF, and there is no reason to decompose further (assuming that all known ICs are FDs).  However, suppose that these queries are important:  Find the contracts held by supplier S.  Find the contracts that department D is involved in.  Decomposing CSJDQV further into CS, CD and CJQV could speed up these queries. (Why?)  On the other hand, the following query is slower:  Find the total value of all contracts held by supplier S.

11 L04: Physical Database Design (2) -- 11 H.Lu/HKUST Vertical Partitioning  Vertical partitioning of a relation R produces partitions R 1, R 2,..., R m, each of which contains a subset of R's attributes as well as the primary key of R  The object of vertical partitioning is to reduce irrelevant attribute access, and thus irrelevant data access  ``Optimal'' vertical partitioning minimizes the irrelevant data access for user applications  For a relation with m non-primary key attributes, the number of possible partitions is approximately equal to m m  Hard to find an optimal solution  Resort to heuristic approaches

12 L04: Physical Database Design (2) -- 12 H.Lu/HKUST VP: Heuristic Approaches  Grouping:  Assign each attribute to one fragment  Join fragments until some criteria is satisfied  Splitting (our focus):  Start with the original relation  Generate partitions based on access behavior  Closer to optimal; less overlapping fragments  Basic idea: Affinity of attributes  A measure of closeness of these attributes

13 L04: Physical Database Design (2) -- 13 H.Lu/HKUST Attribute Usage Matrices  Q = {q1, q2,..., qm}  Set of user queries  R (A1, A2,..., An)  Relation R with n attributes  Usage matrix |Uij| m×n  Uij = 1 if attribute Aj is referenced by qi;  Uij = 0 otherwise.  Access matrix |acc i |  access frequency of q i

14 L04: Physical Database Design (2) -- 14 H.Lu/HKUST VP - Matrices Examples  Relation PROJ(PNO,PNAME,BUDGET,LOC), four SQL queries sent to three sites: q 1 : SELECT BUDGET FROM PROJ WHERE PNO = val; q 2 : SELECT PNAME,BUDGET FROM PROJ; q 3 : SELECT PNAME FROM PROJWHERE LOC = val; q 4 : SELECT SUM(BUDGET) FROM PROJ WHERE LOC=val;

15 L04: Physical Database Design (2) -- 15 H.Lu/HKUST Attribute Affinity Matrix  |aff ij | n×n : Affinity between two attributes A i and A j aff ij =  { k|U ki = 1  U kj =1} acc k AA Matrix

16 L04: Physical Database Design (2) -- 16 H.Lu/HKUST Bond Energy Clustering Algorithm  Determines groups of similar items (clusters of attributes with larger affinity values, and ones with smaller affinity values)  Final groupings are insensitive to the order in which items are presented to the algorithm  The computation time is O(n 2 ) where n is the number of attributes  Secondary interrelationships between clustered attribute groups are identifiable

17 L04: Physical Database Design (2) -- 17 H.Lu/HKUST Main Idea of BEA Permute the attribute affinity matrix (AA) and generate a clustered affinity matrix (CA) to maximize the global affinity measure (AM) where

18 L04: Physical Database Design (2) -- 18 H.Lu/HKUST AM in Terms of Bond Because the affinity matrix is symmetric,, or Let then AM = ∑[bond(A j, A j-1 ) + bond(A j, A j+1 )]

19 L04: Physical Database Design (2) -- 19 H.Lu/HKUST Bond Energy Algorithm  Initialization : place and fix one of the columns of AA arbitrarily into CA  Iteration :  Pick one of the remaining n­i columns of AA and place it in one of the i+1 positions in CA  Choose the placement that makes greatest contribution.  Row ordering :  Change the placement of the rows accordingly

20 L04: Physical Database Design (2) -- 20 H.Lu/HKUST Contribution of a Placement Contribution of placing attribute A k between A i and A j : cont(A i, A k, A j ) = 2bond(A i, A k ) + 2bond(A k, A j ) – 2bond(A i, A j ) bond(A 1, A 2 ) = 45*0+0*80+45*5+0*75=225 bond(A 1, A 4 ) = 45*0+0*75+45*3+0*78=135 bond(A 4, A 2 ) = 0*0+ 75*80+3*5+78*75=11865 If we place A 4 between A 1 and A 2, cont(A 1, A 4, A 2 ) = 2bond( A 1, A 4 ) + 2bond( A 4, A 2 ) – 2bond( A 1, A 2 ) = 2*135 + 2*11865 - 2*225 = 23550

21 L04: Physical Database Design (2) -- 21 H.Lu/HKUST BEA Example cont(A 0, A 3, A 1 ) = 8820 cont(A 1, A 3, A 2 ) = 10150 cont(A 2, A 3, A 4 ) = 1780 A 1 A 2 A3A3 A 1 A 3 A 2 A 1 A 3 A 2 A 4 A1A2A3A4A1A2A3A4 A1A3A2A4A1A3A2A4 A1A2A3A4A1A2A3A4 A1A2A3A4A1A2A3A4 Two clusters: the upper left corner of the smaller affinity values, and the lower right corner of the larger affinity values

22 L04: Physical Database Design (2) -- 22 H.Lu/HKUST BQ OQ VP ­ Splitting A 1 A 2 A 3 … A i A i+1 A n A 1 A 2 A 3. A i A i+1 A n TA BA Two attribute sets: TA : {A 1, A 2,..., A i } BA : {A i+1, A i+2,..., A n } TQ Three sets of apps: TQ : access TA only BQ : access BA only OQ: access both The basic idea Given a set of attributes {A 1, A 2,..., A n } and a set of applications, partition the attributes into two or more sets such that there are no (or minimal) applications that access more than one of the sets.

23 L04: Physical Database Design (2) -- 23 H.Lu/HKUST VP ­ Splitting Problem Define:  CTQ = total number of accesses to attributes by applications that access only TA  CBQ = total number of accesses to attributes by applications that access only BA  COQ = total number of accesses to attributes by applications that access both TA & BA Find a split point x (1≤x<n) which maximizes z z = CTQ * CBQ ­ COQ 2

24 L04: Physical Database Design (2) -- 24 H.Lu/HKUST VP – The Splitting Algorithm  Input: Relation R, and CA, acc matrices  Output: a set of fragments  For each split point x (1≤x<n), compute z  Choose the split point with the maximum z value and construct fragments XQ  {TQ, BQ, OQ}

25 L04: Physical Database Design (2) -- 25 H.Lu/HKUST VP: Splitting Example x123x123  Partition: (A1,A3) (A2,A4) A 1 A 3 A 2 A 4 A1A3A2A4A1A3A2A4

26 L04: Physical Database Design (2) -- 26 H.Lu/HKUST Complications in VP Partitioning Algorithm  Cluster forming in the middle of the CA matrix  Shift a row up and a column left and apply the algorithm to find the “best” partitioning point  Do this for all possible shifts  Cost O(n 2 )  More than two clusters  M-way partitioning  Try 1, 2, …, m-1 split points along the diagonal and try to find the best point for each of these  Cost O(2 m )


Download ppt "H.Lu/HKUST L04: Physical Database Design (2)  Introduction  Index Selection  Partitioning & Denormalization."

Similar presentations


Ads by Google