H.Lu/HKUST L04: Physical Database Design (2)  Introduction  Index Selection  Partitioning & Denormalization.

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Database Tuning. Overview v After ER design, schema refinement, and the definition of views, we have the conceptual and external schemas for our database.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Schema Refinement and Normal Forms Given a design, how do we know it is good or not? What is the best design? Can a bad design be transformed into a good.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Physical Database Design Section Chapter 20.
Manajemen Basis Data Pertemuan 7 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
Physical Database Design R&G Chapter 16 Lecture 26.
Normalization DB Tuning CS186 Final Review Session.
Normalization DB Tuning CS186 Final Review Session.
1 Physical Database Design and Tuning Module 5, Lecture 3.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Physical Database Design Chapter 20.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
1 Normalization Chapter What it’s all about Given a relation, R, and a set of functional dependencies, F, on R. Assume that R is not in a desirable.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
PARTITIONING “ A de-normalization practice in which relations are split instead of merger ”
Physical Database Design and Tuning R&G - Chapter 20 Although the whole of this life were said to be nothing but a dream and the physical world nothing.
Functional Dependencies CS 186, Spring 2006, Lecture 21 R&G Chapter 19 Science is the knowledge of consequences, and dependence of one fact upon another.
Chapter 6 Physical Database Design. Introduction The purpose of physical database design is to translate the logical description of data into the technical.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
CSC271 Database Systems Lecture # 30.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Physical Database Design and Database Tuning, R. Ramakrishnan and J. Gehrke, modified by Ch. Eick 1 Physical Database Design Part II.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Physical Database Design and Tuning.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
© Pearson Education Limited, Chapter 15 Physical Database Design – Step 7 (Consider Introduction of Controlled Redundancy) Transparencies.
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Physical Database Design and Tuning Chapter 20.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
1 Schema Refinement and Normal Forms Week 6. 2 The Evils of Redundancy  Redundancy is at the root of several problems associated with relational schemas:
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Functional Dependencies R&G Chapter 19 Science is the knowledge of consequences, and dependence of one fact upon another. Thomas Hobbes ( )
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Normalization.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
CS542 1 Schema Refinement Chapter 19 (part 1) Functional Dependencies.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Distributed Database Design Bayu Adhi Tama, MTI Fasilkom-Unsri Adapted from Connolly, et al., Database Systems 4 th Edition, Pearson Education Limited,
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Physical Database Design Section Chapter 20.
CS742 – Distributed & Parallel DBMSPage 2. 1M. Tamer Özsu Outline Introduction & architectural issues  Data distribution  Fragmentation  Data Allocation.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
1 CS122A: Introduction to Data Management Lecture #15: Physical DB Design Instructor: Chen Li.
Practical Database Design and Tuning
Methodology – Monitoring and Tuning the Operational System
Physical Database Design for Relational Databases Step 3 – Step 8
Modeling Your Data Chapter 2 cs542
Functional Dependencies
Physical Database Design
Practical Database Design and Tuning
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Functional Dependencies
Physical Database Design
Vertical Fragmentation
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Distributed Database Management Systems
Methodology – Monitoring and Tuning the Operational System
Physical Database Design and Tuning
Physical Database Design
Distributed Database Management Systems
Physical Database Design and Tuning
Presentation transcript:

H.Lu/HKUST L04: Physical Database Design (2)  Introduction  Index Selection  Partitioning & Denormalization

L04: Physical Database Design (2) -- 2 H.Lu/HKUST Tuning a Relational Schema  The choice of relational schema should be guided by the workload, in addition to redundancy issues:  We may settle for a 3NF schema rather than BCNF.  Workload may influence the choice we make in decomposing a relation into 3NF or BCNF.  We might denormalize (i.e., undo a decomposition step), or we might add fields to a relation  We may further decompose a BCNF schema!  We might consider horizontal partitioning.  If such changes are made after a database is in use, called schema evolution; might want to mask some of these changes from applications by defining views.

L04: Physical Database Design (2) -- 3 H.Lu/HKUST Example Schemas  We will concentrate on Contracts, denoted as CSJDPQV. The following ICs are given to hold: JP  C, SD  P, C is the primary key.  What are the candidate keys for CSJDPQV?  What normal form is this relation schema in? Contracts (Cid, Sid, Jid, Did, Pid, Qty, Val) Depts (Did, Budget, Report) Suppliers (Sid, Address) Parts (Pid, Cost) Projects (Jid, Mgr)

L04: Physical Database Design (2) -- 4 H.Lu/HKUST Denormalization  Suppose that the following query is important:  Is the value of a contract less than the budget of the department?  To speed up this query, we might add a field budget B to Contracts.  This introduces the FD D  B wrt Contracts.  Thus, Contracts is no longer in 3NF.  We might choose to modify Contracts thus if the query is sufficiently important, and we cannot obtain adequate performance otherwise (i.e., by adding indexes or by choosing an alternative 3NF schema.)

L04: Physical Database Design (2) -- 5 H.Lu/HKUST Partitioning  Horizontal Partitioning: Distributing the rows of a table into several separate files  Useful for situations where different users need access to different rows  Vertical Partitioning: Distributing the columns of a table into several separate files  Useful for situations where different users need access to different columns  The primary key must be repeated in each file  Combinations of Horizontal and Vertical Partitions often correspond with User Schemas (user views)

L04: Physical Database Design (2) -- 6 H.Lu/HKUST Partitioning  Advantages of Partitioning:  Records used together are grouped together  Each partition can be optimized for performance  Security, recovery  Partitions stored on different disks: contention  Take advantage of parallel processing capability  Disadvantages of Partitioning:  Slow retrievals across partitions  Complexity  Issues: Need to find suitable level  Too little ­ too much of irrelevant data access.  Too much ­ too much processing cost

L04: Physical Database Design (2) -- 7 H.Lu/HKUST Horizontal Decompositions  Our definition of decomposition: Relation is replaced by a collection of relations that are projections. Most important case.  Sometimes, might want to replace relation by a collection of relations that are selections.  Each new relation has same schema as the original, but a subset of the rows.  Collectively, new relations contain all rows of the original. Typically, the new relations are disjoint.

L04: Physical Database Design (2) -- 8 H.Lu/HKUST Horizontal Decompositions (Contd.)  Suppose that contracts with value > are subject to different rules. This means that queries on Contracts will often contain the condition val>  One way to deal with this is to build a clustered B+ tree index on the val field of Contracts.  A second approach is to replace contracts by two new relations: LargeContracts and SmallContracts, with the same attributes (CSJDPQV).  Performs like index on such queries, but no index overhead.  Can build clustered indexes on other attributes, in addition!

L04: Physical Database Design (2) -- 9 H.Lu/HKUST Masking Conceptual Schema Changes  The replacement of Contracts by LargeContracts and SmallContracts can be masked by the view.  However, queries with the condition val>10000 must be asked wrt LargeContracts for efficient execution: so users concerned with performance have to be aware of the change. CREATE VIEW Contracts(cid, sid, jid, did, pid, qty, val) AS SELECT * FROM LargeContracts UNION SELECT * FROM SmallContracts

L04: Physical Database Design (2) H.Lu/HKUST Decomposition of a BCNF Relation  Suppose that we choose { SDP, CSJDQV }. This is in BCNF, and there is no reason to decompose further (assuming that all known ICs are FDs).  However, suppose that these queries are important:  Find the contracts held by supplier S.  Find the contracts that department D is involved in.  Decomposing CSJDQV further into CS, CD and CJQV could speed up these queries. (Why?)  On the other hand, the following query is slower:  Find the total value of all contracts held by supplier S.

L04: Physical Database Design (2) H.Lu/HKUST Vertical Partitioning  Vertical partitioning of a relation R produces partitions R 1, R 2,..., R m, each of which contains a subset of R's attributes as well as the primary key of R  The object of vertical partitioning is to reduce irrelevant attribute access, and thus irrelevant data access  ``Optimal'' vertical partitioning minimizes the irrelevant data access for user applications  For a relation with m non-primary key attributes, the number of possible partitions is approximately equal to m m  Hard to find an optimal solution  Resort to heuristic approaches

L04: Physical Database Design (2) H.Lu/HKUST VP: Heuristic Approaches  Grouping:  Assign each attribute to one fragment  Join fragments until some criteria is satisfied  Splitting (our focus):  Start with the original relation  Generate partitions based on access behavior  Closer to optimal; less overlapping fragments  Basic idea: Affinity of attributes  A measure of closeness of these attributes

L04: Physical Database Design (2) H.Lu/HKUST Attribute Usage Matrices  Q = {q1, q2,..., qm}  Set of user queries  R (A1, A2,..., An)  Relation R with n attributes  Usage matrix |Uij| m×n  Uij = 1 if attribute Aj is referenced by qi;  Uij = 0 otherwise.  Access matrix |acc i |  access frequency of q i

L04: Physical Database Design (2) H.Lu/HKUST VP - Matrices Examples  Relation PROJ(PNO,PNAME,BUDGET,LOC), four SQL queries sent to three sites: q 1 : SELECT BUDGET FROM PROJ WHERE PNO = val; q 2 : SELECT PNAME,BUDGET FROM PROJ; q 3 : SELECT PNAME FROM PROJWHERE LOC = val; q 4 : SELECT SUM(BUDGET) FROM PROJ WHERE LOC=val;

L04: Physical Database Design (2) H.Lu/HKUST Attribute Affinity Matrix  |aff ij | n×n : Affinity between two attributes A i and A j aff ij =  { k|U ki = 1  U kj =1} acc k AA Matrix

L04: Physical Database Design (2) H.Lu/HKUST Bond Energy Clustering Algorithm  Determines groups of similar items (clusters of attributes with larger affinity values, and ones with smaller affinity values)  Final groupings are insensitive to the order in which items are presented to the algorithm  The computation time is O(n 2 ) where n is the number of attributes  Secondary interrelationships between clustered attribute groups are identifiable

L04: Physical Database Design (2) H.Lu/HKUST Main Idea of BEA Permute the attribute affinity matrix (AA) and generate a clustered affinity matrix (CA) to maximize the global affinity measure (AM) where

L04: Physical Database Design (2) H.Lu/HKUST AM in Terms of Bond Because the affinity matrix is symmetric,, or Let then AM = ∑[bond(A j, A j-1 ) + bond(A j, A j+1 )]

L04: Physical Database Design (2) H.Lu/HKUST Bond Energy Algorithm  Initialization : place and fix one of the columns of AA arbitrarily into CA  Iteration :  Pick one of the remaining n­i columns of AA and place it in one of the i+1 positions in CA  Choose the placement that makes greatest contribution.  Row ordering :  Change the placement of the rows accordingly

L04: Physical Database Design (2) H.Lu/HKUST Contribution of a Placement Contribution of placing attribute A k between A i and A j : cont(A i, A k, A j ) = 2bond(A i, A k ) + 2bond(A k, A j ) – 2bond(A i, A j ) bond(A 1, A 2 ) = 45*0+0*80+45*5+0*75=225 bond(A 1, A 4 ) = 45*0+0*75+45*3+0*78=135 bond(A 4, A 2 ) = 0*0+ 75*80+3*5+78*75=11865 If we place A 4 between A 1 and A 2, cont(A 1, A 4, A 2 ) = 2bond( A 1, A 4 ) + 2bond( A 4, A 2 ) – 2bond( A 1, A 2 ) = 2* * *225 = 23550

L04: Physical Database Design (2) H.Lu/HKUST BEA Example cont(A 0, A 3, A 1 ) = 8820 cont(A 1, A 3, A 2 ) = cont(A 2, A 3, A 4 ) = 1780 A 1 A 2 A3A3 A 1 A 3 A 2 A 1 A 3 A 2 A 4 A1A2A3A4A1A2A3A4 A1A3A2A4A1A3A2A4 A1A2A3A4A1A2A3A4 A1A2A3A4A1A2A3A4 Two clusters: the upper left corner of the smaller affinity values, and the lower right corner of the larger affinity values

L04: Physical Database Design (2) H.Lu/HKUST BQ OQ VP ­ Splitting A 1 A 2 A 3 … A i A i+1 A n A 1 A 2 A 3. A i A i+1 A n TA BA Two attribute sets: TA : {A 1, A 2,..., A i } BA : {A i+1, A i+2,..., A n } TQ Three sets of apps: TQ : access TA only BQ : access BA only OQ: access both The basic idea Given a set of attributes {A 1, A 2,..., A n } and a set of applications, partition the attributes into two or more sets such that there are no (or minimal) applications that access more than one of the sets.

L04: Physical Database Design (2) H.Lu/HKUST VP ­ Splitting Problem Define:  CTQ = total number of accesses to attributes by applications that access only TA  CBQ = total number of accesses to attributes by applications that access only BA  COQ = total number of accesses to attributes by applications that access both TA & BA Find a split point x (1≤x<n) which maximizes z z = CTQ * CBQ ­ COQ 2

L04: Physical Database Design (2) H.Lu/HKUST VP – The Splitting Algorithm  Input: Relation R, and CA, acc matrices  Output: a set of fragments  For each split point x (1≤x<n), compute z  Choose the split point with the maximum z value and construct fragments XQ  {TQ, BQ, OQ}

L04: Physical Database Design (2) H.Lu/HKUST VP: Splitting Example x123x123  Partition: (A1,A3) (A2,A4) A 1 A 3 A 2 A 4 A1A3A2A4A1A3A2A4

L04: Physical Database Design (2) H.Lu/HKUST Complications in VP Partitioning Algorithm  Cluster forming in the middle of the CA matrix  Shift a row up and a column left and apply the algorithm to find the “best” partitioning point  Do this for all possible shifts  Cost O(n 2 )  More than two clusters  M-way partitioning  Try 1, 2, …, m-1 split points along the diagonal and try to find the best point for each of these  Cost O(2 m )