IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics Denormalization Partitioning Tables (relations) Parallel Processing & RAID
IMS 4212: Database Implementation 2 Dr. Lawrence West, Management Dept., University of Central Florida Denormalization Denormalizing is the process of reshuffling attributes and sometimes entities to create entities that violate the rules of normalization We are trading off (again) storage efficiency and anomaly avoidance for better retrieval efficiency Denormalizing includes: –Storing derived attributes explicitly –Allowing transitive dependencies (violating second, third, or Boyce-Codd normal form) –Merging entities in 1:1 relationships
IMS 4212: Database Implementation 3 Dr. Lawrence West, Management Dept., University of Central Florida Denormalization (cont.) Derived attributes –Storing derived attributes is one of the most common means of improving processing efficiency –How many tables/row examinations are avoided by storing total grade points and total credit hours with the STUDENT entity? –What new operations must be introduced to keep the data current? –Explicitly storing derived attributes gives rise to new operational business rules to enforce accuracy
IMS 4212: Database Implementation 4 Dr. Lawrence West, Management Dept., University of Central Florida Denormalization (cont.) 1:1 Relationships –It may be possible to collapse data from one entity in a 1:1 relationship into the other. –Usually the pervasive entity survives –Alternately, both entities may be retained but the data from one may be copied into the other to avoid a table look-up
IMS 4212: Database Implementation 5 Dr. Lawrence West, Management Dept., University of Central Florida Denormalizing (cont.) 1:M Relationships –You may consider moving or duplicating attributes from the “one” side of a 1:M relationship into the “many” side –This will result in considerable data duplication –Considerations There should be many records on the “one” side Frequent access should be directly into the “many” side
IMS 4212: Database Implementation 6 Dr. Lawrence West, Management Dept., University of Central Florida Denormalization (cont.) 1:M Relationships (cont.) Similar technique may be used by collapsing or copying attributes into the associative entity between two entities in a M:M relationship
IMS 4212: Database Implementation 7 Dr. Lawrence West, Management Dept., University of Central Florida Denormalization (cont.) The goal of denormalizing is to avoid accessing a (large) table for high frequency critical transactions Denormalizing usually requires additional business rules to guarantee that data remains accurate in the face of updates
IMS 4212: Database Implementation 8 Dr. Lawrence West, Management Dept., University of Central Florida Partitioning Partitioning entities divides one table into many –Horizontal partitioning Each table has all fields from the original table Each table has a subset of records –Vertical partitioning Each table has the PK of the original table Each table has all records Each table has a subset of fields –May partition both vertically and horizontally Very powerful technique with historical data
IMS 4212: Database Implementation 9 Dr. Lawrence West, Management Dept., University of Central Florida Partitioning (cont.) Horizontal Partitioning –How many records in the STUDENT table? –How many of them are currently enrolled? –How frequently do we need to access both current and former students in the same query or operation? –It may make sense to partition tables based on a historical context Active records vs. archived records –May also partition based on geographic considerations –Whole table can be reconstructed using UNION query
IMS 4212: Database Implementation 10 Dr. Lawrence West, Management Dept., University of Central Florida Partitioning (cont.) Vertical Partitioning –Librarian, Registrar, Athletic Department, and Health Center may all need a different subset of fields from the STUDENT entity –It may make sense to create separate tables containing the necessary attributes for each view –Common PK creates 1:1 cardinality between all tables –Whole logical record can be assembled using SQL when needed –We are actually backing into a supertype/subtype relationship
IMS 4212: Database Implementation 11 Dr. Lawrence West, Management Dept., University of Central Florida RAID Storage Devices In conventional drives data is laid down sequentially along a track in the disk –Read/Write head must move along the track to read the data –Each read/write operation must finish before the next can begin –A drive failure can result in loss of all data
IMS 4212: Database Implementation 12 Dr. Lawrence West, Management Dept., University of Central Florida RAID Storage Devices RAID is for Redundant Array of Inexpensive Disks –Multiple disks appear as a single logical drive to the computer –May be implemented in hardware or software (OS) Various RAID levels provide for different levels of performance and redundancy Most RAID levels enable the rebuilding of entire lost physical drives through parity storage
IMS 4212: Database Implementation 13 Dr. Lawrence West, Management Dept., University of Central Florida RAID Storage Devices—Raid 3 Records are striped across multiple physical devices –Part of each record is laid down across multiple physical drives –Much faster Read/Write time since disk rotation needed to read whole record/block is much shorter –However only one request can be serviced concurrently –Not commonly used in practice A single parity disk allows reconstruction of data on damaged drives * Image source: Wikipedia *
IMS 4212: Database Implementation 14 Dr. Lawrence West, Management Dept., University of Central Florida RAID Storage Devices—Raid 4 Blocks are stored independently on the drives –Block A1 can be serviced just by Drive 0 –Simultaneous requests for Blocks B2 or D3 can also be serviced A single parity drive enables recovery of lost data Write operations may be slower—simultaneous write operations to Drives 0-2 must wait on the parity calculation and writing on Drive 3 * Image source: Wikipedia *
IMS 4212: Database Implementation 15 Dr. Lawrence West, Management Dept., University of Central Florida RAID Storage Devices—Raid 5 Similar to Raid 4 except that parity storage is distributed across multiple drives –Rotating allocation –Lessens the chance that writes on two drives will wait on parity updates on a single parity drive * Image source: Wikipedia *
IMS 4212: Database Implementation 16 Dr. Lawrence West, Management Dept., University of Central Florida Parallel Processing More and more computers support parallel processing (multiple CPUs on the same computer) Some tasks can be split among multiple processors In an SQL SELECT query the usual method requires the RDBMS to scan each record to determine if it matches the WHERE clause or JOIN criteria In parallel processing part of the whole table is passed to each processor Availability depends on hardware, OS, and RDBMS