© 2011 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 5 Part 2: File Organization and Performance Modern Database Management 10 th Edition Jeffrey A. Hoffer, V. Ramesh, Heikki Topi
Chapter 5 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 2 Objectives Define terms Define terms Select appropriate file organizations Select appropriate file organizations Describe three types of file organization Describe three types of file organization Describe indexes and their appropriate use Describe indexes and their appropriate use Translate a database model into efficient structures Translate a database model into efficient structures Know when and how to use denormalization Know when and how to use denormalization
Chapter 5 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 3 Physical Records Physical Record: A group of fields stored in adjacent memory locations and retrieved together as a unit Physical Record: A group of fields stored in adjacent memory locations and retrieved together as a unit Page: The amount of data read or written in one I/O operation Page: The amount of data read or written in one I/O operation Blocking Factor: The number of physical records per page Blocking Factor: The number of physical records per page
Chapter 5 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 4 Designing Physical Files Physical File: Physical File: A named portion of secondary memory allocated for the purpose of storing physical records A named portion of secondary memory allocated for the purpose of storing physical records Tablespace – named set of disk storage elements in which physical files for database tables can be stored Tablespace – named set of disk storage elements in which physical files for database tables can be stored Extent–contiguous section of disk space Extent–contiguous section of disk space
Chapter 5 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 5 File Organizations Technique for physically arranging records of a file on secondary storage Technique for physically arranging records of a file on secondary storage Factors for selecting file organization: Factors for selecting file organization: Fast data retrieval and throughput Fast data retrieval and throughput Efficient storage space utilization Efficient storage space utilization Protection from failure and data loss Protection from failure and data loss Minimizing need for reorganization Minimizing need for reorganization Accommodating growth Accommodating growth Security from unauthorized use Security from unauthorized use Types of file organizations Types of file organizations Sequential Sequential Indexed Indexed Hashed Hashed
Chapter 5 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6 Figure 5-7a Sequential file organization If not sorted Average time to find desired record = n/2 1 2 n Records of the file are stored in sequence by the primary key field values If sorted – every insert or delete requires a re-sort
Chapter 5 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 7 Figure 5-7b Indexed file organization uses a tree search Average time to find desired record = depth of the tree
Chapter 5 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 8 Indexed File Organizations Indexed File Organization: the storage of records with an index that allows software to locate individual records Indexed File Organization: the storage of records with an index that allows software to locate individual records Index: a table or other data structure used to determine (within a file) the location of records that satisfy some condition Index: a table or other data structure used to determine (within a file) the location of records that satisfy some condition Primary keys are automatically indexed Primary keys are automatically indexed Other fields or combinations of fields can also be indexed; these are called secondary keys (or nonunique keys) Other fields or combinations of fields can also be indexed; these are called secondary keys (or nonunique keys)
Chapter 5 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 9 Figure 5-7c Hashed file organization Hash algorithm Usually uses division- remainder to determine record position. Records with same position are grouped in lists.
Chapter 5 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 10 CUSTOMER HASH STORAGE STRUCTURE Bucket # 033 Jones …11 Smith … 123 Zale …12 Gaines …1 Dane … 235 Allen …2 Hafter … 314 Norris …25 Harris … 44 Caine …15 Elder … 516 Doan …5 Moen …38 Raines … 639 Vale …27 Hale …28 Tyne … 718 Clark …29 Kent … 88 Ames … 920 Lord …9 Cowell …42 Hart … 1032 Bundy …31 Madoff … Assume that a set of Customer records have been stored using the hashing method, where the storage location is determined by the remainder from dividing the customer ID by 11 (the # of buckets). Each bucket has enough room (slots) for 3 customer data records. If a bucket is full and a new record is added which belongs in that bucket, the record is placed in the next (higher) available bucket. If the last bucket is full, we roll around to the top bucket and store a record in the first available space. 1What bucket(s) would be accessed to retrieve data for customer IDs: 38, 27, 49 - and which would be found? 2In which bucket would the following records be stored if they were added to this hashed structure in the order shown: 36, 3, 10, 21
Chapter 5 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 11 Figure 6-8 Join Indexes–speeds up join operations a) Join index for common non-key columns a) Join index for matching foreign key (FK) and primary key (PK)
Chapter 5 © 2011 Pearson Education, Inc. Publishing as Prentice Hall Cluster Storage Store records from two (or more) tables together on the same physical record Store records from two (or more) tables together on the same physical record E.g. I may almost always retrieve the set of ITEM_SOLD associated with a SALE every time I retrieve a SALE record E.g. I may almost always retrieve the set of ITEM_SOLD associated with a SALE every time I retrieve a SALE record If so, I will store them as a cluster to speed up retrieval If so, I will store them as a cluster to speed up retrieval 12
Chapter 5 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 13
Chapter 5 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 14 Clustering Files In some relational DBMSs, related records from different tables can be stored together in the same disk area In some relational DBMSs, related records from different tables can be stored together in the same disk area Useful for improving performance of join operations Useful for improving performance of join operations Primary key records of the main table are stored adjacent to associated foreign key records of the dependent table Primary key records of the main table are stored adjacent to associated foreign key records of the dependent table e.g. Oracle has a CREATE CLUSTER command e.g. Oracle has a CREATE CLUSTER command
Chapter 5 © 2011 Pearson Education, Inc. Publishing as Prentice Hall 15 Rules for Using Indexes 1. Use on larger tables 2. Index the primary key of each table 3. Index search fields (fields frequently in WHERE clause) Foreign Keys? 4. Fields in SQL ORDER BY and GROUP BY commands 5. When there are >100 values but not when there are 100 values but not when there are <30 values