File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

Slides:



Advertisements
Similar presentations
Disk Storage, Basic File Structures, and Hashing
Advertisements

Databasteknik Databaser och bioinformatik Data structures and Indexing (II) Fang Wei-Kleiner.
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
Hashing and Indexing John Ortiz.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
1 Lecture 8: Data structures for databases II Jose M. Peña
Advance Database System
Indexing Techniques. Advanced DatabasesIndexing Techniques2 The Problem What can we introduce to make search more efficient? –Indices! What is an index?
B+-tree and Hashing.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Efficient Storage and Retrieval of Data
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
Physical Database Design File Organizations and Indexes ISYS 464.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical.
CS4432: Database Systems II
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
1 Lecture 7: Data structures for databases I Jose M. Peña
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 5, 6 of Elmasri “ How index-learning turns no student.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How are data stored? –physical level –logical level.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Disk Storage, Basic File Structures, and Hashing
1 Physical Data Organization and Indexing Lecture 14.
Basic File Structures and Hashing Lectured by, Jesmin Akhter, Assistant professor, IIT, JU.
Announcements Exam Friday Project: Steps –Due today.
Physical Database Design File Organizations and Indexes ISYS 464.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
Chapter 9 Disk Storage and Indexing Structures for Files Copyright © 2004 Pearson Education, Inc.
External data structures
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Dr Gordon Russell, Napier University Unit Storage Structures 1 Storage Structures Unit 4.3.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.
Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright © 2004 Pearson Education, Inc.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Appendix C File Organization & Storage Structure.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 5 Record Storage and Primary File Organizations
Appendix C File Organization & Storage Structure.
1 CSCE 520 Test 2 Info Indexing Modified from slides of Hector Garcia-Molina and Jeff Ullman.
CS4432: Database Systems II
Select Operation Strategies And Indexing (Chapter 8)
Indexing Structures for Files and Physical Database Design
Indexing and hashing.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Oracle SQL*Loader
Disk Storage, Basic File Structures, and Hashing
Database Management Systems (CS 564)
9/12/2018.
Disk Storage, Basic File Structures, and Hashing
Disk storage Index structures for files
Chapter 6: Physical Database Design and Performance
Database Design and Programming
Lecture 20: Indexes Monday, February 27, 2006.
Presentation transcript:

File Organizations and Indexes ISYS 464

Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector, cylinder (tracks with the same diameter on the various disks) Page, block, or physical record: It is the unit of transfer between disk and primary storage, and vice versa. Blocking factor: the number of records in a block

Disk Speed Rpm: rounds per minute –2400, 3600, 7200 rpm Ex rpm, then each round takes 1/2400 min/round. –60*1000/2400 = 25 msec/r

Time Required to Read One Block Seek time Rotational delay –Half round Block transfer time

Example A student file contains 20,000 records, each record has 113 bytes, assume each block is 512 bytes, how many blocks needed? –Blocking factor = floor(Block size/record size) = floor(512/113)=4 –Number of blocks = ceiling(number of records/blocking factor) = 20,000/4=5,000 blocks

Linear Search, Binary search, and Direct Access Assume seek = s, rotational delay = r, block transfer time = tr, and file size is 5000 blocks, then the average time to do a linear search is: s + r + tr*(half of blocks) = s + r *tr If the file is ordered by a key field, then the time to do a binary search is: (s + r + tr) * Log If index is available to enable direct access: s + r + tr

Linear Search and Binary search Assume seek = s, rotational delay = r, block transfer time = tr, and file size is 5000 blocks, then the average time to do a linear search is: s + r + tr*(half of blocks) = s + r *tr Binary search: If the file is ordered by a key field, then the time to do a binary search is:. Number of blocks accessed given n blocks: Log 2 n. (s + r + tr) * Log

Updating a Record Read the block into main memory. Change the record in main memory. Write the block back to disk.

File Organizations Technique for physically arranging records of a file on secondary storage Factors for selecting file organization: –Fast data retrieval and throughput –Efficient storage space utilization –Protection from failure and data loss –Minimizing need for reorganization –Accommodating growth –Security from unauthorized use Types of file organizations –Sequential –Indexed –Hashed

Access Method The steps involved in storing and retrieving records from a file. –Searching and updating

Unordered Files (Heap Files) Records are placed in the file in the same order as they are inserted. Searching: must do a linear search if index is not available. Updating: –Insertion: Read the last page, append to the last page, then write the page back. –Modification: Search and read the block to main memory. Write the block back after making changes. –Deletion: Mark the record for deletion (deletion flag) and periodically reorganize the file.

Ordered Files Enable binary search Insertion: May need a temporary overflow file and periodically the overflow file is merged with the ordered file. Deletion: May need periodical reorganization.

Hash Files (Direct Files) The page a record is to be stored is determined by a hash function. Hash function calculates the address of the page based on the key field of the file: –Address = H(Key) Typical hash function: division/remainder: –0 <= Key Mod M <= M-1 –Where M is the number of blocks allocated to this file.

Disk blocks Block Address H(K) -> Block number Block address: Physical address

Hash File Example 8 blocks, each block holds 2 records Hash function: Key Mod 8 Record keys: –Key = 1821, Key Mod 8 = 5 –7115, 3 –2428,4 –4750,6 –1620, 4 –4692,4

Collision Resolution Collision: When a record’s home block is full. Open addressing (linear probing): Place the record in the first available block.

Searching a Hash File Home block = H(SearchKey) If found in the home block then search successful Else –Search the next block until found or reach a block with empty space

Hash File Performance Average Search Length = (Total # of blocks accessed to find all records)/(The number of records in the file) Using the previous example: –( )/6 = 7/6 Time needed to find a record in this file: –(s + r + tr) * 7/6

Factors Affecting Hash File Performance Hash file should spread the records evenly over the disk space. Use of a low load factor: –(# of records)/(# of available spaces) Allow each block to hold more records

Limitations of Hash File Cannot be accessed by other order: –Direct access only Fixed amount of space allocated to the file: –Static hashing –Waste space, hard to grow Inappropriate for retrievals based on ranges of values: –Find EmpID = 123 –Find EmpID > 123

Index A data structure that allows the DBMS to locate particular records in a file more quickly. Index file: –IndexField + RecordPointer –Ordered according to the indexing field

Types of Index Primary index: Index on the primary key field. Secondary index: Index on a non-key field.

Index on Ordering Key Field S10, … S05, … S07, … S20, … S12, … S15, … S30, … S25, … S27, … S05 S12 S25 Block ptr SID Note: The number of index entries equals the number of file blocks.

Index on NonOrdering Key Field S12, … S25, … S47, … S20, … S22, … S05, … S30, … S33, … S27, … S05 S12 S20 Record ptr S22 SID Note: The number of index entries equals the number of records.

Index on Ordering NonKey Field (Cluster Index) S12, … S25, … S47, … S20, … S22, … S05, … S30, … S33, … S27, … ACCT CIS FIN Block ptr SIDMajor ACCT CIS FIN Major

Index on NonOrdering NonKey Field S12, … S25, … S47, … S20, … S22, … S05, … S30, … S33, … S27, … ACCT CIS Record ptr CIS SIDMajor CIS FIN ACCT CIS FIN MKT CIS FIN Major CIS FIN

Physical pointer vs Logical Pointer When index on the key field is available, index on nonkey field can use record keys as logical pointers. S12, … S25, … S47, … S20, … S22, … S05, … S30, … S33, … S27, … ACCT CIS SID CIS SIDMajor CIS FIN ACCT CIS FIN MKT CIS FIN CIS FIN Major S12 S22 S25 S05 S27 S47

Physical pointer vs Logical Pointer When index on the key field is available, index on nonkey field can use record keys as logical pointers. S12, … S25, … S47, … S20, … S22, … S05, … S30, … S33, … S27, … ACCT CIS SID CIS SIDMajor CIS FIN ACCT CIS FIN MKT CIS FIN CIS FIN Major S12 S22 S25 S05 S27 S47 SID is a logical pointer The location of S12 can be found by search the primary index.

Searching with Index A file with 30,000 records, each record has 100 bytes, block size is 1024 bytes:. Data file blocking factor = floor(1024/100)=10. Data file blocks = ceiling(30,000/10)=3000 blocks If key field has 9 bytes, and physical pointer has 6 bytes, so each index entry has 15 bytes:. Index file blocking factor = floor(1024/15) = 68. Index file blocks = ceiling(30,000/68) = 442 blocks Time to search for a record with the index is:. Binary search the index = Log One data file access. Time = (s + rd + tr) * (1 + Log )

Tree Nodes: –Regular nodes (internal nodes): nodes with parent and children –Root node: node with no parent –Leaf nodes: nodes with no children Level: length of the path from the root to a node. –Root: level 0 Balanced tree: All leaf nodes are at the same level.

B -Trees If a node can store n pointers (n-1 keys), then each node except root and leaf nodes has at least ceiling(n/2) pointers. Each key in the tree represents (key + RecordPointer) All leaf nodes are at the same level. When a node split, it splits into two nodes at the same level, and the middle key is moved up to its parent node.

B-Tree Examples A B-Tree with 3 pointers (2 keys) in a node, insert keys: 8, 5, 1,7, 3, 12, 9, 6, 4 A B-Tree with 4 pointers (3 keys) in a node, insert keys: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 100, 95

B+ Trees Record pointers are stored only at the leaf nodes. –More keys in a node, shorter path Every key must exist at the leaf nodes. Every leaf node contains pointer to the next leaf node. Node Split: –Leaf node split: keep the middle key in the left node and duplicate it in the parent node. –Internal node split: move up the middle key as B-Tree.

B+ Tree Examples A B+ Tree with 3 pointers (2 keys) in a node, insert keys: 8, 5, 1, 7, 3, 12, 9, 6 A B+ Tree with 4 pointers (3 keys) in a node, insert keys: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 100, 95

B+ Tree Advantages Shorter tree: Because internal nodes do not include record pointers, internal nodes can have more keys. All keys in the leaf nodes are already in sorted order. B+ Tree can be used to store data file.

Figure 6-8 Bitmap index index organization Bitmap saves on space requirements Rows - possible values of the attribute Columns - table rows Bit indicates whether the attribute of a row has the values The bitmap index is used where the values of a field repeats very frequently, it is not used for primary key index.

Too many indexes will slow down update operations.

Rules for Using Indexes 1. Use on larger tables 2. Index the primary key of each table 3. Index search fields (fields frequently in WHERE clause) 4. Fields in SQL ORDER BY and GROUP BY commands 5. When there are >100 values but not when there are <30 values

Rules for Using Indexes (cont.) 6. Avoid use of indexes for fields with long values; perhaps compress values first 7. DBMS may have limit on number of indexes per table and number of bytes per indexed field(s) 8. Null values will not be referenced from an index 9. Use indexes heavily for non-volatile databases; limit the use of indexes for volatile databases Why? Because modifications (e.g. inserts, deletes) require updates to occur in index files

Redundant Arrays of Inexpensive (Independent) Disks RAID is a method to group more than one drive and make them appear as a single drive.

Disk 0Disk 1Disk 2Disk 3 1A2A3A4A 1B2B3B4B 1C2C3C4C RAID 0 No redundancy Best write performance disk can be accessed in parallel Unreliable Creating a stripe set without parity: Spreads the data out over various disks

Figure 6-10 RAID with four disks and striping Here, pages 1-4 can be read/written simultaneously

RAID 1 Mirror set –Primary disk and mirror disk –2 writes –Data can be accessed from either disk. –Fault tolerance

RAID 5 Creating a stripe set with parity Disk 0Disk 1Disk 2Disk 3 ParityA1A2A3A 1BParity B2B3B 1C 1D 2CParity C3C 2C3DParity D

Exclusive OR, XOR Condition 1 Condition 2 Condtion 1 XOR Condition 2 TTF TFT FTT FFF

Creating Parity with XOR Disk 0Disk 1Disk 2Disk 3 ParityA1A2A3A 1A=1010, 2A=0100, 3A=1100 ParityA=(1A XOR 2A) XOR 3A = 0010 If Disk 0 fails: Recover by using =(1A XOR 2A) XOR 3A If Disk 1 fails: Recover by using =(ParityA XOR 2A) XOR 3A