Indexing
Physical Disk Structure
Disk Example Four platters providing eight surfaces 213 = 8192 tracks per surface 28 = 256 sectors per track 29 = 512 bytes per sector Sector is physical unit while block is logical unit dependent on the DBMS Typically a block is as large as a sector
Disk Access Characteristics To read a block Head has to move to the track containing the block (seek time) The block has rotate under the head (rotational latency) Transfer time (negligible)
I/O model of computation If a block needs to be moved between disk and main memory then the time taken to perform the read or write is much larger than the time likely to be used manipulating the data in main memory. Thus number of blocks accessed is a good approximation of the time needed by the algorithm and should be minimized Minimize seek time + rotational latency
Sorting data in Secondary Storage Suppose a relation consists of 10,000,000 records Want to sort this relation on a “key” 100 bytes per record. Approx 1 gigabyte Assume 50 MB of main memory available Disk block is 4096 bytes. Relation takes 250,000 blocks. 12,800 blocks can fit in memory.
Sorting…cont If data fits in main memory, the fastest algorithms for sorting are variants of Quicksort. Preferred method is to minimize the number of times a block is brought into main memory
Merge-Sort example Step List 1 List 2 Output Start 1,3,4,9 2,5,7,8 none 1) 3,4,9 1 2) 5,7,8 1,2 3) 4,9 1,2,3 4) 9 1,2,3,4 5) 7,8 1,2,3,4,5 6) 8 1,2,3,4,5,7 7) 1,2,3,4,5,7,8 8) 1,2,3,4,5,7,8,9
Two-Phase Multiway Merge Sort Phase 1: Sort main-memory sized pieces of data and store them as sorted lists. Fill all memory with blocks from orig. list Time:read and write 250,000 blocks; 15 millisecond per block; 7500 seconds, 125 min Phase 2: Merge all the sorted lists into a single sorted list.
Phase 2 If we use the main-memory merge then we would have to read the data 2log(n) times where n is the number of sorted lists. The common strategy is Bring the first block of each sorted list into memory and have one output buffer Find smallest key among remaining keys in all lists Move the smallest element to the first available position in the buffer If output block is full write buffer to disk and reinitialize to empty If the block from where the smallest element was taken is full, then bring in the next block from the same sorted list into the buffer Cost of phase 2 is again is 125 min; Total cost is 250 min
Motivation SQL is declarative How should the following queries be processed: SELECT * from R SELECT * from R where R.A = ’10’
Index Function Block Holding records IIndex value Matching Records
Book Analogy Just remember the book index Index is a set of pages (a separate file) with pointers (page numbers) to the data page which contains the value Also note difference between “search key” and “primary key”
Types of Indexes Simple indexes on sorted files Secondary indexes on unsorted files B-trees on any type of file
Sequential File A file sorted on the attribute(s) of the index. Very useful when the search attribute is the primary key. Build a dense index on the file. Called dense because every key from the data file is represented in the index. Note the index only contains the key and pointer of the data file and thus is usually much smaller than the data file.
Example Suppose a relation has 1,000,000 records A block is of size 4096 bytes. 10 records fit in one block. Thus size of data is > 400 MB An index will have a 12 byte representation for each record. Thus will fit 100 index entries in a block. Index will fit in 10000 blocks (40 MB). Log2(10000) ~= 14 (Thus 14 I/O’s for lookup)
Sparse Index Instead of one index record per data record, use one index record per block of data record. This is called a sparse index. Suppose query is “Is there a record with key value K”. Just check in dense index For sparse index a data block has to be retrieved
Secondary Indexes When you go to a library, books are sorted by Call Number. The call numbers ranges on the shelves are like a sparse index. Now what if you want to search by “last name” and not call number. You build a secondary structure on the books. Secondary structure does not determine or influence the place of data records. You can have several secondary structures on one file.
Example SELECT book.title FROM books WHERE books.lastname=“Codd” Create secondary index by SQL statement CREATE INDEX LNIndex on Books (lastname)’
Question? Can a secondary index be sparse?
B-Trees B-Trees are the most commonly used indexing structure in commercial systems. Several variants are available, but the most popular is called the B+ tree. Roughly speaking Ord Array (search: O(Log(n)), update:O(n)) Linked List (search:O(n), update: O(1)) Tree (search: Log(n), update:Log(n))
Structure of B-Tree Organizes its blocks into a tree Tree is Balanced: all paths from leaf to root have the same length There are three types of nodes Root, Interior Leaf Associated with each tree is a layout parameter n (n search keys, n+1 pointers)
Example Interior Node Leaf Node 20 31 52 57 81 95 Next leaf in sequence To keys K>=52 To record with key 57 To keys K < 20 To keys 31 <= K < 52 To record with key 81 To record with key 95 To keys 20 <= K < 31 Suppose block size is 4096 bytes. Each key is 4 bytes and pointer is 8 bytes. Want 4*n + 8*(n+1) <= 4096; n = 340
Example 13 7 23 31 43 2 3 5 11 17 19 29 37 41 47
17 7 - 37 43 2 3 5 7 13 13 17 23 23 23 23 37 41 43 47
Efficiency of B-Trees Earlier we saw up to 340 key-pointer pairs. Assume average block has half-occupancy =~ 255 1 root block, 255 child nodes and 255*255 leaf nodes. In the leaf we will have 2553 = 16.6 million records Thus upto 4 I/O to access any record!
B-Tree Animation Go to http://slady.cz/java/bt/ to see B-Tree animation.