Oct 29, 2001CSE 373, Autumn External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer than a machine instruction. The RAM model does not account for disk I/O. memory disk 128 MB fast, expensive 60 GB slow, cheap
Oct 29, 2001CSE 373, Autumn Disks, continued The difference between memory speed and disk speed is increasing. Example: State of Florida driving records (256 bytes). 10,000,000 items. 6 disk accesses per second on a time-sharing system. unbalanced binary search tree: possibly 10,000,000 accesses. BST: on avg. 32 accesses (5 sec.) AVL: worst: 1.44 log n typical case: log n, 25 accesses (4 sec.)
Oct 29, 2001CSE 373, Autumn Disk accesses Goal: reduce the number of disk accesses. We are willing to do more complicated computations in memory in order to save disk time. Idea: increase the branching of the tree so that the height is decreased. Defn: An M-ary search tree allows up to M children per node.
Oct 29, 2001CSE 373, Autumn B-Trees 1.All the data items are stored at the leaves. 2.The non-leaf nodes store up to M-1 keys. The ith key represents the smallest key in subtree i+1. 3.The root is either a leaf of has between 2 and M children. 4.All non-leaf nodes (except the root) have between M/2 and M children. 5.All leaves are at the same depth and have between L/2 and L data items.
Oct 29, 2001CSE 373, Autumn B-Trees: Choices Choose M and L based on the size of the keys K and on the size of the record R. Suppose a disk block is of size B (bytes). Choose M so that a non-leaf node fits into one block: B (M-1) · K + M · 4 Choose L so that a leaf node fits into one block: B L · R accesses: log 2 N vs. log M/2 N
Oct 29, 2001CSE 373, Autumn Hash Tables Constant time accesses! A hash table is an array of some fixed size, usually a prime number. General idea: key space (e.g., strings) 0 … TableSize –1 hash func. h(K) hash table
Oct 29, 2001CSE 373, Autumn Desirable Properties We want a hash function to: 1.be simple/fast to compute, 2.map different keys to different cells, (impossible – why?) 3.have keys distributed evenly among cells. Idea: If #1 and #3 are true and the hash table is not very full, then it should be fast to do a find.
Oct 29, 2001CSE 373, Autumn Example key space = integers h(K) = K mod We lose all ordering information: findMin, findMax, inorder traversal, printing items in sorted order.
Oct 29, 2001CSE 373, Autumn Example 2 key space = strings s = s 0 s 1 s 2 … s k-1 h(s) = s 0 mod TableSize BAD HASH FUNCTION h(s) = mod TableSize BETTER HASH FUNCTION
Oct 29, 2001CSE 373, Autumn Collision Resolution Separate chaining: All keys that map to the same hash value are kept in a list