M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #25 Matthew P. Johnson Stern School of Business, NYU Spring, 2004
M.P. Johnson, DBMS, Stern/NYU, Sp Agenda Previously: Hardware & sorting Next: Indices Failover/recovery Data warehousing & mining Websearch Hw3 due Thursday no extensions! 1-minute responses XML links up
M.P. Johnson, DBMS, Stern/NYU, Sp Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution plan Record, index requests Page commands Read/write pages Transaction manager: Concurrency control Logging/recovery Transaction commands Let’s get physical
M.P. Johnson, DBMS, Stern/NYU, Sp Hardware/memory review DBs won’t fit in RAM Disk access is O(100,000) times slower than RAM RAM Model of Computation Single ops about same as single memory access I/O Model of Computation We read/write one block (4k) at a time Measure time in # disk accesses Ignore processor operations – O(100,000) times faster Regular Mergesort Divide in half each time and recurse
M.P. Johnson, DBMS, Stern/NYU, Sp Hardware/memory review Big problem: how to sort 1GB with 1MB of RAM? Can use MS but must read/write all data 19+ times Soln: TPMMS (External MergeSort) 1. Sort data in 1MB chunks 2. Sort 249 of the chunks into a 249MB chunk 3. Sort 249 of the 249MB chunks… Each iteration: RAM size/blocksize * last-chunk-size
M.P. Johnson, DBMS, Stern/NYU, Sp External Merge-Sort Phase one: load 1MB in memory, sort Result: SIZE/M lists of length M bytes (1MB) M bytes of main memory Disk... M/R records
M.P. Johnson, DBMS, Stern/NYU, Sp Phase Two Merge M/B – 1 lists into a new list M/B-1 = 1MB / 4kb -1 = 250 Result: lists of size M *(M/B – 1) bytes 249 * 1MB ~= 250 MB M bytes of main memory Disk... Input M/B Input 1 Input 2.. Output
M.P. Johnson, DBMS, Stern/NYU, Sp Phase Three Merge M/B – 1 lists into a new list Result: lists of size M*(M/B – 1) 2 bytes 249 * 250 MB ~= 62,500 MB = 625 GB M bytes of main memory Disk... Input M/B Input 1 Input 2.. Output
M.P. Johnson, DBMS, Stern/NYU, Sp Next topic: File organization 1 Heap files: unordered list of rows One damn row after another. All row queries are easy: SELECT * FROM T; Insert is easy: just add to end Unique/subset queries are hard: Must test each row
M.P. Johnson, DBMS, Stern/NYU, Sp File organization 2 Sorted file: sort rows on some fields Since datafile likely to be large, must use an external sort like external MS Equality, range select now easier: Do binary search to find first Walk through rows until one fails test Insert, delete now hard Must move avg of half rows forward or back Possible solns: Leave empty space Use “overflow” pages
M.P. Johnson, DBMS, Stern/NYU, Sp Modifications Insert: File is unsorted easy File is sorted: Is there space in the right block? Then store it there If anything else fails, create overflow block Delete: Free space in block Maybe be able to eliminate an overflow block If not, use a tombstone (null record) Update: new rec is shorter than prev. easy If it’s longer, need to shift records, create overflow blocks
M.P. Johnson, DBMS, Stern/NYU, Sp Overflow Blocks After a while the file starts being dominated by overflow blocks: time to reorganize Block n-1 Block n Block n+1 Overflow
M.P. Johnson, DBMS, Stern/NYU, Sp File organization 3 Datafile (un/sorted) + index Speeds searches based on its fields Any subset/list of table’s fields these called search key not to be confused with table’s keys/superkeys Idea: trade disk space for disk time also may cost processor/RAM time Downsides: Takes up more space Must reflect changes in data
M.P. Johnson, DBMS, Stern/NYU, Sp Classification of indices Primary v. secondary Clustered v. unclustered Dense v. sparse Index data structures: B-trees Hash tables More advanced types: Function-based indices R-trees Bitmap indices
M.P. Johnson, DBMS, Stern/NYU, Sp Dense indices Index has entry for each row NB: index entries are smaller than rows more index entries per block than rows
M.P. Johnson, DBMS, Stern/NYU, Sp Sparse indices Why make sparse? Fewer disk accesses Bin search on shorter list – log(shorter N) Analogy: “thumb” index in large dictionaries Trade disk space for RAM space and comp. Time May fit in RAM
M.P. Johnson, DBMS, Stern/NYU, Sp Secondary/unclustered indices To index other attributes than primary key Always dense (why?)
M.P. Johnson, DBMS, Stern/NYU, Sp Clustered v. unclustered Clustered means: data and index sorted same way Sorted on the fields the index is indexing Each index entry stored “near” data entry Sparse indices must be clustered Unclustered indices must be dense Clustered indices can reduce disk latency Related data stored together – less far to go Good for range queries
M.P. Johnson, DBMS, Stern/NYU, Sp Primary v. secondary Primary indexes usually clustered Only one per table Use PRIMARY KEY Secondary indexes usually unclustered many allowed per table Use UNIQUE or CREATE INDEX
M.P. Johnson, DBMS, Stern/NYU, Sp Partial key searches Situ: index on fields a 1,a 2,a 3 ; we search on fields a i, a j When will this work? 1. i and j must be 1 and 2 (in either order) Searched fields must be a prefix of the indexed fields E.g.: lastname,firstname in phone book 2. Index must be clustered
M.P. Johnson, DBMS, Stern/NYU, Sp New topic: Hash Tables I/O model hash tables are much like main memory ones Hash basics: There are n buckets A hash function f(k) maps a key k to {0, 1, …, n-1} Store in bucket f(k) a pointer to record with key k Difference for I/O model/DBMS: bucket size = 1 block use overflow blocks when needed
M.P. Johnson, DBMS, Stern/NYU, Sp Assume: 10 buckets, each storing 5 keys and pointers (only 2 shown) h(0)=0 h(25)=h(5)=5 h(83)=h(43)=3 h(99)=h(9)=9 Example hash table
M.P. Johnson, DBMS, Stern/NYU, Sp Search for 82: Compute h(82)=2 Read bucket 2 1 disk access Hash table search
M.P. Johnson, DBMS, Stern/NYU, Sp Place in corresponding bucket, if space Insert 42… Hash table insertion
M.P. Johnson, DBMS, Stern/NYU, Sp Hash table insertion Create overflow block, if no space Insert 91… More over- flow blocks may be added as necessary
M.P. Johnson, DBMS, Stern/NYU, Sp Hash table performance Excellent if no overflow blocks For in-memory indices, hash tables usually preferred Performance degrades as ratio of keys/(n*blocksize) increases
M.P. Johnson, DBMS, Stern/NYU, Sp Hash functions Lots of ideas for “good” functions, depending on situation One obvious idea: h(x) = x mod n Every x mapped to one of 0, 1, …, n-1 Roughly 1/n th of x’s mapped to each bucket Does this work for equality search? Does this work for range search? Does this work for partial-key search? Good functions of hashing passwords? What was the point of hashing in that case?
M.P. Johnson, DBMS, Stern/NYU, Sp Extensible hash table Number of buckets grows to prevent overflows Also used for crypto, hashing passwords, etc. And: Java’s HashMap and object.hashCode()HashMapobject.hashCode()
M.P. Johnson, DBMS, Stern/NYU, Sp New topic: B-trees Saw connected, rooted graphs before: XML graphs Trees are connected, acyclic graphs Saw rooted trees before: XML docs directory structure on hard drive Organizational/management charts B-trees are one kind of rooted tree
M.P. Johnson, DBMS, Stern/NYU, Sp Twenty Questions What am I thinking of? Large space of possible choices Can ask only yes/no questions Each gives <=1 bit Strategy: ask questions that divide searchspace in half gain full bit from each question log 2 (1,000,000 ~= 2 20 ) = 20
M.P. Johnson, DBMS, Stern/NYU, Sp BSTs Very simple data structure in CS: BSTs Binary Search Trees Keep balanced Each node ~ one item Each node has two children: Left subtree: < Right subtree: >= Can search, insert, delete in log time log 2 (1MB = 2 20 ) = 20
M.P. Johnson, DBMS, Stern/NYU, Sp Search for DBMS Big improvement: log 2 (1MB) = 20 Each op divides remaining range in half! But recall: all that matters is #disk accesses 20 is better than 2 20 but: Can we do better?
M.P. Johnson, DBMS, Stern/NYU, Sp BSTs B-trees Like BSTs except each node ~ one block Branching factor is >> 2 Each access divides remaining range by, say, 300 B-trees = BSTs + blocks B+ trees are a variant of B-trees Data stored only in leaves Leaves form a (sorted) linked list Better supports range queries Consequences: Much shorter depth Many fewer disk reads Must find element within node Trades CPU/RAM time for disk time
M.P. Johnson, DBMS, Stern/NYU, Sp B-tree search efficiency With params: block=4k integer = 4b, pointer = 8b the largest n satisfying 4n+8(n+1) <= 4096 is n=340 Each node has keys assume on avg has ( )/2=255 Then: 255 rows depth = 1 = 64k rows depth = 2 = 16M rows depth = 3 = 4G rows depth = 4
M.P. Johnson, DBMS, Stern/NYU, Sp Next time Next: Failover For next time: reading online Hw3 due next time no extensions! Now: one-minute responses