CS 405G: Introduction to Database Systems 20. Concurrent control, Storage
10/25/2015Chen University of Kentucky2 Conflicting operations Two operations on the same data item conflict if at least one of the operations is a write r(X) and w(X) conflict w(X) and r(X) conflict w(X) and w(X) conflict r(X) and r(X) do not r/w(X) and r/w(Y) do not Order of conflicting operations matters E.g., if T 1.r(A) precedes T 2.w(A), then conceptually, T 1 should precede T 2
10/25/2015Chen University of Kentucky3 Locking Rules If a transaction wants to read an object, it must first request a shared lock (S mode) on that object If a transaction wants to modify an object, it must first request an exclusive lock (X mode) on that object Allow one exclusive lock, or multiple shared locks Mode of lock(s) currently held by other transactions Mode of the lock requested Grant the lock? Compatibility matrix SX SYN XNN
10/25/2015Chen University of Kentucky4 Basic locking is not enough T 1 T 2 r(A) w(A) r(A) w(A) r(B) w(B) r(B) w(B) lock-X(A) lock-X(B) unlock(B) unlock(A) lock-X(A) unlock(A) unlock(B) lock-X(B) Possible schedule under locking But still not conflict-serializable! T1T1 T2T2 Read 100 Write Read 101 Write 101*2 Read 100 Write 100*2 Read 200 Write Add 1 to both A and B (preserve A=B) Multiply both A and B by 2 (preserves A=B) A B!A B!
10/25/2015Chen University of Kentucky5 Two-phase locking (2PL) All lock requests precede all unlock requests Phase 1: obtain locks, phase 2: release locks T 1 T 2 r(A) w(A) r(A) w(A) r(B) w(B) r(B) w(B) lock-X(A) lock-X(B) unlock(B) unlock(A) lock-X(A) lock-X(B) Cannot obtain the lock on B until T 1 unlocks T 1 T 2 r(A) w(A) r(A) w(A) r(B) w(B) r(B) w(B) 2PL guarantees a conflict-serializable schedule
10/25/2015Chen University of Kentucky6 Problem of 2PL T 2 has read uncommitted data written by T 1 If T 1 aborts, then T 2 must abort as well Cascading aborts possible if other transactions have read data written by T 2 Even worse, what if T 2 commits before T 1 ? Schedule is not recoverable if the system crashes right after T 2 commits T 1 T 2 r(A) w(A) r(A) w(A) r(B) w(B) r(B) w(B) Abort!
10/25/2015Chen University of Kentucky7 Strict 2PL Only release locks at commit/abort time A writer will block all other readers until the writer commits or aborts Used in most commercial DBMS
Deadlock Detection Create and maintain a “waits-for” graph Periodically check for cycles in graph 10/25/2015Chen University of Kentucky8
Deadlock Detection (Continued) Example: T1: S(A), S(D), S(B) T2: X(B) X(C) T3: S(D), S(C), X(A) T4: X(B) T1T2 T4T3 Deadlock! 10/25/2015Chen University of Kentucky9
Deadlock Prevention Assign priorities based on timestamps. Say Ti wants a lock that Tj holds Two policies are possible: Wait-Die: If Ti has higher priority, Ti waits for Tj; otherwise Ti aborts Wound-wait: If Ti has higher priority, Tj aborts; otherwise Ti waits 10/25/2015Chen University of Kentucky10
10/25/2015Chen University of Kentucky11 Storage It’s all about disks! That’s why we always draw databases as And why the single most important metric in database processing is the number of disk I/Os performed
The Storage Hierarchy Source: Operating Systems Concepts 5th Edition Main memory (RAM) for currently used data Disk for the main database (secondary storage). Tapes for archiving older versions of the data (tertiary storage). Smaller, Faster Bigger, Slower 10/25/201512Chen University of Kentucky
10/25/2015Chen University of Kentucky13 Jim Gray’s Storage Latency Analogy: How Far Away is the Data?
10/25/2015Chen University of Kentucky14 A typical disk Spindle rotation Spindle Platter Tracks Arm movement Disk arm Disk head Cylinders “Moving parts” are slow
10/25/2015Chen University of Kentucky15 Disk access time Sum of: Seek time: time for disk heads to move to the correct cylinder Rotational delay: time for the desired block to rotate under the disk head Transfer time: time to read/write data in the block (= time for disk to rotate over the block)
10/25/2015Chen University of Kentucky16 Random disk access Seek time + rotational delay + transfer time Average seek time Time to skip one half of the cylinders? Not quite; should be time to skip a third of them (why?) “Typical” value: 5 ms Average rotational delay Time for a half rotation (a function of RPM) “Typical” value: 4.2 ms (7200 RPM) Typical transfer time.08msec per 8K block
10/25/2015Chen University of Kentucky17 Sequential Disk Access Improves Performance Seek time + rotational delay + transfer time Seek time 0 (assuming data is on the same track) Rotational delay 0 (assuming data is in the next block on the track) Easily an order of magnitude faster than random disk access!
10/25/2015Chen University of Kentucky18 Performance tricks Disk layout strategy Keep related things (what are they?) close together: same sector/block ! same track ! same cylinder ! adjacent cylinder Double buffering While processing the current block in memory, prefetch the next block from disk (overlap I/O with processing) Disk scheduling algorithm Track buffer Read/write one entire track at a time Parallel I/O More disk heads working at the same time
10/25/2015Chen University of Kentucky19 Records and Files A row in a table, when located on disks, is called A record Two types of record: Fixed-length Variable-length
10/25/2015Chen University of Kentucky20 In an abstract sense, a file is A set of “records” on a disk In reality, a file is A set of disk pages Each record lives on A page Physical Record ID (RID) A tuple of
Files Higher levels of DBMS operate on records, and files of records. FILE: A collection of pages, containing a collection of records. Record = row in a table Must support: insert/delete/modify record fetch a particular record (specified using record id) scan all records (possibly with some conditions on the records to be retrieved) 10/25/201521Chen University of Kentucky
Unordered (Heap) Files Simplest file structure contains records in no particular order. As file grows and shrinks, disk pages are allocated and de-allocated. To support record level operations, we must: keep track of the pages in a file keep track of free space on pages keep track of the records on a page There are many alternatives for keeping track of this. We’ll consider 2 10/25/201522Chen University of Kentucky
Heap File Implemented as a List The header page id and Heap file name must be stored someplace. Database “catalog” Each page contains 2 `pointers’ plus data. Header Page Data Page Data Page Data Page Data Page Data Page Data Page Pages with Free Space Full Pages 10/25/201523Chen University of Kentucky
Heap File Using a Page Directory DBMS must remember where the first directory page is located. The entry for a page can include the number of free bytes on the page. The directory itself is a collection of pages and shown as a linked list. Much smaller than linked list of all HF pages! Data Page 1 Data Page 2 Data Page N Header Page DIRECTORY 10/25/201524Chen University of Kentucky
Trade-off Compared to linked list, the directory based approaches have: Pros: Fast access to a particular record Cons: Extra space 10/25/2015Chen University of Kentucky25
Summary Disks provide cheap, non-volatile storage. Random access, but cost depends on the location of page on disk; important to arrange data sequentially to minimize seek and rotation delays. 10/25/201526Chen University of Kentucky