Download presentation
Presentation is loading. Please wait.
Published byAldous Harmon Modified over 8 years ago
1
CS 245Notes 21 Database System Principles Notes 02: Hardware
2
CS 245Notes 22 Outline Hardware: Disks Access Times Example - Megatron 747 Optimizations Other Topics: –Storage costs –Using secondary storage –Disk failures
3
CS 245Notes 23 10 nanoseconds (10 -8 seconds) 10-100 nanoseconds (10 -8 to 10 -7 seconds) 10-30 miliseconds (0.1 to 0.3 seconds) 1000 times slower than secondary memory Volatile Non-volatile The Memory Hierarchy
4
CS 245Notes 24 Storage Cost 10 -9 10 -6 10 -3 10 -0 10 3 access time (sec) 10 15 10 13 10 11 10 9 10 7 10 5 10 3 cache electronic main electronic secondary magnetic optical disks online tape nearline tape & optical disks offline tape typical capacity (bytes) from Gray & Reuter
5
CS 245Notes 25 Storage Cost 10 -9 10 -6 10 -3 10 -0 10 3 access time (sec) 10 4 10 2 10 0 10 -2 10 -4 cache electronic main electronic secondary magnetic optical disks online tape nearline tape & optical disks offline tape dollars/MB from Gray & Reuter
6
CS 245Notes 26 DBMS Data Storage APP1 APPn ……
7
CS 245Notes 27 P MC Typical Computer Secondary Storage...
8
CS 245Notes 28 Processor Fast, slow, reduced instruction set, with cache, pipelined… Speed: 100 500 1000 MIPS Memory Fast, slow, non-volatile, read-only,… Access time: 10 -6 10 -9 sec. 1 s 1 ns
9
CS 245Notes 29 Secondary storage Many flavors: - Disk: Floppy (hard, soft) Removable Packs Winchester Ram disks Optical, CD-ROM… Arrays - TapeReel, cartridge Robots
10
CS 245Notes 210 Focus on: “Typical Disk” Terms: Platter, Head, Actuator Cylinder, Track Sector (physical), Block (logical), Gap …
11
CS 245Notes 211
12
CS 245Notes 212 Top View 0 track Sector is unit of error correction/detection
13
CS 245Notes 213 Disk Controller 1. Buffer data in and out of disk 2. Schedule the disk heads 3. Manage the “bad blocks” so they are not used
14
CS 245Notes 214 “Typical” Numbers Diameter: 1 inch 15 inches Cylinders:100 2000 Surfaces:1 (CDs) (Tracks/cyl) 2 (floppies) 30 Sector Size:512B 50K Capacity:360 KB (old floppy) 400 GB (I use)
15
CS 245Notes 215 Disk Access Time block x in memory ? I want block X
16
CS 245Notes 216 Time = Seek Time + Rotational Delay + Transfer Time + Other
17
CS 245Notes 217 Seek Time 3 or 5x x 1N Cylinders Traveled Time
18
CS 245Notes 218 Average Random Seek Time SEEKTIME (i j) S = N(N-1) N N i=1 j=1 j i
19
CS 245Notes 219 Average Random Seek Time SEEKTIME (i j) S = N(N-1) N N i=1 j=1 j i “Typical” S: 10 ms 40 ms
20
CS 245Notes 220 Rotational Delay Head Here Block I Want
21
CS 245Notes 221 Average Rotational Delay R = 1/2 revolution “typical” R = 8.33 ms (3600 RPM)
22
CS 245Notes 222 Transfer Rate: r “typical” r: 1 3 MB/second transfer time: block size r
23
CS 245Notes 223 Other Delays CPU time to issue I/O Contention for controller Contention for bus, memory
24
CS 245Notes 224 Other Delays CPU time to issue I/O Contention for controller Contention for bus, memory “Typical” Value: 0
25
CS 245Notes 225 So far: Random Block Access What about: Reading “Next” block?
26
CS 245Notes 226 If we do things right (e.g., Double Buffer, Stagger Blocks…) Time to get = Block Size + Negligible block r - skip gap - switch track - once in a while, next cylinder
27
CS 245Notes 227 Rule ofRandom I/O: Expensive Thumb Sequential I/O: Much less Ex:1 KB Block »Random I/O: 10 ms. »Sequential I/O: 1 ms.
28
CS 245Notes 228 Cost for Writing similar to Reading …. unless we want to verify! need to add (full) rotation + Block size r
29
CS 245Notes 229 To Modify a Block?
30
CS 245Notes 230 To Modify a Block? To Modify Block: (a) Read Block (b) Modify in Memory (c) Write Block [(d) Verify?]
31
CS 245Notes 231 Block Address: Physical Device Cylinder # Surface # Sector
32
CS 245Notes 232 Complication: Bad Blocks Messy to handle May map via software to integer sequence 1 2.Map Actual Block Addresses. m Logical block address (LBA)
33
CS 245Notes 233 The Megatron 747 16 Surfaces, 3.5 Inch diameter – outer 1 inch used 2 16 = 65536 Tracks/surface 2 8 = 256 Sectors/track 2 12 = 4096 Bytes/sector 2 40 Bytes = 1 TB Disk An Example Example 13.1, 13.2
34
CS 245Notes 234 7200 RPM ~ 8.33ms/rotation Take 1ms for the head to start 1ms for every 4000 cylinders – the head move one track in 1.00025ms – 65536 tracks in about 17.38ms 10% overhead between blocks Megatron 747 Disk (cont.)
35
CS 245Notes 235 7200 RPM 120 revolutions / sec 1 rev. = 8.33 msec. One track:...
36
CS 245Notes 236 7200 RPM 120 revolutions / sec 1 rev. = 8.33 msec. One track:... Time over useful data:(8.33)(0.9)=7.5 ms. Time over gaps: (8.33)(0.1)= 0.83 ms. Transfer time 1 sector = 7.5/256=0.029 ms. Trans. time 1 sector+gap=8.3/256=0.032ms.
37
CS 245Notes 237 Timing for Megatron 747 Seek time: –MIN: 0 ms –MAX: 17.38 ms –AVE: 6.46 ms (why not 17.38/2=8.69 ms?)
38
CS 245Notes 238 0 32768 65536 0 32768 16384 Average travel distance as a function of initial head position
39
CS 245Notes 239 Timing for Megatron 747 Rotational delay: –MIN: 0 ms –MAX: 8.33 ms –AVE: 4.17 ms
40
CS 245Notes 240 T 1 = Time to read one random sector T 1 = seek + rotational delay + TT = 6.46 + 4.17 +.029 = 10.66 ms.
41
CS 245Notes 241 Suppose OS deals with 16 KB blocks T 4 = 6.46 + 4.17 + (.029) x 1 + (.032) X 3 = 10.76 ms [Compare to T 1 = 10.66 ms]... 1 2 3 4 1 block
42
CS 245Notes 242 T T = Time to read a full track (start at any block) T T = 6.46 + (0.032/2) + 8.33 * = 14.81 ms to get to first block * Actually, a bit less; do not have to read last gap.
43
CS 245Notes 243 Block Size Selection? Big Block Amortize I/O Cost Big Block Read in more useless stuff! and takes longer to read Unfortunately...
44
CS 245Notes 244 Trend As memory prices drop, blocks get bigger... Trend
45
CS 245Notes 245 Outline Hardware: Disks Access Times Example: Megatron 747 Optimizations Other Topics –Storage Costs –Using Secondary Storage –Disk Failures here
46
CS 245Notes 246 The I/O Model of Computation Access of a block is very expensive Random block access is the norm Reasonable to count only the disk I/O
47
CS 245Notes 247 External Sorting 10M records of 100 bytes = 1Gb file –4Kb block, each holding 40 records –Entire file takes 2 18 blocks 50Mb available main memory –Holds 12800 blocks ~ 1/20 th of file Requirement – sort by primary key field
48
CS 245Notes 248 External Sorting Main memory sorting algorithms may not be good candidates here Time complexity in this case is estimated based on disk I/O numbers, instead of CPU times
49
CS 245Notes 249 Merge Sort Take two sorted list and repeatedly choose the smaller of the “heads” of the lists Merge 1, 3, 5 with 2, 4, 6 = 1,2,3,4,5,6 A recursive algorithm: divide records into two parts; recursively mergesort the parts, and merge the resulting list Each record is R/W from disk log 2 n times
50
CS 245Notes 250 Two-Phase, Multiway Merge Sort (2PPMS) Phase 1 –Fill main memory with records –Sort with favorite main memory sorting algorithms –Write sorted list to disk –Repeat until all records have been put into one of the sorted lists
51
CS 245Notes 251 Phase 2
52
CS 245Notes 252 Two-Phase, Multiway Merge Sort (2PPMS) Phase 2 –Initially load input buffers with the first block of their respective sorted lists –Repeated run a competition among the first unchosen records of each of the buffered blocks –If an input block is exhausted, get the next block from the same file –If the output block is full, write it to disk
53
CS 245Notes 253 Analysis of Naïve Implementation 2 reads and 2 writes for each block Recall that the file is stored on 2 18 blocks Assume a random block access is 10ms Overall time for 2PPMS = 2 20 *.01s ~ 174 minutes ~ 3 hours
54
CS 245Notes 254 Optimizations (in controller or O.S.) Disk Scheduling Algorithms –e.g., elevator algorithm Cylinderization Track (or larger) Buffer Pre-fetch Arrays Mirrored Disks On Disk Cache
55
CS 245Notes 255 Cylinderization Place data by cylinder so that we can read block after block, with no seek time or/and rotational delay Assume records on 1000 consecutive cylinders = 1Mb data per cylinder = 256 blocks per cylinder Assume transfer time for one block is 0.05ms (actually 0.032 for 747)
56
CS 245Notes 256 Application to Phase 1 Load main memory from 50 consecutive cylinders (50M data=12800 blocks) –Time to transfer 12800 blocks = 0.05ms * 12800 = 0.64 sec –One random seek and 49 1-cylinder seeks cost 0.06 sec (actually 0.056 for 747) –Write sorted list onto 50 consecutive cylinders, so write time is also about 0.7 sec Overall cost = 20*2*0.7 = 28 sec
57
CS 245Notes 257 But in Phase 2… –Blocks are read from the 20 sublists in a data-dependent order (random disk I/O requests), so we cannot take much advantage of cylinderization.
58
CS 245Notes 258 Track Buffer If we have extra space for main-memory buffer, consider loading buffers in advance of need Similarly, keep output blocks buffered in main-memory until it is convenient to write them to disk
59
CS 245Notes 259 Application to Phase 2 Buffer one cylinder for each sublist and the output –Recall that one cylinder (1Mb) = 256 blocks –Time to read 1000 cylinder = 1000*(256*0.05ms+6.5ms) ~ 19.3 sec Similarly, time to write the output buffer = 19.3 sec
60
CS 245Notes 260 Double Buffering Problem: Have a File » Sequence of Blocks B1, B2 Have a Program » Process B1 » Process B2 » Process B3...
61
CS 245Notes 261 Single Buffer Solution (1) Read B1 Buffer (2) Process Data in Buffer (3) Read B2 Buffer (4) Process Data in Buffer...
62
CS 245Notes 262 SayP = time to process/block R = time to read in 1 block n = # blocks Single buffer time = n(P+R)
63
CS 245Notes 263 Double Buffering Memory: Disk: ABCDGEF process
64
CS 245Notes 264 Double Buffering Memory: Disk: ABCDGEF B done process A
65
CS 245Notes 265 Double Buffering Memory: Disk: ABCDGEF A C process B done
66
CS 245Notes 266 Double Buffering Memory: Disk: ABCDGEFA B done process A C B done
67
CS 245Notes 267 Say P R What is processing time? P = Processing time/block R = IO time/block n = # blocks
68
CS 245Notes 268 Say P R What is processing time? P = Processing time/block R = IO time/block n = # blocks Double buffering time = R + nP Single buffering time = n(R+P)
69
CS 245Notes 269 Multiple Disks If the sublists are stored on multiple (cheap) different disks, then we can read or write multiple blocks at once Better performance at the cost of $ Example 13.5
70
CS 245Notes 270 Mirror Disks If we use n disks and mirror the original disk with n-1 copies, then we can read n blocks at once Writing requires a write on each disk, thus not speed up Obvious minus: extra disk cost
71
CS 245Notes 271 Disk Scheduling Algorithms Concurrent processes generate (random) disk I/O requests Disk controller (or OS, DMBS) decides in which order these requests are serviced Elevator algorithm –Head moves from center to rim, then back again –Idea is to honor “nearby” requests whenever possible
72
CS 245Notes 272 Example 13.6 P572
73
CS 245Notes 273 Disk Failures Partial Total Intermittent Permanent 1. Intermittent failure 2. Media decay 3. Disk crash
74
CS 245Notes 274 Coping with Disk Failures Detection –e.g. Checksum (example 13.7 p576) Correction Redundancy
75
CS 245Notes 275 Disk Arrays RAIDs (various flavors) Block Striping Mirrored logically one disk
76
CS 245Notes 276 RAID Level 1 Mirror the disk Redundancy allows data to survive a disk crash
77
CS 245Notes 277 MTTF for Mirrored disks Assume –10% failure per year; MTTF = 10 years –3 hours to replace and restore failed disk If a fail to one disk occurs, then the other better not fail in the next three hours Example 13.7, p579
78
CS 245Notes 278 MTTF for Mirrored disks (cont.) Probability of any disk having a failure in one year = 1/10 (by def.) 3 hours = 1/2920 year Probability that the mirror disk fails = (1/10)*(1/2920)
79
CS 245Notes 279 MTTF for Mirrored disks (cont.) Probability that one of the two disks fail in one year = 1/10 + 1/10 = 1/5 Probability that the mirror disk fails during the restoring of the other crashed disk= (1/5)*(1/10)*(1/2920) MTTF for mirror disks = 5*10*2920=146000 years
80
CS 245Notes 280 RAID Level 4 Use n data disks and one redundant disk Each block of the redundant disk is the parity check for the corresponding blocks of the data disks
81
CS 245Notes 281 RAID Level 4 - Example
82
CS 245Notes 282 RAID Level 4 - Recovery
83
CS 245Notes 283 RAID Level 4 - Reading
84
CS 245Notes 284 RAID Level 4 - Writing Example 13.10, p581
85
CS 245Notes 285 RAID Level 5 The redundant disk is the bottleneck for writing in RAID 4 Solution: vary the redundant disks for different blocks –Example: n disks; block j is redundant on disk i if i = remainder of j /n Example 13.12, p583
86
CS 245Notes 286 At what level do we cope? Single Disk –e.g., Error Correcting Codes Disk Array Logical Physical
87
CS 245Notes 287 Operating System e.g., Stable Storage Logical BlockCopy A Copy B
88
CS 245Notes 288 Database System e.g., Log Current DBLast week’s DB
89
CS 245Notes 289 Hardware: Disks Access Times Example: Megatron 747 Optimizations Other Topics –Storage Costs –Using Secondary Storage –Disk Failures here Summary
90
CS 245Notes 290 Assignments (First Edition) 1.Ex. 2.2.1, p39 2.Ex. 2.3.1, p48, Ex. 2.3.4, p49 3.Ex. 2.4.1, p61, Ex. 2.4.5, p63, refer to tips on p58 4.Ex. 2.5.2, p67 5.Ex. 2.6.1, p77, Ex. 2.6.5, p78
91
CS 245Notes 291 Assignments (Second Edition) 1.Ex. 13.2.1, 13.2.2, p567 2.Ex. 13.3.1, 13.3.4, p575 3.Ex. 13.4.2, 13.4.3, 13.4.5, 13.4.7, p589 4.Ex. 13.5.1, 13.5.2, p593 5.Ex. 13.6.5, 13.6.7, 13.6.9, p603 6.Ex. 13.7.1, 13.7.3, p611 7.Ex. 13.8.1, p615
92
CS 245Notes 292 Assignments (Second Edition in Chinese ) 1.Ex. 2.2.1, 2.2.3, p17 2.Ex. 2.3.1, 2.3.4, p21 3.Ex. 2.4.2, 2.4.3, 2.4.5, 2.4.7, p29 4.Ex. 2.5.1, 2.5.2, p31 5.Ex. 2.6.5, 2.6.7, 2.6.9, p37 6.Ex. 2.7.1, 2.7.3, p41 7.Ex. 2.8.1, p44
93
CS 245Notes 193 Reading Materials 1.D.A. Patterson, G.A. Gibson, and R.H. Katz, “A case for redundant arrays of inexpensive disks,” Proc. ACM SIGMOD, 1988 2.Bianca Schroeder Garth A. Gibson, “Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you?” 5th USENIX Conference on File and Storage Technologies
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.