CS 245Notes 21 CS 245: Database System Principles Notes 02: Hardware Hector Garcia-Molina.

Slides:



Advertisements
Similar presentations
CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.
Advertisements

CS 245Notes 21 CS 245: Database System Principles Notes 02: Hardware Hector Garcia-Molina.
- Dr. Kalpakis CMSC Dr. Kalpakis 1 Outline In implementing DBMS we need to answer How should the system store and manage very large amounts of data?
CS 277 – Spring 2002Notes 21 CS 277: Database System Implementation Notes 02: Hardware Arthur Keller.
Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics.
The Memory Hierarchy fastest, perhaps 1Mb
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 2Gb Typically magnetic disks, magneto­ optical (erasable),
CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #6.
Disk Drivers May 10, 2000 Instructor: Gary Kimura.
CS 4432lecture #31 CS4432: Database Systems II Lecture #3 Professor Elke A. Rundensteiner.
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
1 CS143: Disks and Files. 2 System Architecture CPU Main Memory Disk Controller... Disk Word (1B – 64B) ~ x GB/sec Block (512B – 50KB) ~ x MB/sec System.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #5.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
12/3/2004EE 42 fall 2004 lecture 391 Lecture #39: Magnetic memory storage Last lecture: –Dynamic Ram –E 2 memory This lecture: –Future memory technologies.
CS 4432lecture #31 CS4432: Database Systems II Lecture #3 Using the Disk, and Disk Optimizations Professor Elke A. Rundensteiner.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
1 Introduction to Computers Day 4. 2 Storage device A functional unit into which data can be –placed –retained(stored) –retrieved(accessed)
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 5 – Storage Organization.
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
CS 346 – Chapter 10 Mass storage –Advantages? –Disk features –Disk scheduling –Disk formatting –Managing swap space –RAID.
Chapter 2 Data Storage How does a computer system store and manage very large volumes of data ?
Physical Storage and File Organization COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 29 Database Systems II Secondary Storage.
IT 344: Operating Systems Winter 2010 Module 13 Secondary Storage Chia-Chi Teng CTB 265.
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
Disks Chapter 5 Thursday, April 5, Today’s Schedule Input/Output – Disks (Chapter 5.4)  Magnetic vs. Optical Disks  RAID levels and functions.
1 Data Storage (Chap. 11) Based on Hector Garcia-Molina’s slides.
Programming for GCSE Topic 5.1: Memory and Storage T eaching L ondon C omputing William Marsh School of Electronic Engineering and Computer Science Queen.
CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
Disk Basics CS Introduction to Operating Systems.
CS 101 – Sept. 28 Main vs. secondary memory Examples of secondary storage –Disk (direct access) Various types Disk geometry –Flash memory (random access)
Section 13.2 – Secondary storage management (Former Student’s Note)
DBMS 2001Notes 2: Hardware1 Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
1 CSE232A: Database System Principles Hardware. Data + Indexes Database System Architecture Query ProcessingTransaction Management SQL query Parser Query.
COSC 6340: Disks 1 Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! » READ: transfer data from disk.
CPS216: Advanced Database Systems Notes 03: Data Access from Disks Shivnath Babu.
Data Storage COMP3017 Advanced Databases Dr Nicholas Gibbins
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
CS 245Notes 21 Database System Principles Notes 02: Hardware.
1 Query Processing Exercise Session 1. 2 The system (OS or DBMS) manages the buffer Disk B1B2B3 Bn … … Program’s private memory An application program.
CS422 Principles of Database Systems Disk Access Chengyu Sun California State University, Los Angeles.
Data Storage and Querying in Various Storage Devices.
File organization Secondary Storage Devices Lec#7 Presenter: Dr Emad Nabil.
Query Processing Exercise Session 1.
TYPES OF MEMORY.
Database Management Systems (CS 564)
Computer Science 210 Computer Organization
CS 554: Advanced Database System Notes 02: Hardware
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
CSE 451: Operating Systems Winter 2006 Module 13 Secondary Storage
CSE 451: Operating Systems Autumn 2003 Lecture 12 Secondary Storage
CSE 451: Operating Systems Secondary Storage
CSE 451: Operating Systems Winter 2003 Lecture 12 Secondary Storage
CSE 451: Operating Systems Spring 2005 Module 13 Secondary Storage
CSE 451: Operating Systems Autumn 2004 Secondary Storage
CSE 451: Operating Systems Winter 2004 Module 13 Secondary Storage
CPS216: Advanced Database Systems Notes 04: Data Access from Disks
CS 245: Database System Principles Notes 02: Hardware
Introduction to Operating Systems
Presentation transcript:

CS 245Notes 21 CS 245: Database System Principles Notes 02: Hardware Hector Garcia-Molina

CS 245Notes 22 Outline Hardware: Disks Access Times Example - Megatron 747 Optimizations Other Topics: –Storage costs –Using secondary storage –Disk failures

CS 245Notes 23 Hardware DBMS Data Storage

CS 245Notes 24 P MC Typical Computer Secondary Storage...

CS 245Notes 25 Processor Fast, slow, reduced instruction set, with cache, pipelined… Speed: 100  500  1000 MIPS Memory Fast, slow, non-volatile, read-only,… Access time:  sec. 1  s  1 ns

CS 245Notes 26 Secondary storage Many flavors: - Disk: Floppy (hard, soft) Removable Packs Winchester SSD disks Optical, CD-ROM… Arrays - TapeReel, cartridge Robots

CS 245Notes 27 Focus on: “Typical Disk” Terms: Platter, Head, Actuator Cylinder, Track Sector (physical), Block (logical), Gap …

CS 245Notes 28 Top View

CS 245Notes 29 Disk Access Time block x in memory ? I want block X

CS 245Notes 210 Time = Seek Time + Rotational Delay + Transfer Time + Other

CS 245Notes 211 Seek Time 3 or 5x x 1N Cylinders Traveled Time

CS 245Notes 212 Average Random Seek Time   SEEKTIME (i  j) S = N(N-1) N N i=1 j=1 j  i

CS 245Notes 213 Average Random Seek Time   SEEKTIME (i  j) S = N(N-1) N N i=1 j=1 j  i “Typical” S: 10 ms  40 ms

Typical Seek Time Ranges from –4ms for high end drives –15ms for mobile devices Typical SSD: ranges from –0.08ms –0.16ms Source: Wikipedia, "Hard disk drive performance characteristics" CS 245Notes 214

CS 245Notes 215 Rotational Delay Head Here Block I Want

CS 245Notes 216 Average Rotational Delay R = 1/2 revolution HDD Spindle [rpm] Average rotational latency [ms] 4, , , , , Typical HDD figures Source: Wikipedia, "Hard disk drive performance characteristics" R=0 for SSDs

CS 245Notes 217 Transfer Rate: t value of t ranges from –up to 1000 Mbit/sec –432 Mbit/sec 12x Blu-Ray disk –1.23 Mbits/sec 1x CD –for SSDs, limited by interface e.g., SATA 3000 Mbit/s transfer time: block size t

CS 245Notes 218 Other Delays CPU time to issue I/O Contention for controller Contention for bus, memory

CS 245Notes 219 Other Delays CPU time to issue I/O Contention for controller Contention for bus, memory “Typical” Value: 0

CS 245Notes 220 So far: Random Block Access What about: Reading “Next” block?

CS 245Notes 221 If we do things right (e.g., Double Buffer, Stagger Blocks…) Time to get = Block Size + Negligible block t - skip gap - switch track - once in a while, next cylinder

CS 245Notes 222 Rule ofRandom I/O: Expensive Thumb Sequential I/O: Much less Ex:1 KB Block »Random I/O:  10 ms. »Sequential I/O:  <1 ms.

CS 245Notes 223 Cost for Writing similar to Reading …. unless we want to verify! need to add (full) rotation + Block size t

CS 245Notes 224 To Modify a Block?

CS 245Notes 225 To Modify a Block? To Modify Block: (a) Read Block (b) Modify in Memory (c) Write Block [(d) Verify?]

CS 245Notes 226 Block Address: Physical Device Cylinder # Surface # Sector

CS 245Notes 227 Complication: Bad Blocks Messy to handle May map via software to integer sequence 1 2.Map Actual Block Addresses. m

CS 245Notes in diameter 3600 RPM 1 surface 16 MB usable capacity (16 X 2 20 ) 128 cylinders seek time: average = 25 ms. adjacent cyl = 5 ms. An Example Megatron 747 Disk (old)

Kibibytes 1 kibibyte = 2 10 bytes = 1024 bytes. CS 245Notes 229 from Wikipedia

CS 245Notes KB blocks = sectors 10% overhead between blocks capacity = 16 MB = (2 20 )16 = 2 24 # cylinders = 128 = 2 7 bytes/cyl = 2 24 /2 7 = 2 17 = 128 KB blocks/cyl = 128 KB / 1 KB = 128

CS 245Notes RPM 60 revolutions / sec 1 rev. = msec. One track:...

CS 245Notes RPM 60 revolutions / sec 1 rev. = msec. One track:... Time over useful data:(16.66)(0.9)=14.99 ms. Time over gaps: (16.66)(0.1) = 1.66 ms. Transfer time 1 block = 14.99/128=0.117 ms. Trans. time 1 block+gap=16.66/128=0.13ms.

CS 245Notes 233 Burst Bandwith 1 KB in ms. BB = 1/0.117 = 8.54 KB/ms. or BB =8.54KB/ms x 1000 ms/1sec x 1MB/1024KB = 8540/1024 = 8.33 MB/sec

CS 245Notes 234 Sustained bandwith (over track) 128 KB in ms. SB = 128/16.66 = 7.68 KB/ms or SB = 7.68 x 1000/1024 = 7.50 MB/sec.

CS 245Notes 235 T 1 = Time to read one random block T 1 = seek + rotational delay + TT = 25 + (16.66/2) = ms.

CS 245Notes 236 Suppose OS deals with 4 KB blocks T 4 = 25 + (16.66/2) + (.117) x 1 + (.130) X 3 = ms [Compare to T 1 = ms] block

CS 245Notes 237 T T = Time to read a full track (start at any block) T T = 25 + (0.130/2) * = ms to get to first block * Actually, a bit less; do not have to read last gap.

CS 245Notes 238 The NEW Megatron 747 (Example 2.1 book) 8 Surfaces, 3.5 Inch diameter – outer 1 inch used 2 13 = 8192 Tracks/surface 256 Sectors/track 2 9 = 512 Bytes/sector

CS 245Notes GB Disk If all tracks have 256 sectors Outermost density: 100,000 bits/inch Inner density: 250,000 bits/inch 1.

CS 245Notes 240 Outer third of tracks: 320 sectors Middle third of tracks: 256 Inner third of tracks: 192 Density: 114,000  182,000 bits/inch

CS 245Notes 241 Timing for new Megatron 747 (Ex 2.3) Time to read 4096-byte block: –MIN: 0.5 ms –MAX: 33.5 ms –AVE: 14.8 ms

CS 245Notes 242 Outline Hardware: Disks Access Times Example: Megatron 747 Optimizations Other Topics –Storage Costs –Using Secondary Storage –Disk Failures here

CS 245Notes 243 Optimizations (in controller or O.S.) Disk Scheduling Algorithms –e.g., elevator algorithm Track (or larger) Buffer Pre-fetch Arrays Mirrored Disks On Disk Cache

CS 245Notes 244 Double Buffering Problem: Have a File » Sequence of Blocks B1, B2 Have a Program » Process B1 » Process B2 » Process B3...

CS 245Notes 245 Single Buffer Solution (1) Read B1  Buffer (2) Process Data in Buffer (3) Read B2  Buffer (4) Process Data in Buffer...

CS 245Notes 246 SayP = time to process/block R = time to read in 1 block n = # blocks Single buffer time = n(P+R)

CS 245Notes 247 Double Buffering Memory: Disk: ABCDGEF process

CS 245Notes 248 Double Buffering Memory: Disk: ABCDGEF B done process A

CS 245Notes 249 Double Buffering Memory: Disk: ABCDGEF A C process B done

CS 245Notes 250 Double Buffering Memory: Disk: ABCDGEFA B done process A C B done

CS 245Notes 251 Say P  R What is processing time? P = Processing time/block R = IO time/block n = # blocks

CS 245Notes 252 Say P  R What is processing time? P = Processing time/block R = IO time/block n = # blocks Double buffering time = R + nP Single buffering time = n(R+P)

CS 245Notes 253 Disk Arrays RAIDs (various flavors) Block Striping Mirrored logically one disk

CS 245Notes 254 On Disk Cache P MC... cache

SSDs storage is block oriented (not random access) lots of errors –e.g., write of one block may cause an error of nearby block –e.g., a block can only be written a limited number of times logic masks most issues –e.g., using log structure sequential writes improve throughput (less bookkeeping) –latency for seq. writes = random writes –performance seq. reads = random reads CS 245Notes 255 SSD on-device logic interface HDD or other Source: Reza Sadri, STEC ("the SSD Company")

Add slides on Flash disk CS 245Notes 256

CS 245Notes 257 Block Size Selection? Big Block  Amortize I/O Cost Big Block  Read in more useless stuff! and takes longer to read Unfortunately...

CS 245Notes 258 Trend As memory prices drop, blocks get bigger... Trend

CS 245Notes 259 Storage Cost access time (sec) cache electronic main electronic secondary magnetic optical disks online tape nearline tape & optical disks offline tape typical capacity (bytes) from Gray & Reuter

CS 245Notes 260 Storage Cost access time (sec) cache electronic main electronic secondary magnetic optical disks online tape nearline tape & optical disks offline tape dollars/MB from Gray & Reuter

CS 245Notes 261 Using secondary storage effectively (Sec. 2.3) Example: Sorting data on disk Conclusion: –I/O costs dominate –Design algorithms to reduce I/O Also: How big should blocks be?

CS 245Notes 262 Five Minute Rule THE 5 MINUTE RULE FOR TRADING MEMORY FOR DISC ACCESSES Jim Gray & Franco Putzolu May 1985 The Five Minute Rule, Ten Years Later Goetz Graefe & Jim Gray December 1997

CS 245Notes 263 Five Minute Rule Say a page is accessed every X seconds CD = cost if we keep that page on disk –$D = cost of disk unit –I = numbers IOs that unit can perform –In X seconds, unit can do XI IOs –So CD = $D / XI

CS 245Notes 264 Five Minute Rule Say a page is accessed every X seconds CM = cost if we keep that page on RAM –$M = cost of 1 MB of RAM –P = numbers of pages in 1 MB RAM –So CM = $M / P

CS 245Notes 265 Five Minute Rule Say a page is accessed every X seconds If CD is smaller than CM, –keep page on disk –else keep in memory Break even point when CD = CM, or $D P I $M X =

CS 245Notes 266 Using ‘97 Numbers P = 128 pages/MB (8KB pages) I = 64 accesses/sec/disk $D = 2000 dollars/disk (9GB + controller) $M = 15 dollars/MB of DRAM X = 266 seconds (about 5 minutes) (did not change much from 85 to 97)

CS 245Notes 267 Disk Failures (Sec 2.5) Partial  Total Intermittent  Permanent

CS 245Notes 268 Coping with Disk Failures Detection –e.g. Checksum Correction  Redundancy

CS 245Notes 269 At what level do we cope? Single Disk –e.g., Error Correcting Codes Disk Array Logical Physical

CS 245Notes 270 Operating System e.g., Stable Storage Logical BlockCopy A Copy B

CS 245Notes 271 Database System e.g., Log Current DBLast week’s DB

CS 245Notes 272 Summary Secondary storage, mainly disks I/O times I/Os should be avoided, especially random ones….. Summary

CS 245Notes 273 Outline Hardware: Disks Access Times Example: Megatron 747 Optimizations Other Topics –Storage Costs –Using Secondary Storage –Disk Failures here