Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics.

Slides:



Advertisements
Similar presentations
CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.
Advertisements

Storing Data: Disk Organization and I/O
Storing Data: Disks and Files
Storing Data: Disks and Files: Chapter 9
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7.
Buffer management.
CS 277 – Spring 2002Notes 21 CS 277: Database System Implementation Notes 02: Hardware Arthur Keller.
CS 245Notes 21 CS 245: Database System Principles Notes 02: Hardware Hector Garcia-Molina.
CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.
Advance Database System
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
Disk Storage, Basic File Structures, and Hashing
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
1 Disk Storage, Basic File Structures, and Hashing.
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 9.
Lecture 11: DMBS Internals
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7.
Physical Storage Susan B. Davidson University of Pennsylvania CIS330 – Database Management Systems November 20, 2007.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7 “ Yea, from the table of my memory I ’ ll wipe away.
1 Storing Data: Disks and Files Chapter 9. 2 Disks and Files  DBMS stores information on (“hard”) disks.  This has major implications for DBMS design!
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
“Yea, from the table of my memory I’ll wipe away all trivial fond records.” -- Shakespeare, Hamlet.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
1 Data Storage (Chap. 11) Based on Hector Garcia-Molina’s slides.
Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.
Exam I Grades uMax: 96, Min: 37 uMean/Median:66, Std: 18 uDistribution: w>= 90 : 6 w>= 80 : 12 w>= 70 : 9 w>= 60 : 9 w>= 50 : 7 w>= 40 : 11 w>= 30 : 5.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Content based on Chapter 9 Database Management Systems, (3.
DBMS 2001Notes 2: Hardware1 Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
1 CSE232A: Database System Principles Hardware. Data + Indexes Database System Architecture Query ProcessingTransaction Management SQL query Parser Query.
COSC 6340: Disks 1 Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! » READ: transfer data from disk.
CPS216: Advanced Database Systems Notes 03: Data Access from Disks Shivnath Babu.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Storing Data: Disks and Files Chapter 7 Jianping Fan Dept of Computer Science UNC-Charlotte.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
1 CS122A: Introduction to Data Management Lecture #14: Indexing Instructor: Chen Li.
1 Storing Data: Disks and Files Chapter 9. 2 Objectives  Memory hierarchy in computer systems  Characteristics of disks and tapes  RAID storage systems.
The very Essentials of Disk and Buffer Management.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Disks and Files.
File Organization Record Storage and Primary File Organization
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
CS522 Advanced database Systems
Storing Data: Disks and Files
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
CS 554: Advanced Database System Notes 02: Hardware
CS222/CS122C: Principles of Data Management Lecture #3 Heap Files, Page Formats, Buffer Manager Instructor: Chen Li.
Lecture 11: DMBS Internals
Storing Data: Disks and Files
Lecture 9: Data Storage and IO Models
Disk Storage, Basic File Structures, and Buffer Management
Basics Storing Data on Disks and Files
CS222P: Principles of Data Management Lecture #3 Buffer Manager, PAX
Secondary Storage Management Hank Levy
Storing Data: Disks and Files
CPS216: Advanced Database Systems Notes 04: Data Access from Disks
CS 245: Database System Principles Notes 02: Hardware
Presentation transcript:

Data Storage John Ortiz

Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics  DBMS must bring data from disk to main memory buffer for processing & write data from memory buffer to disk for storage  Low level DBMS software must effectively manage disk and buffer space  How disks are accessed?  How to manage a buffer?

Lecture 17Data Storage3 Storage Hierarchy  Architecture of a typical computer  Processor: fast, slow, RISC, cache, pipelined. Typical speed: 100  500  1000 MIPS  Main Memory: fast, slow, volatile, read-only Access time: 1  s – 1ns P M Secondary Storage C

Lecture 17Data Storage4 Secondary Storage Devices  Disk:  Floppy (hard, soft)  Removable Packs  Winchester  Ram disks  Optical, CDROM, DVD ROM  Disk Arrays  Tape  Real, cartridge, robots

Lecture 17Data Storage5 Disks  DBMS stores information on (“hard”) disks.  This has major implications for DBMS design!  Data must be transferred from disk to main memory (for read) & vice versa (for write)  Transfer unit: block (= 1 or more sectors)  Why not store everything in main memory?  Costs too much. $147 will buy you 128MB of RAM or 40GB of disk (Oct. 2000).  Main memory is volatile. We want data to be saved between runs.

Lecture 17Data Storage6 Components of A Disk Platters Spindle Disk head Arm movement Arm assembly Tracks Sector Top view

Lecture 17Data Storage7 Components of A Disk (cont.)  Terms: Platter, Head, Actuator, Cylinder, Track, Sector, Block (Page), Gap, Cluster  Common measurement  Diameter: 1 inch  15 inches  Cylinders:100   Surfaces:1 (CDs)  10  Tracks/cyl: 2 (floppies)  30  Sector Size:512B  64K  Capacity:360 KB (old floppy)  200 GB

Lecture 17Data Storage8 Components of A Disk (cont.)  Division of tracks into sectors is hard coded on disk surface  A sector is subdivided into one or more blocks  Formatting sets the block size  Interblock gaps make the difference between formatted and unformatted capacity

Lecture 17Data Storage9 block x in memory ? I want block X Disk Access Time Access Time = Seek Time + Rotational Delay + Transfer Time + Other

Lecture 17Data Storage10 Seek Time  Time to move arms to position disk head on a given track 3 or 5x x 1N Cylinders Traveled Time

Lecture 17Data Storage11 Average Random Seek Time  Typical S: 8 ms  40 ms  N = number of tracks/surface (= # cylinders)

Lecture 17Data Storage12 Rotational Delay  Time to wait for block to rotate under head also called rotational latency  Average R = 1/2 revolution  Typically R = 4.2 ms (7200 RPM)  Complication: may need to wait for track start Head Here Start Track Block I Want

Lecture 17Data Storage13 Transfer Time  Time for actually moving data to/from disk surface  Typical transfer rate tr: 16  100 MB/s  Block Transfer Time btt = block size / tr Typically, 4 KB: about 1 ms  Other Delay  CPU time to issue I/O  Contention for controller  Contention for bus, memory Typical value: 0

Lecture 17Data Storage14 Formulas  tr = track size * rpm(transfer rate)  rd = ½ * 1/rpm(rotational delay)  btt = B/tr(block transfer time)  B = Block Size  btr = ( B/(B + G) * tr)(bulk transfer rate)  G = interblock gap size  Read ‘K’ consecutive blocks:  s + rd + (B/btr) * K  Read ‘K’ random blocks  (s + rd + (B/btr) ) * K

Lecture 17Data Storage15 Formulas  Log base N of X = log X/log N  If no base indicated, assume base 10  Log = log 500/log 2  bfr = floor(B/record size)  File Size (in blocks) = ceiling (# records/bfr) (unspanned)  = ceiling(# bytes in file/B) (spanned)

Lecture 17Data Storage16 Units!!!  The units are just as important as the value  Use the units to check your result  Example: tr = track size * rpm  Given track size = bytes and 3600 rpm, what is the transfer rate?  * 3600 = 117,964,800 which SEEMS very unreasonable!  However, it is 117,964,800 bytes per minute!  1,966,080 bytes per second, = MB/sec  Pretty Slow for a modern disk!

Lecture 17Data Storage17 How many bytes?  1 kilobyte = 1024 bytes (most of the time!)  1 megabyte = = 1,048,576 bytes  1 gigabyte = bytes  1 terabyte = bytes  Disk manufacturers often use 1000 bytes as a kilobyte for marketing purposes!  My 200 GB disk drive, is actually 186 GB, though it’s 200 billion bytes  RAM marketing has so far not followed suit.

Lecture 17Data Storage18 What about Reading Next Block?  If we do things right (double buffer, stagger blocks …) Time = T + Negligible  Skip gap, change track, change cylinder  Rule of Thumb:  Random I/O: Expensive  Sequential I/O: Much less EX: For 1 KB block Random I/O : about 20ms Sequential I/O: about 1ms

Lecture 17Data Storage19 What About Writing and Updating?  Cost of writing is similar to reading, unless we need to verify  Add full rotation + T  Cost of modifying a block: (a) Read Block (b) Modify in Memory (c) Write Block [(d) Verify?]  Block Address: Device, Cylinder #, Surface #, Sector

Lecture 17Data Storage20 Disk Example: IBM Ultrastar 36LP  Formatted capacity: 36.9 GB  Sector size: 512 to 528 variable (2-byte inc)  Platters: 10  Max. recording density: BPI  Track density: TPI (per surface)  Rotation speed: 7200 RPM  Average rotational delay: 4.17 ms  Sustained data rate: MB  Seek time: average 6.8ms, next track 0.6ms

Lecture 17Data Storage21 Arranging Pages on Disk  Blocks in a file should be arranged sequentially on disk (by next block), to minimize seek and rotational delay.  The concept of next block : 1.Next block on the same track 2.1 st block on next track of the same cylinder 3.1 st block on 1 st track of the next cylinder  For a sequential scan, pre-fetching several pages at a time is a big win!

Lecture 17Data Storage22 Disk Space Management  Lowest layer of DBMS software manages space on disk.  Higher levels call upon this layer to:  allocate/de-allocate a page  read/write a page  One such “higher level” is the buffer manager, which receives a request to bring a page into memory and then, if needed, requests the disk space layer to read the page into the buffer pool.

Lecture 17Data Storage23 Buffer Management in a DBMS  Data must be in RAM for DBMS to operate on it!  Table of pairs is maintained. DB MAIN MEMORY DISK disk page free frame Page Requests from Higher Levels BUFFER POOL choice of frame dictated by replacement policy

Lecture 17Data Storage24 When a Page is Requested...  If requested page is not in pool:  Choose a frame for replacement  If the frame is dirty, write it to disk  Read requested page into chosen frame  Else pin the frame (so it can not be replaced)  Return its address.  If requests can be predicted (e.g., sequential scans) pages can be pre-fetched several pages at a time!

Lecture 17Data Storage25 Buffer Replacement Policy  A frame is chosen for replacement by a replacement policy:  Least-recently-used (LRU), Clock, MRU, etc.  Policy can have big impact on # of I/O’s; depends on the access pattern.  Sequential flooding: Nasty situation caused by LRU + repeated sequential scans.  # buffer frames < # pages in file means each page request causes an I/O. MRU much better in this situation (but not in all situations, of course).

Lecture 17Data Storage26 Representing Records  A record is a collection of related data items (called fields)  Fields of an Employee record: id, name, salary, date-of-hire,...  Main choices:  Format: fixed vs variable  Length: fixed vs variable  A schema (of record) specifies # of fields, type of each field, order in record, meaning of each field

Lecture 17Data Storage27 Example: Fixed Format and Length  Employee record (1) Eid, 2 byte integer (2) Ename, 10 char. Schema (3) Dept, 2 byte code 55 s m i t h j o n e s 01 Records

Lecture 17Data Storage28 File of Records  Logically, a file is a sequence of records  Physically, a file is a set of blocks R1R2R3R4 Rn assume fixed length blocks assume a single file (for now) a file

Lecture 17Data Storage29 Packing Records into Blocks  Unspanned: records must be within one block  Spanned: A record may be split between blocks R1R2R3R4R5 Block 1Block 2 … R1R2 R3 (a) R3 (b) R6R5R4 R7 (a) Block 1Block 2 …  Unspanned: Simple, but may waste space  Spanned: Necessary if record size > block size

Lecture 17Data Storage30 File Size: Spanned vs Unspanned  Blocking Factor (bf F ): number of records per block.  Ex: Block size B = 2048 bytes Number of records R = 200,000 Record size = 600 bytes (fixed length) Blocking factor =  2048/600  = 3 RPB (unspanned, 248 bytes unused) File size =  /3  = blocks (unspan.) =   600/2048  = (span.)

Lecture 17Data Storage31 Summary  Disks provide cheap, non-volatile storage.  Random access, but cost depends on location of page on disk; important to arrange data sequentially to minimize seek and rotation delays.  Buffer manager brings pages into RAM.  Page stays in RAM until released by requestor(s).

Lecture 17Data Storage32 Summary (cont.)  Written to disk when frame chosen for replacement (which is after all requestors release the page), or earlier.  Choice of frame to replace based on replacement policy.  File layer keeps track of pages in a file, and supports abstraction of a collection of records.

Lecture 17Data Storage33 Look Ahead  Next topic: Hashing and Indexing  Read textbook:  Chapter