Lecture 5: Wrap-up RAID Flash memory Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Slides:



Advertisements
Similar presentations
Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.
Advertisements

1 CSIS 7102 Spring 2004 Lecture 9: Recovery (approaches) Dr. King-Ip Lin.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting CS634 Lecture 10, Mar 5, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Chapter 11: File System Implementation
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
Lecture 3: A Case for RAID (Part 1) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Efficient Storage and Retrieval of Data
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting 198:541. Why Sort?  A classic problem in computer science!  Data requested in sorted order e.g., find students in increasing gpa order.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
DISKS IS421. DISK  A disk consists of Read/write head, and arm  A platter is divided into Tracks and sector  The R/W heads can R/W at the same time.
Lecture 11: DMBS Internals
Chapter 10 Storage and File Structure Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
Lecture 16: Storage and I/O EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
Sorting.
Recovery System By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
Chapter 16 Recovery Yonsei University 1 st Semester, 2015 Sanghyun Park.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 17: Recovery System.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
CS 540 Database Management Systems
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapters 13: 13.1—13.5.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
Jeffrey Ellak CS 147. Topics What is memory hierarchy? What are the different types of memory? What is in charge of accessing memory?
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
Database Recovery Zheng (Godric) Gu. Transaction Concept Storage Structure Failure Classification Log-Based Recovery Deferred Database Modification Immediate.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
1 Query Processing Exercise Session 1. 2 The system (OS or DBMS) manages the buffer Disk B1B2B3 Bn … … Program’s private memory An application program.
Data Storage and Querying in Various Storage Devices.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Secondary Storage Data Retrieval.
Oracle SQL*Loader
Lecture 11: DMBS Internals
CRASH RECOVERY (CHAPTERS 14, 16) (Joint Collaboration with Prof
ICOM 6005 – Database Management Systems Design
Overview Continuation from Monday (File system implementation)
Overview: File system implementation (cont)
Recovery System.
Chapter 14: File-System Implementation
Page Cache and Page Writeback
Presentation transcript:

Lecture 5: Wrap-up RAID Flash memory Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California

Mental Block from Last Lecture

Question! Level 4Level 5 Why?

Last Lecture’s Discussion With RAID 4, why is the performance of small writes D/2G?   To write block b:   Read the old Block b and old parity block ECC1,   Compute the new parity using the old Block b, new Block b, and the old parity: New parity = (old block xor new block) xor old parity ECC1   Write new block b and new parity block. Disk 1Disk 2Disk 3Disk 4Parity Block aBlock bBlock cBlock dECC 1

Small Write with RAID 4 Note that a write to a block on Disk 3 cannot proceed in parallel because the Parity disk is busy! One disk would perform the write of block b without reading it first. With 1 group, level 4 RAID performs ½ the number of small write events when compared with 1 disk. With nG groups, level 4 RAID performs nG/2 number of events when compared with 1 disk. nG = D/G Disk 1Disk 2Disk 3Disk 4Parity Read Old bRead Old ECC1 New parity ECC1 = (old b xor new b) xor old ECC1 Write New bWrite New ECC1

RAID 4 Two groups may perform write operations independently.

RAID 5: Resolve the Bottleneck With Level 5 RAID, different disks may perform different small write operations simultaneously. Disk 1Disk 2Disk 3Disk 4 Block aBlock bBlock cBlock dECC 1 Block eBlock fBlock g Block h ECC 2 Disk 5 Block iBlock jECC 3Block kBlock l Block mECC 4Block n Block p Block o ECC 5Block qBlock r Block t Block s

RAID 5 Example: Write block a and f simultaneously and initiate a part of write for block j. All disks are busy reading data and parity blocks. A write requires 4 I/Os. Disk 1Disk 2Disk 3Disk 4 Read aRead fRead ECC 1 Write aWrite f Read ECC 2 Disk 5 Read ECC 3 Read n Write ECC 1 Write ECC 2 Compute parity blocks for a and f

RAID 5 When compared with one disk, Level 5 RAID performs 4 times as many I/Os. To compare with one disk, divide the total number of operations supported by the data disks by 4. Total number of small writes for 1 group   D/4 + ¼ (1 check disk) With nG groups, there are nG check disks.   D/4 + nG*C/4 (nG = D/G)   D/4 + (D/4 * C/G)

Level 5 RAID D/4 + (D/4 * C/G)

A Comment Definitions may appear somewhat arbitrary and far-fetched! Definitions are applied consistently.

Flash Memory Goetz Graefe. The Five-Minute Rule Twenty Years Later, and How Flash Memory Changes the Rules. DaMoN 2007.

Alternative Storage Mediums Magnetic disk drive Flash memory Dynamic Random Access Memory (DRAM)

Flash Memory [Kim et al. 02] Nonvolatile storage media: stored data is sustained after power is turned off. Supports random access to data. Comes in two types:   NOR: can read/erase/write 1 byte individually.   NAND: optimized for mass storage and supports read/erase/write of a block.   A block consists of multiple pages. A page is typically 512 bytes. A block is somewhere between 4KB to 128KB.   Write performance for flash memory is an order-of-magnitude higher when compared with NOR.

Flash Storage Comes with different interfaces:   UFD: USB Flash Disk. Throughput is price dependent; typically quoted at:   Read throughput of 8 to 16 MBps   Write throughput of 6 to 12 MBps

Flash Storage Comes with different interfaces:   UFD: USB Flash Disk. Throughput is price dependent; typically quoted at:   Read throughput of 8 to 16 MBps.   Write throughput of 6 to 12 MBps.   Flash memory card:   Accessed as memory.   Typically byte-accessible.   Flash disk:   Accessed through a disk interface.   Block-accessible.   Focus of this paper is on flash disk.

Flash Memory Reads are faster than writes because a write of a page (512 bytes) requires a block to be erased. Sequential writes are fast because the interface has a cache and manages write operations intelligently. Random write operations are slow because of the erase operations and a small cache.

Flash: Sequential Reads/Write [Gray’08] Read/Write performance is sensitive to request size. Read performance is significantly better than write performance. Throughput plateaus at 53 MBps for reads and 35 MBps for writes.   Note the higher throughput for flash disk when compared with UFD.

Flash: Random Read/Write [ Gray’08] Read performance is comparable to sequential reads. Write performance is very poor, 216 KBps with 8KB writes (27 requests per second). Poor performance of random writes is being addressed – might have been addressed already! (A fast moving field!)

Disk & Flash [Gray 08] Disk provides a higher bandwidth with sequential reads/writes. With random reads, flash blows disk away!   Why? When one considers power consumption, IOPS/watt of flash is very impressive!

Flash Reliability of flash suffers after 100,000 to 1,000,000 erase-and-write cycles.   Less reliable (a lower MTTF) than magnetic disk assuming a write intensive workload.

Characteristics RAM is faster than the other two storage mediums. Flash disk consumes less power than disk because there are no moving parts.

Disk and DRAM Question: When does it make economic sense to make a piece of data resident in DRAM and when does it make sense to have it resident in disk where it must be moved to main memory prior to reading or writing it?   Assumptions:   Fix sized disk pages, say 4 Kilobyte.   A 250 GB disk costs $80 and supports 83 page reads per second. So the price per page read per second is about $1.   1 MB of DRAM holds 256 disk pages and costs $0.047 per megabyte. So, the cost of a disk page occupying DRAM is $   If making a page memory resident saves 1 page a/s then it saves $1. A good deal. If it saves.1 page a/s then it saves 10 cents, still a good deal.   Break even point is an access every $1/ which is roughly 90 minutes.   In 1987, this break even point was 2 minutes.

Disk and DRAM: Moral of the story In 2007, pages referenced every 90 minutes should be DRAM resident. In 1987, pages referenced every 5 minutes should be DRAM resident. Key observation:   Focus is on memory space and disk bandwidth!   Is something missing from this analysis?

Assumed Page Size Matters A larger page size enhances the throughput of a magnetic disk drive.   How?

Assumed Page Size Matters A larger page size enhances the throughput of a magnetic disk drive.   With small page sizes (1 KB), seek and rotational latency result in a lower disk throughput, and a higher cost per a/s.

Flash and DRAM Question: When does it make economic sense to make a piece of data resident in DRAM and when does it make sense to have it resident in Flash where it must be moved to main memory prior to reading or writing it?   Assumptions:   Fix sized disk pages, say 4 Kilobyte.   A 32 GB Flash disk costs $999 and supports 6200 page reads per second. So the price per page read per second is about $0.16.   1 MB of DRAM holds 256 disk pages and costs $0.047 per megabyte. So, the cost of a disk page occupying DRAM is $   If making a page memory resident saves 1 page a/s then it saves $0.16. A good deal. If it saves.1 page a/s then it saves $0.016, still a good deal.   Break even point is an access every $0.16/ which is roughly 15 minutes.   In the price of flash drops to $400, break even point is 6 minutes.

Flash and DRAM: Moral of the story With 2007 price of $999, pages referenced every 15 minutes should be DRAM resident. With anticipated price of $400, pages referenced every 6 minutes should be DRAM resident. Focus is on DRAM space, and Flash bandwidth! Is something missing from this analysis?

What is Missing? Page size matters (same discussion as DRAM) With flash disk, throughput of reads and writes is asymmetrical – even with sequential reads and writes.   A 32 GB Flash disk costs $999 and supports 30 page writes per second. So the price per page write per second is about $33. (For reads, it is 16 cents.)

Disk and Flash Memory With Flash memory, the available flash is accessible in the same manner as DRAM.   The read and write performance of Flash memory is different than DRAM. One may repeat the analysis to establish a Δ-Minute rule for magnetic disk and flash memory, see discussion of Table 3.

Possible Software Architectures? Extended buffer pool: Flash is an extension of DRAM. Extended disk: Flash is an extension of disk. Treat DRAM, Flash, and magnetic disk independently using a new cache management technique.   Trojan storage manager. This paper focuses on the first two possibilities using LRU to manage their content.

Architecture Choice Choice of an architecture depends on pattern of usage. This study claims:   File systems and operating systems prefer “extended buffer pool” architecture.   DBMS prefer “extended disk architecture” Why?

Usage Pattern File system/OS:   Pointer pages maintain data pages or runs of contiguous pages.   Movement of a page requires writing of the page and the entire pointer page.   During recovery, checks the entire storage.   Many random I/Os!   Extended buffer pool architecture. DBMS, assuming logging with immediate database modification:   Data is stored in B-tree indexes.   Writing a page requires appending a few bytes in the log file.   The log file is flushed using large sequential write operations.   During recovery plays log records sequentially.   Large I/Os!   Extended disk architecture.

LOG-BASED RECOVERY A=1000 B=10 (1) Read(A) (2) A=A-50 A=1000 B=10 A=1000 B=10 (3) Write(A) (4) Read(B) (5) B=B+50 (6) Write(B) A=1000 A=950 A=950 B=50 A=950 (7) Commit

Checkpointing Motivation: In the presence of failures, the system consults with the log file to determine which transaction should be redone and which should be undone. There are two major difficulties: 1) 1) the search process is time consuming 2) 2) most transactions are okay as their updates have made it to the database (the system performs wasteful work by searching through and redoing these transactions). Approach: perform a checkpoint that requires the following operations: output all log records from main memory to the disk output all modified (dirty) pages in the buffer pool to the disk output a log record onto the log file on disk

Checkpointing (Cont…) Dirty pages and log records stored on flash storage persist during failure. No need to flush them to disk drive. If DBMS assumes extended buffer pool architecture, the check-point operation will flush data to disk un-necessarily! Motivation for extended disk architecture with xact-processing applications!

Checkpointing Unsure about the following argument:

B + -tree is a multi-level tree structured directory A node is a page. A larger node has a higher fan-out, reducing the depth of the tree. Utility of a node is measured by the logarithm of records in a node. A larger node has a higher utility. B + -TREE ….... Root Internal Nodes Leaf Nodes Data File

B+-Tree Using Flash-Disk hardware combination, page size of 256/512 maximizes utility/time value. Note access time does not change as a function of page size. 3 rd column is a log function of the 2 nd column.

B+-tree Using DRAM-Flash combination, a small page size (2 KB) provides the highest utility.

Summary With an extended-disk architecture that requires a page to migrate from DRAM to Flash and then to disk, different B+- tree page sizes should be used with Flash and Disk. SB-trees [O’Neil 1992] supports the concept of extents and different page sizes.

EXTERNAL SORTING Sort a 20 page relation assuming a five page buffer pool. Merge sort

EXTERNAL SORTING Use flash to store intermediate runs:   Large sequential reads/writes to flash memory.   More energy efficient! Merge sort

References Gray and Fitzgerald. Flash Disk Opportunity for Server Applications. ACM Queue, July Kim et. al. A Space-Efficient Flash Translation Layer for CompactFlash Systems. IEEE Transactions on Consumer Electronics, Vol. 48, No. 2, May O’Neil P. The SB-Tree: An Index- Sequential Structure for High- Performance Sequential Access. Acta Inf., 29(3), 1992.