1 The Five-Minute Rule 20 Years Later (And How Flash Memory Changes The Rules) Goetz Graefe Presented By Abhinav Parate
2 Storage Hierarchy FLASH
3 Comparing Flash with Disks
4 When should we increase main memory? Metrics to decide- – Cost of infrastructure – Cost of maintenance – Mean Time to Failure – Performance improvement Simplest answer: Increase RAM size if it is insufficient to hold frequently accessed data item What time period is frequent?
5 Cost of accessing a data item A disc provides N accesses per second and costs $D. D A : D/N = Cost of disc access per second M : Cost of 1 byte of main memory I : Expected interval when the same data is accessed again (in seconds) B : Size of data in bytes
6 Cost of accessing a data item Number of accesses per second for data item = 1/I Cost if item is accessed from disc = D A /I Cost if item is available in memory = M * B Keep data item in memory if main memory cost is less than disc access cost M * B < D A / I I < D A / (M * B) For 1 KB data item, I < 400s ~ 5 minutes at 1987 costs
7 The Five-Minute Rule In 1987, Keep a 1KB data item in main memory, if it is accessed repeatedly in less than 5 minutes. In 1967, the frequent period was 0.5 s In 2007, the authors predicted 5 hour rule At actual 2007 prices, the period turned out to be little under 6 hours.
8 Sample Case A database consists of 500,000 records of 1000 bytes each. Peak load consists of 600 transactions per sec. Only 6% of data gets 96% accesses and gets accessed in <5min. 6% data resides in main memory. Remaining data gets accessed via two hard disks to support 1 second access time. The design saved $3.5m at 1987 costs when compared with entirely main-memory design
9 Back to Present Technology changed Multiple cores Virtualization Size of data increased tremendously Gap between RAM and disks performance increased FLASH memory comes into the picture!
10 Flash memory characteristics Purchase cost Access Latency Bandwidth Density Power consumption Cooling costs Everything lies in between RAM and rotating hard disks!
11 Comparison: Flash and Disks
12 Desirability of Flash Memory Disk I/O is increasingly becoming bottleneck as the number of CPU instructions possible in a disk I/O is steadily increasing A faster intermediate memory in storage hierarchy is highly desirable
13 Limitation of Flash Memory Write-bandwidth is lower than read-bandwidth. Re-writing a block requires erasing of entire block. Reliability: 100,000-1M erase and write cycles Requires wear levelling mechanism Requires agent to erase blocks as soon as they are written to hard disk.
14 The presentation ahead... Key challenges in using flash memory Addressing challenges Lots of open questions Implications in greening the computing infrastructure.
15 #1: Which hardware interface to use? Use DIMM? Use Serial-ATA? Use new hardware interface? Defining and developing new hardware interface is time- consuming exercise Use one of the existing interfaces
16 #2: Use as Buffer or Persistent Storage? Database systems are concerned with providing consistency. Databases have large number of small updates and must maintain recovery logs. Write logs to persistent storage quickly. Use Flash as Persistent Storage!
17 #2: Use as Buffer or Persistent Storage? File-systems manipulates the file contents in memory and write file to disk in its entirety Consistency is achieved via careful write ordering, quick write-back and expensive file-system checks. Page movement between flash and disks is expensive if flash is considered as persistent storage. Use Flash memory as buffer pool!.
18 #3: How to track Frequent Pages? The estimation and administration of frequent pages in current system is done through LRU Maintain two LRU chains in RAM
19 Least Recently Used Chain LRU for RAM LRU for flash memory T(N)T(N-1)T(1)
20 #4: How to decide size of RAM and Flash? Use Five-Minute Rule!
21 #5: How to move pages among layers in hierarchy? RAM and flash – DMA Transfer Flash and Disk – DMA (hardware) – Transfer buffer in RAM (software)
22 #6: How to track Page Locations? File systems – Maintain pointer pages – Pointer points to data page or run of contiguous data pages – Individual page movement may require breaking up run and updating pointer pages
23 #6: How to track Page Locations? Database systems – Use B-Tree indexes – Other kinds of indexes have been implemented on B-Trees efficiently – Page movement requires updating pointers in parent node and neighbors
24 Benefits to Database Systems Check Point Processing – provides consistency in databases – writes dirty pages to persistent storage – persistent flash storage is faster – need to write to disk only if page-replacement policy requires Recovery Logs – quick writes
25 Benefits to Database Systems Query Processing – Index based selection is faster – Need to consider index based query plans – Index joins and intersections Example: Table Scan: 100M rows : 100s Index fetches 10K rows in 100s Table Scan is efficient if result has more than 10K rows. Flash index scan fetches 500K rows!
26 Problem of Optimal B-tree Page Size Two different optimal page sizes
27 Implications for Green Computing This work's focus is infrastructure cost. Energy optimization may lead to different optimal page sizes for B-trees. Infrastructure cost optimization can lead to significant reduction in RAM size and hence, lower energy consumption. Introduces large flash memory in the system.
28 Implications for Green Computing P_flash be power consumption with flash memory P_noflash be power consumption without flash Let T_flash,T_noflash denote system throughput with/without flash System is green if – P_flash / P_noflash < 1 – T_flash / T_noflash > 1
29 Implications for Green Computing What if P_flash / P_noflash > 1? In this case, system is green if – T_flash / T_noflash > P_flash / P_noflash – Gain in throughput is higher than extra power spent
30 Some calculations Assume linear relation between number of frequently accessed pages and the frequent period If M is RAM used in no-flash system – M/15 is RAM in flash-based system – 4M is flash memory P_flash = M/15 x p ram + 4M x p flash P_noflash = M x p ram P_flash < P_noflash if p flash < 14/60 p ram The relationship holds true.
31 Conclusions Desirable to have faster intermediate memory in storage hierarchy. Database systems are likely to benefit a lot. Things are not clear about file-systems. Flash can improve system throughput and reduce power consumption. Reduction in RAM usage can lead to significant power savings.
32 Thank You!