ExLRU : A Unified Write Buffer Cache Management for Flash Memory EMSOFT '11 Liang Shi 1,2, Jianhua Li 1,2, Chun Jason Xue 1, Chengmo Yang 3 and Xuehai.

Slides:



Advertisements
Similar presentations
Paper by: Yu Li, Jianliang Xu, Byron Choi, and Haibo Hu Department of Computer Science Hong Kong Baptist University Slides and Presentation By: Justin.
Advertisements

A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:
Bypass and Insertion Algorithms for Exclusive Last-level Caches
Flash storage memory and Design Trade offs for SSD performance
Recent Progress In Embedded Memory Controller Design
1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.
Trading Flash Translation Layer For Performance and Lifetime
Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University.
International Conference on Supercomputing June 12, 2009
Impact of Data Locality on Garbage Collection in SSDs: A General Analytical Study Yongkun Li, Patrick P. C. Lee, John C. S. Lui, Yinlong Xu The Chinese.
Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu*, and Lide Duan Dept. of Electrical & Computer Engineering Louisiana State University.
Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
An Efficient and Scalable Pattern Matching Scheme for Network Security Applications Department of Computer Science and Information Engineering National.
U NIVERSITY OF M ASSACHUSETTS Department of Computer Science Automatic Heap Sizing Ting Yang, Matthew Hertz Emery Berger, Eliot Moss University of Massachusetts.
ECE7995 Caching and Prefetching Techniques in Computer Systems Lecture 8: Buffer Cache in Main Memory (IV)
SAIU: An Efficient Cache Replacement Policy for Wireless On-demand Broadcasts Jianliang Xu, Qinglong Hu, Dik Lun Department of Computer Science in HK University.
1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng.
ECE/CSC Yan Solihin 1 An Optimized AMPM-based Prefetcher Coupled with Configurable Cache Line Sizing Qi Jia, Maulik Bakulbhai Padia, Kashyap Amboju.
Simulation of Memory Management Using Paging Mechanism in Operating Systems Tarek M. Sobh and Yanchun Liu Presented by: Bei Wang University of Bridgeport.
Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.
Origianal Work Of Hyojun Kim and Seongjun Ahn
Segment-Based Proxy Caching of Multimedia Streams Authors: Kun-Lung Wu, Philip S. Yu, and Joel L. Wolf IBM T.J. Watson Research Center Proceedings of The.
Speaker: 吳晋賢 (Chin-Hsien Wu) Embedded Computing and Applications Lab Department of Electronic Engineering National Taiwan University of Science and Technology,
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy Jason Zebchuk, Elham Safi, and Andreas Moshovos
Buffer Management Policy
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings A. gupta, Y. Kim, B. Urgaonkar, Penn State ASPLOS.
Embedded System Lab. Jung Young Jin The Design and Implementation of a Log-Structured File System D. Ma, J. Feng, and G. Li. LazyFTL:
1/14 Synthesis and Design of Parameter Extractors for Low-Power Pre-computation-Based Content-addressable Memory Using Gate- Block Selection Algorithm.
Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
Container Marking : Combining Data Placement, Garbage Collection and Wear Leveling for Flash MASCOTS '11 Xiao-Yu Hu, Robert Haas, and Eleftheriou Evangelos.
Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,
A Semi-Preemptive Garbage Collector for Solid State Drives
Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
1 Design Issues of Flash-based SSD& Hybrid Disks Han-Lin Li Dept. Computer Science and Information Engineering National Taiwan University Advisor: Prof.
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
Sungkyunkwan University Sector Level Mappinng FTL Computer engineering, Sungkyunkwan Univ. Oh Gihwan, Han Gyuhwa, Hong Gyeonghwan Jasmine Open-SSD Project.
Data Retention in MLC NAND FLASH Memory: Characterization, Optimization, and Recovery. 서동화
University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.
Transforming Policies into Mechanisms with Infokernel Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Nathan C. Burnett, Timothy E. Denehy, Thomas J.
1 Cache-Oblivious Query Processing Bingsheng He, Qiong Luo {saven, Department of Computer Science & Engineering Hong Kong University of.
Jiahao Chen, Yuhui Deng, Zhan Huang 1 ICA3PP2015: The 15th International Conference on Algorithms and Architectures for Parallel Processing. zhangjiajie,
Equalizer: Dynamically Tuning GPU Resources for Efficient Execution Ankit Sethia* Scott Mahlke University of Michigan.
On Caching Search Engine Query Results Evangelos Markatos Evangelos Markatoshttp://archvlsi.ics.forth.gr/OS/os.html Computer Architecture and VLSI Systems.
 The emerged flash-memory based solid state drives (SSDs) have rapidly replaced the traditional hard disk drives (HDDs) in many applications.  Characteristics.
Taeho Kgil, Trevor Mudge Advanced Computer Architecture Laboratory The University of Michigan Ann Arbor, USA CASES’06.
Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.
Chin-Hsien Wu & Tei-Wei Kuo
Elastic Parity Logging for SSD RAID Arrays Yongkun Li*, Helen Chan #, Patrick P. C. Lee #, Yinlong Xu* *University of Science and Technology of China #
CMSC 611: Advanced Computer Architecture
ASR: Adaptive Selective Replication for CMP Caches
Selective Code Compression Scheme for Embedded System
Multilevel Memories (Improving performance using alittle “cash”)
Semantic Data Caching and Replacement
An Adaptive Data Separation Aware FTL for Improving the Garbage Collection Efficiency of Solid State Drives Wei Xie and Yong Chen Texas Tech University.
Using Dead Blocks as a Virtual Victim Cache
Ann Gordon-Ross and Frank Vahid*
ICIEV 2014 Dhaka, Bangladesh
CDA 5155 Caches.
Qingbo Zhu, Asim Shankar and Yuanyuan Zhou
Parallel Garbage Collection in Solid State Drives (SSDs)
Cache - Optimization.
A Novel Cache-Utilization Based Dynamic Voltage Frequency Scaling (DVFS) Mechanism for Reliability Enhancements *Yen-Hao Chen, *Yi-Lun Tang, **Yi-Yu Liu,
Dong Hyun Kang, Changwoo Min, Young Ik Eom
Presentation transcript:

ExLRU : A Unified Write Buffer Cache Management for Flash Memory EMSOFT '11 Liang Shi 1,2, Jianhua Li 1,2, Chun Jason Xue 1, Chengmo Yang 3 and Xuehai Zhou 2 1 Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong 2 Department of Computer Science, University of Science and Technology of China, Hefei, China 3 Department of Electrical & Commputer Engineering, University of Delaware (Thu) Kwangwoon univ. SystemSoftware Lab. HoSeok Seo 1

Introduction  Propose  Write buffer management scheme for Flash memory  Purpose of write buffer?  Increase the write performance  Reduce the number of erase operations on flash memory  Why consider NAND Flash characteristics?  Write operation time is longer than read operation  NAND Flash has limited erase operation count  Out-place-update 2

Background  Previous study for NAND flash & Access patterns  FAB, BPLRU, etc 3 Weak from sequential write patterns Weak from random write patterns

Motivation  Previous schemes is  Managed with the block-level information.  Lack of the page-level information.  Result in inappropriate eviction decisions, as follows:  Slow retirement of large cold blocks. -Block size is big, but pages is cold.  Early eviction of small hot blocks. -Block size is small, but pages is hot.  Cold page retention in heat-imbalanced blocks. -Few pages is hot, but most pages is cold.  Thus, ExLRU takes the page-level access information and the characteristics of flash memory. 4

Cost Model of ExLRU  The page-level information and the block-level information 5

Cost Model of ExLRU  Averaged Frequency of Pages (AFP)  Averaged Frequency of Block (AFB)  Unified eviction cost of block x (UC) 6

Cost Model of ExLRU  Example  Cost is 7

Efficient ExLRU  The cost model of ExLRU has the overhead of O(n).  Efficient ExLRU  Proposed to reduce the overhead.  Identify the block with UC value low enough, not the lowest.  Pre-identify blocks during the idle time between two write requests  Cost is 8

Efficient ExLRU  Two processes  Scanning process and victim block selection process 9 (WR)(ER)

Efficient ExLRU  In scanning process.  Work in time between two write requests, if the number of blocks in ER is smaller than a threshold N min.  Move blocks into ER, if UC < T UC  In victim block selection process.  Select a block at LRU position of ER when the buffer is full.  If block miss in ER, UC values of blocks decrease.  If hit in ER, re-compute UC, and move a block WR or not.  If page miss in ER, add pages a block, and re-compute UC, and move a block MRU position of ER 10

Experimental Methodology  Use event-driven Simulator  SSD capacity : 8GB  A page size : 2KB  A block size : 64 pages  FTL algorithm : FAST  Trace : Financial, PC 11

Experimental Results  Average Size and Number of Evicted Blocks 12 ExLRU_S : T UC is static ( 0.1 x 10-5 ) ExLRU_D : T UC is dynamic 19.7% decrease19.1% increase

Experimental Results  Write and Erase Reduction in Financial trace 13 Best case 10.4% decrease Average 3% decrease

Experimental Results  Write and Erase Reduction in Financial trace 14

Parameter Sensitive Studies 15 N min : Min number of blocks of ER T UC : Max value for ER T SCAN. : Max number of blocks at scan

Conclusion  This scheme is designed to improve the write performance and reduce the number of erase operations  Care about diverse type of access patterns.  Exploit the page-level information and the block size 16