Informed Prefetching and Caching

Slides:

Advertisements

Similar presentations

The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling.

Advertisements

Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

ARC: A SELF-TUNING, LOW OVERHEAD REPLACEMENT CACHE

Concurrency: Deadlock and Starvation Chapter 6. Deadlock Permanent blocking of a set of processes that either compete for system resources or communicate.

Concurrency: Deadlock and Starvation Chapter 6. Deadlock Permanent blocking of a set of processes that either compete for system resources or communicate.

Energy Efficiency through Burstiness Athanasios E. Papathanasiou and Michael L. Scott University of Rochester, Computer Science Department Rochester, NY.

1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.

AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.

Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.

Application-Controlled File Caching Policies Pei Cao, Edward W. Felten and Kai Li Presented By: Mazen Daaibes Gurpreet Mavi ECE 7995 Presentation.

1 Storage-Aware Caching: Revisiting Caching for Heterogeneous Systems Brian Forney Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau Wisconsin Network Disks University.

Out-of core Streamline Generation Using Flow-guided File Layout Chun-Ming Chen 788 Project 1.

Lecture 17 I/O Optimization. Disk Organization Tracks: concentric rings around disk surface Sectors: arc of track, minimum unit of transfer Cylinder:

Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.

Memory Hierarchy Design Chapter 5 Karin Strauss. Background 1980: no caches 1995: two levels of caches 2004: even three levels of caches Why? Processor-Memory.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

A Status Report on Research in Transparent Informed Prefetching (TIP) Presented by Hsu Hao Chen.

A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan.

1  2004 Morgan Kaufmann Publishers Chapter Seven.

OS Spring’04 Virtual Memory: Page Replacement Operating Systems Spring 2004.

CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.

Operating Systems ECE344 Ding Yuan Page Replacement Lecture 9: Page Replacement.

Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.

CPU Cache Prefetching Timing Evaluations of Hardware Implementation Ravikiran Channagire & Ramandeep Buttar ECE7995 : Presentation.

Chapter 9 Uniprocessor Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community College, Venice,

1 Design and Performance of a Web Server Accelerator Eric Levy-Abegnoli, Arun Iyengar, Junehwa Song, and Daniel Dias INFOCOM ‘99.

Joonwon Lee File System. 2 Long-term Information Storage 1.Must store large amounts of data 2.Information stored must survive the termination.

Prefetching Challenges in Distributed Memories for CMPs Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture Department UPC – BarcelonaTech.

An Effective Disk Caching Algorithm in Data Grid Why Disk Caching in Data Grids?  It takes a long latency (up to several minutes) to load data from a.

SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.

Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University.

Virtual Memory The memory space of a process is normally divided into blocks that are either pages or segments. Virtual memory management takes.

Time Parallel Simulations I Problem-Specific Approach to Create Massively Parallel Simulations.

Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.

A BRIEF INTRODUCTION TO CACHE LOCALITY YIN WEI DONG 14 SS.

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Energy Efficient Prefetching and Caching Athanasios E. Papathanasiou and Michael L. Scott. University of Rochester Proceedings of 2004 USENIX Annual Technical.

SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.

Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.

Operating Systems ECE344 Ding Yuan Page Replacement Lecture 9: Page Replacement.

CS422 Principles of Database Systems Buffer Management Chengyu Sun California State University, Los Angeles.

Page Replacement FIFO, LIFO, LRU, NUR, Second chance

Informed Prefetching and Caching R. Hugo Patterson, Garth A. Gibson, Eka Ginting, Daniel Stodolsky, Jim Zelenka.

Cache memory. Cache memory Overview CPU Cache Main memory Transfer of words Transfer of blocks of words.

CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.

Lecture: Large Caches, Virtual Memory

Computer Architecture

Cache Memory Presentation I

Auburn University COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques (2) Dr. Xiao Qin Auburn University.

Chapter 9: Virtual-Memory Management

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

CMPT 886: Computer Architecture Primer

Chapter 5 Memory CSE 820.

What Happens if There is no Free Frame?

Virtual Memory فصل هشتم.

Cooperative Caching, Simplified

Qingbo Zhu, Asim Shankar and Yuanyuan Zhou

Persistence: hard disk drive

Contents Memory types & memory hierarchy Virtual memory (VM)

CSC3050 – Computer Architecture

Caches: AAT, 3C’s model of misses Prof. Eric Rotenberg

CS222P: Principles of Data Management Lecture #3 Buffer Manager, PAX

Virtual Memory: Working Sets

Principle of Locality: Memory Hierarchies

COT 5611 Operating Systems Design Principles Spring 2014

CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.

Overview Problem Solution CPU vs Memory performance imbalance

Presentation transcript:

Informed Prefetching and Caching Appears in Proc. of the 15th ACM Symp. on Operating System Principles Presented by Hsu Hao Chen

Outline Introduction Cost-benefit analysis Implementation Experimental testbed Conclusions

Introduction Aggressive prefetching

Introduction Hints LRU(least-recently-used) cache replacement algorithm Sequential Read ahead : prefetching up to 64 blocks ahead when it detects long sequential runs Disclosure: hints based on advance knowledge

Introduction Cost-benefit analysis to use both to balance buffer usage for prefetching versus caching to integrate this proactive management with traditional LRU (least-recently-used) cache management for non-hinted accesses

Cost-benefit analysis benefit (decrease in I/O service time) cost (increase in I/O service time)

An example - demand miss Most recent LRU Queue Least Recent Prefetched blocks App require a file, but cache can’t find out x demand miss 5 3 4 2 ms/req.

System model(1/3) Assumptions buffer cache running on a uniprocessor with sufficient memory to make available number of cache buffers Workload emphasized on read-intensive applications All application I/O accesses request a single file block Enough disk parallelism for there never to be any congestion (there is no disk queuing)

System model(2/3) TCPU is the inter-access application CPU time TI/O is the time it takes to service an I/O access

System model(3/3) Elapsed time latency of the fetch = allocating of a buffer, queuing the request at the drive, and servicing the interrupt when the I/O completes

benefit of allocating a buffer to a consumer(1/6) We know the access sequence: b0,b1,b2,…,bx Prefetching is meant to mask disk latency For each block, processing time: TCPU+Thit+Tdriver

benefit of allocating a buffer to a consumer(2/6) the average stall per access as a function of the prefetch depth, P(TCPU) > x > 0,

benefit of allocating a buffer to a consumer(3/6)

benefit of allocating a buffer to a consumer(4/6)

benefit of allocating a buffer to a consumer(5/6) TCPU is fixed, and P(TCPU) = 5. At time T=0 the fourth access stalls for Tstall = Tdisk - 3(TCPU+Thit+Tdriver).

benefit of allocating a buffer to a consumer(6/6)

Cost of shrinking the LRU cache(1/2) H(n) : Hit ratio if LRU cache has n blocks Avg response time: TLRU(n) = H(n) Thit + (1- H(n)) Tmiss Shrinking cost: △TLRU(n)= TLRU (n-1)-TLRU (n) = (H(n)-H(n-1))(Tmiss - Thit) = △H(n) (Tmiss - Thit) △H(n)=H(n)-H(n-1) note:

Cost of shrinking the LRU cache(2/2)

Cost of ejecting a hinted block

Implementation Local value estimates.

Experimental testbed Implemented on OSF/1 operating system System had 15 disks of 1GB each Experimented single applications multiple applications

Single applications(1/2)

Single applications(2/2)

Multiple applications(1/3) Elapsed time for both applications to complete.

Multiple applications(2/3) Elapsed time for one of a pair of applications.

Multiple applications(3/3) Elapsed time for the other of a pair of applications.

Conclusions Use hints from I/O-intensive applications to prefetch aggressively enough to eliminate I/O stall time while maximizing buffer availability for caching Allocate cache buffers dynamically among competing hinting and non-hinting applications for the greatest performance benefit