Informed Prefetching and Caching Appears in Proc. of the 15th ACM Symp. on Operating System Principles Presented by Hsu Hao Chen
Outline Introduction Cost-benefit analysis Implementation Experimental testbed Conclusions
Introduction Aggressive prefetching
Introduction Hints LRU(least-recently-used) cache replacement algorithm Sequential Read ahead : prefetching up to 64 blocks ahead when it detects long sequential runs Disclosure: hints based on advance knowledge
Introduction Cost-benefit analysis to use both to balance buffer usage for prefetching versus caching to integrate this proactive management with traditional LRU (least-recently-used) cache management for non-hinted accesses
Cost-benefit analysis benefit (decrease in I/O service time) cost (increase in I/O service time)
An example - demand miss Most recent LRU Queue Least Recent Prefetched blocks App require a file, but cache can’t find out x demand miss 5 3 4 2 ms/req.
System model(1/3) Assumptions buffer cache running on a uniprocessor with sufficient memory to make available number of cache buffers Workload emphasized on read-intensive applications All application I/O accesses request a single file block Enough disk parallelism for there never to be any congestion (there is no disk queuing)
System model(2/3) TCPU is the inter-access application CPU time TI/O is the time it takes to service an I/O access
System model(3/3) Elapsed time latency of the fetch = allocating of a buffer, queuing the request at the drive, and servicing the interrupt when the I/O completes
benefit of allocating a buffer to a consumer(1/6) We know the access sequence: b0,b1,b2,…,bx Prefetching is meant to mask disk latency For each block, processing time: TCPU+Thit+Tdriver
benefit of allocating a buffer to a consumer(2/6) the average stall per access as a function of the prefetch depth, P(TCPU) > x > 0,
benefit of allocating a buffer to a consumer(3/6)
benefit of allocating a buffer to a consumer(4/6)
benefit of allocating a buffer to a consumer(5/6) TCPU is fixed, and P(TCPU) = 5. At time T=0 the fourth access stalls for Tstall = Tdisk - 3(TCPU+Thit+Tdriver).
benefit of allocating a buffer to a consumer(6/6)
Cost of shrinking the LRU cache(1/2) H(n) : Hit ratio if LRU cache has n blocks Avg response time: TLRU(n) = H(n) Thit + (1- H(n)) Tmiss Shrinking cost: △TLRU(n)= TLRU (n-1)-TLRU (n) = (H(n)-H(n-1))(Tmiss - Thit) = △H(n) (Tmiss - Thit) △H(n)=H(n)-H(n-1) note:
Cost of shrinking the LRU cache(2/2)
Cost of ejecting a hinted block
Implementation Local value estimates.
Experimental testbed Implemented on OSF/1 operating system System had 15 disks of 1GB each Experimented single applications multiple applications
Single applications(1/2)
Single applications(2/2)
Multiple applications(1/3) Elapsed time for both applications to complete.
Multiple applications(2/3) Elapsed time for one of a pair of applications.
Multiple applications(3/3) Elapsed time for the other of a pair of applications.
Conclusions Use hints from I/O-intensive applications to prefetch aggressively enough to eliminate I/O stall time while maximizing buffer availability for caching Allocate cache buffers dynamically among competing hinting and non-hinting applications for the greatest performance benefit