Download presentation
Presentation is loading. Please wait.
Published byBeryl Black Modified over 9 years ago
1
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University
2
Introduction Disk drives are often bottlenecks Several optimizations have been proposed Disk arrays Fewer disk reads using fancy buffer cache mgmt Optimized disk writes using logs Optimized disk scheduling Disk throughput still problem for data- intensive servers
3
Modern Disk Drives Substantial processing and memory capacity Disk controller cache Independent segments = sequential streams If #streams > #segments, LRU segm is replaced On access, blocks are read ahead to fill segment Disk arrays Array controller may also cache data Striping affects read-ahead
4
Key Problem Controller caches not designed for servers Sequential access to small # large files Read-ahead of consecutive blocks Segment is unit of allocation and replacement Data-intensive servers Small files Large # concurrent accesses Large # blocks often miss in the controller cache
5
This Work Goal Management techniques for disk controller caches that are efficient for servers Techniques File-Oriented Read-ahead (FOR) Host-guided Device Caching (HDC) Exploit processing and memory of drives
6
Architecture
7
File-Oriented Read-ahead Disk controller has no notion of file layout Read-ahead can be useless for small files Disk utilization is not amortized Useless blocks pollute the controller cache FOR only reads ahead blocks of same file
8
File-Oriented Read-ahead FOR needs to know layout of files on disk Bitmap of disk blocks kept by controller 1 block is logical continuation of previous block Initialized at boot, updated on metadata writes # blocks to read-ahead = # consecutive 1’s or max read-ahead size
9
File-Oriented Read-ahead FOR could underutilize segments, so allocation and replacement based on blocks Replacement policy: MRU FOR benefits Lower disk utilization Higher controller cache hit rates
10
Host-guided Device Caching Data-intensive servers rely on disk arrays, so non-trivial amount of cache space Current disk controller caches are speed matching and read-ahead buffers More useful if each cache can be managed directly by the host processor
11
Host-guided Device Caching Our evaluation: Disk controllers permanently cache data with most misses in buffer cache Each controller caches data stored on its disk Assumes block-based organization Support for three simple commands pin_blk() unpin_blk() flush_hdc()
12
Host-guided Device Caching Execution divided into periods to determine: How many blocks to cache; which blocks those are; when to cache them HDC benefits Higher cache hit rate Lower disk utilization Tradeoff: space for HDC and read-aheads
13
Methodology Simulation of 8 IBM Ultrastar 36Z15 drives attached to non-caching Ultra160 SCSI card Logical disk blocks striped across array Contention for buses, memories, and other components is simulated in detail Synthetic + real traces (Web, proxy, file)
14
Real Workloads Web: I/O time as function of striping unit size HDC: 2MB
15
Real Workloads Web: I/O time as function of HDC memory size Stripes: 16KB
16
Real Workloads Summary Consistent and significant performance gains Combination achieves best overall performance
17
Related Work Techniques external to disk controllers Controller cache different than other caches Lack of temporal locality Orders of magnitude smaller than main memory Read-ahead restricted to sequential blocks Explicit grouping Grouping needs to be found and maintained Segment replacements may eliminate benefits
18
Related Work Controller read-ahead & caching techniques None considered file system info, host-guided caching, or block-based organizations Other disk controller optimizations Scheduling of requests Utilizing free bandwidth Data replication FOR and HDC are orthogonal
19
Conclusions Current controller cache management is inappropriate for servers FOR and HDC can achieve significant and consistent increases in server throughput Real workloads show improvements of 47, 33 and 21% (Web, proxy, and file server)
20
Extensions Strategies for servers that use raw I/O Better approach than bitmap Array controllers that cache data and hide individual disks Impact of other replacement policies and sizes for the buffer cache
21
More Information http://www.darklab.rutgers.edu
22
Synthetic Workloads I/O time as function of file size
23
Synthetic Workloads I/O time as function of simultaneous streams
24
Synthetic Workloads I/O time as function of access frequency
25
Synthetic Workloads Summary No read-ahead hurts performance for files > 16KB No effect if simply replace segments with blocks FOR gains increase as file size decreases and # simultaneous streams increases HDC gains increase as requests are shifted toward a small # blocks FOR gains decrease as % writes increases
26
Synthetic Workloads I/O time as function of percentage of writes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.