Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.

Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University

Introduction  Disk drives are often bottlenecks  Several optimizations have been proposed Disk arrays Fewer disk reads using fancy buffer cache mgmt Optimized disk writes using logs Optimized disk scheduling  Disk throughput still problem for data- intensive servers

Modern Disk Drives  Substantial processing and memory capacity  Disk controller cache Independent segments = sequential streams If #streams > #segments, LRU segm is replaced On access, blocks are read ahead to fill segment  Disk arrays Array controller may also cache data Striping affects read-ahead

Key Problem  Controller caches not designed for servers Sequential access to small # large files Read-ahead of consecutive blocks Segment is unit of allocation and replacement  Data-intensive servers Small files Large # concurrent accesses Large # blocks often miss in the controller cache

This Work  Goal Management techniques for disk controller caches that are efficient for servers  Techniques File-Oriented Read-ahead (FOR) Host-guided Device Caching (HDC)  Exploit processing and memory of drives

Architecture

File-Oriented Read-ahead  Disk controller has no notion of file layout  Read-ahead can be useless for small files Disk utilization is not amortized Useless blocks pollute the controller cache  FOR only reads ahead blocks of same file

File-Oriented Read-ahead  FOR needs to know layout of files on disk Bitmap of disk blocks kept by controller 1  block is logical continuation of previous block Initialized at boot, updated on metadata writes  # blocks to read-ahead = # consecutive 1’s or max read-ahead size

File-Oriented Read-ahead  FOR could underutilize segments, so allocation and replacement based on blocks  Replacement policy: MRU  FOR benefits Lower disk utilization Higher controller cache hit rates

Host-guided Device Caching  Data-intensive servers rely on disk arrays, so non-trivial amount of cache space  Current disk controller caches are speed matching and read-ahead buffers  More useful if each cache can be managed directly by the host processor

Host-guided Device Caching  Our evaluation: Disk controllers permanently cache data with most misses in buffer cache Each controller caches data stored on its disk Assumes block-based organization  Support for three simple commands pin_blk() unpin_blk() flush_hdc()

Host-guided Device Caching  Execution divided into periods to determine: How many blocks to cache; which blocks those are; when to cache them  HDC benefits Higher cache hit rate Lower disk utilization  Tradeoff: space for HDC and read-aheads

Methodology  Simulation of 8 IBM Ultrastar 36Z15 drives attached to non-caching Ultra160 SCSI card  Logical disk blocks striped across array  Contention for buses, memories, and other components is simulated in detail  Synthetic + real traces (Web, proxy, file)

Real Workloads Web: I/O time as function of striping unit size HDC: 2MB

Real Workloads Web: I/O time as function of HDC memory size Stripes: 16KB

Real Workloads  Summary Consistent and significant performance gains Combination achieves best overall performance

Related Work  Techniques external to disk controllers  Controller cache different than other caches Lack of temporal locality Orders of magnitude smaller than main memory Read-ahead restricted to sequential blocks  Explicit grouping Grouping needs to be found and maintained Segment replacements may eliminate benefits

Related Work  Controller read-ahead & caching techniques None considered file system info, host-guided caching, or block-based organizations  Other disk controller optimizations Scheduling of requests Utilizing free bandwidth Data replication FOR and HDC are orthogonal

Conclusions  Current controller cache management is inappropriate for servers  FOR and HDC can achieve significant and consistent increases in server throughput  Real workloads show improvements of 47, 33 and 21% (Web, proxy, and file server)

Extensions  Strategies for servers that use raw I/O  Better approach than bitmap  Array controllers that cache data and hide individual disks  Impact of other replacement policies and sizes for the buffer cache

More Information http://www.darklab.rutgers.edu

Synthetic Workloads I/O time as function of file size

Synthetic Workloads I/O time as function of simultaneous streams

Synthetic Workloads I/O time as function of access frequency

Synthetic Workloads  Summary No read-ahead hurts performance for files > 16KB No effect if simply replace segments with blocks FOR gains increase as file size decreases and # simultaneous streams increases HDC gains increase as requests are shifted toward a small # blocks FOR gains decrease as % writes increases

Synthetic Workloads I/O time as function of percentage of writes

Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.

Similar presentations

Presentation on theme: "Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.

Similar presentations

Presentation on theme: "Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University."— Presentation transcript:

Similar presentations

About project

Feedback