NFS Tricks and Benchmarking Traps Daniel Ellard and Margo Seltzer FREENIX 2003 - June 12, 2003.

Slides:

Advertisements

Similar presentations

I/O Management and Disk Scheduling

Advertisements

Chapter 6 I/O Systems.

Disk Storage SystemsCSCE430/830 Disk Storage Systems CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U. Maine) Fall,

Overview of Mass Storage Structure

Using Hard Disks in Real-Time Systems Mark Stanovich.

I/O Management and Disk Scheduling Chapter 11. I/O Driver OS module which controls an I/O device hides the device specifics from the above layers in the.

CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture III: OS Support.

CMPT 401 Dr. Alexandra Fedorova Lecture III: OS Support.

CSE506: Operating Systems Disk Scheduling. CSE506: Operating Systems Key to Disk Performance Don’t access the disk – Whenever possible Cache contents.

Distributed Multimedia Systems

Allocating Memory.

1 Storage (cont’d) Disk scheduling Reducing seek time (cont’d) Reducing rotational latency RAIDs.

Lecture 17 I/O Optimization. Disk Organization Tracks: concentric rings around disk surface Sectors: arc of track, minimum unit of transfer Cylinder:

Based on the slides supporting the text

I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)

Device Management.

1 Today I/O Systems Storage. 2 I/O Devices Many different kinds of I/O devices Software that controls them: device drivers.

PRASHANTHI NARAYAN NETTEM.

Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.

12.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 12: Mass-Storage Systems.

TRACK-ALIGNED EXTENTS: MATCHING ACCESS PATTERNS TO DISK DRIVE CHARACTERISTICS J. Schindler J.-L.Griffin C. R. Lumb G. R. Ganger Carnegie Mellon University.

Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.

1 I/O Management and Disk Scheduling Chapter Categories of I/O Devices Human readable Used to communicate with the user Printers Video display terminals.

Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS6: Device Management 6.1. Principles of I/O.

Storage Management in Virtualized Cloud Environments Sankaran Sivathanu, Ling Liu, Mei Yiduo and Xing Pu Student Workshop on Frontiers of Cloud Computing,

Print Services. 2 Objectives Understand Print Server terms and concepts Understand how printing works Print Server Considerations Printer Hardware Considerations.

Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.

Page 110/12/2015 CSE 30341: Operating Systems Principles Network-Attached Storage  Network-attached storage (NAS) is storage made available over a network.

Chapter 12: Mass-Storage Systems Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Jan 1, 2005 Chapter 12: Mass-Storage.

Disks and Storage Systems

CE Operating Systems Lecture 20 Disk I/O. Overview of lecture In this lecture we will look at: Disk Structure Disk Scheduling Disk Management Swap-Space.

I/O Management and Disk Structure Introduction to Operating Systems: Module 14.

Disks Chapter 5 Thursday, April 5, Today’s Schedule Input/Output – Disks (Chapter 5.4)  Magnetic vs. Optical Disks  RAID levels and functions.

Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 14: Mass-Storage Systems Disk Structure Disk Scheduling Disk Management Swap-Space.

MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.

Lecture 3 Page 1 CS 111 Online Disk Drives An especially important and complex form of I/O device Still the primary method of providing stable storage.

I/O Management and Disk Scheduling Chapter 11. Disk Performance Parameters To read or write, the disk head must be positioned at the desired track and.

Computer Organization. This module surveys the physical resources of a computer system.  Basic components  CPU  Memory  Bus  I/O devices  CPU structure.

Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.

Problem-solving on large-scale clusters: theory and applications Lecture 4: GFS & Course Wrap-up.

Passive NFS Tracing of and Research Workloads Daniel Ellard, Jonathan Ledlie, Pia Malkani, Margo Seltzer FAST April 1, 2003.

1.  Disk Structure Disk Structure  Disk Scheduling Disk Scheduling  FCFS FCFS  SSTF SSTF  SCAN SCAN  C-SCAN C-SCAN  C-LOOK C-LOOK  Selecting a.

Operating Systems (CS 340 D) Princess Nora University Faculty of Computer & Information Systems Computer science Department.

Chapter 14: Mass-Storage Systems Disk Structure. Disk Scheduling. RAID.

CS533 - Concepts of Operating Systems 1 A Fast File System for UNIX Marshall Kirk McKusick, William N. Joy, Samuel J. Leffler and Robert S. Fabry University.

Device Management Mark Stanovich Operating Systems COP 4610.

Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.

Paper by: Chris Ruemmler and John Wikes Presentation by: Timothy Goldberg, Daniel Sink, Erin Collins, and Tony Luaders.

Taeho Kgil, Trevor Mudge Advanced Computer Architecture Laboratory The University of Michigan Ann Arbor, USA CASES’06.

Magnetic Disks Have cylinders, sectors platters, tracks, heads virtual and real disk blocks (x cylinders, y heads, z sectors per track) Relatively slow,

1 Chapter 11 I/O Management and Disk Scheduling Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and.

Memory Management.

Chapter 2 Memory and process management

Chapter 11: File System Implementation

Operating System I/O System Monday, August 11, 2008.

Chapter 2 Scheduling.

DISK SCHEDULING FCFS SSTF SCAN/ELEVATOR C-SCAN C-LOOK.

CSI 400/500 Operating Systems Spring 2009

Distributed P2P File System

Chapter 37 Hard Disk Drives

Main Memory Background Swapping Contiguous Allocation Paging

Overview Continuation from Monday (File system implementation)

Persistence: hard disk drive

Chapter 13: I/O Systems I/O Hardware Application I/O Interface

Disks and Storage Systems

Performance-Robust Parallel I/O

Operating Systems: Internals and Design Principles, 6/E

CSE 542: Operating Systems

Presentation transcript:

NFS Tricks and Benchmarking Traps Daniel Ellard and Margo Seltzer FREENIX June 12, 2003

Daniel Ellard - Freenix 2003June 12, Outline Motivation –Research questions –Benchmarking traps New NFS Read-Ahead Heuristics –Optimize sequential reads –Improve non-sequential reads Results Conclusions

Daniel Ellard - Freenix 2003June 12, Goal - Improve NFS Read Throughput We are interested in improving the throughput of data accessed from disk via NFS. –Example: workload Our approach: improve the heuristics that control the amount of read-ahead done by the server.

Daniel Ellard - Freenix 2003June 12, Why Improve Read-Ahead Heuristics? With busy NFS clients, 5-10% of NFS requests arrive at the server out-of-order. nfsiods are the primary source of reordering. –nfsiod is a client daemon that marshals and schedules NFS requests. –Many implementations use multiple nfsiods. –Contention for resources and process scheduling effects can cause reordering.

Daniel Ellard - Freenix 2003June 12, Why Improve Read-Ahead Heuristics? Sequential access patterns may appear non- sequential if requests are reordered. Servers do less (or no) read-ahead for non- sequential access patterns. Read-ahead is necessary for good performance.

Daniel Ellard - Freenix 2003June 12, Research Questions Can we improve performance for sequential reads by improving the way the NFS sequentiality-detection heuristic handles “slightly” out-of-order requests? Can we detect non-sequential access patterns that have sequential components and therefore can benefit from read-ahead?

Daniel Ellard - Freenix 2003June 12, A Micro-Benchmark for NFS Reads Long sequential reads Many concurrent readers Inspired by observed workloads All tests begin with a cold cache on client and server. –All data is brought from disk during the benchmark.

Daniel Ellard - Freenix 2003June 12, The Testbed FreeBSD Commodity PCs –Note: PCI bus transfer speed of 54 MB/s Intel PRO/1000 TX gigabit Ethernet –em device driver –MTU=1500 –Raw TCP transfer rate of 49 MB/s IDE and SCSI drives –Paper discusses SCSI, this talk focuses on IDE

Daniel Ellard - Freenix 2003June 12, Preliminary Results Before measuring the effect of our changes to the NFS server, we must understand the default system. Results of our benchmarks were frustrating: –Large variance –Strange effects We decided to investigate these effects before proceeding.

Daniel Ellard - Freenix 2003June 12, Benchmarking Traps Properties of disks and their drivers: –ZCAV/disk geometry effects –Disk scheduling algorithms –Tagged command queues Arbitrary limits in the NFS implementation Network issues –TCP vs UDP for RPC

Daniel Ellard - Freenix 2003June 12, ZCAV Effects ZCAV - “Zoned Constant Angular Velocity” –Disk tracks are grouped into zones. –Within each zone, each track has the same number of sectors. –The number of sectors is roughly proportional to the length of the track. Tracks in the outer zones hold times more data –Outer zone has a higher transfer rate –Outer zone requires fewer seeks

Daniel Ellard - Freenix 2003June 12,

Daniel Ellard - Freenix 2003June 12, Controlling for ZCAV Effects To minimize the ZCAV effect, minimize the difference between the innermost and outermost zones you use. –Use a large disk. –Run your benchmark in a small partition. To measure the effect, create several partitions and repeat your benchmark in each.

Daniel Ellard - Freenix 2003June 12, Disk Scheduler Issues BSD systems use the CSCAN scheduler. CSCAN trades fairness for disk utilitization. –Some requests are serviced much sooner than others. –It is not hard to create request streams that starve other requests for the disk. –Overall throughput is very good. Many scheduling algorithms are unfair.

Daniel Ellard - Freenix 2003June 12, Controlling for Scheduler Effects Application specific! For our purposes: –Total throughput for concurrent readers –Measure the total time it takes for all the concurrent readers to finish their tasks, instead of the time of each individual reader. There is large variation in the time each reader takes, but the time required by the slowest reader is reasonably consistent.

Daniel Ellard - Freenix 2003June 12, Tagged Command Queues SCSI drives have tagged command queues. –Disk requests are sent to the drive as soon as they reach the front of the scheduler queue. –The drive schedules the requests according to its own scheduling algorithm. For our benchmarks and hardware: –Tagged command queues increase fairness. –Unfortunately, throughput is reduced (almost 50% in the worst case).

Daniel Ellard - Freenix 2003June 12, Back to the Experiments… Q: What is the potential for improvement in the read-ahead algorithm? –Compare the default system to AlwaysReadAhead, a system that aggressively always does as much read-ahead as it can. A: There is benefit when the degree of concurrency is high and requests arrive out- of-order.

Daniel Ellard - Freenix 2003June 12,

Daniel Ellard - Freenix 2003June 12, The SlowDown Heuristic Default Heuristic If the access is sequential relative to the previous access: seqCount++ else seqCount = small const SlowDown Heuristic If the access is sequential relative to the previous access: seqCount++ else if the access is “close” to the previous access: seqCount is unchanged else seqCount = seqCount / 2

Daniel Ellard - Freenix 2003June 12,

Daniel Ellard - Freenix 2003June 12, Why Doesn’t SlowDown Help? The problem is not SlowDown. In FreeBSD, the sequentiality scores are stored in a fixed-size hash table. When the table is full, adding a new entry forces the ejection of another. The hash table is too small to support more than a few readers.

Daniel Ellard - Freenix 2003June 12,

Daniel Ellard - Freenix 2003June 12, The Effect of Increasing the Table Size Increasing the hash table size makes SlowDown as fast as AlwaysReadAhead. Fixing the table also makes the default algorithm as fast as AlwaysReadAhead. –For our current testbed, it is enough simply to have a reasonable value for seqCount. –Perhaps in the future having a more accurate value will become important.

Daniel Ellard - Freenix 2003June 12, Improving Non-Sequential Reads Some read patterns are non-sequential, but do contain sequential components. One example is two threads reading sequentially from the same file: –Thread 1 reads blocks 0, 1, 2, 3, 4 … –Thread 2 reads blocks 1000, 1001, 1002, 1003 … –Server sees 0, 1000, 1, 1001, 2, 1002, 3, 1003 … This pattern is not sequential according to the default or SlowDown read-ahead heuristics.

Daniel Ellard - Freenix 2003June 12, Using Cursors to Find Components For each active file, maintain a set of cursors. –Each cursor is a position and sequentiality score. For each read access to the file, choose the cursor with the closest position: –If there is no “close” cursor, create one. –If there are already too many cursors for this file, eject the least recently used. –Update the sequentiality score for the cursor.

Daniel Ellard - Freenix 2003June 12,

Daniel Ellard - Freenix 2003June 12, Conclusions The SlowDown heuristic does not help much, at least not for our system. –Fixing the hash table does help Cursors work well for access patterns that are the composition of sequential access patterns. Benchmarking is hard, even for simple changes.

Daniel Ellard - Freenix 2003June 12, Obtaining Our Code Daniel Ellard