Rethink the Sync Ed Nightingale Kaushik Veeraraghavan Peter Chen

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

4/8/14CS161 Spring FFS Recovery: Soft Updates Learning Objectives Explain how to enforce write-ordering without synchronous writes. Identify and.
Energy Efficiency through Burstiness Athanasios E. Papathanasiou and Michael L. Scott University of Rochester, Computer Science Department Rochester, NY.
Rethink the Sync Ed Nightingale Kaushik Veeraraghavan Peter Chen Jason Flinn University of Michigan.
Speculative Execution In Distributed File System and External Synchrony Edmund B.Nightingale, Kaushik Veeraraghavan Peter Chen, Jason Flinn Presented by.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
Bandwidth and latency optimizations Jinyang Li w/ speculator slides from Ed Nightingale.
Speculations: Speculative Execution in a Distributed File System 1 and Rethink the Sync 2 Edmund Nightingale 12, Kaushik Veeraraghavan 2, Peter Chen 12,
OS2-1 Chapter 2 Computer System Structures. OS2-2 Outlines Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Microkernels: Mach and L4
Remus: High Availability via Asynchronous Virtual Machine Replication.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Rensselaer Polytechnic Institute CSC 432 – Operating Systems David Goldschmidt, Ph.D.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Operating System Support for Application-Specific Speculation Benjamin Wester Peter Chen and Jason Flinn University of Michigan.
Accelerating Mobile Applications through Flip-Flop Replication
Toolbox for Dimensioning Windows Storage Systems Jalil Boukhobza, Claude Timsit 12/09/2006 Versailles Saint Quentin University.
QuFiles: The right file at the right time Kaushik Veeraraghavan Jason Flinn Ed Nightingale * Brian Noble University of Michigan *Microsoft Research (Redmond)
Livio Soares, Michael Stumm University of Toronto
Silberschatz, Galvin and Gagne  Operating System Concepts I/O Hardware Incredible variety of I/O devices.
2.1 Silberschatz, Galvin and Gagne ©2003 Operating System Concepts with Java Chapter 2: Computer-System Structures Computer System Operation I/O Structure.
ICOM 6115©Manuel Rodriguez-Martinez ICOM 6115 – Computer Networks and the WWW Manuel Rodriguez-Martinez, Ph.D. Lecture 6.
An I/O Simulator for Windows Systems Jalil Boukhobza, Claude Timsit 27/10/2004 Versailles Saint Quentin University laboratory.
EECS 262a Advanced Topics in Computer Systems Lecture 7 Transactional Flash & Rethink the Sync September 25 th, 2012 John Kubiatowicz and Anthony D. Joseph.
Parallelizing Security Checks on Commodity Hardware Ed Nightingale Dan Peek, Peter Chen Jason Flinn Microsoft Research University of Michigan.
Chapter VIIII File Systems Review Questions and Problems Jehan-François Pâris
SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan.
Speculative Execution in a Distributed File System Ed Nightingale Peter Chen Jason Flinn University of Michigan.
Storage Systems CSE 598d, Spring 2007 Rethink the Sync April 3, 2007 Mark Johnson.
Speculation Supriya Vadlamani CS 6410 Advanced Systems.
4P13 Week 9 Talking Points
Silberschatz, Galvin, and Gagne  Applied Operating System Concepts Module 12: I/O Systems I/O hardwared Application I/O Interface Kernel I/O.
Speculative Execution in a Distributed File System Ed Nightingale Peter Chen Jason Flinn University of Michigan Best Paper at SOSP 2005 Modified for CS739.
Journaling versus Softupdates Asynchronous Meta-Data Protection in File System Authors - Margo Seltzer, Gregory Ganger et all Presenter – Abhishek Abhyankar.
NVWAL: Exploiting NVRAM in Write-Ahead Logging
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
Disk Cache Main memory buffer contains most recently accessed disk sectors Cache is organized by blocks, block size = sector’s A hash table is used to.
Chapter 13: I/O Systems.
Module 12: I/O Systems I/O hardware Application I/O Interface
Jonathan Walpole Computer Science Portland State University
Chapter 2: Computer-System Structures(Hardware)
Chapter 2: Computer-System Structures
Linux Details: Device Drivers
Using non-volatile memory (NVDIMM-N) as block storage in Windows Server 2016 Tobias Klima Program Manager.
Scaling a file system to many cores using an operation log
Operating System I/O System Monday, August 11, 2008.
Improving File System Synchrony
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
CPSC 457 Operating Systems
Operating System Support for Virtual Machines
Module 2: Computer-System Structures
Rethink the Sync Ed Nightingale Kaushik Veeraraghavan Peter Chen
Operating System Concepts
13: I/O Systems I/O hardwared Application I/O Interface
CS703 - Advanced Operating Systems
Linux Details: Device Drivers
Chapter 2: Operating-System Structures
Module 2: Computer-System Structures
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter VIIII File Systems Review Questions and Problems
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 2: Computer-System Structures
Chapter 2: Computer-System Structures
Module 2: Computer-System Structures
Software Overheads of Storage in Ceph
Chapter 2: Operating-System Structures
Module 2: Computer-System Structures
Summer 2002 at SLAC Ajay Tirumala.
Module 12: I/O Systems I/O hardwared Application I/O Interface
Presentation transcript:

Rethink the Sync Ed Nightingale Kaushik Veeraraghavan Peter Chen Jason Flinn University of Michigan

University of Michigan Problem Asynchronous I/O is a poor abstraction for: Reliability Ordering Durability Ease of programming Synchronous I/O is superior but 100x slower Caller blocked until operation is complete Rethink the Sync University of Michigan

University of Michigan Solution Synchronous I/O can be fast New model for synchronous I/O External synchrony Same guarantees as synchronous I/O Only 8% slower than asynchronous I/O Rethink the Sync University of Michigan

When a sync() is really async On sync() data written only to volatile cache 10x performance penalty and data NOT safe Disk Operating System Volatile Cache Cylinders 100x slower than asynchronous I/O if disable cache Rethink the Sync University of Michigan

To whom are guarantees provided? Synchronous I/O definition: Caller blocked until operation completes App App App OS Kernel Disk Screen Network Guarantee provided to application Rethink the Sync University of Michigan

To whom are guarantees provided? App App App OS Kernel Disk Screen Network Guarantee really provided to the user Rethink the Sync University of Michigan

Providing the user a guarantee User observes operation has completed User may examine screen, network, disk… Guarantee provided by synchronous I/O Data durable when operation observed to complete To observe output it must be externally visible Visible on external device Rethink the Sync University of Michigan

Why do applications block? External App App App OS Kernel Internal Disk Screen Network External Since application external we block on syscall Application is internal therefore no need to block Rethink the Sync University of Michigan

A new model of synchronous I/O Provide guarantee directly to user Rather than via application Called externally synchronous I/O Indistinguishable from traditional sync I/O Approaches speed of asynchronous I/O Rethink the Sync University of Michigan

Example: Synchronous I/O Application blocks 101 write(buf_1); 102 write(buf_2); 103 print(“work done”); 104 foo(); Application blocks % %work done % Disk OS Kernel TEXT Process Rethink the Sync University of Michigan

Observing synchronous I/O 101 write(buf_1); 102 write(buf_2); 103 print(“work done”); 104 foo(); Depends on 1st write Depends on 1st & 2nd write Sync I/O externalizes output based on causal ordering Enforces causal ordering by blocking an application Ext sync: Same causal ordering without blocking applications Rethink the Sync University of Michigan

Example: External synchrony 101 write(buf_1); 102 write(buf_2); 103 print(“work done”); 104 foo(); %work done % % Disk OS Kernel TEXT Process Rethink the Sync University of Michigan

Tracking causal dependencies Applications may communicate via IPC Socket, pipe, fifo etc. Need to propagate dependencies through IPC We build upon Speculator [SOSP ’05] Track and propagate causal dependencies Buffer output to screen and network Rethink the Sync University of Michigan

Tracking causal dependencies Process 1 Process 2 101 print (“hello”); 102 read(file1); 103 print(“world”); 101 write(file1); 102 do_something(); %hello % % TEXT TEXT Process 2 Process 2 OS Kernel world Commit Dep 1 Disk Process 1 Process 1 Rethink the Sync University of Michigan

Output triggered commits Maximize throughput until output buffered When output buffered, trigger commit Minimize latency only when important % %work done % Disk OS Kernel TEXT Process Rethink the Sync University of Michigan

University of Michigan Evaluation Implemented ext sync file system Xsyncfs Based on the ext3 file system Use journaling to preserve order of writes Use write barriers to flush volatile cache Compare Xsyncfs to 3 other file systems Default asynchronous ext3 Default synchronous ext3 Synchronous ext3 with write barriers Rethink the Sync University of Michigan

When is data safe? File System Configuration Data durable on write() Data durable on fsync() Asynchronous No Not on power failure Synchronous Synchronous w/ write barriers Yes External synchrony Rethink the Sync University of Michigan

University of Michigan Postmark benchmark Xsyncfs within 7% of ext3 mounted asynchronously Rethink the Sync University of Michigan

University of Michigan The MySQL benchmark Xsyncfs can group commit from a single client Rethink the Sync University of Michigan

University of Michigan Specweb99 throughput Xsyncfs within 8% of ext3 mounted asynchronously Rethink the Sync University of Michigan

University of Michigan Specweb99 latency Request size ext3-async xsyncfs 0-1 KB 0.064 seconds 0.097 seconds 1-10 KB 0.150 second 0.180 seconds 10-100 KB 1.084 seconds 1.094 seconds 100-1000 KB 10.253 seconds 10.072 seconds Xsyncfs adds no more than 33 ms of delay Rethink the Sync University of Michigan

University of Michigan Conclusion Synchronous I/O can be fast External synchrony performs with 8% of async Questions? Rethink the Sync University of Michigan