SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan.

Slides:

Advertisements

Similar presentations

Remus: High Availability via Asynchronous Virtual Machine Replication

Advertisements

Two phase commit. Failures in a distributed system Consistency requires agreement among multiple servers –Is transaction X committed? –Have all servers.

Rethink the Sync Ed Nightingale Kaushik Veeraraghavan Peter Chen Jason Flinn University of Michigan.

Speculative Execution In Distributed File System and External Synchrony Edmund B.Nightingale, Kaushik Veeraraghavan Peter Chen, Jason Flinn Presented by.

1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,

Bandwidth and latency optimizations Jinyang Li w/ speculator slides from Ed Nightingale.

Speculations: Speculative Execution in a Distributed File System 1 and Rethink the Sync 2 Edmund Nightingale 12, Kaushik Veeraraghavan 2, Peter Chen 12,

CS 582 / CMPE 481 Distributed Systems

Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.

Precept 3 COS 461. Concurrency is Useful Multi Processor/Core Multiple Inputs Don’t wait on slow devices.

CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.

2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.

1 Transaction Management Database recovery Concurrency control.

Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.

Remus: High Availability via Asynchronous Virtual Machine Replication.

Distributed File System: Design Comparisons II Pei Cao Cisco Systems, Inc.

THE EVOLUTION OF NFS Dave Hitz and Andy Watson Network Appliance, Inc.

File Systems (2). Readings r Silbershatz et al: 11.8.

A Survey of Rollback-Recovery Protocols in Message-Passing Systems M. Elnozahy, L. Alvisi, Y. Wang, D. Johnson Carnegie Mellon University Presented by:

DESIGN AND IMPLEMENTATION OF THE SUN NETWORK FILESYSTEM R. Sandberg, D. Goldberg S. Kleinman, D. Walsh, R. Lyon Sun Microsystems.

1 Rollback-Recovery Protocols II Mahmoud ElGammal.

Energy Efficiency and Storage Flexibility in the Blue File System Edmund B Nightingale Jason Flinn University of Michigan.

I/O Systems ◦ Operating Systems ◦ CS550. Note:  Based on Operating Systems Concepts by Silberschatz, Galvin, and Gagne  Strongly recommended to read.

Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.

Operating System Support for Application-Specific Speculation Benjamin Wester Peter Chen and Jason Flinn University of Michigan.

IMPROVING THE RELIABILITY OF COMMODITY OPERATING SYSTEMS Michael M. Swift Brian N. Bershad Henry M. Levy University of Washington.

1 AutoBash: Improving Configuration Management with Operating System Causality Analysis Ya-Yunn Su, Mona Attariyan, and Jason Flinn University of Michigan.

W. Sliwinski – eLTC – 7March08 1 LSA & Safety – Integration of RBAC and MCS in the LHC control system.

Distributed Transactions March 15, Transactions What is a Distributed Transaction?  A transaction that involves more than one server  Network.

Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.

Distributed File Systems

JOURNALING VERSUS SOFT UPDATES: ASYNCHRONOUS META-DATA PROTECTION IN FILE SYSTEMS Margo I. Seltzer, Harvard Gregory R. Ganger, CMU M. Kirk McKusick Keith.

Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.

…using Git/Tortoise Git

Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,

EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

PAVANI REDDY KATHURI TRANSACTION COMMUNICATION. OUTLINE 0 P ART I : I NTRODUCTION 0 P ART II : C URRENT R ESEARCH 0 P ART III : F UTURE P OTENTIAL 0 R.

Chapter 15 Recovery. Topics in this Chapter Transactions Transaction Recovery System Recovery Media Recovery Two-Phase Commit SQL Facilities.

Practical Byzantine Fault Tolerance

1 ZYZZYVA: SPECULATIVE BYZANTINE FAULT TOLERANCE R.Kotla, L. Alvisi, M. Dahlin, A. Clement and E. Wong U. T. Austin Best Paper Award at SOSP 2007.

 Distributed file systems having transaction facility need to support distributed transaction service.  A distributed transaction service is an extension.

A Low-bandwidth Network File System Athicha Muthitacharoen et al. Presented by Matt Miller September 12, 2002.

Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.

Speculative Execution in a Distributed File System Ed Nightingale Peter Chen Jason Flinn University of Michigan.

ENERGY-EFFICIENCY AND STORAGE FLEXIBILITY IN THE BLUE FILE SYSTEM E. B. Nightingale and J. Flinn University of Michigan.

1 Lecture 6 Introduction to Process Management COP 3353 Introduction to UNIX.

GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.

Storage Systems CSE 598d, Spring 2007 Rethink the Sync April 3, 2007 Mark Johnson.

Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.

CS510 Concurrent Systems Jonathan Walpole. A Methodology for Implementing Highly Concurrent Data Objects.

Outline for Today Journaling vs. Soft Updates Administrative.

Speculation Supriya Vadlamani CS 6410 Advanced Systems.

Recovery technique. Recovery concept Recovery from transactions failure mean data restored to the most recent consistent state just before the time of.

4P13 Week 9 Talking Points

JOURNALING VERSUS SOFT UPDATES: ASYNCHRONOUS META-DATA PROTECTION IN FILE SYSTEMS Margo I. Seltzer, Harvard Gregory R. Ganger, CMU M. Kirk McKusick Keith.

Race conditions and synchronization issues Exploiting UNIX.

Solutions for the Fourth Problem Set COSC 6360 Fall 2014.

Transactional Recovery and Checkpoints. Difference How is this different from schedule recovery? It is the details to implementing schedule recovery –It.

Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.

Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.

Speculative Execution in a Distributed File System Ed Nightingale Peter Chen Jason Flinn University of Michigan Best Paper at SOSP 2005 Modified for CS739.

THE EVOLUTION OF CODA M. Satyanarayanan Carnegie-Mellon University.

Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.

Database Recovery Techniques

Transactions and Reliability

Improving File System Synchrony

Rethink the Sync Ed Nightingale Kaushik Veeraraghavan Peter Chen

Cary G. Gray David R. Cheriton Stanford University

DESIGN AND IMPLEMENTATION OF THE SUN NETWORK FILESYSTEM

Rethink the Sync Ed Nightingale Kaushik Veeraraghavan Peter Chen

Presentation transcript:

SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan

Motivation Distributed file systems are often much slower than local file systems –Due to synchronous operations required for cache coherence and data safety –Even true for file systems that weaken consistency and safety guarantees Close-to-open consistency for AFS and most versions of NFS

A better solution Most of these synchronous operations have predictable outcomes –We can bet on the outcome and let the client process go forward ( speculation ) Make operation asynchronous –Must take before that a checkpoint of the process Can restart operation if speculation failed

Why it works 1.Clients can correctly predict the outcome of many operations Few concurrent accesses to files 2.Time to take a lightweight checkpoint is often less than network round-trip time 52 ms for a small process thanks to copy-on-write 3.Most clients have free cycles

Speculator File system controls when speculations start, succeed and fail Speculator provides a mechanism to ensure correct execution of speculative code No application changes are required Speculative state is never visible from the outside

Correctness rules (I) A process that executes in speculative mode cannot externalize output –Speculator blocks the process Speculator tracks causal dependencies between kernel objects – Kernel objects modified by a speculative process will be put in a speculative state

Correctness rules (II) Speculator tracks causal dependencies between processes –Processes receiving a message or a signal from a speculative process will be checkpointed and become speculative In case of doubt, Speculator will block the execution of the speculative process

An example: conventional NFS

Linux NFSv3 implements close to open consistency –At close time, client sends to server: 1.Asynchronous write calls with the modified data 2.A synchronous commit call once it has received replies for all write calls

An example: SpecNFS

All calls are non-blocking but force the calling process to become speculative If a call returns an unexpected result, the calling process is rolled back to its checkpoint and the call is executed again –A new speculation starts

Speculation interface Three new system calls: – Create_speculation () : Returns unique spec_id and a list of previous speculations on which the speculation depends – Commit_speculation(spec_id) – Fail_speculation(spec_id)

Implementing checkpoints Checkpoints are implemented through copy-on-write fork –Speculator also saves the state of any open file descriptor and copies all pending signals Forked child is not placed on the ready queue –It just waits If speculation fails, forked child assumes the identity of the failed parent

New kernel structures Speculation structure: –Created during create_speculation() –Tracks the set of kernel objects that depend on the speculation Undo log: –Associated with each kernel object that has a speculative state –Ordered list of speculative modifications

Sharing checkpoints Letting successive speculations share the same checkpoint reduces the speculation overhead Two limitations –Speculator limits the amount of rollback work by not letting speculation share a checkpoint that is more than 500 ms old –Cannot let a speculation share a checkpoint with a previous speculation that changes state of file system

Correctness invariants 1.Speculative state should never be visible to the user or to any external device –Speculator prevents all speculative processes from externalizing output to any interface 2.A process should never view speculative state unless it is already speculatively dependent upon that state.

Invariant implementations (I) First Implementation: Block speculative processes whenever they try to perform a system call –Always correct –Limits the amount of work that can be done by a process in a speculative state

Invariant implementations (II) Second Implementation: Allow speculative processes to perform systems calls that –Do not modify state “Read-only” calls such as getpid() –Only modify state that is private to the calling process It will be rolled back if speculation fails

Invariant implementations (III) Third Implementation: Allow speculative processes to perform operations on files in speculative file systems –With VFS, can have multiple file systems on the same machine Typically NFS plus FFS or ext3 Must check type of file system –Have a special bit in superblock

Multiprocess speculation (I) Whenever a speculative process P participates in interprocess communication with a process Q Process Q must become speculatively dependent on the speculative state of process P and get checkpointed

Multiprocess speculation (II) Whenever a speculative process P modifies an object X Object X must become speculatively dependent on the speculative state of process P and get an undo list You are not responsible for the implementation details

Performance: PostMark benchmark

SpecNFS is –2.5 times faster than NFS with no latency between client and server –41 times faster than NFS with a 30ms round- trip time delay between client and server A version of BlueFS providing single-copy semantics is 49 times faster than NFS with same 30ms round-trip time delay

Performance: Apache benchmark

Building Apache server from a tarred file SpecNFS is –2 times faster than NFS with no latency between client and server –14 times faster than NFS with a 30ms round- trip time delay between client and serve –Always better than BlueFS and Coda

Performance: impact of rollbacks

Repeated Apache benchmark marking a varying fraction of the files out-of-date –Will result in speculation failures –Percentage of out-of-date files has little impact on SpecNFS performance

Performance: other

Impact of group commits and sharing state –Mostly affects Blue FS When speculative processes cannot propagate their state, Blue FS performs worse than NFS with no latency between client and server Impact magnified at 30ms latency

Conclusion Speculation enables the development of distributed file systems that are –Safe –Consistent –Fast Generic kernel support for speculative execution and causal dependency tracking could have many other applications