Toward Common Patterns for Distributed, Concurrent, Fault-Tolerant Code Ryan Stutsman and John Ousterhout Stanford University 1.

Slides:



Advertisements
Similar presentations
Distributed systems Programming with threads. Reviews on OS concepts Each process occupies a single address space.
Advertisements

Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Race Conditions. Isolated & Non-Isolated Processes Isolated: Do not share state with other processes –The output of process is unaffected by run of other.
Threads Irfan Khan Myo Thein What Are Threads ? a light, fine, string like length of material made up of two or more fibers or strands of spun cotton,
Distributed systems Programming with threads. Reviews on OS concepts Each process occupies a single address space.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Transaction Management and Concurrency Control
Precept 3 COS 461. Concurrency is Useful Multi Processor/Core Multiple Inputs Don’t wait on slow devices.
Threads CSCI 444/544 Operating Systems Fall 2008.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
Implementing Remote Procedure Calls an introduction to the fundamentals of RPCs, made during the advent of the technology. what is an RPC? what different.
PRASHANTHI NARAYAN NETTEM.
Why Threads Are A Bad Idea (for most purposes) John Ousterhout Sun Microsystems Laboratories
CSE 490dp Resource Control Robert Grimm. Problems How to access resources? –Basic usage tracking How to measure resource consumption? –Accounting How.
Failures in the System  Two major components in a Node Applications System.
I/O Systems ◦ Operating Systems ◦ CS550. Note:  Based on Operating Systems Concepts by Silberschatz, Galvin, and Gagne  Strongly recommended to read.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Discussion Week 3 TA: Kyle Dewey. Overview Concurrency overview Synchronization primitives Semaphores Locks Conditions Project #1.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Concurrency Recitation – 2/24 Nisarg Raval Slides by Prof. Landon Cox.
Cooperative Task Management without Manual Stack Management Or, Event-driven Programming is not the Opposite of Thread Programming Atul Adya, John Howell,
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
What We Have Learned From RAMCloud John Ousterhout Stanford University (with Asaf Cidon, Ankita Kejriwal, Diego Ongaro, Mendel Rosenblum, Stephen Rumble,
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
1 The Google File System Reporter: You-Wei Zhang.
RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.
LiNK: An Operating System Architecture for Network Processors Steve Muir, Jonathan Smith Princeton University, University of Pennsylvania
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
Server to Server Communication Redis as an enabler Orion Free
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Lecture 8 Page 1 CS 111 Online Other Important Synchronization Primitives Semaphores Mutexes Monitors.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
Discussion Week 2 TA: Kyle Dewey. Overview Concurrency Process level Thread level MIPS - switch.s Project #1.
Operating Systems CSE 411 CPU Management Sept Lecture 10 Instructor: Bhuvan Urgaonkar.
Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy. Presented by: Tim Fleck.
M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young MACH: A New Kernel Foundation for UNIX Development Presenter: Wei-Lwun.
CSE 60641: Operating Systems Next topic: CPU (Process/threads/scheduling, synchronization and deadlocks) –Why threads are a bad idea (for most purposes).
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
By: Rob von Behren, Jeremy Condit and Eric Brewer 2003 Presenter: Farnoosh MoshirFatemi Jan
1 Why Events Are A Bad Idea (for high-concurrency servers) By Rob von Behren, Jeremy Condit and Eric Brewer (May 2003) CS533 – Spring 2006 – DONG, QIN.
CS533 Concepts of Operating Systems Jonathan Walpole.
CS533 - Concepts of Operating Systems 1 Threads, Events, and Reactive Objects - Alan West.
Concurrent Object-Oriented Programming Languages Chris Tomlinson Mark Scheevel.
1 Why Threads are a Bad Idea (for most purposes) based on a presentation by John Ousterhout Sun Microsystems Laboratories Threads!
Processes Chapter 3. Processes in Distributed Systems Processes and threads –Introduction to threads –Distinction between threads and processes Threads.
The Structuring of Systems Using Upcalls By David D. Clark Presented by Samuel Moffatt.
Parallel Programming Models EECC 756 David D. McGann 18 May, 1999.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Introduction to Operating Systems Concepts
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Why Events Are A Bad Idea (for high-concurrency servers)
Processes and Threads Processes and their scheduling
CS533 Concepts of Operating Systems
EECS 582 Midterm Review Mosharaf Chowdhury EECS 582 – F16.
Outline Announcements Fault Tolerance.
Fast Communication and User Level Parallelism
Background and Motivation
Why Threads Are A Bad Idea (for most purposes)
Why Events Are a Bad Idea (for high concurrency servers)
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes)
CS703 – Advanced Operating Systems
CS533 Concepts of Operating Systems Class 4
Presentation transcript:

Toward Common Patterns for Distributed, Concurrent, Fault-Tolerant Code Ryan Stutsman and John Ousterhout Stanford University 1

Introduction More developers writing more code that is distributed, concurrent, fault-tolerant (DCFT) Hard to get right 1000s of logical threads of execution Failures require highly adaptive control flow No commonly accepted patterns Threads versus events but then what? A pattern from our experiences: “rules” Small steps whose execution order is based on state Potential for correct code more quickly Fumbling in the dark; interested in others’ ideas 2

DCFT Code Examples HDFS chunk replication Coordinating Hadoop jobs RAMCloud Replicate 1000s of chunks across 1000s of servers Coordinate 1000s servers working to recover failed server Coordinate many ongoing recoveries at the same time 3

Example: Distributed Log Replication Segmented log replicates client data In-Memory Log Segments Stable Storage Servers Master Server Client Write Request 4

Example: Distributed Log Replication Segmented log replicates client data Failures require recreation of lost segments In-Memory Log Segments Stable Storage Servers Master Server Client Write Request 5

Example: Distributed Log Replication Segmented log replicates client data Failures require recreation of lost segments In-Memory Log Segments Stable Storage Servers Master Server Client Write Request 6

Example: Distributed Log Replication Segmented log replicates client data Failures require recreation of lost segments Failures can occur at any time In-Memory Log Segments Stable Storage Servers Master Server Client Write Request 7

Traditional imperative programming doesn’t work Result: spaghetti code, brittle, buggy PC doesn’t matter, only state matters 8 This Type of Code is Hard Must “go back” after failures

Our Approach: Rules Rule: Condition + Action If unreplicated data and no RPC outstanding and prior segment footer is replicated Then start write RPC containing unreplicated data 9 State Variables State Variables Actio n Conditions

Our Approach: Rules Execution order determined by state Actions are short, non-blocking, atomic Failures handled between actions not within State Variables State Variables Actio n Conditions 10

Segment Replication Rules #ConditionAction 1No backup server selectedChoose available server to hold replica 2Header unreplicated, no RPC outstanding Start RPC containing the header 3Header unreplicated, RPC completed Mark header replicated; mark prior segment to allow footer replication 4Unreplicated data, no RPC outstanding, prior footer is replicated Start write RPC containing up to 1 MB of unreplicated data 5Unreplicated data, RPC completedMark sent data as replicated 6Segment finalized, following header replicated, footer not sent, no RPC outstanding Start RPC containing the footer 7Segment finalized, RPC completedMark footer replicated; mark following segment to allow data replication On failure reset sent/replicated bytes and RPCs 11

Structuring rules How should hundreds of rules be organized? Need modularity and clear visualization How can rules be evaluated efficiently? Polling to test conditions for all rules won’t scale 12

One Approach Used by HDFS chunk replication Structures paired with a thread that polls to find needed work due to failure State records queued between threads/actions No modularity; code scattered across threads 13 Step 1 Step 2 Step 1-fail State Records

One Approach Complicates concurrency CPU parallelism unneeded Need fine-grained locking on structures But must carefully perforated for liveness/responsiveness 14 Step 1 Step 2 Step 1-fail State Records State

Tasks Task: Rules, state, and a goal Similar to how monitors group locks with state Implemented as a C++ object State: fields of the object Rules: applied via virtual method on the object Goal: invariant the task is intended to attain/retain Log segment replication One task per segment Rules send/reap RPCs, reset state on failures Goal is met when 3 complete replicas are made 15

Pools Pool: Applies rules to tasks with unmet goals A pool for each independent set of tasks Serializes execution of rules in each pool Simplifies synchronization 16

Benefits of Rules, Tasks, and Pools Directly determines code structure Eases local reasoning Write a rule at a time Condition clearly documents expected state Actions are short with simple control flow Rules are amenable to model checking But less restrictive than modeling languages Flexible, easy to abuse with gross performance hacks and odd concurrency needs Question: fundamental or just a mismatch of kernel scheduling/concurrency abstractions to app? 17

Conclusion Code pattern from our experiences: “rules” Small steps whose order is based on state Easy to adapt on failures PC doesn’t matter, only state matters In DCFT code non-linear flow is unavoidable Interesting question, how to structure rules This is one way, we’d love to hear others 18

19

Isn’t this just state machines? Explicit states explode or hide detail Similar to code flowcharts of the 70s Mental model doesn’t scale well to complex code Collate on state rather than on events+state Convert all events to state Reason about next step based on state alone Conditions (implicit states) serve as documentation Provide strong hint about what steps are needed 20

Isn’t this just events? Rules take actions in based on state Rather than events+state Event-based code: handler triggers all needed actions Rules-based code: events just modify state Decouples events from rules that react to them Event handler unaware of needed reactive steps Add reactions without modifying event handler Improves modularity 21

Don’t user-level threads solve this? They help Support 1000s of lightweight contexts Limit interruption to well-defined points (cooperatively scheduled) Stack-trace is still of limited benefit, though Threads must recheck for failures after resuming Code devolves into small, non-blocking, atomic actions just as with rules 22

Isn’t this hard to debug? Loss of stack context makes debugging hard Yes, but it would be lost even with threads Fundamental limitation of the need to break code into reactive, reorderable blocks Best we’ve got so far Dump state variables when a goal goes unmet for a long period Log aggressively for debugging Can add causality tracking to log messages 23