Algorithms for Synchronization and Consistency in Concurrent System Services Anders Gidenstam Distributed Computing and Systems group, Department of Computer.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

1 Process groups and message ordering If processes belong to groups, certain algorithms can be used that depend on group properties membership create (
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
1 CS 194: Elections, Exclusion and Transactions Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
Scalable and Lock-Free Concurrent Dictionaries
Garbage Collecting the World Bernard Lang Christian Queinnec Jose Piquer Presented by Yu-Jin Chia See also: pp text.
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
Garbage Collecting the World: One Car at a Time Richard L. Hudson, Ron Monison,~ J. Eliot B. Moss, & David S. Munrot Department of Computer Science, University.
Locality-Conscious Lock-Free Linked Lists Anastasia Braginsky & Erez Petrank 1.
Gossip Algorithms and Implementing a Cluster/Grid Information service MsSys Course Amar Lior and Barak Amnon.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Concurrent Data Structures in Architectures with Limited Shared Memory Support Ivan Walulya Yiannis Nikolakopoulos Marina Papatriantafilou Philippas Tsigas.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.
CS 582 / CMPE 481 Distributed Systems
© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Reliable Communication for Highly Mobile Agents ECE 7995: Term Paper.
CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
Distributed Systems Fall 2011 Gossip and highly available services.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
PRASHANTHI NARAYAN NETTEM.
Distributed process management: Distributed deadlock
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
1 Lock-Free Linked Lists Using Compare-and-Swap by John Valois Speaker’s Name: Talk Title: Larry Bush.
Practical and Lock-Free Doubly Linked Lists Håkan Sundell Philippas Tsigas.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Simple Wait-Free Snapshots for Real-Time Systems with Sporadic Tasks Håkan Sundell Philippas Tsigas.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Hardware Supported Time Synchronization in Multi-Core Architectures 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan,
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
CS510 Concurrent Systems Jonathan Walpole. A Lock-Free Multiprocessor OS Kernel.
Håkan Sundell, Chalmers University of Technology 1 NOBLE: A Non-Blocking Inter-Process Communication Library Håkan Sundell Philippas.
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
A Consistency Framework for Iteration Operations in Concurrent Data Structures Yiannis Nikolakopoulos A. Gidenstam M. Papatriantafilou P. Tsigas Distributed.
A Fault Tolerant Protocol for Massively Parallel Machines Sayantan Chakravorty Laxmikant Kale University of Illinois, Urbana-Champaign.
Challenges in Non-Blocking Synchronization Håkan Sundell, Ph.D. Guest seminar at Department of Computer Science, University of Tromsö, Norway, 8 Dec 2005.
1 Distributed Process Management Chapter Distributed Global States Operating system cannot know the current state of all process in the distributed.
Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD.
Supporting Multi-Processors Bernard Wong February 17, 2003.
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects Maged M. Michael Presented by Abdulai Sei.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Physical clock synchronization Question 1. Why is physical clock synchronization important? Question 2. With the price of atomic clocks or GPS coming down,
The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with Linux Guniguntala et al.
CS510 Concurrent Systems Jonathan Walpole. RCU Usage in Linux.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.
Distributed systems. distributed systems and protocols distributed systems: use components located at networked computers use message-passing to coordinate.
Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects MAGED M. MICHAEL PRESENTED BY NURIT MOSCOVICI ADVANCED TOPICS IN CONCURRENT PROGRAMMING,
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 7, 2005 Session 23.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Garbage Collecting the World Presentation: Mark Mastroieni Authors: Bernard Lang, Christian Queinne, Jose Piquer.
Alternative system models
Håkan Sundell Philippas Tsigas
A Lock-Free Algorithm for Concurrent Bags
Anders Gidenstam Håkan Sundell Philippas Tsigas
Yiannis Nikolakopoulos
EEC 688/788 Secure and Dependable Computing
Parallel and Distributed Simulation
CS510 - Portland State University
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Presentation transcript:

Algorithms for Synchronization and Consistency in Concurrent System Services Anders Gidenstam Distributed Computing and Systems group, Department of Computer Science and Engineering, Chalmers University of Technology

Anders Gidenstam, Chalmers 2 Research Overview Optimistic Synchronization Plausible Clocks Atomic Register NBmalloc: Lock-free Memory Allocation LFthreads: Lock-free Thread Library Lock-free Memory Reclamation Dynamic Cluster Management Causal Cluster Consistency

Anders Gidenstam, Chalmers 3 Outline Overview Sample of results Scalable information dissemination Causal Cluster Consistency Plausible Clocks Lock-free algorithms in system services Memory Management Threading and thread synchronization library Conclusions Future work

Anders Gidenstam, Chalmers 4 Causal Cluster Consistency Goal: Support Distributed Collaborative Applications Provide Consistency (order of updates matter) Scalable communication media Dynamically changing set of participants Focus: Group communication Propagate events (updates) to all interested processes Event delivery in causal order

Anders Gidenstam, Chalmers 5 Causal Cluster Consistency Goal: Support Distributed Collaborative Applications Provide Consistency (order of updates matter) Scalable communication media Dynamically changing set of participants Focus: Group communication Propagate events (updates) to all interested processes Event delivery in causal order Question

Anders Gidenstam, Chalmers 6 Causal Cluster Consistency Goal: Support Distributed Collaborative Applications Provide Consistency (order of updates matter) Scalable communication media Dynamically changing set of participants Focus: Group communication Propagate events (updates) to all interested processes Event delivery in causal order

Anders Gidenstam, Chalmers 7 Causal Cluster Consistency Goal: Support Distributed Collaborative Applications Provide Consistency (order of updates matter) Scalable communication media Dynamically changing set of participants Focus: Group communication Propagate events (updates) to all interested processes Event delivery in causal order Answer

Anders Gidenstam, Chalmers 8 Causal Cluster Consistency Goal: Support Distributed Collaborative Applications Provide Consistency (order of updates matter) Scalable communication media Dynamically changing set of participants Focus: Group communication Propagate events (updates) to all interested processes Event delivery in causal order

Anders Gidenstam, Chalmers 9 Causal Cluster Consistency: Optimistic causally ordered delivery C E Cluster B A D F A B C D E F Timestamp vector Basic algorithm: Vector clocks + queue of incoming events [Ahamad et al. 1996] Improvements Delivery with high probability Limited per-user domain of interest: Nobody is interested in changing everything at once Often more observers than updaters Events have lifetimes/deadlines [Baldoni et al. 1998] 3 5

Anders Gidenstam, Chalmers 10 Causal Cluster Consistency: Optimistic causally ordered delivery C E Cluster B A D F Core A B C D E F Processes Timestamp vector Basic algorithm: Vector clocks + queue of incoming events [Ahamad et al. 1996] Improvements Delivery with high probability Limited per-user domain of interest: Nobody is interested in changing everything at once Often more observers than updaters Events have lifetimes/deadlines [Baldoni et al. 1998]

Anders Gidenstam, Chalmers 11 Causal Cluster Consistency: Optimistic causally ordered delivery C E Cluster B A D F Core A B C D E F Processes Timestamp vector Basic algorithm: Vector clocks + queue of incoming events [Ahamad et al. 1996] Improvements Delivery with high probability Limited per-user domain of interest: Nobody is interested in changing everything at once Often more observers than updaters Events have lifetimes/deadlines [Baldoni et al. 1998]

Anders Gidenstam, Chalmers 12 Causal Cluster Consistency: Optimistic causally ordered delivery C E Cluster A D F Core A B C D E F Processes Timestamp vector B Basic algorithm: Vector clocks + queue of incoming events [Ahamad et al. 1996] Improvements Delivery with high probability Limited per-user domain of interest: Nobody is interested in changing everything at once Often more observers than updaters Events have lifetimes/deadlines [Baldoni et al. 1998]

Anders Gidenstam, Chalmers 13 Event recovery & bounds Recovery buffer of observed events “Pull” missing events: request from k processes analyse relation between adversary’s Q and buffer size: | Buffer | > 2npT => availability of an event w.h.p. > 1-(e/4) npT processes fail independently with probability q<k/(2n) request recovery from k => w.h.p. (>1-(e/4) nq ) response Deq e Timeline Deq e‘ Enq e‘ Enq e Size of Q

Anders Gidenstam, Chalmers 14 p1p1 p4p4 p3p3 p2p2 Successor Cluster Core Management Fault-tolerant cluster management algorithm: Dynamic who-is-who in smaller group (combined with dissemination) Assigns unique id/ticket/ entry to each core process. Inspired by algorithms for distributed hash tables Tolerates bounded # failures Process stop failures Communication failures Reclaims tickets (ids) from crashed processes

Anders Gidenstam, Chalmers 15 p1p1 p4p4 p3p3 p2p2 Successor Cluster Core Management Fault-tolerant cluster management algorithm: Dynamic who-is-who in smaller group (combined with dissemination) Assigns unique id/ticket/ entry to each core process. Inspired by algorithms for distributed hash tables Tolerates bounded # failures Process stop failures Communication failures Reclaims tickets (ids) from crashed processes

Anders Gidenstam, Chalmers 16 p1p1 p4p4 p3p3 p2p2 Successor Cluster Management Algorithm Recall: Each coordinator manages the tickets between itself and its successor Joining the core Contact any coordinator Coordinator assigns ticket Inform the other processes about the new core member p5p5 CJOIN request

Anders Gidenstam, Chalmers 17 p1p1 p4p4 p3p3 p2p2 Successor Cluster Management Algorithm p5p5 Recall: Each coordinator manages the tickets between itself and its successor Joining the core Contact any coordinator Coordinator assigns ticket Inform the other processes about the new core member

Anders Gidenstam, Chalmers 18 p1p1 p4p4 p3p3 Successor Cluster Management Algorithm: Dealing with Failures Failure detection Each process in each round Sends ALIVE messages to 2k+1 closest successors Needs to receive k+1 ALIVE messages from known predecessors to stay alive ALIVE messages

Anders Gidenstam, Chalmers 19 Cluster Management Algorithm: Dealing with Failures Failure detection Each process in each round Sends ALIVE messages to 2k+1 closest successors Needs to receive k+1 ALIVE messages from known predecessors to stay alive Failure handling Suspects successor to have failed: Exclusion algorithm contacts the next closest successor p1p1 p4p4 p3p3 Successor p5p5 EXCLUDE request

Anders Gidenstam, Chalmers 20 Cluster Management Algorithm: Dealing with Failures Failure detection Each process in each round Sends ALIVE messages to 2k+1 closest successors Needs to receive k+1 ALIVE messages from known predecessors to stay alive Failure handling Suspects successor to have failed: Exclusion algorithm contacts the next closest successor p1p1 p4p4 p3p3 Successor p5p5 EXCLUDE request Tolerates k failures in a 2k +1 neighbourhood.

Anders Gidenstam, Chalmers 21 Summary cluster consistency Causal Cluster Consistency Interesting for collaborative applications and environments Preserves optimistic causal order relations Good predictable delivery guarantees Scalability Event Recovery Algorithm Can decrease delivery latency Good match with protocols providing delivery w.h.p. Fault-tolerant cluster management algorithm Can support scalable and reliable peer-to-peer services: Resource and parallelism control Good availability of tickets in the occurrence of failures Low message overhead

Anders Gidenstam, Chalmers 22 Outline Overview Sample of results Scalable information dissemination Causal Cluster Consistency Plausible Clocks Lock-free algorithms in system services Memory Management Threading and thread synchronization library Conclusions Future work

Anders Gidenstam, Chalmers 23 Memory Management for Concurrent Applications Applications want to Allocate memory dynamically Use Return/deallocate Concurrent applications Memory is a shared resource Concurrent memory requests Potential problems: contention, blocking, lock convoys etc Lock-based Service Lock-free Service A B C A B C

Anders Gidenstam, Chalmers 24 Memory Management for Concurrent Applications Lock-based Service Lock-free Service A B C A B C Slow Why lock-free system services? Scalability/fault-tolerance potential Prevents a delayed thread from blocking other threads Scheduler decisions Page faults etc Many non-blocking algorithms uses dynamic memory allocation => non-blocking memory management needed

Anders Gidenstam, Chalmers 25 Memory Management for Concurrent Applications Lock-based Service Lock-free Service A B C A B C Slow Why lock-free services? Scalability/fault-tolerance potential Prevents a delayed thread from blocking other threads Scheduler decisions Page faults etc Many non-blocking algorithms uses dynamic memory allocation => non-blocking memory management needed Waiting

Anders Gidenstam, Chalmers 26 Memory Management for Concurrent Applications Lock-based Service Lock-free Service A B C A B C Slow Why lock-free services? Scalability/fault-tolerance potential Prevents a delayed thread from blocking other threads Scheduler decisions Page faults etc Many non-blocking algorithms uses dynamic memory allocation => non-blocking memory management needed Waiting

Anders Gidenstam, Chalmers 27 Memory Management for Concurrent Applications Lock-based Service Lock-free Service A B C A B C Slow Why lock-free services? Scalability/fault-tolerance potential Prevents a delayed thread from blocking other threads Scheduler decisions Page faults etc Many non-blocking algorithms use dynamic memory allocation => non-blocking memory management needed Waiting

Anders Gidenstam, Chalmers 28 The Lock-Free Memory Reclamation Problem Concurrent shared data structures with Dynamic use of shared memory Concurrent and overlapping operations by threads or processes Can nodes be deleted and reused safely? ABC Base

Anders Gidenstam, Chalmers 29 ABC The Lock-Free Memory Reclamation Problem Thread X Base Local variables

Anders Gidenstam, Chalmers 30 ABC The Lock-Free Memory Reclamation Problem Thread X Base X has de-referenced the link (pointer) to B

Anders Gidenstam, Chalmers 31 ABC The Lock-Free Memory Reclamation Problem Thread X Base Another thread, Y, finds and deletes (removes) B from the active structure Thread Y

Anders Gidenstam, Chalmers 32 ABC The Lock-Free Memory Reclamation Problem Thread X Base Thread Y wants to reclaim(/free) B Thread Y ? Property I: A (de-)referenced node is not reclaimed

Anders Gidenstam, Chalmers 33 ACD The Lock-Free Memory Reclamation Problem Thread X Base ? The nodes B and C are deleted from the active structure. B Property II: Links in a (de-)referenced node should always be de-referencable.

Anders Gidenstam, Chalmers 34 The Lock-Free Memory Reclamation Problem Thread X Base A 1 D 2 B 1 C 1 Reference counting can guarantee Property I Property II But it needs to be lock-free!

Anders Gidenstam, Chalmers 35 Reference counting issues A slow thread might prevent reclamation Cyclic garbage Implementation practicality issues Reference-count field MUST remain writable forever [Valois, Michael & Scott 1995] Needs double word CAS [Detlefs et al. 2001] Needs double width CAS [Herlihy et al. 2002] Large overhead Problems not fully addressed by previous solutions B 1 C 1 A 1 Slow

Anders Gidenstam, Chalmers 36 Our approach – The basic idea Combine the best of Hazard pointers [Michael 2002] Tracks references from threads Fast de-reference Upper bound on the number of unreclaimed deleted nodes Compatible with standard memory allocators Hazard pointers (of thread X) Global Thread X B Deletion list

Anders Gidenstam, Chalmers 37 Our approach – The basic idea Combine the best of Hazard pointers [Michael 2002] Tracks references from threads Fast de-reference Upper bound on the number of unreclaimed deleted nodes Compatible with standard memory allocators Reference counting Tracks references from links in shared memory Manages links within dynamic nodes Safe to traverse links (also) in deleted nodes Hazard pointers (of thread X) Global Thread X B 1 Deletion list A 1

Anders Gidenstam, Chalmers 38 Bound on #unreclaimed nodes The total number of unreclaimable deleted nodes is bounded Hazard pointers (Thread Y) Deletion list of Thread X B 0 C 0 D 1 Reclaimable

Anders Gidenstam, Chalmers 39 Bound on #unreclaimed nodes The total number of unreclaimable deleted nodes is bounded Hazard pointers (Thread Y) Deletion list of Thread X B 0 C 0 D 0 Reclaimable

Anders Gidenstam, Chalmers 40 Concurrent Memory Allocators Performance Goals Scalability Avoiding False-sharing Threads use data in the same cache-line Heap blowup Memory freed on one CPU is not made available to the others Fragmentation Runtime overhead Provide dynamic memory to the application Allocate / Deallocate interface Maintains a pool of memory (a.k.a. heap) Online problem – requests are handled at once and in order Cache line CPUs

Anders Gidenstam, Chalmers 41 NBmalloc architecture Based on the Hoard architecture [Berger et al, 2000] size-classes Processor heap SB Processor heap SB Processor heap SB Processor heap SB Processor heap SB Processor heap SB Processor heap SB Processor heap SB Processor heap SB Processor heap SB Processor heap SB Processor heap SB SB header Avoiding false-sharing: Per-processor heaps Fixed set of allocatable block sizes/classes Avoiding heap blowup: Superblocks

Anders Gidenstam, Chalmers 42 The lock-free challenge Finding and moving superblocks Within a per-processor heap Between per-processor heaps L-F Set RemoveInsert Unless “Remove + Insert” appears atomic an item may get stuck in “limbo”. New lock-free data structure: The flat-set. Stores superblocks Operations Finding an item in a set Moving an item from one set to another atomically

Anders Gidenstam, Chalmers 43 From Moving a shared pointer Issues One atomic CAS is not enough! We’ll need several steps. Any interfering threads need to help unfinished operations To New_pos From Old_pos - New_pos To Old_pos From New_pos To Old_pos - From

Anders Gidenstam, Chalmers 44 Experimental results Larson benchmark (emulates a multithreaded server application).

Anders Gidenstam, Chalmers 45 Summary memory management 1 st lock-free memory reclamation algorithm ensuring Safety of local and global references An upper bound on the number of deleted but unreclaimed nodes Safe arbitrary reuse of reclaimed memory Lock-free memory allocator Scalable Behaves well on both UMA and NUMA architectures Lock-free flat-sets New lock-free data structure Allows lock-free inter-object operations Implementation Freely available (GPL)

Anders Gidenstam, Chalmers 46 Conclusions and future work Presented algorithms for consistency in information dissemination services and scalable memory management and synchronization Optimistic synchronization is feasible and aids scalability Some new research problems opened by this thesis: Use of plausible clocks in cluster consistency Other consistency models in cluster consistency framework Generalized method for lock-free composite objects from “smaller” lock-free objects Making other systems services lock-free