CAFÉ: Scalable Task Pool with Adjustable Fairness and Contention Dmitry Basin, Rui Fan, Idit Keidar, Ofer Kiselov, Dmitri Perelman Technion, Israel Institute.

Slides:



Advertisements
Similar presentations
CS16: Introduction to Data Structures & Algorithms
Advertisements

Wait-Free Linked-Lists Shahar Timnat, Anastasia Braginsky, Alex Kogan, Erez Petrank Technion, Israel Presented by Shahar Timnat 469-+
Wait-Free Queues with Multiple Enqueuers and Dequeuers
Single Source Shortest Paths
Ceng-112 Data Structures I Chapter 5 Queues.
IDIT KEIDAR DMITRI PERELMAN RUI FAN EuroTM 2011 Maintaining Multiple Versions in Software Transactional Memory 1.
Scalable and Precise Dynamic Datarace Detection for Structured Parallelism Raghavan RamanJisheng ZhaoVivek Sarkar Rice University June 13, 2012 Martin.
1/20 Generalized Symbolic Execution for Model Checking and Testing Charngki PSWLAB Generalized Symbolic Execution for Model Checking and Testing.
PRIORITY QUEUES AND HEAPS Lecture 19 CS2110 Spring
IP Routing Lookups Scalable High Speed IP Routing Lookups.
Scalable Synchronous Queues By William N. Scherer III, Doug Lea, and Michael L. Scott Presented by Ran Isenberg.
Elad Gidron, Idit Keidar, Dmitri Perelman, Yonathan Perez
Advanced Data Structures
Progress Guarantee for Parallel Programs via Bounded Lock-Freedom Erez Petrank – Technion Madanlal Musuvathi- Microsoft Bjarne Steensgaard - Microsoft.
Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Chapter 6: Priority Queues Priority Queues Binary Heaps Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova, Simpson College.
©Brooks/Cole, 2003 Chapter 12 Abstract Data Type.
CS 257 Database Systems Principles Assignment 2 Instructor: Student: Dr. T. Y. Lin Rajan Vyas (119)
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Crash recovery All-or-nothing atomicity & logging.
1 Thread Pools. 2 What’s A Thread Pool? A programming technique which we will use. A collection of threads that are created once (e.g. when server starts).
SUPPORTING LOCK-FREE COMPOSITION OF CONCURRENT DATA OBJECTS Daniel Cederman and Philippas Tsigas.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Chapter 11 Heap. Overview ● The heap is a special type of binary tree. ● It may be used either as a priority queue or as a tool for sorting.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Concurrency Patterns Emery Berger and Mark Corner University.
1 (Worker Queues) cs What is a Thread Pool? A collection of threads that are created once (e.g. when a server starts) That is, no need to create.
Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.
1 Searching Searching in a sorted linked list takes linear time in the worst and average case. Searching in a sorted array takes logarithmic time in the.
A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
Multithreaded Programing. Outline Overview of threads Threads Multithreaded Models  Many-to-One  One-to-One  Many-to-Many Thread Libraries  Pthread.
Priority Queues Two kinds of priority queues: Min priority queue. Max priority queue. Nov 4,
PRIORITY QUEUES AND HEAPS Slides of Ken Birman, Cornell University.
Martin Kruliš by Martin Kruliš (v1.1)1.
HeapSort 25 March HeapSort Heaps or priority queues provide a means of sorting: 1.Construct a heap, 2.Add each item to it (maintaining the heap.
CS510 Concurrent Systems Tyler Fetters. A Methodology for Implementing Highly Concurrent Data Objects.
(c) University of Washington20c-1 CSC 143 Binary Search Trees.
Contents 1.Overview 2.Multithreading Model 3.Thread Libraries 4.Threading Issues 5.Operating-system Example 2 OS Lab Sun Suk Kim.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Data Structures and Design in Java © Rick Mercer
Scheduling of Non-Real-Time Tasks in Linux (SCHED_NORMAL/SCHED_OTHER)
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Chapter 11 Heap.
Why Events Are A Bad Idea (for high-concurrency servers)
Multiway search trees and the (2,4)-tree
Lecture 22 Binary Search Trees Chapter 10 of textbook
Hashing Exercises.
Chapter 4: Threads.
Practical Non-blocking Unordered Lists
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
Concurrency Specification
Capriccio – A Thread Model
Concurrent Data Structures Concurrent Algorithms 2017
Lesson Objectives Aims
Yiannis Nikolakopoulos
Priority Queues (Chapter 6.6):
Multicore programming
Multicore programming
CS202 - Fundamental Structures of Computer Science II
Indirect Communication Paradigms (or Messaging Methods)
Indirect Communication Paradigms (or Messaging Methods)
CSC 143 Binary Search Trees.
Why Events Are a Bad Idea (for high concurrency servers)
Tree Searching.
Tree Searching.
Tree Searching.
Priority Queues (Chapter 6):
Software Engineering and Architecture
CSE 542: Operating Systems
Presentation transcript:

CAFÉ: Scalable Task Pool with Adjustable Fairness and Contention Dmitry Basin, Rui Fan, Idit Keidar, Ofer Kiselov, Dmitri Perelman Technion, Israel Institute of Technology

Task Pools Exist in most server applications: Web Servers, e.g., building block of SEDA Architecture Handling asynchronous requests Ubiquitous programming pattern for parallel programs Scalability is essential! Task Pool Producers Consumers Shared Memory

Typical Implementation: FIFO Queue Has inherent scalability problem Do we really need FIFO ? In many cases no! We would like to: Relax the requirement Control the degree of relaxation contention points Shared Memory

CAFÉ: Contention and Fairness Explorer Ordered list of scalable bounded non-FIFO pools TreeContainer size controls contention-fairness trade-off garbage collected TreeContainer Java VM Less fairness Less contention More fairness More contention Tree height 0 Pure FIFO

TreeContainer (TC) Specification Bounded container A put operation can fail if no free space found A get operation returns a task or null if TC is empty Randomized algorithms for put and get operations

TreeContainer Data Structure Complete binary tree Free node Used node without a task Occupied node containing a task Right sub-tree doesnt contain tasks Left sub-tree has tasks

Get/Put Operations Step 1: find target node By navigating the tree Step 2: perform put/get on that node Need to handle races Step 3: update routes Update meta-data on path to node – tricky!

Task Get() Operation Get(): Start from the root TreeContainer CAS Close to the root updates are rare Step3: Update routes up to the root Step1: Navigate to a task by random walk on arrows graph Step 2: Extract the task.

Put() Operation Level 0 Level 1 Level 2 Level 3 TreeContainer Random node occupied Random node occupied Random node free Task put(): Every level of the tree implemented by array of nodes occupied Random node

Put() Operation Task put(): Go to the highest free predecessor CAS operation TreeContainer Level 0 Level 1 Level 2 Level 3 Random node free Finished Step 1: found free node Step 2: occupy the free node

Put() Operation put(): Close to the root, updates are rare true TreeContainer Update routes

When Put() Operation Fails put(): false TreeContainer Random node k trials fail Level 0 Level 1 Level 2 Last Level

Races Concurrency issues are not trivial :) Challenge: guarantee linearizability avoid updating all the metadata up to the root upon each operation See the paper

TreeContainer properties Put/Get operations are linearizable wait-free Under the worst-case thread scheduling: Good step complexity of puts When N nodes occupied - O(log 2 N) whp Does not depend on TC size Good tree density (arbitrarily close to 2 h whp) TreeContainer

CAFÉ Data Structures TC GT TC PT

CAFÉ Data Structures TC PT TC TC.Put(task) false Allocate and connect new TC TC.Put(task) true GT

TC CAFÉ:get() from Empty TC TC PT TC TC.Get nullTC.Get task Garbage collected GT

CAFÉ: Races TC PT TC Suspended producer thread The task is lost for consumers GT

CAFÉ: Handling Races – Try 1 TC PT TC Move GT back Check if GT bypassed TC GT

CAFÉ: Races (2) TC PT TC TC.Get null Consumer thread Producer thread Going to move GT forward TC.Put(task) true Consumers can access the task can terminate Task is lost GT

CAFÉ: Handling Races – Try 2 TC PT TC Read prev, If empty read curr GT cur><prev Lock-Free. To make it wait-free we do additional tricks.

CAFÉ: Properties Safety: Put()/Get() operations are linearizable Wait-freedom: Get() operations are deterministically wait-free Put() operations are wait-free with probability 1 Fairness: Preserves order among trees

Evaluation Setup Compared pools: LBQ: Java 6 FIFO blocking queue CLQ: Java 6 FIFO non-blocking queue (M&S) EDQ: non-FIFO Elimination-Diffraction Tree Queue Evaluation server: 8 AMD Opteron quad-cores total 32 cores

CAFÉ evaluation Throughput CAFÉ-13: CAFÉ with tree height 13 LBQ: Java 6 FIFO blocking queue CLQ: Java 6 FIFO non-blocking queue (M&S) EDQ: non-FIFO Elimination-Diffraction Tree Queue Throughput as a function of thread number factor of 30 over lock-free implementations

CAFÉ evaluation Throughput CAFÉ: CAFÉ queue LBQ: Java 6 FIFO blocking queue CLQ: Java 6 FIFO non-blocking queue (M&S) EDQ: non-FIFO Elimination-Diffraction Tree Queue. CAFÉ throughput as a function of TreeContainer height

CAFÉ evaluation CAS-failures CAS failures per operation as a function of TreeContainer height

Summary CAFÉ: Efficient Wait-Free With adjustable fairness and contention Thank you