WORK STEALING SCHEDULER 6/16/2010 Work Stealing Scheduler 1.

Slides:



Advertisements
Similar presentations
Data Structures Through C
Advertisements

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
1 Presenter: Chien-Chih Chen. 2 Dynamic Scheduler for Multi-core Systems Analysis of The Linux 2.6 Kernel Scheduler Optimal Task Scheduler for Multi-core.
Mutual Exclusion The Art of Multiprocessor Programming Spring 2007.
A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager
CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Background for “KISS: Keep It Simple and Sequential” cs264 Ras Bodik spring 2005.
CS 206 Introduction to Computer Science II 10 / 22 / 2008 Instructor: Michael Eckmann.
Distributed Process Scheduling Summery Distributed Process Scheduling Summery BY:-Yonatan Negash.
Designing a thread-safe class  Store all states in public static fields  Verifying thread safety is hard  Modifications to the program hard  Design.
Parallelism Analysis, and Work Distribution BY DANIEL LIVSHEN. BASED ON CHAPTER 16 “FUTURES, SCHEDULING AND WORK DISTRIBUTION” ON “THE ART OF MULTIPROCESSOR.
CPSC 322, Lecture 15Slide 1 Stochastic Local Search Computer Science cpsc322, Lecture 15 (Textbook Chpt 4.8) February, 6, 2009.
Reference: Message Passing Fundamentals.
Parallel Simulation etc Roger Curry Presentation on Load Balancing.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Informationsteknologi Tuesday, October 9, 2007Computer Systems/Operating Systems - Class 141 Today’s class Scheduling.
Parallel Processing (CS526) Spring 2012(Week 5).  There are no rules, only intuition, experience and imagination!  We consider design techniques, particularly.
PARALLEL PROGRAMMING ABSTRACTIONS 6/16/2010 Parallel Programming Abstractions 1.
Graph Algorithms. Overview Graphs are very general data structures – data structures such as dense and sparse matrices, sets, multi-sets, etc. can be.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Work Stealing and Persistence-based Load Balancers for Iterative Overdecomposed Applications Jonathan Lifflander, UIUC Sriram Krishnamoorthy, PNNL* Laxmikant.
Chapters 7, 8, & 9 Quiz 3 Review 1. 2 Algorithms Algorithm A set of unambiguous instructions for solving a problem or subproblem in a finite amount of.
GraphLab: how I understood it with sample code Aapo Kyrola, Carnegie Mellon Univ. Oct 1, 2009.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.
Multiprocessor and Real-Time Scheduling Chapter 10.
Chapter 101 Multiprocessor and Real- Time Scheduling Chapter 10.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Static Process Scheduling Section 5.2 CSc 8320 Alex De Ruiter
CSE 60641: Operating Systems Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism. Thomas E. Anderson, Brian N.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
5 May CmpE 516 Fault Tolerant Scheduling in Multiprocessor Systems Betül Demiröz.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.
Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Static Process Scheduling
Advanced Computer Networks Lecture 1 - Parallelization 1.
Data Structures and Algorithms in Parallel Computing
CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.
A System Performance Model Distributed Process Scheduling.
Futures, Scheduling, and Work Distribution Speaker: Eliran Shmila Based on chapter 16 from the book “The art of multiprocessor programming” by Maurice.
Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
CS 420 Design of Algorithms Parallel Algorithm Design.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
Pradeep Konduri Static Process Scheduling:  Proceedance process model  Communication system model  Application  Dicussion.
Computer Architecture: Parallel Task Assignment
A task-based implementation for GeantV
Task Scheduling for Multicore CPUs and NUMA Systems
Course Description Algorithms are: Recipes for solving problems.
A Lock-Free Algorithm for Concurrent Bags
Shared Memory Programming
Background and Motivation
Multiprocessor and Real-Time Scheduling
CS703 - Advanced Operating Systems
CSE 417: Algorithms and Computational Complexity
Course Description Algorithms are: Recipes for solving problems.
GPU Scheduling on the NVIDIA TX2:
Operating System Overview
Presentation transcript:

WORK STEALING SCHEDULER 6/16/2010 Work Stealing Scheduler 1

Announcements Text books Assignment 1 Assignment 0 results Upcoming Guest lectures 6/16/2010Work Stealing Scheduler 2

Recommended Textbooks 6/16/2010Work Stealing Scheduler 3 The Art of Multiprocessor Programming Maurice Herlihy, Nir Shavit Patterns for Parallel Programming Timothy Mattson, et.al. Parallel Programming with Microsoft.NET

Available on Website Assignment 1 (due two weeks from now) Paper for Reading Assignment 1 (due next week) 6/16/2010Work Stealing Scheduler 4

Assignment 0 Thanks for submitting Provided important feedback to us 6/16/2010Work Stealing Scheduler 5

Percentage Students Answering Yes 6/16/2010Work Stealing Scheduler 6

Number of Yes Answers Per Student 6/16/2010Work Stealing Scheduler 7

Upcoming Guest Lectures Apr 12: Todd Mytkowicz How to (and not to) measure performance Apr 19: Shaz Qadeer Correctness Specifications and Data Race Detection 6/16/2010Work Stealing Scheduler 8

Last Lecture Recap 6/16/2010Work Stealing Scheduler 9

Last Lecture Recap Parallel computation can be represented as a DAG Nodes represent sequential computation Edges represent dependencies 6/16/2010Work Stealing Scheduler 10 Split Sequential Sort Merge Sequential Sort Time

Last Lecture Recap 6/16/2010Work Stealing Scheduler 11

Last Lecture Recap 6/16/2010Work Stealing Scheduler 12

Last Lecture Recap 6/16/2010Work Stealing Scheduler 13

This Lecture Design of a greedy scheduler Task abstraction Translating high-level abstractions to tasks 6/16/2010Work Stealing Scheduler 14

This Lecture Design of a greedy scheduler Task abstraction Translating high-level abstractions to tasks 6/16/2010Work Stealing Scheduler 15.NET Program.NET Program Intel TBB Intel TBB Program TBB Program … Scheduler

(Simple) Tasks A node in the DAG Executing a task generates dependent subtasks 6/16/2010Work Stealing Scheduler 16 Split(16) Merge(16) Split(8) Merge(8) Sort(4) Split(8) Merge(8) Sort(4)

(Simple) Tasks A node in the DAG Executing a task generates dependent subtasks Note: Task C is generated by A or B, whoever finishes last 6/16/2010Work Stealing Scheduler 17 Split(16) Merge(16) Split(8) Merge(8) Sort(4) Split(8) Merge(8) Sort(4) C A B

Design Constraints of the Scheduler The DAG is generated dynamically Based on inputs and program control flow The graph is not known ahead of time The amount of work done by a task is dynamic The weight of each node is not know ahead of time Number of processors P can change at runtime Hardware processors are shared with other processes, kernel 6/16/2010Work Stealing Scheduler 18

Design Requirements of the Scheduler 6/16/2010Work Stealing Scheduler 19

Design Requirements of the Scheduler Should be greedy A processor cannot be idle when tasks are pending Should limit communication between processors Should schedule related tasks in the same processor Tasks that are likely to access the same cachelines 6/16/2010Work Stealing Scheduler 20

Attempt 0: Centralized Scheduler “Manager distributes tasks to others” Manager: assigns tasks to workers, ensures no worker is idle Workers: On task completion, submit generated tasks to the manager 6/16/2010Work Stealing Scheduler 21

Attempt 1: Centralized Work Queue “All processors share a common work queue” Every processor dequeues a task from the work queue On task completion, enqueue the generated tasks to the work queue 6/16/2010Work Stealing Scheduler 22

Attempt 2: Work Sharing “Loaded workers share” Every processor pushes and pops tasks into a local work queue When the work queue gets large, send tasks to other processors 6/16/2010Work Stealing Scheduler 23

Disadvantages of Work Sharing If all processors are busy, each will spend time trying to offload “Perform communication when busy” Difficult to know the load on processors A processor with two large tasks might take longer than a processor with five small tasks Tasks might get shared multiple times before being executed Some processors can be idle while others are loaded Not greedy 6/16/2010Work Stealing Scheduler 24

Attempt 3: Work Stealing “Idle workers steal” Each processor maintains a local work queue Pushes generated tasks into the local queue When local queue is empty, steal a task from another processor 6/16/2010Work Stealing Scheduler 25

Nice Properties of Work Stealing 6/16/2010Work Stealing Scheduler 26

Nice Properties of Work Stealing 6/16/2010Work Stealing Scheduler 27

Work Stealing Queue Datastructure A specialized deque (Double-Ended Queue) with three operations: Push : Local processor adds newly created tasks Pop : Local processor removes task to execute Steal : Remote processors remove tasks 6/16/2010Work Stealing Scheduler 28 Push

Work Stealing Queue Datastructure A specialized deque (Double-Ended Queue) with three operations: Push : Local processor adds newly created tasks Pop : Local processor removes task to execute Steal : Remote processors remove tasks 6/16/2010Work Stealing Scheduler 29 PushPop? Steal? Pop? Steal?

Work Stealing Queue Datastructure A specialized deque (Double-Ended Queue) with three operations: Push : Local processor adds newly created tasks Pop : Local processor removes task to execute Steal : Remote processors remove tasks 6/16/2010Work Stealing Scheduler 30 Push Steal Pop

Advantages Stealers don’t interact with local processor when the queue has more than one task Popping recently pushed tasks improves locality Stealers take the oldest tasks, which are likely to be the largest (in practice) 6/16/2010Work Stealing Scheduler 31 Push Steal Pop

For Assignment 1 We provide an implementation of Work Stealing Queue This implementation is thread-safe Clients can concurrently call the push, pop, steal operations You don’t need to use additional locks or other synchronization Implement the scheduler logic and stealing strategy Hint: this implementation is single threaded 6/16/2010Work Stealing Scheduler 32