Martin Kruliš 10. 12. 2015 by Martin Kruliš (v1.1)1.

Slides:



Advertisements
Similar presentations
Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.
Advertisements

CPU Scheduling Questions answered in this lecture: What is scheduling vs. allocation? What is preemptive vs. non-preemptive scheduling? What are FCFS,
Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.
Scheduling. Main Points Scheduling policy: what to do next, when there are multiple threads ready to run – Or multiple packets to send, or web requests.
1 SEDA: An Architecture for Well- Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.
Multiple Processor Systems
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Chap 5 Process Scheduling. Basic Concepts Maximum CPU utilization obtained with multiprogramming CPU–I/O Burst Cycle – Process execution consists of a.
The University of Adelaide, School of Computer Science
The Performance of Spin Lock Alternatives for Shared-Memory Microprocessors Thomas E. Anderson Presented by David Woodard.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
CS 3013 & CS 502 Summer 2006 Scheduling1 The art and science of allocating the CPU and other resources to processes.
1 Thursday, June 15, 2006 Confucius says: He who play in root, eventually kill tree.
Wk 2 – Scheduling 1 CS502 Spring 2006 Scheduling The art and science of allocating the CPU and other resources to processes.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Synchronization and Scheduling in Multiprocessor Operating Systems
CPU Scheduling - Multicore. Reading Silberschatz et al: Chapter 5.5.
Computer System Architectures Computer System Software
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
1 Previous lecture review n Out of basic scheduling techniques none is a clear winner: u FCFS - simple but unfair u RR - more overhead than FCFS may not.
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
CS 3204 Operating Systems Godmar Back Lecture 13.
CPU Scheduling CSCI 444/544 Operating Systems Fall 2008.
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors THOMAS E. ANDERSON Presented by Daesung Park.
Lecture 13: Multiprocessors Kai Bu
Operating Systems CSE 411 Multi-processor Operating Systems Multi-processor Operating Systems Dec Lecture 30 Instructor: Bhuvan Urgaonkar.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 5: CPU Scheduling.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 15 Scheduling Read Ch.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Processor Architecture
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
Full and Para Virtualization
Lecture Topics: 11/15 CPU scheduling: –Scheduling goals and algorithms.
Understanding Parallel Computers Parallel Processing EE 613.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
1 CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling.
Background Computer System Architectures Computer System Software.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Introduction to operating systems What is an operating system? An operating system is a program that, from a programmer’s perspective, adds a variety of.
Lecture 4 CPU scheduling. Basic Concepts Single Process  one process at a time Maximum CPU utilization obtained with multiprogramming CPU idle :waiting.
The University of Adelaide, School of Computer Science
CPU scheduling.  Single Process  one process at a time  Maximum CPU utilization obtained with multiprogramming  CPU idle :waiting time is wasted 2.
Scheduling of Non-Real-Time Tasks in Linux (SCHED_NORMAL/SCHED_OTHER)
The University of Adelaide, School of Computer Science
A task-based implementation for GeantV
Task Scheduling for Multicore CPUs and NUMA Systems
Chapter 8 – Processor Scheduling
CMSC 611: Advanced Computer Architecture
Chapter 4: Threads.
Kai Bu 13 Multiprocessors So today, we’ll finish the last part of our lecture sessions, multiprocessors.
Andy Wang Operating Systems COP 4610 / CGS 5765
Chapter 15 – Part 1 The Internal Operating System
Lecture 21: Introduction to Process Scheduling
Operating systems Process scheduling.
Chapter 5: CPU Scheduling
Process scheduling Chapter 5.
Chapter 6: CPU Scheduling
High Performance Computing
Lecture 21: Introduction to Process Scheduling
Jakub Yaghob Martin Kruliš
CSC Multiprocessor Programming, Spring, 2011
Scheduling of Regular Tasks in Linux
Presentation transcript:

Martin Kruliš by Martin Kruliš (v1.1)1

 Thread Scheduling in OS ◦ Operating systems have multiple requirements  Fairness (regarding multiple processes)  Throughput (maximizing CPU utilization)  Latency (minimizing response time)  Efficiency (minimizing overhead)  Additional constraints (I/O bound operations) ◦ Threads are planned on available cores  Preemptively (thread can be removed from a core) ◦ Optimal solution does not exist  A compromise between requirements is established by Martin Kruliš (v1.1)2

 Task Scheduling in Parallel Applications ◦ Completely different problem  Tasks have common objective(s)  Possibly much more information about the tasks and their structure (than OS have about threads) ◦ Task (typical definition)  A portion of work (code + data)  Sufficiently small and indivisible  Typically scheduled non-preemptively  May have dependencies (one task must finish before another task can be executed) by Martin Kruliš (v1.1)3

 Task Scheduling Issues ◦ Task spawning  All tasks are created at the beginning  Task are spawned dynamically by other tasks ◦ Predictable time complexity  # of instructions is fixed/depend on the data ◦ Blocking operations  Computing tasks vs. I/O (disk, net, GPU, …) tasks ◦ Optimization issues  Task dependencies may lead to various orderings  Data produced by a task are used by another task by Martin Kruliš (v1.1)4

 Task Scheduling Strategies ◦ Static Scheduling  When number and length of tasks is predictable  Assigned to the threads at the beginning  Virtually no scheduling overhead (after assignment) ◦ Dynamic Scheduling  When task are spawned ad hoc or their length is unpredictable and varying greatly  Oversubscription – much more tasks than threads  The task-to-thread assignment may not be determined directly (when the task is created) and it may change in time by Martin Kruliš (v1.1)5

 Scheduling Algorithms ◦ Many different approaches that are suitable for different specific scenarios ◦ Global task queue  Threads atomically pop tasks (or push tasks)  The queue may become a bottleneck ◦ Private task queues per thread  Each thread process/spawns its own tasks  What should thread do, when its queue is empty? ◦ Combined solutions  Local and shared queues by Martin Kruliš (v1.1)6

 Modern Multicore CPUs by Martin Kruliš (v1.1)7

 Non-Uniform Memory Architecture ◦ First-touch Physical Memory Allocation by Martin Kruliš (v1.1)8

 Memory Coherency Problem ◦ Implemented on cache level ◦ All cores must perceive the same data  MESI Protocol ◦ Each cache line has a special flag  Modified  Exclusive  Shared  Invalid ◦ Memory bus snooping + update rules by Martin Kruliš (v1.1)9

 MESI Protocol by Martin Kruliš (v1.1)10

 Intel Threading Building Blocks Scheduler ◦ Thread pool with private task queues ◦ Local thread gets/inserts tasks from/to the bottom of its queue ◦ Thread steals tasks from the top of the queue by Martin Kruliš (v1.1)11

 Task Dependency Tree ◦ Stack-like local processing leads to DFS tree expansion within one thread  Reduces memory consumption  Improves caching ◦ Queue-like stealing leads to BFS tree expansion by Martin Kruliš (v1.1)12

 Challenges ◦ Maintaining NUMA locality ◦ Efficient cache utilization vs. thread affinity ◦ Avoiding false sharing  Key ideas ◦ Separate requests on different NUMA nodes ◦ Task scheduling consider cache sharing  Related tasks – on cores that are close ◦ Minimize overhead of task stealing by Martin Kruliš (v1.1)13

 Locality Aware Scheduler (Z. Falt) ◦ Key ideas  Queues are associated with cores (not threads)  Threads are bound (by affinity) to NUMA node  Two methods for task spawning  Immediate task – related/follow-up work  Deferred task – unrelated work  Task stealing reflects CPU core distance  NUMA distance – number of NUMA hops  Cache distance – level of shared cache (L1, L2, …) by Martin Kruliš (v1.1)14

 Locality Aware Scheduler (Z. Falt) by Martin Kruliš (v1.1)15

by Martin Kruliš (v1.1)16