Presentation is loading. Please wait.

Presentation is loading. Please wait.

Martin Kruliš 10. 12. 2015 by Martin Kruliš (v1.1)1.

Similar presentations


Presentation on theme: "Martin Kruliš 10. 12. 2015 by Martin Kruliš (v1.1)1."— Presentation transcript:

1 Martin Kruliš 10. 12. 2015 by Martin Kruliš (v1.1)1

2  Thread Scheduling in OS ◦ Operating systems have multiple requirements  Fairness (regarding multiple processes)  Throughput (maximizing CPU utilization)  Latency (minimizing response time)  Efficiency (minimizing overhead)  Additional constraints (I/O bound operations) ◦ Threads are planned on available cores  Preemptively (thread can be removed from a core) ◦ Optimal solution does not exist  A compromise between requirements is established 10. 12. 2015 by Martin Kruliš (v1.1)2

3  Task Scheduling in Parallel Applications ◦ Completely different problem  Tasks have common objective(s)  Possibly much more information about the tasks and their structure (than OS have about threads) ◦ Task (typical definition)  A portion of work (code + data)  Sufficiently small and indivisible  Typically scheduled non-preemptively  May have dependencies (one task must finish before another task can be executed) 10. 12. 2015 by Martin Kruliš (v1.1)3

4  Task Scheduling Issues ◦ Task spawning  All tasks are created at the beginning  Task are spawned dynamically by other tasks ◦ Predictable time complexity  # of instructions is fixed/depend on the data ◦ Blocking operations  Computing tasks vs. I/O (disk, net, GPU, …) tasks ◦ Optimization issues  Task dependencies may lead to various orderings  Data produced by a task are used by another task 10. 12. 2015 by Martin Kruliš (v1.1)4

5  Task Scheduling Strategies ◦ Static Scheduling  When number and length of tasks is predictable  Assigned to the threads at the beginning  Virtually no scheduling overhead (after assignment) ◦ Dynamic Scheduling  When task are spawned ad hoc or their length is unpredictable and varying greatly  Oversubscription – much more tasks than threads  The task-to-thread assignment may not be determined directly (when the task is created) and it may change in time 10. 12. 2015 by Martin Kruliš (v1.1)5

6  Scheduling Algorithms ◦ Many different approaches that are suitable for different specific scenarios ◦ Global task queue  Threads atomically pop tasks (or push tasks)  The queue may become a bottleneck ◦ Private task queues per thread  Each thread process/spawns its own tasks  What should thread do, when its queue is empty? ◦ Combined solutions  Local and shared queues 10. 12. 2015 by Martin Kruliš (v1.1)6

7  Modern Multicore CPUs 10. 12. 2015 by Martin Kruliš (v1.1)7

8  Non-Uniform Memory Architecture ◦ First-touch Physical Memory Allocation 10. 12. 2015 by Martin Kruliš (v1.1)8

9  Memory Coherency Problem ◦ Implemented on cache level ◦ All cores must perceive the same data  MESI Protocol ◦ Each cache line has a special flag  Modified  Exclusive  Shared  Invalid ◦ Memory bus snooping + update rules 10. 12. 2015 by Martin Kruliš (v1.1)9

10  MESI Protocol 10. 12. 2015 by Martin Kruliš (v1.1)10

11  Intel Threading Building Blocks Scheduler ◦ Thread pool with private task queues ◦ Local thread gets/inserts tasks from/to the bottom of its queue ◦ Thread steals tasks from the top of the queue 10. 12. 2015 by Martin Kruliš (v1.1)11

12  Task Dependency Tree ◦ Stack-like local processing leads to DFS tree expansion within one thread  Reduces memory consumption  Improves caching ◦ Queue-like stealing leads to BFS tree expansion 10. 12. 2015 by Martin Kruliš (v1.1)12

13  Challenges ◦ Maintaining NUMA locality ◦ Efficient cache utilization vs. thread affinity ◦ Avoiding false sharing  Key ideas ◦ Separate requests on different NUMA nodes ◦ Task scheduling consider cache sharing  Related tasks – on cores that are close ◦ Minimize overhead of task stealing 10. 12. 2015 by Martin Kruliš (v1.1)13

14  Locality Aware Scheduler (Z. Falt) ◦ Key ideas  Queues are associated with cores (not threads)  Threads are bound (by affinity) to NUMA node  Two methods for task spawning  Immediate task – related/follow-up work  Deferred task – unrelated work  Task stealing reflects CPU core distance  NUMA distance – number of NUMA hops  Cache distance – level of shared cache (L1, L2, …) 10. 12. 2015 by Martin Kruliš (v1.1)14

15  Locality Aware Scheduler (Z. Falt) 10. 12. 2015 by Martin Kruliš (v1.1)15

16 10. 12. 2015 by Martin Kruliš (v1.1)16


Download ppt "Martin Kruliš 10. 12. 2015 by Martin Kruliš (v1.1)1."

Similar presentations


Ads by Google