Process Scheduling III ( 5.4, 5.7) CPE Operating Systems
Multiple-Processor Scheduling (5.4)
Asymmetric vs Symmetric Processing
Asymmetric Multiprocessing Cell Processor
The Cell Processor
EIB – A Ring Bus Topology PPE SPE Supports concurrent transmissions
How the OS can manage the Cell PPE SPE PPE SPE … Job QueueStream Processing
Job Queue vs Stream Processing
Symmetric Multiprocessing (SMP)
The Xenon Processor Xenon is actually a modified PPE unit of the Cell Processor. IBM designed it for Microsoft.
Broadway CPU Single Core 729 MHz
4-Way SMP
Has 750 Million Transistors How does 750 million objects look like?
Garth Brooks in Central Park New York, 1997
750,000 Viewers
Biggest Concert in History?
Rod Stewart
3,500,000 Viewers
Symmetric Multithreading (SMT) Hyperthreading
SMT Architecture Figure 5.8 Each logical CPU has: - Its own registers - Can handle interrupts Similar to Virtual Machines but done at the HW level
CPU Affinity (proc staying at one processor) CORE 1CORE 2 Cache Main Memory Soft Affinity – Process may be migrated to a different processor Hard Affinity – Process is locked to one processor
Load Balancing: Push Migration CORE 1CORE 2 Ready Queue 1Ready Queue 2 Kernel Check load Push Migration
Load Balancing: Pull Migration CORE 1CORE 2 Ready Queue 1Ready Queue 2 Kernel Notify queue empty Pull Migration
CPU 0 Scheduling Domains in the Linux Kernel (v and later) Core 0 Core 1 CPU 1 Core 0 Core 1 Sched Level 0 Level 1 Level 2 Load Balance Load Balance Load Balance Takes CPU Affinity into consideration. It tries to migrate only in the same group
Benefits of Scheduling Domain Keep migration local when possible. Less cache-miss. Can optimize for power saving mode. Schedule only for one domain when possible.
Future trend of Multi-CPU Processors? AMP Asymmetric Multi-Processing Few High-speed Serial Core + Many Slower Parallel Cores
The Cell Processor PPE – Serial Core SPE – Parallel Cores
Turbo Boost Technology (Intel) Core i5, i7 Processors 3-4 Cores 2.26 GHz 2 Cores 3.06 GHz 1 Cores 3.2 GHz Can turn on/off any core and adjust the speed Parallel TasksSequential Tasks
Scheduling for AMP Performance Asymmetry Handling High Number of Cores
Example of Performance Asymmetry Core 0 Core 1 Core 2 Core 3 Performance Index = 2 PI = 1 Scaled Load – Core 0’s ready queue should be twice as long
Handling High Number of Cores None Pre-emptive - Less need to share CPU - Save context switch time Smart Barrier - A thread can tell the OS what resources it is waiting for - OS does not need to schedule the thread until the resources are ready
Job 0 Printf() OS Job 1 Job 2 Message: Waiting for the display
Parallel Processing Exercise New Data [ x, y] = Old Data [x, y] ^ 2 2. New Data [ x, y] = (D [x, y] + D [x-1, y] + D [x+1, y] + D [x, y-1] + D [x, y+1]) / 5