Resource Aware Scheduler – Initial Results Tomer Morad, Noam Shalev, Avinoam Kolodny, Idit Keidar, Uri Weiser May 8, 2013
Main Message: Balance Systems to Avoid Bottlenecks Motivation Different programs have different resource requirements: # of cores, cache, memory bandwidth, energy, branch prediction, etc. Hence, no computing system can be balanced Heterogeneous systems are even worse (unbalanced) Contention on resources wastes energy and usually degrades performance (for example: cache) Proposal: dynamically tune the workload to the (dynamically tuned) hardware in order to minimize the contention on the resources by balancing the system The OS scheduler can do this
CMP Shared Resource Effects Examples for shared resources: last level cache, memory bus, network bandwidth, disk bandwidth, etc. There are three effects observed when several threads access a shared resource Wasted Peripheral Energy (⬆ Energy) Observed when adding additional threads in a presence of a bottleneck For example: many floating point programs running in parallel in a Niagara processor (many cores with a shared floating point unit) Collisions (⬆ Energy, ⬇ Throughput) Observed when several threads access a shared resources, and the requests are queued In the example above, the service to the requests is slower Destructive Interference (⬆ Energy, ⬇ Throughput) Observed when threads destroy each others’ caches
Resource Aware OS scheduler Main Components: Sampling: Sample the resource usage of the tasks that have run so that the information will be available for the prediction stage Prediction: Predict each task’s resource usage based on the past resource usage Scheduling: Schedule only tasks that the system has enough resources to run (idle cores are OK) Implemented in Linux 3.2.0 Use performance counters for sampling
Memory Bandwidth – An Example Core count is increasing Core frequency does not decrease Pin count is not increasing Chip bandwidth demand is increasing, but Chip bandwidth to memory is not increasing We are approaching the memory bandwidth wall! No real remedies in the near future
Memory Bus Usage
SPEC-CPU2006 on the baseline scheduler Instance Instances Instances Instances
BW hungry program – Initial results Implemented a resource aware scheduler in the Linux 3.2.0 BW hungry program 5.58 sec, 132 Joules When run x4 times sequentially 22.3 sec, 526 Joules When run x4 times in parallel (4 core i5-2500) 27.86 sec (+25%), 1368 Joules (+160%) – over sequential Using the new scheduler with memory bandwidth limitation enforcement 23.71 sec (+6%), 569 Joules (+8%) – over sequential Baseline scheduler Vs Resource Aware Scheduler 17.5% speedup, 58% energy reduction Disclaimers: (a) Initial results; (b) energy sampled using performance counter (MSR_PKG_ENERGY_STATUS) that samples the power used by the package. Consistent results with Wattsup
SPEC-CPU2006 – Initial results Each run included four instances of identical SPEC-CPU2006 benchmarks Average: +3.3% throughput, -3.5% energy Notable results: 429: +106% throughput, -43% energy 473: +3.3% throughput, -13% energy Out of 25 benchmarks 16 consumed less energy (9 consumed more) 10 ran faster (11 slower) Other results Energy efficiency improved on average by 11% 15 benchmarks’ energy efficiency improved by 20% on average 10 benchmarks’ energy efficiency degraded by 3% on average Soft limit for the bandwidth anticipated to improve the results