Presentation is loading. Please wait.

Presentation is loading. Please wait.

Haishan Zhu, Mattan Erez

Similar presentations


Presentation on theme: "Haishan Zhu, Mattan Erez"β€” Presentation transcript:

1 Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems
Haishan Zhu, Mattan Erez Department of Electrical and Computer Engineering The University of Texas at Austin

2 Ubiquitous Latency-Sensitive Tasks in the Cloud
Data centers suffer from low utilization Latency sensitive tasks have stringent QoS goals Literatures report averages between 10% to 50%

3 Overprovisioning of hardware resources
Diurnal traffic fluctuation Nondeterministic user behavior [Maier, SIGCOMM 2009]

4 Addressing Low Utilization in Data Centers
Throttle / shutdown resources to save power [Lo, ISCA 2014], [Hsu, HPCA 2015] Latency constraints, data partitioning, efficiency Not the best use of compute resources Backfill with throughput-oriented tasks [Mars, MICRO 2011], [Lo, ISCA 2015] Average and percentile performance degradation Performance variation of latency-critical tasks

5 Performance Variation Causes Wasted Resources
Performance variation of latency-sensitive tasks quantifies the amount of resources wasted

6 Dirigent: a software runtime mechanism
Assume compute resources shared between: Latency-sensitive (foreground or FG) tasks Throughput-oriented (background or BG) tasks Dynamically reconfigure hardware resources at fine time granularity FG tasks finish just before the deadline Spared resources are fully used to maximize BG throughput Offline Runtime Latency-Critical Task Profiler Execution Time Predictor Performance Controller

7 Execution Time Prediction
Offline Profiling S1 S2 S3 Time Online Average Execution Time Penalty S1 S2 S3 Time Execution Time Prediction Time S1 S2 S3 Δ𝑇:π‘‘π‘–π‘šπ‘’ π‘‘π‘’π‘Ÿπ‘Žπ‘‘π‘–π‘œπ‘› π‘œπ‘“ π‘Ž π‘ π‘Žπ‘šπ‘π‘™π‘’ π‘ π‘’π‘”π‘šπ‘’π‘›π‘‘ Sn:π‘π‘Ÿπ‘œπ‘”π‘Ÿπ‘Žπ‘š π‘π‘Ÿπ‘œπ‘”π‘Ÿπ‘’π‘ π‘  π‘‘π‘’π‘Ÿπ‘–π‘›π‘” π‘ π‘’π‘”π‘šπ‘’π‘›π‘‘ 𝑛 𝛼 𝑖 = π‘π‘Ÿπ‘œπ‘“π‘–π‘™π‘’π‘‘_π‘π‘Ÿπ‘œπ‘”π‘Ÿπ‘’π‘ π‘  𝑖 π‘šπ‘’π‘Žπ‘ π‘’π‘Ÿπ‘’π‘‘_π‘π‘Ÿπ‘œπ‘”π‘Ÿπ‘’π‘ π‘  𝑖 𝑃 𝑖 : π‘Žπ‘£π‘’π‘Ÿπ‘Žπ‘”π‘’ π‘‘π‘–π‘šπ‘’ π‘π‘’π‘›π‘Žπ‘™π‘‘π‘¦ π‘œπ‘“ π‘‘β„Žπ‘’ 𝑖 π‘‘β„Ž π‘ π‘’π‘”π‘šπ‘’π‘›π‘‘ 𝑀𝐴 βˆ™ :π‘šπ‘œπ‘£π‘–π‘›π‘” π‘Žπ‘£π‘’π‘Ÿπ‘Žπ‘”π‘’

8 Dirigent: a software runtime mechanism
Offline Runtime Latency-Critical Task Profiler Execution Time Predictor Performance Controller

9 Performance Controller – Fine Time Scale
Use dynamic per-core DVFS and task pausing Quickly respond to contention changes Pause most intrusive BG Continue paused BG Use slowest speed for BG Predict exec time Speed up BG cores Use fastest speed for FG Slow down FG cores Conservative scaling when multiple FGs present Each FG task may exhibit different slowdown Any throttling decisions impact all tasks

10 Performance Controller – Coarse Time Scale
Cache partitioning Large caches are slow to warm up due to cache inertia Based on three heuristics: Strong correlation between FG exec time and LLC misses Increase FG LLC partition size Increased partition size doesn’t improve LLC misses Decrease FG LLC partition size BG is heavily throttled and shows low core utilization

11 Intel Cache Allocation Technology (CAT)
Implementation Per-Core DVFS Linux CPUFreq Governors Dirigent uses 5 frequency steps (1.2– 2.0 GHz) Intel Cache Allocation Technology (CAT) Machine Specific Registers (MSRs) Use β€œWay Masks” to specify partition sizes Data collected from performance counters

12 Machine Configuration
Experimental Setup Machine Configuration Intel Xeon E5-2618L v GHz 16 GB DDR4 Memory Linux at runlevel S Workload Type Name Description FG Bodytrack, Ferret, Fluidanimate, Raytrace, Streamcluster PARSEC 2.0 Single BG Bwaves, PCA, RS SPEC 2006 and MLPack Rotate BG Namd, Soplex, Libquantum, Lbm SPEC 2006

13 Execution Time Predictor Accuracy
50 consecutive executions of raytrace (FG) and RS (BG) Predictions are made halfway through each FG execution

14 Execution Time Predictor Accuracy
Dirigent predictor achieves an error rate of 2.4% across all 35 workload mixes

15 Methodology Five Configurations Definition of deadline
Baseline: Unmanaged resource contention StaticFreq: Static frequency setting prioritizing FG tasks StaticBoth: Set to the best static frequency and partitioning DirigentFreq: Uses fine time scale controller only Dirigent: Full Dirigent implementation Definition of deadline πœ‡ π‘π‘Žπ‘ π‘’π‘™π‘–π‘›π‘’ +0.3βˆ™ 𝜎 π‘π‘Žπ‘ π‘’π‘™π‘–π‘›π‘’

16 Tradeoff between FG Throughput and BG Performance
Precise control over the range of target deadlines Convert FG time slack to BG performance Consistent QoS satisfaction rate

17 Dirigent Results: Single FG Workload

18 Dirigent Results: Single FG Workload

19 Dirigent Results: Single FG Workload

20 Dirigent Results: Single FG Workload
Dirigent can finish almost all FG tasks while achieving 91% of BG task throughput of Baseline configuration

21 Dirigent Results: Concurrent FG Workloads
1x x x 1x x x 1x x x 1x x x 1x x x FG Bodytrack Ferret Fluidanimate Raytrace Streamcluster BG Libquantum + Soplex Bwaves Lbm + Soplex RS Lbm + Namd Multiple FG tasks introduces two complications: Fine time scale controller has to be conservative Higher contention within FG partition

22 Dirigent Results: Concurrent FG Workloads
Despite the complications caused by concurrent FG workloads, Dirigent achieves 98% of QoS goal and 87% of BG throughput

23 Conclusion Performance contention from collocation causes variation of latency-critical tasks, even when batch tasks are carefully chosen Minimizing such variation offers significant opportunities in improving BG throughput without QoS compromises Dirigent is a lightweight software runtime system that reconfigures resources at fine time granularity Give FG tasks just enough resources to finish on time Maximize BG throughput by fully using spare resources Evaluation on real machine shows 85% reduction in standard deviation of execution time 30% better BG throughput compared to coarse time scale approach, yet higher FG completion rate

24 Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems
Haishan Zhu, Mattan Erez Department of Electrical and Computer Engineering The University of Texas at Austin


Download ppt "Haishan Zhu, Mattan Erez"

Similar presentations


Ads by Google