Download presentation
Presentation is loading. Please wait.
Published byJustin Blankenship Modified over 9 years ago
1
Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc.
2
Outline Introduction Design ◦ Isolation Mechanisms ◦ Controllers Evaluation Conclusion
3
Motivation Average server utilization in most datacenter is low, ranging between 10%~50%. ◦ Difficult to consolidate the latency-critical services on a subset of highly utilized servers. Increase the server utilization by launching best-effort tasks on the same server with a latency-critical job.
4
Motivation(Cont.) Previous works tend to protect LC workloads, but reduce the opportunities for higher utilization through co-location.
5
Goal Eliminate SLO violations at all levels of load for the LC job while maximizing the throughput for BE tasks.
6
Heracles A real-time, feedback-based controller ◦ Enables the safe co-location of best-effort(BE) tasks alongside a latency-critical(LC) service. ◦ Ensures that LC jobs meet their target while maximizing the resources given to BE tasks.
7
Heracles(Cont.) ◦ Four hardware and software isolation mechanisms. Hardware: shared cache partitioning, fine-grained power/frequency setting. Software: core isolation, network traffic control.
8
Isolation Mechanisms(Soft) Core isolation ◦ Pin workload to a set of core using cpuset cgroups. ◦ Speed of (re)allocation: tens of milliseconds. Network traffic ◦ Limit the outgoing bandwidth of BE tasks using Linux traffic control. ◦ No limit on LC job. ◦ Take effect in less than hundreds of milliseconds.
9
Isolation Mechanisms(Hard) LLC isolation ◦ Cache Allocation Technology(CAT) in recent Intel chip. Use way-partitioning to define non-overlapping partitions on LLC. Take effect in a few milliseconds. ◦ Implement software monitor to track the bandwidth usage of LC and BE jobs. Scale down the # of cores for BE jobs if LC jobs does not receive sufficient bandwidth.
10
Isolation Mechanisms(Hard)(Cont.) Power isolation ◦ CPU frequency monitoring, Running Average Power Limit(RAPL), and per-core DVFS. ◦ Take effect within a few milliseconds.
11
Design Approach An optimization problem ◦ Maximize utilization with the constraint that the SLO must be met. Heracles ◦ decomposes the high-dimensional optimization problem into many smaller and independent problem. Decoupling interference sources. ◦ Monitors latency, latency slack, and load. Adjust the BE job allocation.
12
System Diagram
13
High-level Controller
14
Core & Memory Sub-controller
15
Max Load under SLO
16
Power and Network Sub-controller
17
Evaluation Two sets of experiments ◦ Co-locates LC applications with BE tasks on a single server. ◦ Measuring end-to-end latency of Websearch on tens of servers. BE tasks are also running. Effective Machine Utilization(EMU) ◦ LC throughput + BE throughput
18
Workloads Three Google production LC workloads: ◦ websearch ◦ ml_cluster Real-time text clustering using machine learning ◦ memkeyval In-memory key-value store Run LC workloads with benchmarks that stress a single shared resource. ◦ Stream-LLC, Stream-DRAM, cpu-pwr, iperf, brain, and streetview.
19
Latency of LC Applications
20
EMU
21
Shared Resource Utilization
22
Websearch in Cluster
23
Conclusion Heracles ◦ a heuristic feedback-based system that manages four isolation mechanisms to enable a latency-critical workload to be co-located with batch jobs without SLO violations. ◦ Evaluation on real hardware demonstrates an average utilization of 90% across all evaluated scenarios without any SLO violations for the latency-critical job.
24
Interference Analysis Three Google production LC workloads: ◦ websearch ◦ ml_cluster Real-time text clustering using machine learning ◦ memkeyval In-memory key-value store Run LC workloads with synthetic benchmarks that stress each resource in isolation.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.