vCAT: Dynamic Cache Management using CAT Virtualization Meng Xu Linh Thi Xuan Phan Hyon-Young Choi Insup Lee Department of Computer and Information Science University of Pennsylvania
Trend: Multicore & Virtualization Cyber physical systems are becoming increasingly complex Require high performance and strong isolation Virtualization on multicore help handle such complexity Increase performance and reduce cost Challenge: Harder to achieve timing isolation Collision avoidance Adaptive cruise control Pedestrian detection Infotainment VM1 VM2 VM3 VM4 Hypervisor
Problem: Shared cache interference A task uses the cache to reduce its execution time Concurrent tasks may access the same cache area Extra cache misses Increased WCET Intra-VM interference Inter-VM cache interference Tasks: VM1 VM2 1 2 1 2 3 4 3 4 Hypervisor P1 P2 P3 P4 Cache Collision
Existing approach: Static management Statically assign non-overlapped cache areas to tasks (VMs) Pros: Simple to implement Cons: Low cache resource utilization Unused cache area of one task (VM) cannot be reused by another Cons: Not always feasible e.g., when the whole task set does not fit into the cache Tasks: VM1 VM2 1 2 3 4 1 2 3 4 Hypervisor P1 P2 P3 P4
Our approach: Dynamic management Dynamically assign disjoint cache areas to tasks (VMs) Pros: Enable cache reuse Better utilization of the cache Running tasks (VMs) can have larger cache areas, and thus smaller WCETs Challenge: Account for the cache overhead in sched. analysis The cache overhead scenario under dynamic cache allocation are more complex than that under static cache allocation Tasks: VM1 VM2 1 2 3 4 1 2 3 4 Hypervisor P1 P2 P3 P4 TODO: A figure to show dynamic management
Our approach: Dynamic management Challenge: How to achieve the efficient dynamic cache management while guaranteeing isolation? Efficiency: The dynamic management should incur small overhead Solution: Hardware-based Increasingly many CPUs support the cache partitioning Benefit: Cache reconfiguration can be done very efficiently Example: Intel processors that support cache partitioning Processor family Number of COTS processors Intel(R) Xeon(R) processor E5 v3 6 out of 48 Intel(R) Xeon(R) processor D 15 out of 15 Intel(R) Xeon(R) processor E3 v4 5 out of 5 Intel(R) Xeon(R) processor E5 v4 117 out of 117 Source: https://github.com/01org/intel-cmt-cat and http://www.intel.com/
Contribution: vCAT vCAT: Dynamic cache management by virtualizing CAT First work that achieves dynamic cache management for tasks in virtualization systems on commodity multicore hardware Achieve strong shared cache isolation for tasks and VMs Support the dynamic cache management for tasks and VMs OS in VM can dynamically allocate cache partitions for its tasks Hypervisor can dynamically reconfigure cache partitions for VMs Support cache sharing among best-effort VMs and tasks
Outline Introduction Background: Intel CAT Design & Implementation Evaluation
Intel Cache Allocation Technology (CAT) Divide the shared cache into α partitions (α = 20) Similar to way-based cache partitioning Provide two types of model-specific registers Each core has a PQR register K Class of Service (COS) registers shared by all cores (K = 4) COS register ID Reserved 63 31 9 PQR Cache Bit Mask COS 63 20 Shared cache
Intel Cache Allocation Technology (CAT) Divide the shared cache into α partitions (α = 20) Similar to way-based cache partitioning Provide two types of model-specific registers Each core has a PQR register K Class of Service (COS) registers shared by all cores (K = 4) Configure cache partitions for a core Step 1: Set the cache bit mask of the COS Step 2: Link the core with a COS by setting PQR PQR 1 COS register ID Reserved 63 31 9 COS 0x0000F Cache Bit Mask 63 20 Shared cache
Intel CAT: Software support Xen hypervisor supports Intel CAT System operators can allocate cache partitions for VMs only Pros: Mitigate the interference among VMs Cons: Do not provide strong isolation among VMs Cons: Do not allow a VM to manage partitions for its tasks Tasks in the same VM can still interfere each other Cons: Only support a limited number of VMs with different cache-partition settings e.g., the number of VMs with different cache-partition settings supported by Xen is ≤ 4 on our machine (Intel Xeon 2618L v3 processor).
Outline Introduction Background: Intel CAT Design & Implementation Evaluation
Goals Dynamically control cache allocations for tasks and VMs Each VM should control the cache allocation for its own tasks The hypervisor should control the cache allocation for the VMs Preserve the virtualization abstraction layer Physical resources should not be exposed to VMs Guarantee cache isolation among tasks and VMs Tasks should not interfere with each other after the reconfiguration
Dynamic cache allocation for tasks To modify the cache configuration of a task, VM needs to modify the cache control registers BUT, cache control registers are only available to the hypervisor One possible approach: Expose the registers to VMs VM Modify COS Hypervisor Core P2 COS register 0xF Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Dynamic cache allocation for tasks To modify the cache configuration of a task, VM needs to modify the cache control registers BUT, cache control registers are only available to the hypervisor One possible approach: Expose the registers to VMs Problem: Potential cache interference among VMs e.g., a VM may overwrite the hypervisor’s allocation decision VM Modify COS Hypervisor Core P2 COS register 0xF00 0xF Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Dynamic cache allocation for tasks To modify the cache configuration of a task, VM needs to modify the cache control registers BUT, cache control registers are only available to the hypervisor One possible approach: Expose the registers to VMs Problem: Potential cache interference among VMs e.g., a VM may overwrite the hypervisor’s allocation decision VM Modify COS Hypervisor Validate the operation Core P2 COS register 0xF00 0xF Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Dynamic cache allocation for tasks To modify the cache configuration of a task, VM needs to modify the cache control registers BUT, cache control registers are only available to the hypervisor One possible approach: Expose the registers to VMs Problem: Potential cache interference among VMs e.g., a VM may overwrite the hypervisor’s allocation decision Problem: Hypervisor needs to notify VMs for any changes VM Modify COS Hypervisor Validate the operation Core P2 COS register 0xF Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14
vCAT: Key insight Virtualize cache partitions and expose virtual caches to VMs Hypervisor assigns virtual and physical cache partitions to VMs VM controls the allocation of its assigned virtual partitions to tasks Hypervisor translates VM’s operations on virtual partitions to operations on the physical partitions VM VM operates on its virtual cache Virtual cache Hypervisor Translate the operation Core P2 COS register 0xF0 Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Challenge 1: No control for cache hit requests A task’s contents stay in cache until they are evicted Problem: A task can access its content in its previous partitions via cache hits interfere with another task Not explicitly documented in Intel’s SDM We confirmed this limitation with experiments (available in the paper) Tasks hit Core Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Collision
Solution: Cache flushing Task’s content in the previous partitions is no longer valid Approach 1: Flush for each memory address of the task Pros: Not affect the other tasks’ cache content Cons: Slow when a task’s working set size is large (> 8.46MB) Approach 2: Flush the entire cache Pros: Efficient when a task’s working set size is large (> 8.46MB) Cons: Flush the other tasks’ cache content as well vCAT provides both approaches to system operators Discussion of the tradeoffs and flushing heuristics are in the paper Tasks Core Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Challenge 2: Contiguous allocation constraint Unallocated partitions may NOT be contiguous Fragmentation of cache partitions in dynamic allocation Low cache resource utilization VM 3 VM 1 VM 2 Virtual cache Invalid! Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Solution: Partition defragmentation Rearrange the partitions to form contiguous partitions Hypervisor rearranges physical cache partitions for VMs VM rearranges virtual cache partitions for tasks VM 3 VM 1 VM 2 Virtual cache Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14
vCAT: Design summary Introduce virtual cache partitions Enable the VM to control the cache allocation for its tasks without breaking the virtualization abstraction Flush the cache when the cache partitions of tasks (VMs) are changed Guarantee cache isolation among tasks and VMs in dynamic cache management Defragment non-contiguous cache partitions Enables better cache utilization Refer to the paper for technical details and other design considerations e.g., how to allocate and de-allocate partitions for tasks and VMs e.g., how to support an arbitrary number of tasks and VMs with different cache-partition settings
Implementation Hardware: Intel Xeon 2618L v3 processor Design works for any processors that support both virtualization and hardware-based cache partitioning Implementation based on Xen 4.8 and LITMUSRT 2015.1 LITMUSRT: Linux Testbed for Multiprocessor Scheduling in Real-Time Systems 5K Line of Code (LoC) in total Hypervisor (Xen): 3264 LoC VM (LITMUSRT): 2086 LoC Flexible to add new cache management policies
Outline Introduction Background: Intel CAT Design Evaluation
vCAT Evaluation: Goals How much overhead is introduced by vCAT? How much WCET reduction is achieved through cache isolation? How much real-time performance improvement vCAT enables? Static management vs. No management Dynamic management vs. Static management
vCAT Evaluation: Goals How much overhead is introduced by vCAT? How much WCET reduction is achieved through cache isolation? How much real-time performance improvement vCAT enables? Static management vs. No management Dynamic management vs. Static management The rest of the evaluation is available in the paper
vCAT Evaluation: Goals How much overhead is introduced by vCAT? How much WCET reduction is achieved through cache isolation? How much real-time performance improvement vCAT enables? Static management vs. No management Dynamic management vs. Static management
vCAT run-time overhead Static cache management Overhead occurs only when a task/VM is created Negligible overhead: ≤ 1.12us Dynamic cache management Overhead occurs whenever the partitions of a task/VM are changed Reasonably small overhead: ≤ 27.1 ms Value depends on the workload’s working set size (WSS) Overhead = min{3.23 ms/MB × WSS, 27.1ms} More details can be found in the paper Computation of the overhead value based on the WSS Experiments that show the factors that contribute to the overhead
vCAT Evaluation: Goals How much overhead is introduced by vCAT? How much WCET reduction is achieved through cache isolation? How much real-time performance improvement vCAT enables? Static management vs. No management Dynamic management vs. Static management
Static management: Evaluation setup PARSEC benchmarks Convert to LITMUSRT compatible real-time tasks Randomly generate real-time parameters for the benchmarks to generate real-time tasks Benchmark VM Pollute VM PARSEC benchmarks … Cache-intensive task VM VP1 VP2 VP3 VP4 Pin to core Hypervisor Core P1 P2 P3 P4 Cache
Static management vs. No management Static management improves system utilization significantly Fraction of schedulable task sets Improve system utilization by 1.0 / 0.3 = 3.3x VCPU utilization No management Static management Real-time performance of streamcluster benchmark
Static management vs. No management The more cache sensitive the workload is, the more performance benefit is achieved 33
vCAT Evaluation: Goals How much overhead is introduced by vCAT? How much WCET reduction is achieved through cache isolation? How much real-time performance improvement vCAT enables? Static management vs. No management Dynamic management vs. Static management
Dynamic management: Evaluation setup Create the workloads that have dynamic cache demand Dual-mode tasks: Switch from mode 1 to mode 2 after 1min Type 1: Task increases its utilization by decreasing its period Type 2: Task decreases its utilization by increasing its period Benchmark VM Pollute VM Type 1 dual-mode task … … Type 2 dual-mode task VM Cache-intensive task VP1 VP2 VP3 VP4 Pin to core Hypervisor Core P1 P2 P3 P4 Cache
Dynamic management vs. Static management Dynamic outperforms static significantly Fraction of schedulable task sets Improve system utilization by 0.6/0.2 = 3x VCPU utilization Static management Dynamic management
Conclusion vCAT: A dynamic cache management framework for virtualization systems using CAT virtualization Provide strong isolations among tasks and VMs Support both static and dynamic cache allocations for both real-time tasks and best-effort tasks Evaluation shows that dynamic management substantially improves schedulability compared to static management Future work Develop more sophisticated cache resource allocation policies for tasks and VMs in virtualization systems Apply vCAT to real systems, e.g., automotive systems and cloud computing