vCAT: Dynamic Cache Management using CAT Virtualization

Slides:



Advertisements
Similar presentations
Virtual Hierarchies to Support Server Consolidation Michael Marty and Mark Hill University of Wisconsin - Madison.
Advertisements

Part IV: Memory Management
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
VCRIB: Virtual Cloud Rule Information Base Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan HotCloud 2012.
Chapter 9 – Real Memory Organization and Management
Multiprocessing Memory Management
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
 2004 Deitel & Associates, Inc. All rights reserved. Chapter 9 – Real Memory Organization and Management Outline 9.1 Introduction 9.2Memory Organization.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
CSE598C Virtual Machines and Their Applications Operating System Support for Virtual Machines Coauthored by Samuel T. King, George W. Dunlap and Peter.
Tanenbaum 8.3 See references
Jakub Szefer, Eric Keller, Ruby B. Lee Jennifer Rexford Princeton University CCS October, 2011 報告人:張逸文.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)
CS533 Concepts of Operating Systems Jonathan Walpole.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Kenichi Kourai (Kyushu Institute of Technology) Takuya Nagata (Kyushu Institute of Technology) A Secure Framework for Monitoring Operating Systems Using.
StimulusCache: Boosting Performance of Chip Multiprocessors with Excess Cache Hyunjin Lee Sangyeun Cho Bruce R. Childers Dept. of Computer Science University.
Lecture 19: Virtual Memory
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Micro-sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems Jeongseob Ahn, Chang Hyun Park, and Jaehyuk.
1 Virtual Machine Memory Access Tracing With Hypervisor Exclusive Cache USENIX ‘07 Pin Lu & Kai Shen Department of Computer Science University of Rochester.
Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Processes and Virtual Memory
Full and Para Virtualization
Chapter 19 Translation Lookaside Buffer
Translation Lookaside Buffer
Is Virtualization ready for End-to-End Application Performance?
Non Contiguous Memory Allocation
Chapter 2 Memory and process management
Presented by Yoon-Soo Lee
Memory Caches & TLB Virtual Memory
Selective Code Compression Scheme for Embedded System
Today How was the midterm review? Lab4 due today.
Chapter 9 – Real Memory Organization and Management
FPGA: Real needs and limits
CSE 153 Design of Operating Systems Winter 2018
Comparison of the Three CPU Schedulers in Xen
RANDOM FILL CACHE ARCHITECTURE
Houssam-Eddine Zahaf, Giuseppe Lipari, Luca Abeni RTNS’17
Energy-Efficient Address Translation
Chapter 8: Main Memory.
OS Virtualization.
Lecture 28: Virtual Memory-Address Translation
Improved schedulability on the ρVEX polymorphic VLIW processor
Evolution in Memory Management Techniques
Reducing Memory Reference Energy with Opportunistic Virtual Caching
Page Replacement.
Haishan Zhu, Mattan Erez
Computer Architecture
Jeongseob Ahn*, Chang Hyun Park‡, Taekyung Heo‡, Jaehyuk Huh‡
Virtual Memory Hardware
Process scheduling Chapter 5.
Translation Lookaside Buffer
Perfctr-Xen: A framework for Performance Counter Virtualization
Multithreaded Programming
Lecture 8: Efficient Address Translation
Virtual Memory: Working Sets
Evolution in memory management techniques
CSE 153 Design of Operating Systems Winter 2019
Evolution in memory management techniques
Evolution in memory management techniques
COMP755 Advanced Operating Systems
Synonyms v.p. x, process A v.p # index Map to same physical page
CSE 542: Operating Systems
Virtual Memory 1 1.
2019 2학기 고급운영체제론 ZebRAM: Comprehensive and Compatible Software Protection Against Rowhammer Attacks 3 # 단국대학교 컴퓨터학과 # 남혜민 # 발표자.
Presentation transcript:

vCAT: Dynamic Cache Management using CAT Virtualization Meng Xu Linh Thi Xuan Phan Hyon-Young Choi Insup Lee Department of Computer and Information Science University of Pennsylvania

Trend: Multicore & Virtualization Cyber physical systems are becoming increasingly complex Require high performance and strong isolation Virtualization on multicore help handle such complexity  Increase performance and reduce cost  Challenge: Harder to achieve timing isolation Collision avoidance Adaptive cruise control Pedestrian detection Infotainment VM1 VM2 VM3 VM4 Hypervisor

Problem: Shared cache interference A task uses the cache to reduce its execution time Concurrent tasks may access the same cache area  Extra cache misses  Increased WCET Intra-VM interference Inter-VM cache interference Tasks: VM1 VM2 1 2 1 2 3 4 3 4 Hypervisor P1 P2 P3 P4 Cache Collision

Existing approach: Static management Statically assign non-overlapped cache areas to tasks (VMs) Pros: Simple to implement Cons: Low cache resource utilization Unused cache area of one task (VM) cannot be reused by another Cons: Not always feasible e.g., when the whole task set does not fit into the cache Tasks: VM1 VM2 1 2 3 4 1 2 3 4 Hypervisor P1 P2 P3 P4

Our approach: Dynamic management Dynamically assign disjoint cache areas to tasks (VMs) Pros: Enable cache reuse  Better utilization of the cache  Running tasks (VMs) can have larger cache areas, and thus smaller WCETs Challenge: Account for the cache overhead in sched. analysis The cache overhead scenario under dynamic cache allocation are more complex than that under static cache allocation Tasks: VM1 VM2 1 2 3 4 1 2 3 4 Hypervisor P1 P2 P3 P4 TODO: A figure to show dynamic management

Our approach: Dynamic management Challenge: How to achieve the efficient dynamic cache management while guaranteeing isolation? Efficiency: The dynamic management should incur small overhead Solution: Hardware-based Increasingly many CPUs support the cache partitioning Benefit: Cache reconfiguration can be done very efficiently Example: Intel processors that support cache partitioning Processor family Number of COTS processors Intel(R) Xeon(R) processor E5 v3 6 out of 48 Intel(R) Xeon(R) processor D 15 out of 15 Intel(R) Xeon(R) processor E3 v4 5 out of 5 Intel(R) Xeon(R) processor E5 v4 117 out of 117 Source: https://github.com/01org/intel-cmt-cat and http://www.intel.com/

Contribution: vCAT vCAT: Dynamic cache management by virtualizing CAT First work that achieves dynamic cache management for tasks in virtualization systems on commodity multicore hardware Achieve strong shared cache isolation for tasks and VMs Support the dynamic cache management for tasks and VMs OS in VM can dynamically allocate cache partitions for its tasks Hypervisor can dynamically reconfigure cache partitions for VMs Support cache sharing among best-effort VMs and tasks

Outline Introduction Background: Intel CAT Design & Implementation Evaluation

Intel Cache Allocation Technology (CAT) Divide the shared cache into α partitions (α = 20) Similar to way-based cache partitioning Provide two types of model-specific registers Each core has a PQR register K Class of Service (COS) registers shared by all cores (K = 4) COS register ID Reserved 63 31 9 PQR Cache Bit Mask COS 63 20 Shared cache

Intel Cache Allocation Technology (CAT) Divide the shared cache into α partitions (α = 20) Similar to way-based cache partitioning Provide two types of model-specific registers Each core has a PQR register K Class of Service (COS) registers shared by all cores (K = 4) Configure cache partitions for a core Step 1: Set the cache bit mask of the COS Step 2: Link the core with a COS by setting PQR PQR 1 COS register ID Reserved 63 31 9 COS 0x0000F Cache Bit Mask 63 20 Shared cache

Intel CAT: Software support Xen hypervisor supports Intel CAT System operators can allocate cache partitions for VMs only Pros: Mitigate the interference among VMs Cons: Do not provide strong isolation among VMs Cons: Do not allow a VM to manage partitions for its tasks Tasks in the same VM can still interfere each other Cons: Only support a limited number of VMs with different cache-partition settings e.g., the number of VMs with different cache-partition settings supported by Xen is ≤ 4 on our machine (Intel Xeon 2618L v3 processor).

Outline Introduction Background: Intel CAT Design & Implementation Evaluation

Goals Dynamically control cache allocations for tasks and VMs Each VM should control the cache allocation for its own tasks The hypervisor should control the cache allocation for the VMs Preserve the virtualization abstraction layer Physical resources should not be exposed to VMs Guarantee cache isolation among tasks and VMs Tasks should not interfere with each other after the reconfiguration

Dynamic cache allocation for tasks To modify the cache configuration of a task, VM needs to modify the cache control registers BUT, cache control registers are only available to the hypervisor One possible approach: Expose the registers to VMs VM Modify COS Hypervisor Core P2 COS register 0xF Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Dynamic cache allocation for tasks To modify the cache configuration of a task, VM needs to modify the cache control registers BUT, cache control registers are only available to the hypervisor One possible approach: Expose the registers to VMs Problem: Potential cache interference among VMs e.g., a VM may overwrite the hypervisor’s allocation decision VM Modify COS Hypervisor Core P2 COS register 0xF00 0xF Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Dynamic cache allocation for tasks To modify the cache configuration of a task, VM needs to modify the cache control registers BUT, cache control registers are only available to the hypervisor One possible approach: Expose the registers to VMs Problem: Potential cache interference among VMs e.g., a VM may overwrite the hypervisor’s allocation decision VM Modify COS Hypervisor Validate the operation Core P2 COS register 0xF00 0xF Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Dynamic cache allocation for tasks To modify the cache configuration of a task, VM needs to modify the cache control registers BUT, cache control registers are only available to the hypervisor One possible approach: Expose the registers to VMs Problem: Potential cache interference among VMs e.g., a VM may overwrite the hypervisor’s allocation decision Problem: Hypervisor needs to notify VMs for any changes VM Modify COS Hypervisor Validate the operation Core P2 COS register 0xF Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14

vCAT: Key insight Virtualize cache partitions and expose virtual caches to VMs Hypervisor assigns virtual and physical cache partitions to VMs VM controls the allocation of its assigned virtual partitions to tasks Hypervisor translates VM’s operations on virtual partitions to operations on the physical partitions VM VM operates on its virtual cache Virtual cache Hypervisor Translate the operation Core P2 COS register 0xF0 Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Challenge 1: No control for cache hit requests A task’s contents stay in cache until they are evicted Problem: A task can access its content in its previous partitions via cache hits  interfere with another task Not explicitly documented in Intel’s SDM We confirmed this limitation with experiments (available in the paper) Tasks hit Core Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Collision

Solution: Cache flushing Task’s content in the previous partitions is no longer valid Approach 1: Flush for each memory address of the task Pros: Not affect the other tasks’ cache content Cons: Slow when a task’s working set size is large (> 8.46MB) Approach 2: Flush the entire cache Pros: Efficient when a task’s working set size is large (> 8.46MB) Cons: Flush the other tasks’ cache content as well vCAT provides both approaches to system operators Discussion of the tradeoffs and flushing heuristics are in the paper Tasks Core Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Challenge 2: Contiguous allocation constraint Unallocated partitions may NOT be contiguous Fragmentation of cache partitions in dynamic allocation Low cache resource utilization VM 3 VM 1 VM 2 Virtual cache Invalid! Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Solution: Partition defragmentation Rearrange the partitions to form contiguous partitions Hypervisor rearranges physical cache partitions for VMs VM rearranges virtual cache partitions for tasks VM 3 VM 1 VM 2 Virtual cache Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14

vCAT: Design summary Introduce virtual cache partitions Enable the VM to control the cache allocation for its tasks without breaking the virtualization abstraction Flush the cache when the cache partitions of tasks (VMs) are changed Guarantee cache isolation among tasks and VMs in dynamic cache management Defragment non-contiguous cache partitions Enables better cache utilization Refer to the paper for technical details and other design considerations e.g., how to allocate and de-allocate partitions for tasks and VMs e.g., how to support an arbitrary number of tasks and VMs with different cache-partition settings

Implementation Hardware: Intel Xeon 2618L v3 processor Design works for any processors that support both virtualization and hardware-based cache partitioning Implementation based on Xen 4.8 and LITMUSRT 2015.1 LITMUSRT: Linux Testbed for Multiprocessor Scheduling in Real-Time Systems 5K Line of Code (LoC) in total Hypervisor (Xen): 3264 LoC VM (LITMUSRT): 2086 LoC Flexible to add new cache management policies

Outline Introduction Background: Intel CAT Design Evaluation

vCAT Evaluation: Goals How much overhead is introduced by vCAT? How much WCET reduction is achieved through cache isolation? How much real-time performance improvement vCAT enables? Static management vs. No management Dynamic management vs. Static management

vCAT Evaluation: Goals How much overhead is introduced by vCAT? How much WCET reduction is achieved through cache isolation? How much real-time performance improvement vCAT enables? Static management vs. No management Dynamic management vs. Static management The rest of the evaluation is available in the paper

vCAT Evaluation: Goals How much overhead is introduced by vCAT? How much WCET reduction is achieved through cache isolation? How much real-time performance improvement vCAT enables? Static management vs. No management Dynamic management vs. Static management

vCAT run-time overhead Static cache management Overhead occurs only when a task/VM is created Negligible overhead: ≤ 1.12us Dynamic cache management Overhead occurs whenever the partitions of a task/VM are changed Reasonably small overhead: ≤ 27.1 ms Value depends on the workload’s working set size (WSS) Overhead = min{3.23 ms/MB × WSS, 27.1ms} More details can be found in the paper Computation of the overhead value based on the WSS Experiments that show the factors that contribute to the overhead

vCAT Evaluation: Goals How much overhead is introduced by vCAT? How much WCET reduction is achieved through cache isolation? How much real-time performance improvement vCAT enables? Static management vs. No management Dynamic management vs. Static management

Static management: Evaluation setup PARSEC benchmarks Convert to LITMUSRT compatible real-time tasks Randomly generate real-time parameters for the benchmarks to generate real-time tasks Benchmark VM Pollute VM PARSEC benchmarks … Cache-intensive task VM VP1 VP2 VP3 VP4 Pin to core Hypervisor Core P1 P2 P3 P4 Cache

Static management vs. No management Static management improves system utilization significantly Fraction of schedulable task sets Improve system utilization by 1.0 / 0.3 = 3.3x VCPU utilization No management Static management Real-time performance of streamcluster benchmark

Static management vs. No management The more cache sensitive the workload is, the more performance benefit is achieved 33

vCAT Evaluation: Goals How much overhead is introduced by vCAT? How much WCET reduction is achieved through cache isolation? How much real-time performance improvement vCAT enables? Static management vs. No management Dynamic management vs. Static management

Dynamic management: Evaluation setup Create the workloads that have dynamic cache demand Dual-mode tasks: Switch from mode 1 to mode 2 after 1min Type 1: Task increases its utilization by decreasing its period Type 2: Task decreases its utilization by increasing its period Benchmark VM Pollute VM Type 1 dual-mode task … … Type 2 dual-mode task VM Cache-intensive task VP1 VP2 VP3 VP4 Pin to core Hypervisor Core P1 P2 P3 P4 Cache

Dynamic management vs. Static management Dynamic outperforms static significantly Fraction of schedulable task sets Improve system utilization by 0.6/0.2 = 3x VCPU utilization Static management Dynamic management

Conclusion vCAT: A dynamic cache management framework for virtualization systems using CAT virtualization Provide strong isolations among tasks and VMs Support both static and dynamic cache allocations for both real-time tasks and best-effort tasks Evaluation shows that dynamic management substantially improves schedulability compared to static management Future work Develop more sophisticated cache resource allocation policies for tasks and VMs in virtualization systems Apply vCAT to real systems, e.g., automotive systems and cloud computing