Memory Resource Management in VMware ESX Server

Slides:



Advertisements
Similar presentations
CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Memory Resource Management in VMware ESX Server Carl A. Waldspurger VMware, Inc. Appears in SOSDI.
Advertisements

Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.
Paging: Design Issues. Readings r Silbershatz et al: ,
VMWare ESX Memory Management Dr. Sanjay P. Ahuja, Ph.D FIS Distinguished Professor of Computer Science School of Computing, UNF.
Memory Resource Management in Vmware ESX Server Author: Carl A. Waldspurger Vmware, Inc. Present: Jun Tao.
Difference Engine: Harnessing Memory Redundancy in Virtual Machines by Diwaker Gupta et al. presented by Jonathan Berkhahn.
CSE 598B: Self-* Systems Memory Resource Management in VMware ESX Server by Carl A. Waldspurger Presented by: Arjun R. Nath (slide material adapted from.
Chapter 101 Cleaning Policy When should a modified page be written out to disk?  Demand cleaning write page out only when its frame has been selected.
Segmentation and Paging Considerations
Virtual Machines What Why How Powerpoint?. What is a Virtual Machine? A Piece of software that emulates hardware.  Might emulate the I/O devices  Might.
Operating System Support Focus on Architecture
Chapter 3.2 : Virtual Memory
Computer Organization and Architecture
Virtualization 101.
Xen and the Art of Virtualization Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, Andrew Warfield.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
Virtualization 101.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Virtualization Technology Prof D M Dhamdhere CSE Department IIT Bombay Moving towards Virtualization… Department of Computer Science and Engineering, IIT.
Tanenbaum 8.3 See references
Virtual Memory.
Windows 2000 Memory Management Computing Department, Lancaster University, UK.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Virtualization. ABCs Special software: hypervisors or virtual machine managers Guest OS (virtual machine) sits on top of host OS (Win 7 in our case) We.
A Cloud is a type of parallel and distributed system consisting of a collection of inter- connected and virtualized computers that are dynamically provisioned.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)
CS533 Concepts of Operating Systems Jonathan Walpole.
Review of Memory Management, Virtual Memory CS448.
Virtual Machine Monitors: Technology and Trends Jonathan Kaldor CS614 / F07.
Cosc 2150: Computer Organization Chapter 6, Part 2 Virtual Memory.
1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.
Embedded System Lab. 오명훈 Memory Resource Management in VMware ESX Server Carl A. Waldspurger VMware, Inc. Palo Alto, CA USA
Virtualization Part 2 – VMware. Virtualization 2 CS5204 – Operating Systems VMware: binary translation Hypervisor VMM Base Functionality (e.g. scheduling)
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
IT253: Computer Organization
Memory Management – Page 1 of 49CSCI 4717 – Computer Architecture Memory Management Uni-program – memory split into two parts –One for Operating System.
Virtualization Infrastructure Administration
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard et al. Madhura S Rama.
Virtualization Part 2 – VMware Hardware Support. Virtualization 2 CS 5204 – Fall, 2008 VMware: binary translation Hypervisor VMM Base Functionality (e.g.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
1 Virtual Machine Memory Access Tracing With Hypervisor Exclusive Cache USENIX ‘07 Pin Lu & Kai Shen Department of Computer Science University of Rochester.
CS 3204 Operating Systems Godmar Back Lecture 27.
VMWare MMU Ranjit Kolkar. Designed for efficient use of resources. ESX uses high-level resource management policies to compute a target memory allocation.
MEMORY RESOURCE MANAGEMENT IN VMWARE ESX SERVER 김정수
Full and Para Virtualization
Memory Resource Management in VMware ESX Server By Carl A. Waldspurger Presented by Clyde Byrd III (some slides adapted from C. Waldspurger) EECS 582 –
3.1 Advanced Operating Systems Superpages TLB coverage is the amount of memory mapped by TLB. I.e. the amount of memory that can be accessed without TLB.
Virtualization Fundamentals for DBAs Joey D’Antoni February 3, 2015 DBA Fundamentals VC.
Virtualization.
Chapter 2 Memory and process management
Presented by Yoon-Soo Lee
CS 3214 Introduction to Computer Systems
Virtualization Dr. Michael L. Collard
CS703 - Advanced Operating Systems
CSC 322 Operating Systems Concepts Lecture - 16: by
1. 2 VIRTUAL MACHINES By: Satya Prasanna Mallick Reg.No
Chapter 1: Introduction
Chapter 11: File System Implementation
Chapter 9: Virtual-Memory Management
Virtualization 101.
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
CSE 451: Operating Systems Autumn 2005 Memory Management
Windows Virtual PC / Hyper-V
CSE451 Virtual Memory Paging Autumn 2002
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Virtual Memory: Working Sets
COMP755 Advanced Operating Systems
System Virtualization
Virtual Memory 1 1.
Presentation transcript:

Memory Resource Management in VMware ESX Server Carl A. Waldspurger Vmware Inc. Presented by Wesley Coomber wcoomberEECS 582 – W16

Agenda Motivation for VMware ESX Server Ballooning Idle Memory Tax Content-based Page Sharing Hot I/O Page Remapping Discussion EECS 582 – W16

Motivations Why Server Virtualization? Present Industry trends Individual servers are often underutilized, so it makes sense to consolidate them as VMs on a single physical server to simplify management and reduce costs. Present Industry trends Server consolidation and lots of cheap shared-memory multiprocessors-> Server Virtualization has room for improvement Smarter multiplexing of physical resources EECS 582 – W16

What is VMware ESX Server? A thin software layer that multiplexes various hardware resources efficiently among a number of virtual machines How is this different? Traditional virtual machine systems had a hypervisor that runs on top of a typical operating system and intercepts I/O device calls from VMs to handle them as a host OS system call ESX Server runs on top of the system hardware directly, which provides better I/O performance and more control over resource management. Can run multiple operating systems with any necessary OS modification EECS 582 – W16

What is VMware ESX Server? (2) ESX adds another layer of abstraction by virtualizing real physical memory (machine address) into a ‘physical address’ software abstraction that makes the VM think it has access to hardware memory ESX Server has a pmap for each VM Also separate shadow page tables for the processor Server can transparently remap ‘physical’ pages and monitor/manipulate guest memory accesses Over-Commitment of Memory! ESX adds another layer of abstraction by virtualizing real physical memory (machine address) into a ‘physical address’ software abstraction that makes the VM think it has access to hardware memory ESX Server has a pmap for each VM that translates their ‘physical’ page numbers into machine page numbers Also separate shadow page tables that map virtual->machine pages are maintained for the processor Over-commitment of memory -> the sum of the max memory size for all the VMs is greater than the actual total size of the machine memory EECS 582 – W16

Ballooning A technique to implicitly reclaim memory How is this different? Traditional approach is to have another level of paging, and then moving a VM’s ‘physical’ pages to disk. This is bad because it necessitates a meta-level page replacement policy. (Eg. Which VM do I take memory from AND which pages to reclaim?) When ballooning is impossible or too slow, then the system defaults to random page replacement. Every VM is given illusion of having the fixed ‘max size’ of machine memory that I t can be allocated. A VM will get its maximum size when memory is not over commited. A complicated meta-level policy is likely to introduce strange behavior and performance due to unintended interactions of the Guest OS’s native memory management policies with the meta-level policy. http://spongebob.wikia.com/wiki/Life_of_Crime EECS 582 – W16

Ballooning (2) Balloon drivers poll the server once a second ESX Server controls the balloon module running in every guest OS. Inflating the balloon increases memory pressure and causes the Guest OS to invoke its own memory management algorithms to pin the pages into “physical” memory for reclamation. The guest OS might page out to its virtual disk when memory is scarce. The server can also deflate the balloon to decrease pressure and freeing guest memory. EECS 582 – W16

Ballooning Benchmarks Dbench benefits a lot from extra memory Black bars are the performance when the VM is configured with main memory sizes of 128 to 256 MegaBytes. The gray bars are the performance of the VM configured to a max of 256 MB and then ballooned down to each size. Ballooned VM perf is right behind normal VM performance, and the over head is mostly due to guest OS data structures that are sized based on the amount of ‘physical’ memory that the system has. So the 256VM ballooned to 128MB has a little bit less free memory than a 128MB configured VM. EECS 582 – W16

How memory management works in ESX Reminder: ESX server gives each guestOS the illusion of a ‘physical’ address space that starts at 0, when each address is actually mapped to non-contiguous actual hardware ‘machine address’ 3 variables go into allocation of memory to each VM. Min (guaranteed) size Max size Memory shares Min is guaranteed even when memory is over committed. Max size is the amount of physical memory configured for use by the GuestOS and the VM will get this max size allocated as long as memory is NOT over committed. Memory shares give each vm a fraction of real ‘machine’ memory based on their proportion of shares to total shares. So a VM that has 2x as many shares as another VM will get 2x the memory, (subject to each vms min and max constraints) and this 2x amount will only be kept if it is actively used. Because min is a guaranteed size, machine memory must be reserved for the guaranteed min size, plus the additional overhead needed for virtualiziation. The remaining space (eg. Max – min ) must be reserved as disk swap space so that the system is always capable of preserving the VM memory. http://clipartix.com/reminder-clip-art-image-12468/ EECS 582 – W16

Idle Memory Tax A technique to tradeoff some performance isolation for efficient memory utilization How is this different? Traditional approach is to have a pure proportional-share algorithm that maintains specific ratios of memory between VMs This is bad because it lets idle clients hoard memory and waste it while working clients with meager shares are under severe memory famine. Client is guaranteed a minimum resource fraction equal to its fraction of total shares https://commons.wikimedia.org/wiki/File:Logo_of_the_Internal_Revenue_Service.svg EECS 582 – W16

Idle Memory Tax (2) Charge a client more for its idle pages than for the ones it is actually using When the system needs memory, pages will be claimed first from clients that are not actively using their full allocated memory. Tax rate = max fraction of idle pages that can be taken from a client (defaults to 75%) ESX Server measures idle memory by statistically sampling the VM working sets to generate estimates of the amount of actively used memory for each VM. 75% tax rate is a good ‘magic number’ that allows a lot of idle memory in the system to be reclaimed, while still providing a buffer against rapid working set increases, which hides the latency of system reclamation activity such as ballooning and swapping to disk. The system responds rapidly to increases in mem usage and more gradually to decreases in mem usage. So the VM that had been idle and suddenly starts using all of its allocated memory is allowed to ramp up to its max pretty quickly, and a winding down VM (post workout) that decreases its working set has its idle mem slowly reclaimed by the idle memory tax EECS 582 – W16

Idle Memory Tax Benchmarks Two VMs with identical share allocations of 256 MB in an overcommitted system. VM1 (gray) is an idle windows OS. VM2 (black) is an instance of linux executing a memory-intensive workload. When the tax rate is increased to 75% at time 33min, the idle memory is taken from the windows VM and given to the linux VM, which boosts its performance by over 30%. EECS 582 – W16

Content-based Page Sharing A technique to safely share memory between virtual machines How is this different? Traditional approach is to identify redundant copies of pages, delete them, and then map the ‘physical’ pages to the single original ‘CoW’ copy. This (traditional way) is bad because it requires several modifications to the guest OS for it to work. “CoW” = copy on write. Make yourself a unique copy when you write to this page. http://hamster.wikia.com/shareCarrot.jpg EECS 582 – W16

Content-based Page Sharing (2) Content-based page sharing completely removes concerns for sharing away from the guest OS. (no need for modifications) Need to scan for sharing opportunities! ESX server hashes the contents of candidate pages and the hash is indexed into a table of other scanned pages. If there is a match with a hint frame, then a full comparison is done. If the pages are indeed identical, then the server maps each ‘physical’ page to a single machine page and marks it CoW “CoW” = copy on write. Make yourself a unique copy when you write to this page. High quality has function means that we can assume that all shared pages have unique hash values. EECS 582 – W16

Content-based Page Sharing Benchmarks Identical linux VMs running SPEC95 benchmarks Top graph is absolute amounts of memory shared and saved, and shows that it increases linearly with the number of VMs. The bottom graph depicts the metrics as a percentage of aggregate VM memory. For large numbers of VMs, the sharing % approaches 67% and almost 60% of all VM memory is reclaimed. EECS 582 – W16

Hot I/O Page Remapping Modern processors can address up to 64GB of memory. However, many devices that use DMA for IO transfers can only address up to 4 GB of memory. Traditional approach-> copy “high” (>4) memory into a temp buffer in “low” memory. This is expensive, and even worse in the case of VMs since VMs that think they have “low” memory might actually be mapped to high memory! ESX server tracks ‘hot’ pages that are involved in a lot of I/O, and when it reaches a certain threshold, the page is transparently remapped to low memory. “DMA” direct memory access. EECS 582 – W16

Discussion Is the ‘hot’ page remapping feature (published 2002) still useful for modern virtualized servers? What can be done to help alleviate the limitations of ballooning (balloon driver can be disabled or unavailable while guest OS is booting)? Is there a better page replacement policy for this system compared to the implemented randomized page replacement policy? “DMA” direct memory access. wcoomberEECS 582 – W16