Download presentation
Presentation is loading. Please wait.
1
Memory Resource Management in VMware ESX Server
Carl A. Waldspurger Vmware Inc. Presented by Wesley Coomber wcoomberEECS 582 – W16
2
Agenda Motivation for VMware ESX Server Ballooning Idle Memory Tax
Content-based Page Sharing Hot I/O Page Remapping Discussion EECS 582 – W16
3
Motivations Why Server Virtualization? Present Industry trends
Individual servers are often underutilized, so it makes sense to consolidate them as VMs on a single physical server to simplify management and reduce costs. Present Industry trends Server consolidation and lots of cheap shared-memory multiprocessors-> Server Virtualization has room for improvement Smarter multiplexing of physical resources EECS 582 – W16
4
What is VMware ESX Server?
A thin software layer that multiplexes various hardware resources efficiently among a number of virtual machines How is this different? Traditional virtual machine systems had a hypervisor that runs on top of a typical operating system and intercepts I/O device calls from VMs to handle them as a host OS system call ESX Server runs on top of the system hardware directly, which provides better I/O performance and more control over resource management. Can run multiple operating systems with any necessary OS modification EECS 582 – W16
5
What is VMware ESX Server? (2)
ESX adds another layer of abstraction by virtualizing real physical memory (machine address) into a ‘physical address’ software abstraction that makes the VM think it has access to hardware memory ESX Server has a pmap for each VM Also separate shadow page tables for the processor Server can transparently remap ‘physical’ pages and monitor/manipulate guest memory accesses Over-Commitment of Memory! ESX adds another layer of abstraction by virtualizing real physical memory (machine address) into a ‘physical address’ software abstraction that makes the VM think it has access to hardware memory ESX Server has a pmap for each VM that translates their ‘physical’ page numbers into machine page numbers Also separate shadow page tables that map virtual->machine pages are maintained for the processor Over-commitment of memory -> the sum of the max memory size for all the VMs is greater than the actual total size of the machine memory EECS 582 – W16
6
Ballooning A technique to implicitly reclaim memory
How is this different? Traditional approach is to have another level of paging, and then moving a VM’s ‘physical’ pages to disk. This is bad because it necessitates a meta-level page replacement policy. (Eg. Which VM do I take memory from AND which pages to reclaim?) When ballooning is impossible or too slow, then the system defaults to random page replacement. Every VM is given illusion of having the fixed ‘max size’ of machine memory that I t can be allocated. A VM will get its maximum size when memory is not over commited. A complicated meta-level policy is likely to introduce strange behavior and performance due to unintended interactions of the Guest OS’s native memory management policies with the meta-level policy. EECS 582 – W16
7
Ballooning (2) Balloon drivers poll the server once a second
ESX Server controls the balloon module running in every guest OS. Inflating the balloon increases memory pressure and causes the Guest OS to invoke its own memory management algorithms to pin the pages into “physical” memory for reclamation. The guest OS might page out to its virtual disk when memory is scarce. The server can also deflate the balloon to decrease pressure and freeing guest memory. EECS 582 – W16
8
Ballooning Benchmarks
Dbench benefits a lot from extra memory Black bars are the performance when the VM is configured with main memory sizes of 128 to 256 MegaBytes. The gray bars are the performance of the VM configured to a max of 256 MB and then ballooned down to each size. Ballooned VM perf is right behind normal VM performance, and the over head is mostly due to guest OS data structures that are sized based on the amount of ‘physical’ memory that the system has. So the 256VM ballooned to 128MB has a little bit less free memory than a 128MB configured VM. EECS 582 – W16
9
How memory management works in ESX
Reminder: ESX server gives each guestOS the illusion of a ‘physical’ address space that starts at 0, when each address is actually mapped to non-contiguous actual hardware ‘machine address’ 3 variables go into allocation of memory to each VM. Min (guaranteed) size Max size Memory shares Min is guaranteed even when memory is over committed. Max size is the amount of physical memory configured for use by the GuestOS and the VM will get this max size allocated as long as memory is NOT over committed. Memory shares give each vm a fraction of real ‘machine’ memory based on their proportion of shares to total shares. So a VM that has 2x as many shares as another VM will get 2x the memory, (subject to each vms min and max constraints) and this 2x amount will only be kept if it is actively used. Because min is a guaranteed size, machine memory must be reserved for the guaranteed min size, plus the additional overhead needed for virtualiziation. The remaining space (eg. Max – min ) must be reserved as disk swap space so that the system is always capable of preserving the VM memory. EECS 582 – W16
10
Idle Memory Tax A technique to tradeoff some performance isolation for efficient memory utilization How is this different? Traditional approach is to have a pure proportional-share algorithm that maintains specific ratios of memory between VMs This is bad because it lets idle clients hoard memory and waste it while working clients with meager shares are under severe memory famine. Client is guaranteed a minimum resource fraction equal to its fraction of total shares EECS 582 – W16
11
Idle Memory Tax (2) Charge a client more for its idle pages than for the ones it is actually using When the system needs memory, pages will be claimed first from clients that are not actively using their full allocated memory. Tax rate = max fraction of idle pages that can be taken from a client (defaults to 75%) ESX Server measures idle memory by statistically sampling the VM working sets to generate estimates of the amount of actively used memory for each VM. 75% tax rate is a good ‘magic number’ that allows a lot of idle memory in the system to be reclaimed, while still providing a buffer against rapid working set increases, which hides the latency of system reclamation activity such as ballooning and swapping to disk. The system responds rapidly to increases in mem usage and more gradually to decreases in mem usage. So the VM that had been idle and suddenly starts using all of its allocated memory is allowed to ramp up to its max pretty quickly, and a winding down VM (post workout) that decreases its working set has its idle mem slowly reclaimed by the idle memory tax EECS 582 – W16
12
Idle Memory Tax Benchmarks
Two VMs with identical share allocations of 256 MB in an overcommitted system. VM1 (gray) is an idle windows OS. VM2 (black) is an instance of linux executing a memory-intensive workload. When the tax rate is increased to 75% at time 33min, the idle memory is taken from the windows VM and given to the linux VM, which boosts its performance by over 30%. EECS 582 – W16
13
Content-based Page Sharing
A technique to safely share memory between virtual machines How is this different? Traditional approach is to identify redundant copies of pages, delete them, and then map the ‘physical’ pages to the single original ‘CoW’ copy. This (traditional way) is bad because it requires several modifications to the guest OS for it to work. “CoW” = copy on write. Make yourself a unique copy when you write to this page. EECS 582 – W16
14
Content-based Page Sharing (2)
Content-based page sharing completely removes concerns for sharing away from the guest OS. (no need for modifications) Need to scan for sharing opportunities! ESX server hashes the contents of candidate pages and the hash is indexed into a table of other scanned pages. If there is a match with a hint frame, then a full comparison is done. If the pages are indeed identical, then the server maps each ‘physical’ page to a single machine page and marks it CoW “CoW” = copy on write. Make yourself a unique copy when you write to this page. High quality has function means that we can assume that all shared pages have unique hash values. EECS 582 – W16
15
Content-based Page Sharing Benchmarks
Identical linux VMs running SPEC95 benchmarks Top graph is absolute amounts of memory shared and saved, and shows that it increases linearly with the number of VMs. The bottom graph depicts the metrics as a percentage of aggregate VM memory. For large numbers of VMs, the sharing % approaches 67% and almost 60% of all VM memory is reclaimed. EECS 582 – W16
16
Hot I/O Page Remapping Modern processors can address up to 64GB of memory. However, many devices that use DMA for IO transfers can only address up to 4 GB of memory. Traditional approach-> copy “high” (>4) memory into a temp buffer in “low” memory. This is expensive, and even worse in the case of VMs since VMs that think they have “low” memory might actually be mapped to high memory! ESX server tracks ‘hot’ pages that are involved in a lot of I/O, and when it reaches a certain threshold, the page is transparently remapped to low memory. “DMA” direct memory access. EECS 582 – W16
17
Discussion Is the ‘hot’ page remapping feature (published 2002) still useful for modern virtualized servers? What can be done to help alleviate the limitations of ballooning (balloon driver can be disabled or unavailable while guest OS is booting)? Is there a better page replacement policy for this system compared to the implemented randomized page replacement policy? “DMA” direct memory access. wcoomberEECS 582 – W16
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.