Download presentation
Presentation is loading. Please wait.
Published byReynold Logan Modified over 9 years ago
1
Keith Adams, Ole Agesen 1st October 2009 Presented by Chwa Hoon Sung, Kang Joon Young A Comparison of Software and Hardware Techniques for x86 Virtualization
2
10/1/20091 Virtualization virtualization
3
Classic Virtualization Software Virtualization Hardware Virtualization Comparison and Results Discussion Outline 10/1/20092
4
3 De-Privilege OS Executes guest operating systems directly but at lesser privilege level, user-level Classic Virtualization(Trap-and-Emulate) OS apps kernel mode user mode
5
10/1/2009 De-Privilege OS Executes guest operating systems directly but at lesser privilege level, user-level Classic Virtualization(Trap-and-Emulate) OS apps kernel mode user mode virtual machine monitor OS apps
6
10/1/20095 Runs guest operating system deprivileged. All privileged instructions trap into VMM. VMM emulates instructions against virtual state. Resumes direct execution from next guest instruction. Trap-and-Emulate
7
10/1/20096 Architectural Obstacles Traps are expensive. (~3000 cycles) Many traps unavoidable. (e.g., page faults) Not all architectures support the trap-and-emulate. (x86) Classic Virtualization (Cont’d)
8
10/1/2009 7 Classic Virtualization (Popek & Goldberg) System Virtualization Trap-and-emulate Software VMM Hardware VMM Enhancement Para-virtualization (Xen) Hardware Support for Virtualization (Intel VT & AMD SVM) Full-virtualization (VMware)
9
Classic Virtualization Software Virtualization Hardware Virtualization Comparison and Results Discussion Outline 10/1/20098
10
9 Until recently, the x86 architecture has not permitted classical trap-and-emulate virtualization. Some privileged state is visible in user mode Guest OS can observe that current privilege level (CPL) in code segment selection (%cs). Not all privileged operations trap when run in user mode Dual-purpose instructions don’t trap (popf). Software VMMs for x86 have instead used binary translation of the guest code. Software Virtualization
11
10/1/200910 Translates the kernel code to replace privileged instructions with new sequences of instructions that have the intended effect on the virtual hardware. The software VMM uses a translator with these properties. Binary – input is machine-level code. Dynamic – occurs at runtime. On demand – code translated when needed for execution. System level – makes no assumption about guest code. Subsetting– translates from full instruction set to safe subset. Adaptive – adjust code based on guest behavior to achieve efficiency. Binary Translation
12
10/1/200911 The translators input is full x86 instruction set, including all the privileged instructions; output is a safe subset of user-mode instructions Binary Translation (Cont’d)
13
10/1/200912 Binary Translation Translator Guest Code Translation Cache Callouts TC Index CPU Emulation Routines
14
10/1/200913 vPC mov ebx, eax cli and ebx, ~0xfff mov ebx, cr3 sti ret Guest Code Straight-line code Control flow Basic Block
15
10/1/200914 vPC mov ebx, eax cli and ebx, ~0xfff mov ebx, cr3 sti ret mov ebx, eax call HANDLE_CLI and ebx, ~0xfff mov [CO_ARG], ebx call HANDLE_CR3 call HANDLE_STI jmp HANDLE_RET start Guest CodeTranslation Cache
16
10/1/200915 vPC mov ebx, eax cli and ebx, ~0xfff mov ebx, cr3 sti ret mov ebx, eax mov [CPU_IE], 0 and ebx, ~0xfff mov [CO_ARG], ebx call HANDLE_CR3 mov [CPU_IE], 1 test [CPU_IRQ], 1 jne call HANDLE_INTS jmp HANDLE_RET start Guest CodeTranslation Cache
17
10/1/200916 Avoid privilege instruction traps Example: rdtsc (read time-stamp counter) <- privileged instruction Trap-and-emulate: 2030 cycles Callout-and-emulate: 1254 cycles (not TC) In TC emulation: 216 cycles Performance Advantages of BT
18
Classic Virtualization Software Virtualization Hardware Virtualization Comparison and Results Discussion Outline 10/1/200917
19
10/1/200918 Recent x86 extension 1998 – 2005: Software-only VMMs using binary translation 2005: Intel and AMD start extending x86 to support virtualization. First-generation hardware Allows classical trap-and-emulate VMMs. Intel VT (Virtualization Technology) AMD SVM (Security Virtual Machine) Performance VT/SVM help avoid BT, but not MMU ops. (actually slower!) Main problem is efficient virtualization of MMU and I/O, Not executing the virtual instruction stream. Hardware Virtualization
20
10/1/200919 VMCB(Virtual Machine Control Block) in-memory data structure Contains the state of guest virtual CPU. Modes Non-root mode: guest OS runs at its intended privilege level(ring 0) (Not fully privileged) Root mode: VMM is running at a new ring with an even higher privilege level(Fully privileged) Instructions vmrun: transfers from root to non- root mode. exit: transfers from non-root to root mode. New Hardware Features
21
10/1/200920 Intel VT-x Operations Ring 0 VMX Root Mode VMX Non-root Mode... Ring 0 Ring 3 VM 1 Ring 0 Ring 3 VM 2 Ring 0 Ring 3 VM n VMLAUNCH VM Run VM Exit VMCB 2 VMCB n VMCB 1
22
10/1/200921 Hardware VMM reduces guest OS dependency Eliminates need for binary translation Facilitates support for Legacy OS Hardware VMM improves robustness Eliminates need for complex SW techniques Simpler and smaller VMMs Hardware VMM improves performance Fewer unwanted (Guest VMM) transitions Benefits of Hardware Virtualization
23
Classic Virtualization Software Virtualization Hardware Virtualization Comparison and Results Discussion Outline 10/1/200922
24
10/1/200923 BT tends to win in these areas: Trap elimination – BT can replace most traps with faster callouts. Emulation Speed – callouts jump to predecoded emulation routine. Callout avoidance – for frequent cases, BT may use in-TC emulation routines, avoiding even the callout cost. The hardware VMM wins in these area: Code Density – since there is no translation. Precise exceptions – BT performs extra work to recover guest state for faults. System calls – runs without VMM intervention. Software VMM vs. Hardware VMM
25
10/1/200924 Software VMM – VMware Player 1.0.1 Hardware VMM – VMware implemented experimental hardware assisted VMM. Host – HP workstation, VT-enabled 3.8 GHz Intel Pentium All experiments are run natively, on software VMM and on Hardware-assisted VMM. Experiments
26
10/1/200925 Test to stress process creation and destruction system calls, context switching, page table modifications, page faults, etc. Results – to create and destroy 40,000 processes Host – 0.6 seconds Software VMM – 36.9 seconds Hardware VMM – 106.4 seconds Forkwait Test
27
10/1/200926 Benchmark Custom guest OS – FrobOS Tests performance of single virtualization sensitive operation Observations Syscall (Native == HW << SW) Hardware – No VMM intervention in so near native Software – traps in (SW << Native << HW) Native – access a off-CPU register Software VMM – translates “in” into a short sequence of instructions that access virtual model of the same. Hardware – VMM intervention Nanobenchmarks
28
10/1/200927 Observations (Cont’d) ptemod (Native << SW << HW) Both use shadowing technique to implement guest paging using traces for coherency PTE writes causes significant overhead compared to native Nanobenchmarks (Cont’d)
29
Classic Virtualization Software Virtualization Hardware Virtualization Comparison and Results Discussion Outline 10/1/200928
30
10/1/200929 Microarchitecture Hardware overheads will shrink over time as implementations mature. Measurements on desktop system using a pre-production version Intel’s Core microarchitecture. Hardware VMM algorithmic changes Drop trace faults upon guest PTE modification, allowing temporary incoherency with shadow page tables to reduce costs. Hybrid VMM Dynamically selects the execution technique Hardware VMM’s superior system call performance Software VMM’s superior MMU performance Hardware MMU support Trace faults, context switches and hidden page faults can be handled effectively with hardware assistance in MMU virtualization. Opportunities
31
10/1/200930 Hardware extensions allow classical virtualization on x86 architecture. Extensions remove the need for Binary Translation and simplifies VMM design. Software VMM fares better than Hardware VMM in many cases like context switches, page faults, trace faults, I/O. New MMU algorithms might narrow the gap in performance. Conclusion
32
10/1/200931 Benchmarks Apache ab benchmarking tool – on Linux installation of Apache http server and on Windows installation Tests I/O efficiency Observations Both VMMs perform poorly Performance on Windows and Linux differ widely Reason: Apache Configuration Windows – single address space (less paging) Hardware VMM is better Linux – multiple address spaces (more paging) Software VMM is better Server Workload
33
10/1/200932 Benchmark PassMark on Windows XP Professional The suite of microbenchmarks test various aspects of workstation performance. Observations Large RAM test Exhausts memory. (paging capabilities) Intended to test paging capability. Software VMM is better. 2D Graphics test Involves system calls. Hardware VMM is better. Desktop-Oriented Workload
34
10/1/200933 Compilation times Linux kernel and Apache (on Cygwin) Observation Big compilation jobs – lots of page faults. Software VMM is better in handling page faults. Less Synthetic Workload
35
10/1/200934
36
10/1/200935
37
10/1/200936
38
10/1/200937
39
10/1/200938
40
10/1/200939
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.