Scott Devine, VMware, Inc.

Slides:



Advertisements
Similar presentations
E Virtual Machines Lecture 3 Memory Virtualization
Advertisements

© 2012 VMware Inc. All rights reserved CPU and Memory Virtualization Prashanth Bungale, Sr. Member of Technical Staff, Mobile Virtualization January 23.
Computer Organization and Architecture
Computer Organization and Architecture
Architectural Support for OS March 29, 2000 Instructor: Gary Kimura Slides courtesy of Hank Levy.
1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
Introduction to Virtual Machines
Virtual Machine Monitors CSE451 Andrew Whitaker. Hardware Virtualization Running multiple operating systems on a single physical machine Examples:  VMWare,
E Virtual Machines Lecture 4 Device Virtualization
Keith Adams, Ole Agesen 1st October 2009 Presented by Chwa Hoon Sung, Kang Joon Young A Comparison of Software and Hardware Techniques for x86 Virtualization.
Virtualization Concepts Presented by: Mariano Diaz.
Xen I/O Overview. Xen is a popular open-source x86 virtual machine monitor – full-virtualization – para-virtualization para-virtualization as a more efficient.
Architecture Support for OS CSCI 444/544 Operating Systems Fall 2008.
CHAPTER 2: COMPUTER-SYSTEM STRUCTURES Computer system operation Computer system operation I/O structure I/O structure Storage structure Storage structure.
DAIMIHenrik Bærbak Christensen1 Virtualization … from a reliability perspective.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto OS-Related Hardware.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,
Lecture 12 Virtualization Overview 1 Dec. 1, 2015 Prof. Kyu Ho Park “Understanding Full Virtualization, Paravirtualization, and Hardware Assist”, White.
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
CSE 451: Operating Systems Winter 2015 Module 25 Virtual Machine Monitors Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska,
© 2010 VMware Inc. All rights reserved Introduction to Virtual Machines Carl Waldspurger (SB SM ’89, PhD ’95), VMware R&D.
E Virtual Machines Lecture 2 CPU Virtualization Scott Devine VMware, Inc.
CS 695 Topics in Virtualization and Cloud Computing, Autumn 2012 CS 695 Topics in Virtualization and Cloud Computing More Introduction + Processor Virtualization.
Introduction to Operating Systems Concepts
Chapter Overview General Concepts IA-32 Processor Architecture
Translation Lookaside Buffer
Virtualization.
Virtual Machine Monitors
Introduction to Operating Systems
Virtualization D. J. Foreman 2009.
Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir. A.C. Verschueren Eindhoven University of Technology Section of Digital.
Introduction to Operating Systems
Chapter 2: Computer-System Structures(Hardware)
Chapter 2: Computer-System Structures
Assembly language.
William Stallings Computer Organization and Architecture 8th Edition
Lecture 24 Virtual Machine Monitors
x86 segmentation, page tables, and interrupts
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors
OS Virtualization.
Computer-System Architecture
Module 2: Computer-System Structures
A Survey on Virtualization Technologies
Architectural Support for OS
Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 596 Allen Center 1.
Module 2: Computer-System Structures
CSE 451: Operating Systems Autumn 2001 Lecture 2 Architectural Support for Operating Systems Brian Bershad 310 Sieg Hall 1.
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
CSE 451: Operating Systems Winter 2007 Module 2 Architectural Support for Operating Systems Brian Bershad 562 Allen Center 1.
Introduction to Virtual Machines
CSE 451: Operating Systems Winter 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 412 Sieg Hall 1.
Architectural Support for OS
Introduction to Virtual Machines
Lecture 13 Harvard architecture Coccone OS demonstrator
Chapter 2: Computer-System Structures
Chapter 2: Computer-System Structures
Chapter 11 Processor Structure and function
Module 2: Computer-System Structures
Module 2: Computer-System Structures
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Presentation transcript:

Scott Devine, VMware, Inc. CPU Virtualization Scott Devine, VMware, Inc.

Outline CPU Background Virtualization Techniques System ISA Virtualization Instruction Interpretation Trap and Emulate Binary Translation Hybrid Models

Computer System Organization NIC LAN CPU MMU Memory Controller Local Bus Interface High-Speed I/O Bus Bridge Frame Buffer Low-Speed USB CD-ROM

CPU Organization Instruction Set Architecture (ISA) Defines: the state visible to the programmer registers and memory the instruction that operate on the state ISA typically divided into 2 parts User ISA Primarily for computation System ISA Primarily for system resource management

User ISA - State User Virtual Memory Reg 0 Reg 1 Reg n-1 FP 0 FP 1 Program Counter Condition Codes Reg 0 Reg 1 Reg n-1 FP 0 FP 1 FP n-1 Special-Purpose Registers General-Purpose Registers Floating Point Registers

User ISA – Instructions Add Sub And Compare … Load byte Load Word Store Multiple Push Jump Jump equal Call Return Add single Mult. double Sqrt double Integer Memory Control Flow Floating Point Fetch Registers Issue FP Typical Instruction Pipeline Decode Instruction Groupings

System ISA Privilege Levels Control Registers Traps and Interrupts Hardcoded Vectors Dispatch Table System Clock MMU Page Tables TLB I/O Device Access System User Extension Kernel Level 0 Level 1 Level 2

Outline CPU Background Virtualization Techniques System ISA Virtualization Instruction Interpretation Trap and Emulate Binary Translation Hybrid Models

Si Sj Si’ Sj’ Isomorphism Formally, virtualization involves the construction of an isomorphism from guest state to host state. Guest Si Sj Host Si’ Sj’ e(Si) e’(Si’) V(Si) V(Sj) Virtualization software constructs an isomorphism from guest to host. All guest state S is mapped onto host state S’ through some function V(S). Additionally for every state changing operation e(S) in the guest there is a corresponding state changing operation e’(S’) in the host. Virtualization software must implement V() and e().

Virtualizing the System ISA Hardware needed by monitor Ex: monitor must control real hardware interrupts Access to hardware would allow VM to compromise isolation boundaries Ex: access to MMU would allow VM to write any page So… All access to the virtual System ISA by the guest must be emulated by the monitor in software. System state kept in memory. System instructions are implemented as functions in the monitor.

Example: CPUState Goal for CPU virtualization techniques Process normal instructions as fast as possible Forward privileged instructions to emulation routines static struct { void CPU_CLI(void) uint32 GPR[16]; { uint32 LR; CPUState.IE = 0; uint32 PC; } int IE; int IRQ; void CPU_STI(void) } CPUState; CPUState.IE = 1;

Instruction Interpretation Emulate Fetch/Decode/Execute pipeline in software Postives Easy to implement Minimal complexity Negatives Slow!

Example: Virtualizing the Interrupt Flag w/ Instruction Interpreter void CPU_Run(void) { while (1) { inst = Fetch(CPUState.PC); void CPU_CLI(void) CPUState.PC += 4; switch (inst) { case ADD: void CPU_STI(void) CPUState.GPR[rd] = GPR[rn] + GPR[rm]; CPUState.IE = 1; break; … case CLI: void CPU_Vector(int exc) CPU_CLI(); CPUState.LR = CPUState.PC; case STI: CPUState.PC = disTab[exc]; CPU_STI(); } if (CPUState.IRQ && CPUState.IE) { CPUState.IE = 0; CPU_Vector(EXC_INT);

Guest OS + Applications Virtual Machine Monitor Trap and Emulate Guest OS + Applications Virtual Machine Monitor Page Fault Undef Instr vIRQ MMU Emulation CPU I/O Unprivileged Privileged Direct execution

“Strictly Virtualizable” A processor or mode of a processor is strictly virtualizable if, when executed in a lesser privileged mode: all instructions that access privileged state trap all instructions either trap or execute identically … Again another operational definition. This idea behind strictly virtualizable is the whether trap and emulate will work. The x86 is not strictly virtualizable. One reason is the popf instruction. popf takes a word off the stack and puts in into the flags register. One flag in that register is the interrupt enable flag. At system level the flag is updated by popf. When the designers of the x86 introduced user mode they realized that modifications of this flag by the user would break the OS. So they solved this by silently dropping updates to the IF at user level. This works for OS’s put breaks VMMs. When a VMM runs the OS by boosting it into user level all modifications the OS make to the IF are silently dropped and the VMM looses track of whether the OS whats interrupts to be enabled or disabled. The way to do this would be to make popf trap, or better yet make popf’s that modify IF to trap. Note that these conditions are necessary but not sufficient. For example on the x86 there are several other reasons why trap and emulate will not work.

Issues with Trap and Emulate Not all architectures support it Trap costs may be high Monitor uses a privilege level Need to virtualize the protection levels

Guest Code Translator Callouts Binary Translator CPU Emulation TC Translation Cache Callouts TC Index CPU Emulation Routines

Guest Code Basic Blocks vPC Straight-line code Basic Block mov ebx, eax cli and ebx, ~0xfff mov ebx, cr3 sti ret Guest Code Straight-line code Control flow Basic Block

Guest Code Translation Cache Binary Translation vPC mov ebx, eax cli and ebx, ~0xfff mov ebx, cr3 sti ret call HANDLE_CLI mov [CO_ARG], ebx call HANDLE_CR3 call HANDLE_STI jmp HANDLE_RET start Guest Code Translation Cache Dynamic binary translation is used in virtualizing the CPU by translating potentially dangerous (or non-virtualizable) instruction sequences one-by-one into safe instruction sequences. It works like this: The monitor inspects the next sequence of instructions. An instruction sequence is typically defined as the next basic block, that is all instructions up to the next control transfer instruction such as a branch. There may be reasons to end a sequence earlier or go past a branch but for now lets assume we go to the next branch. Each instruction is translated and the translation is copied into a translation cache. Instructions are translated as follows: Instructions which pose no problems can be copied into the translation cache with modification. We call these “ident” translations. Some simple dangerous instructions can be translated into a short sequence emulation code. This code is placed directly into the translation cache. We call this “inline” translation. An example is the modification of the Interrupt Enable flag. Other dangerous instructions need be performed by emulation code in the monitor. For these instructions calls to the monitor are made. These are called “Call-outs”. An example of these is a change to the page table base. The branch ending the basic block needs a call out. The monitor can now jump to the start of the translated basic block with the virtual registers in the hardware registers. So dangerous instructions can be privileged instructions, non-virtualizable instructions, control flow, memory accesses.

Guest Code Translation Cache Binary Translation vPC mov ebx, eax cli and ebx, ~0xfff mov ebx, cr3 sti ret mov [CPU_IE], 0 mov [CO_ARG], ebx call HANDLE_CR3 mov [CPU_IE], 1 test [CPU_IRQ], 1 jne call HANDLE_INTS jmp HANDLE_RET start Guest Code Translation Cache Dynamic binary translation is used in virtualizing the CPU by translating potentially dangerous (or non-virtualizable) instruction sequences one-by-one into safe instruction sequences. It works like this: The monitor inspects the next sequence of instructions. An instruction sequence is typically defined as the next basic block, that is all instructions up to the next control transfer instruction such as a branch. There may be reasons to end a sequence earlier or go past a branch but for now lets assume we go to the next branch. Each instruction is translated and the translation is copied into a translation cache. Instructions are translated as follows: Instructions which pose no problems can be copied into the translation cache with modification. We call these “ident” translations. Some simple dangerous instructions can be translated into a short sequence emulation code. This code is placed directly into the translation cache. We call this “inline” translation. An example is the modification of the Interrupt Enable flag. Other dangerous instructions need be performed by emulation code in the monitor. For these instructions calls to the monitor are made. These are called “Call-outs”. An example of these is a change to the page table base. The branch ending the basic block needs a call out. The monitor can now jump to the start of the translated basic block with the virtual registers in the hardware registers. So dangerous instructions can be privileged instructions, non-virtualizable instructions, control flow, memory accesses.

Basic Binary Translator void BT_Run(void) void *BTTranslate(uint32 pc) { CPUState.PC = _start; void *start = TCTop; BT_Continue(); uint32 TCPC = pc; } while (1) { void BT_Continue(void) inst = Fetch(TCPC); TCPC += 4; void *tcpc; if (IsPrivileged(inst)) { tcpc = BTFindBB(CPUState.PC); EmitCallout(); } else if (IsControlFlow(inst)) { if (!tcpc) { EmitEndBB(); tcpc = BTTranslate(CPUState.PC); break; } else { /* ident translation */ RestoreRegsAndJump(tcpc); EmitInst(inst); return start;

Basic Binary Translator – Part 2 void BT_CalloutSTI(BTSavedRegs regs) } { CPUState.PC = BTFindPC(regs.tcpc); return; CPUState.GPR[] = regs.GPR[]; CPU_STI(); CPUState.PC += 4; if (CPUState.IRQ && CPUState.IE) { CPUVector(); BT_Continue(); /* NOT_REACHED */

Controlling Control Flow vEPC test eax, 1 jeq add ebx, 18 mov ecx, [ebx] mov [ecx], eax call END_BB start Guest Code Translation Cache ret

Controlling Control Flow vEPC test eax, 1 jeq add ebx, 18 mov ecx, [ebx] mov [ecx], eax call END_BB Guest Code Translation Cache ret call HANDLE_RET eax == 0 find next

Controlling Control Flow vEPC test eax, 1 jeq add ebx, 18 mov ecx, [ebx] mov [ecx], eax jmp call END_BB Guest Code Translation Cache ret call HANDLE_RET eax == 0

Controlling Control Flow vEPC test eax, 1 jeq add ebx, 18 mov ecx, [ebx] mov [ecx], eax jmp call END_BB Guest Code Translation Cache ret call HANDLE_RET eax == 1 find next

Controlling Control Flow vEPC test eax, 1 jeq add ebx, 18 mov ecx, [ebx] mov [ecx], eax jmp Guest Code Translation Cache ret call HANDLE_RET eax == 1

Issues with Binary Translation Translation cache index data structure PC Synchronization on interrupts Self-modifying code Notified on writes to translated guest code

Other Uses for Binary Translation Cross ISA translators Digital FX!32 Optimizing translators H.P. Dynamo High level language byte code translators Java .NET/CLI

Hybrid Approach Yes Trap No Callout Direct Execution Jump to Guest PC Binary Translation for the Kernel Direct Execution (Trap-and-emulate) for the User U.S. Patent 6,397,242 DirectExec OK? Direct Execution Jump to Guest PC Yes Trap Handle Priv. Instruction No TC Validate Execute In TC Callout

Homework 1: Binary Patching Binary Patching for profiling and code coverage Process Virtualization Patching code be compiled into program Follow execution by patching control flow instructions Patch instructions in-place No need for translation or copying Instruction decoding need only determine Length of instruction Control flow points Callouts call through asm linkage Saves EFLAGS Saves registers Calls C code