Distributed Systems CS 15-440 Virtualization- Part II Lecture 24, Dec 7, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.

Slides:



Advertisements
Similar presentations
More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
Advertisements

Computer System Organization Computer-system operation – One or more CPUs, device controllers connect through common bus providing access to shared memory.
Computer Organization and Architecture
Computer Organization and Architecture
Computer Organization and Architecture
CS 153 Design of Operating Systems Spring 2015
OS Fall ’ 02 Introduction Operating Systems Fall 2002.
Architectural Support for Operating Systems. Announcements Most office hours are finalized Assignments up every Wednesday, due next week CS 415 section.
OS Spring’03 Introduction Operating Systems Spring 2003.
1 Chapter 8 Virtual Memory Virtual memory is a storage allocation scheme in which secondary memory can be addressed as though it were part of main memory.
Figure 1.1 Interaction between applications and the operating system.
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
Computer Organization and Architecture
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
03/05/2008CSCI 315 Operating Systems Design1 Memory Management Notice: The slides for this lecture have been largely based on those accompanying the textbook.
1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.
Chapter 2 The OS, the Computer, and User Programs Copyright © 2008.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Distributed Systems CS Virtualization- Overview Lecture 22, Dec 4, 2013 Mohammad Hammoud 1.
虛擬化技術 Virtualization and Virtual Machines
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Virtualization Technology Prof D M Dhamdhere CSE Department IIT Bombay Moving towards Virtualization… Department of Computer Science and Engineering, IIT.
System Calls 1.
3/11/2002CSE Input/Output Input/Output Control Datapath Memory Processor Input Output Memory Input Output Network Control Datapath Processor.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)
CS533 Concepts of Operating Systems Jonathan Walpole.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Virtualization Concepts Presented by: Mariano Diaz.
Xen I/O Overview. Xen is a popular open-source x86 virtual machine monitor – full-virtualization – para-virtualization para-virtualization as a more efficient.
Chapter 5 Operating System Support. Outline Operating system - Objective and function - types of OS Scheduling - Long term scheduling - Medium term scheduling.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
Operating Systems Lecture November 2015© Copyright Virtual University of Pakistan 2 Agenda for Today Review of previous lecture Hardware (I/O, memory,
Interrupt driven I/O. MIPS RISC Exception Mechanism The processor operates in The processor operates in user mode user mode kernel mode kernel mode Access.
1 CSE451 Architectural Supports for Operating Systems Autumn 2002 Gary Kimura Lecture #2 October 2, 2002.
Main Memory. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The.
We will focus on operating system concepts What does it do? How is it implemented? Apply to Windows, Linux, Unix, Solaris, Mac OS X. Will discuss differences.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Operating Systems 1 K. Salah Module 1.2: Fundamental Concepts Interrupts System Calls.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Full and Para Virtualization
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
بسم الله الرحمن الرحيم MEMORY AND I/O.
Lectures 8 & 9 Virtual Memory - Paging & Segmentation System Design.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 8: Main Memory.
Computer Architecture Chapter (8): Operating System Support
CS 695 Topics in Virtualization and Cloud Computing, Autumn 2012 CS 695 Topics in Virtualization and Cloud Computing More Introduction + Processor Virtualization.
Virtualization.
Virtual Machine Monitors
Memory Management.
Formal Virtual Machines
OS Virtualization.
Virtualization Techniques
Operating Systems Lecture 3.
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 596 Allen Center 1.
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Winter 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 412 Sieg Hall 1.
Architectural Support for OS
Presentation transcript:

Distributed Systems CS Virtualization- Part II Lecture 24, Dec 7, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1

Today…  Last session  Virtualization- Part I  Today’s session  Virtualization – Part II  Announcements:  PS4 is due today by 11:59PM  Project 4 is due on Dec 13 by 11:59PM (No deadline extension)  On Monday Dec 12, each team will present its work on project 4 2

Objectives Discussion on Virtualization Virtual machine types Partitioning and Multiprocessor virtualization Resource virtualization Why virtualization, and virtualization properties Virtualization, para- virtualization, virtual machines and hypervisors Resource virtualization

Computer System Hardware CPU MMU Memory Controller Local Bus Interface High-Speed I/O Bus NICControllerBridge Frame Buffer LAN Low-Speed I/O Bus USBCD-ROM

Resource Virtualization 5 CPU VirtualizationI/O Virtualization Memory Virtualization

CPU Virtualization  Interpretation and Binary Translation  Virtualizable ISAs

CPU Virtualization  Interpretation and Binary Translation  Virtualizable ISAs

Instruction Set Architecture  Typically, the architecture of a processor defines: 1.A set of storage resources (e.g., registers and memory) 2.A set of instructions that manipulate data held in storage resources  The definition of the storage resources and the instructions that manipulate data are documented in what is referred to as Instruction Set Architecture (ISA)  Two parts in the ISA are important in the definition of VMs: 1.User ISA: visible to user programs 2.System ISA: visible to supervisor software (e.g., OS)

Ways to Virtualize CPUs  The key to virtualize a CPU lies in the execution of the guest instructions, including both system-level and user-level instructions  Virtualizing a CPU can be achieved in one of two ways: 1.Emulation: the only processor virtualization mechanism available when the ISA of the guest is different from the ISA of the host 2.Direct native execution: possible only if the ISA of the host is identical to the ISA of the guest

Emulation  Emulation is the process of implementing the interface and functionality of one system (or subsystem) on a system (or subsystem) having different interface and functionality  In other words, emulation allows a machine implementing one ISA (the target), to reproduce the behavior of a software compiled for another ISA (the source)  Emulation can be carried out using: 1.Interpretation 2.Binary translation Guest Source ISA Host Target ISA Emulated by

Basic Interpretation  Interpretation involves a 4-step cycle (all in software): 1.Fetching a source instruction 2.Analyzing it 3.Performing the required operation 4.Then fetching the next source instruction Code Data Stack Program Counter Condition Codes Reg 0 Reg 1 Reg n-1 Interpreter Code Source Memory State Source Context Block

Decode-And-Dispatch  A simple interpreter, referred to as decode-and-dispatch, operates by stepping through the source program (instruction by instruction) reading and modifying the source state  Decode-and-dispatch is structured around a central loop that decodes an instruction and then dispatches it to an interpretation routine  It uses a switch statement to call a number of routines that emulate individual instructions Source Code Dispatch Loop Source CodeInterpreter Routines Native Execution Decode-And- Dispatch Interpretation

Decode-And-Dispatch- Drawbacks  The central dispatch loop of a decode-and- dispatch interpreter contains a number of branch instructions  Indirect branch for the switch statement  A branch to the interpreter routine  A second register indirect branch to return from the interpreter routine  And a branch that terminates the loop  These branches tend to degrade performance Dispatch Loop Source CodeInterpreter Routines Decode-And- Dispatch Interpretation

Indirect Threaded Interpretation  To avoid some of the branches, a portion of the dispatch code can be appended (threaded) to the end of each of the interpreter routines  To locate interpreter routines, a dispatch table and a jump instruction can be used when stepping through the source program  This scheme is referred to as indirect threaded interpretation Source Code Interpreter Routines Dispatch Loop Source Code Interpreter Routines Decode-And- Dispatch Interpretation Indirect Threaded Interpretation

Indirect Threaded Interpretation- Drawbacks  The dispatch table causes an overhead when looked up:  It requires a memory access and a register indirect branch  An interpreter routine is invoked every time the same instruction is encountered  Thus, the process of examining the instruction and extracting its various fields is always repeated Source Code Interpreter Routines Indirect Threaded Interpretation

Predecoding (1)  It would be more efficient to perform a repeated operation only once  We can save away the extracted information of an instruction in an intermediate form  The intermediate form can then be simply reused whenever an instruction is re-encountered for emulation  However, a Target Program Counter (TPC) will be needed to step through the intermediate code Lwz r1, 8(r2) //load word and zero Add r3, r3, r1 //r3 = r3 +r1 Stw r3, 0(r4) //store word PowerPC source code (load word and zero) (add) (store word) PowerPC program in predecoded intermediate form

Predecoding (2)  To avoid a memory lookup whenever the dispatch table is accessed, the opcode in the intermediate form can be replaced with the address of the interpreter routine  This leads to a scheme referred to as direct threaded interpretation d (load word and zero) (add) (store word) (load word and zero) (add) (store word)

Direct Threaded Interpretation Source Code Interpreter Routines Predecoder Intermediate Code Source Code Interpreter Routines Indirect Threaded Interpretation Direct Threaded Interpretation

Direct Threaded Interpretation- Drawbacks  Direct threaded interpretation still suffers from major drawbacks: 1.It limits portability because the intermediate form is dependent on the exact locations of the interpreter routines 2.The size of predecoded memory image is proportional to the original source memory image 3.All source instructions of the same type are emulated with the same interpretation routine Source Code Interpreter Routines Predecoder Intermediate Code

Binary Translation  Performance can be significantly enhanced by mapping each individual source binary instruction to its own customized target code  This process of converting the source binary program into a target binary program is referred to as binary translation  Binary translation attempts to amortize the fetch and analysis costs by: 1.Translating a block of source instructions to a block of target instructions 2.Caching the translated code for repeated use

Binary Translation Source Code Binary Translator Binary Translated Target Code Source Code Interpreter Routines Predecoder Intermediate Code Direct Threaded Interpretation Binary Translation

Static Binary Translation  It is possible to binary translate a program in its entirety before executing the program  This approach is referred to as static binary translation  However, in real code using conventional ISAs, especially CISC ISAs, such a static approach can cause problems due to:  Variable-length instructions  Register indirect jumps  Data interspersed with instructions  Pads to align instructions Inst. 1Inst. 2 Inst. 3jump Reg.Data Inst. 5Inst. 6 Uncond. Branch Pad Inst. 8 Data in instruction stream Pad for instruction alignment Jim indirect to ???

Dynamic Binary Translation Source Program Counter (SPC) to Target Program Counter (TPC) Map Table Emulation Manager Interpreter Translator Miss Hit Code Cache  A general solution is to translate the binary while the program is operating on actual input data (i.e., dynamically) and interpret new sections of code incrementally as the program reaches them  This scheme is referred to as dynamic binary translation

Dynamic Binary Translation Start with SPC Look Up SPC  TPC in Map Table Hit in Table Use SPC to Read Instructions from Source Memory Image Interpret, Translate, and Place into Code Cache Write New SPC  TPC Mapping into Map Table Branch to TPC and Execute Translated Block Get SPC for Next Block No Yes

CPU Virtualization  Interpretation and Binary Translation  Virtualizable ISAs

Privilege Rings in a System  In the ISA, special privileges to system resources are permitted by defining modes of operations  Usually an ISA specifies at least two modes of operation: 1.System (also called supervisor, kernel, or privileged) mode: all resources are accessible to software 2.User mode: only certain resources are accessible to software System Mode User Mode Kernel Level 0 Level 1 Level 2 Level 3 Apps (User Level) Simple systems have 2 rings Intel’s IA-32 allows 4 rings

Conditions for ISA Virtualizability  In a native system VM, the VMM runs in system mode, and all other software run in user mode  A privileged instruction is defined as one that traps if the machine is in user mode and does not trap if the machine is in system mode  Examples of Privileged Instructions are:  Load PSW: If it can be accessed in user mode, a malicious user program can put itself in system mode and get control of the system  Set CPU Timer: If it can be accessed in user mode, a malicious user program can change the amount of time allocated to it before getting context switched

Types of Instructions  Instructions that interact with hardware can be classified into three categories: 1.Control-sensitive: Instructions that attempt to change the configuration of resources in the system (e.g., memory assigned to a program) 2.Behavior-sensitive: Instructions whose behaviors or results depend on the configuration of resources 3.Innocuous: Instructions that are neither control-sensitive nor behavior-sensitive

Virtualization Theorm  Virtualization Theorem: For any conventional third-generation computer, a VMM may be constructed if the set of sensitive instructions for that computer is a subset of the set of privileged instructions Privileged Sensitive Nonprivileged Privileged Sensitive User Does not satisfy the theorem Satisfies the theorem Critical

Efficient VM Implementation  An OS running on a guest VM should not be allowed to change hardware resources (e.g., executing PSW and set CPU timer)  Therefore, guest OSs are all forced to run in user mode An efficient VM implementation can be constructed if instructions that could interfere with the correct or efficient functioning of the VMM always trap in the user mode

Trapping To VMM Dispatcher Interpreter Routine 1 Interpreter Routine 2 Interpreter Routine n Allocator Instruction Trap Occurs Privileged Instruction Privileged Instruction Privileged Instruction Privileged Instruction These instructions do not change machine resources but access privileged resources (e.g., IN, OUT, Write TLB) These instructions desire to change machine resources (e.g., load relocation bounds register)

Handling Privileged Instructions Guest OS code in VM (user mode) Privileged Instruction (LPSW) Next Instruction (Target of LPSW) VMM code (privileged mode) Dispatcher LPSW Routine: Change mode to privileged Check privilege level in VM Emulate Instruction Compute target Restore mode to user Jump to target

Critical Instructions  Critical instructions are sensitive but not privileged– they do not generate traps in user mode  Intel IA-32 has several critical instructions  An example is POPF in IA-32 (Pop Stack into Flags Register) which pops the flag registers from a stack held in memory  One of the flags is the interrupt-enable flag, which can be modified only in the privileged mode  In the user mode, POPF can overwrite all flags except the interrupt-enable flag (for this it acts as no-op) Can an efficient VMM be constructed with the presence of critical instructions?

Handling Critical Instructions  Critical Instructions are problematic and they inhibit the creation of an efficient VMM  However, if an ISA is not efficiently virtualizable, this does not mean we cannot create a VMM  The VMM can scan the guest code before execution, discover all critical instructions, and replace them with traps (system calls) to the VMM  This replacement process is known as patching  Even if an ISA contains only ONE critical instruction, patching will be required

Patching of Critical Instructions Scanner and Patcher Trap to VMM Code patch for discovered critical instruction Original Code Patched Code

Code Caching  Some of the critical instructions that trap to the VMM might require interpretation  Interpretation overhead might slow down the VMM especially if the frequency of critical instructions requiring interpretations increases  To reduce overhead, interpreted instructions can be cached, using a strategy known as code caching  Code caching is done on a block of instructions surrounding the critical instruction (larger blocks lend themselves better to optimization)

Caching Interpreted Code Control Transfer, e.g., trap VMM Specialized Emulation Routines Patched Program Block 1 Code section emulated in code cache Block 3 Two critical instructions combined into a single block. Block 1 Block 2 Block 3 Code Cache Translation Table Block 2

Resource Virtualization 38 Resource Virtualization CPU VirtualizationI/O Virtualization Memory Virtualization

 Virtual memory makes a distinction between the logical view of memory as seen by a program and the actual hardware memory as managed by the OS  The virtual memory support in traditional OSs is sufficient for providing guest OSs with the view of having (and managing) their own real memories  Such an illusion is created by the underlying VMM Virtual Memory Address (seen by a program running on OS) Physical Memory Address In Real Machine Virtual Memory Address (seen by a program running on guest OS) Real Memory Address In Virtual Machine Physical Memory Address

An Example Virtual Memory of Program 1 onVM Real Memory of VM Virtual Memory of Program 2 onVM1 Not Mapped Real Memory of VM Virtual Memory of Program 3 onVM Physical Memory of System 1000 Virtual Page Real Page Virtual Page Real Page Not mapped Page Table for Program 1 Page Table for Program 2 Virtual Page Real Page Page Table for Program 3 VM1 Real Page Physical Page Not mapped Real Map Table for VM1at VMM VM1 Real Page Physical Page Not mapped --- Real Map Table for VM2 at VMM

Resource Virtualization 41 Resource Virtualization CPU VirtualizationI/O VirtualizationMemory Virtualization

I/O Virtualization  The virtualization strategy for a given I/O device type consists of: 1.Constructing a virtual version of the device 2.Virtualizing the I/O activities directed to the device  A virtual device given to a guest VM is typically (but not necessarily) supported by a similar, underlying physical device  When a guest VM makes a request to use the virtual device, the request is intercepted by the VMM  The VMM converts the request to the equivalent request understood by the underlying physical device and sends it out

Virtualizing Devices  The technique that is used to virtualize an I/O device depends on whether the device is shared and, if so, the ways in which it can be shared  The common categories of devices are:  Dedicated devices  Partitioned devices  Shared devices  Spooled devices

Dedicated Devices  Some I/O devices must be dedicated to a particular guest VM or at least switched from one guest to another on a very long time scale  Examples of dedicated devices are: the display, mouse, and speakers of a VM user  A dedicated device does not necessarily have to be virtualized  Requests to and from a dedicated device in a VM can theoretically bypass the VMM  However, in practice these requests go through the VMM because the guest OS runs in a non-privileged user mode

Partitioned Devices  For some devices it is convenient to partition the available resources among VMs  For example, a disk can be partitioned into several smaller virtual disks that are then made available to VMs as dedicated devices  A location on a magnetic disk is defined in terms of cylinders, heads, and sectors (CHS)  The physical properties of the disk are virtualized by the disk firmware  The disk firmware transforms the CHS addresses into consecutively numbered logical blocks for use by host and guest OSs

Disk Virtualization  To emulate an I/O request for a virtual disk:  The VMM uses a map to translate the virtual parameters into real parameters  The VMM then reissues the request to the disk controller CHS  LBA Host OS VMM Guest OS Physical Disk Drive (CHS) Logical Block Addresses (LBAs) Real Block Addresses VM1 VM2

Shared Devices  Some devices, such as a network adapter, can be shared among a number of guest VMs at a fine time granularity  For example, every VM can have its own virtual network address maintained by the VMM  A request by a VM to use the network is translated by the VMM to a request on a physical network port  To make this happen, the VMM uses its own physical network address and a virtual device driver  Similarly, incoming requests through various ports are translated into requests for virtual network addresses associated with different VMs

Network Virtualization- Scenario I  In this example, we assume that the virtual network interface card (NIC) is of the same type as the physical NIC in the host system User on VM1 User sends message to external machine (e.g., using send()) OS on VM1 OS converts into I/O instructions for virtual NIC, (e.g., OUTS 0xf0…) VMM VMM sends packet on virtual bridge to device driver of physical NIC (e.g., OUTS 0x280, …) Device Driver NIC device driver launches packet on network using wire signals To Network

Network Virtualization- Scenario II  In this scenario, we assume that the desired communication is between two virtual machines on the same platform User on VM1 User sends message to local virtual machine (e.g., using send()) OS on VM1 OS converts into I/O instructions (e.g., OUTS 0xf0…) VMM VMM sends packet on virtual bridge to device driver of physical NIC (e.g., OUTS 0x280, …) VMM raises interrupt in receiver’s OS Device Driver NIC device driver converts send message to a receive message for receiving VM OS on VM2 Interrupt handler in OS generates I/O instructions to receive packet User on VM2 Receiver gets packet

Spooled Devices  A spooled device, such as a printer, is shared, but at a much higher granularity than a device such as a network adapter  Virtualization of spooled devices can be performed by using a two-level spool table approach:  Level 1 is within the guest OS, with one table for each active process  Level 2 is within the VMM, with one table for each guest OS  A request from a guest OS to print a spool buffer is intercepted by the VMM, which copies the buffer into one of its own spool buffers  This allows the VMM to schedule requests from different guest OSs on the same printer

51 Thank You!