Lecture 12: Virtualization COSC6376 Cloud Computing Lecture 12: Virtualization Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston
Outline Today Virtualization
Intel virtualization technology evolution 4/13/2018 8:55 AM Intel virtualization technology evolution PCI-SIG Standards for IO-device sharing: Multi-Context I/O Devices Endpoint Address Translation Caching Under definition in the PCI-SIG* IOVWG Vector 3: I/O Focus Hardware support for IO-device virtualization Device DMA remapping Direct assignment of I/O devices to VMs Interrupt Routing and Remapping VT-d Vector 2: Platform Focus Establish foundation for virtualization in the IA-32 and Itanium architectures… VT-x VT-i … followed by on-going evolution of support: Micro-architectural (e.g., lower VM switch times) Architectural (e.g., Extended Page Tables) Vector 1: Processor Focus VMM Software Evolution Software-only VMMs Binary translation Paravirtualization Simpler and more Secure VMM through foundation of virtualizable ISAs Increasingly better CPU and I/O virtualization performance and functionality as I/O devices and VMMs exploit infrastructure provided by VT-x, VT-i, VT-d Past No Hardware Support Today VMM software evolution over time with hardware support © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
VT-x Overview: Intel Virtualization Technology For IA-32 Processors 4/13/2018 8:55 AM VT-x Overview: Intel Virtualization Technology For IA-32 Processors © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
VT-x overview Operating modes Guest SW VMM Transitions Virtual-machine control structure Principal causes of VM Exits Benefits
Operating modes VMX root operation: VMX non-root operation: Fully privileged, intended for VM monitor VMX non-root operation: Not fully privileged, intended for guest software Reduces Guest SW privilege w/o relying on rings Solution to Ring Aliasing
VM entry and VM exit ... VM Entry VM Exit Transition from VMM to Guest Enters VMX non-root operation Loads Guest state and Exit criteria from VMCS VM Exit VMEXIT instruction used on transition from Guest to VMM Enters VMX root operation Saves Guest state in VMCS Loads VMM state from VMCS VM0 VM1 ... ... App App App ... App App App Guest OS0 Guest OS1 VM Monitor VM Exit VM Entry Physical Host Hardware
VT-x operations VMX Non-root Operation . . . IA-32 Operation VMX Root Ring 0 Ring 3 VM 1 Ring 0 Ring 3 VM 2 Ring 0 Ring 3 VM n VMX Non-root Operation . . . VMCS 1 VMCS 2 VMCS n VM Exit Ring 0 Ring 3 IA-32 Operation VMX Root Operation VMXON VMLAUNCH VMRESUME
Virtual machine control structure (VMCS) VMCSs are Control Structures in Memory Only one VMCS active per virtual processor at any given time VMCS Payload: VM execution, VM exit, and VM entry controls Guest and host state VM-exit information fields VMCS Format not defined and may vary VMPTRLD: Establishes a pointer to a desired VMCS VMREAD/VMWRITE: New VMCS Access instructions
Principal causes of VMEXIT Paging state exits allow page-table control CR3 accesses, INVLPG cause exits Selectively exit on page faults CR0/CR4 controls allow exiting on changes to selected bits State-based exits allow function virtualization CPUID, RDMSR, WRMSR, RDPMC, RDTSC, MOV DRx Selective exception and I/O exiting reduce unnecessary exits 32-entry exception bitmap, I/O-port access bitmap Controls provided for asynchronous events Host interrupt control allows delivery to VMM even when guest blocking interrupts Detection of guest inactivity to support VM scheduling HLT, MWAIT, PAUSE
Running in a Virtual Machine Mechanisms to determine if software is running in a VMware virtual machine int cpuid_check() { unsigned int eax, ebx, ecx, edx; char hyper_vendor_id[13]; cpuid(0x1, &eax, &ebx, &ecx, &edx); if (bit 31 of ecx is set) { cpuid(0x40000000, &eax, &ebx, &ecx, &edx); memcpy(hyper_vendor_id + 0, &ebx, 4); memcpy(hyper_vendor_id + 4, &ecx, 4); memcpy(hyper_vendor_id + 8, &edx, 4); hyper_vendor_id[12] = '\0'; if (!strcmp(hyper_vendor_id, "VMwareVMware")) return 1; // Success - running under VMware } return 0;
Benefits: VT helps improve VMMs VT Reduces guest OS dependency Eliminates need for binary patching / translation Facilitates support for Legacy OS VT improves robustness Eliminates need for complex SW techniques Simpler and smaller VMMs Smaller trusted-computing base VT improves performance Fewer unwanted Guest VMM transitions
Extended page tables (EPT) 4/13/2018 8:55 AM Extended page tables (EPT) A VMM must protect host physical memory Multiple guest operating systems share the same host physical memory VMM typically implements protections through “page-table shadowing” in software Page-table shadowing accounts for a large portion of virtualization overheads Goal of EPT is to reduce these overheads © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
What Is EPT? Extended Page Table 4/13/2018 8:55 AM What Is EPT? Guest IA-32 Page Tables Guest Linear Address Guest Physical Address Extended Host Physical Address EPT Base Pointer (EPTP) CR3 Extended Page Table A new page-table structure, under the control of the VMM Defines mapping between guest- and host-physical addresses EPT base pointer (new VMCS field) points to the EPT page tables Guest has full control over its own IA-32 page tables © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
EPT translation: details 4/13/2018 8:55 AM EPT translation: details All guest-physical memory addresses go through EPT tables (CR3, PDE, PTE, etc.) Above example is for 2-level table for 32-bit address space Translation possible for other page-table formats (e.g., PAE) © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Malware Malware can escape from a guest OS and infect VMM.
VT-d Overview: Intel Virtualization Technology For Directed I/O 4/13/2018 8:55 AM VT-d Overview: Intel Virtualization Technology For Directed I/O © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Q35 chipsets system block diagram
PCI Express 3rd generation high-performance I/O bus Used to interconnect peripheral devices Point-to-point connection as opposed to bus PCIe interconnect consists of either a x1, x2, x4, x8, x12, x16 or x32 point-to-point link if you have x16 link, there are 64 physical lines (16 * 2 (both directions) * 2 (differential signaling)) 1st generation ISA, EISA, VESA and Micro Channel buses 2nd generation PCI, PCI-X, and AGP
Rod Canion
Extended Industry Standard Architecture A bus standard for IBM PC compatible computers. Announced in September 1988. Managed by a consortium of PC clone vendors. A counter to IBM's use of its proprietary Micro Channel architecture (MCA).
PCIe-based system topology Root Complex Denote the root of I/O hierarchy that connects the CPU/memory subsystem to the I/O May support one or more PCIe ports as shown Endpoint devices other than root complex and switches that are requesters or completers of PCIe transactions Souce: PCIe specification 2.0
Three IA-32 address-spaces memory space (4GB) i/o space (64KB) PCI configuration (16MB) accessed using a large variety of processor instructions (mov, add, or, shr, push, etc.) and virtual-to-physical address-translation accessed only by using the processor’s special ‘in’ and ‘out’ instructions (without any translation of port-addresses) PCIe supports the same address spaces as PCI Memory space IO space Configuration space PCIe provides a 4KB space per a function as opposed to 256B in PCI i/o-ports 0x0CF8-0x0CFF dedicated to accessing PCI Configuration Space
PCI configuration header 16 doublewords 31 0 31 0 Dwords Status Register Command Register Device ID Vendor ID 1 - 0 BIST Header Type Latency Timer Cache Line Size Class Code Class/SubClass/ProgIF Revision ID 3 - 2 Base Address 1 Base Address 0 5 - 4 Base Address 3 Base Address 2 7 - 6 Base Address 5 Base Address 4 9 - 8 Subsystem Device ID Subsystem Vendor ID CardBus CIS Pointer 11 - 10 reserved capabilities pointer Expansion ROM Base Address 13 - 12 Maximum Latency Minimum Grant Interrupt Pin Interrupt Line reserved 15 - 14
Typical NIC TX FIFO nic RX FIFO CPU packet main memory transceiver buffer LAN cable B U S RX FIFO CPU
PCI devices and functions A PCI device may include between 1 and 8 functions Function numbers range from 0 to 7 Function 0 must always be present Classified as single-function and multi-function devices
DMA (Direct Memory Access) DMA has the ability to transfer large blocks of data directly to/from the memory without involving the processor The processor initiates the DMA transfer by supplying source and destination addresses, the number of bytes to transfer The DMA controller manages the entire transfer (possibly thousand of bytes in length), arbitrating for the bus When the DMA transfer is complete, the DMA controller interrupts the processor to inform that the transfer is complete
DMA (Direct Memory Access) DMA has the ability to transfer large blocks of data directly to/from the memory without involving the processor
Options for I/O virtualization 4/13/2018 8:55 AM Options for I/O virtualization Hypervisor Shared Devices I/O Services Device Drivers VM0 Guest OS and Apps VMn Monolithic Model Pro: High Security Pro: I/O Device Sharing Pro: VM Migration Con: Lower Performance Shared Devices I/O Services Hypervisor Device Drivers Service VMs VMn VM0 Guest OS and Apps Guest VMs Service VM Model Pro: Highest Performance Pro: Smaller Hypervisor Pro: Device assisted sharing Con: Migration Challenges Assigned Devices Hypervisor VM0 Guest OS and Apps Device Drivers VMn Pass-through Model Pro: Higher Performance Pro: I/O Device Sharing Pro: VM Migration Con: Larger Hypervisor VT-d Goal: Support all Models © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
VT-d overview VT-d is platform infrastructure for I/O virtualization 4/13/2018 8:55 AM VT-d overview VT-d is platform infrastructure for I/O virtualization Defines architecture for DMA remapping Implemented as part of platform core logic Will be supported broadly in Intel server and client chipsets CPU DRAM South Bridge System Bus PCI Express PCI, LPC, Legacy devices, … Integrated Devices North Bridge VT-d PCIe* Root Ports © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
How VTd works? Each VM thinks it is 0 address based 600 1000 100 200 250 350 700 Each VM thinks it is 0 address based GPA (Guest Physical Address) But mapped to a different address in the system memory HPA (Host Physical Address) VTd does the address mapping between GPA and HPA Catches any DMA attempt to cross VM memory boundary VM2 VM0 VM1 100 300 50 10 260
VT-d usage Basic infrastructure for I/O virtualization 4/13/2018 8:55 AM VT-d usage Basic infrastructure for I/O virtualization Enable direct assignment of I/O devices to unmodified or paravirtualized VMs Improves system reliability Contain and report errant DMA to software Enhances security Support multiple protection domains under SW control Provide foundation for building trusted I/O capabilities Other usages Generic facility for DMA scatter/gather Overcome addressability limitations on legacy devices © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
VT-d architecture detail 4/13/2018 8:55 AM VT-d architecture detail DMA Requests Memory-resident Partitioning And Translation Structures Device Assignment Structures Address Translation Device D1 Device D2 Bus 255 Bus 0 Bus N Dev 31, Func 7 Dev P, Func 1 Dev 0, Func 0 Dev P, Func 2 Page Frame 4KB Page Tables Device ID Virtual Address Length … Fault Generation DMA Remapping Engine Translation Cache Context Cache Memory Access with System Physical Address © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
VT-d: hardware page walk 4/13/2018 8:55 AM VT-d: hardware page walk Bus Device Func 2 3 7 8 15 Requestor ID Level-4 Page Table Level-3 Level-2 Level-1 Page Example Device Assignment Table Entry specifying 4-level page table 56 DMA Virtual Address 11 table offset Level-3 table offset Level-2 table offset Level-1 table offset 12 20 21 29 30 38 39 47 000000000b 63 48 57 Page Offset 000000b Device Assignment Tables Base © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
VT-d applied to pass-through model Assigned Devices Hypervisor VM0 Guest OS and Apps Device Drivers VMn Pass-through Model Direct Device Assignment to Guest OS Guest OS directly programs physical device For legacy guests, hypervisor sets up guest- to host-physical DMA mapping For remapping aware guests, hypervisor involved in map/unmap of DMA buffers PCI-SIG I/O Virtualization Working Group Activity towards standardizing natively sharable I/O devices IOV devices provide virtual interfaces, each independently assignable to VMs Pro: Highest Performance Pro: Smaller Hypervisor Pro: Device-assisted sharing Con: VM Migration Limits
DMA remapping: IOTLB scaling Address Translation Services (ATS) extensions to PCIe* enable IOTLB scaling ATS endpoint implements ‘Device IOTLBs’ Device-IOTLBs can be used to improve performance E.g., Cache only static translations (e.g. command buffers) Pre-fetch translations to reduce latency Minimizes dependency on root-complex caching Support device-specific demand I/O paging *Other names and brands may be claimed as the property of others
Address Translation Services (ATS) ATS Translation Flows Device issues Translation Requests to root-complex Root-complex provides Translation Response Device caches translation locally in ‘Device IOTLB’ Devices can issue DMA with translated address Translated DMA from enabled devices bypass address translation Root Complex Translation Request Endpoint Device Remap Hardware IOTLB Translate Address Translation Response Translated DMA Request Device IOTLB DMA using Translated Address VT-d supports per-device control of ATS *Other names and brands may be claimed as the property of others
VT-x & VT-d working together 4/13/2018 8:55 AM VT-x & VT-d working together Virtual Machines Virtual Machine Monitor (VMM) Binary Translation Paravirtualization Page-table Shadowing IO-Device Emulation Interrupt Virtualization DMA Remap VT-d VT-x Logical Processors I/O Devices Hardware Virtualization Mechanisms under VMM Control Physical Memory © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Mapping to VMM software challenges VMn Virtual Machines (VMs) … Apps Apps Apps Apps OS OS OS OS Higher-level VMM Functions: Resource Discovery / Provisioning / Scheduling / User Interface VMM (a.k.a., hypervisor) Processor Virtualization Memory Virtualization I/O Device Virtualization Ring Deprivileging Virtual CPU Configuration EPT Configuration DMA and Interrupt Remapping Configuration VT-x VT-x2 VMDq VT-d2 PCI SIG VT-d Binary Translation Page-table Shadowing I/O DMA Remapping Interrupt Remapping I/O Device Emulation CPU0 CPU0 Storage Physical Platform Resources CPUn CPUn Network Processors Memory I/O Devices
PCI-E Endpoint Sharing 4/13/2018 8:55 AM PCI-E Endpoint Sharing © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
I/O Virtualization Intermediary 4/13/2018 8:55 AM Terminology I/O Virtualization (IOV) - the capability for a single physical I/O unit to be shared by more than one System Image I/O Virtualization Intermediary (IOVI) - software or firmware that is used to support IOV by intervening on one or more of the following: Configuration, I/O, and Memory operations from a System Image; and DMA, completion, and interrupt operations to a System Image I/O Virtualization Intermediary Virtual I/O Virtual System 1 Physical System System Image 1 Virtual System 2 System Image 2 Physical I/O © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
PCI endpoint (EP) sharing 4/13/2018 8:55 AM PCI endpoint (EP) sharing Host CPU set Host CPU set RCVE - RC Virtualization Enablers ATPT - Address Translation and Protection Table Interrupt Table SI 1 SI 2 IO VI IOVI - IO Virtualization Intermediary SI 1 SI 2 IO VI PCI Root PCI Root RCVE PCIe Switch PCIe Switch PCIe Endpoint PCIe Endpoint PCIe Endpoint PCIe IOV Endpoint PCIe IOV Endpoint PCIe IOV Endpoint EP Shared Thru Intermediary RC has no virtualization enablers One or more System Images PCIe EPs shared through IO VI IOVI is involved in all IO transactions and performs all IO Virtualization Functions, for example Multiplexes SIs’ IO queues onto a single queue in the adapter PCIe EP is not required to support any virtualization functions Natively Shared Endpoints RC has virtualization enablers One or more System Images PCIe EPs shared through IO VI Same as IOVI without ATPT PCIe IOV enabled EPs are directly shared IOVI involved in config ops Data transfers are direct © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Adapter IOV mechanisms within a single physical system 4/13/2018 8:55 AM Dedicated Adapter (No Virtualization) Adapter Shared Through Intermediary Natively Shared Adapter Graphic Depiction Intermediary Role None Virtualizes physical I/O by intervening on configuration and data transfer operations Manages assignment of Virtual Resources by intervening on configuration operations Configuration Operation Path SI direct to Adapter VI serves as proxy (SI to VI; VI to Adapter) Data Transfer Operation Path System Image 1 System Image 1 System Image 2 System Image 1 System Image 2 I/O Virtualization Intermediary I/O Virtualization Intermediary Physical System Physical System Physical System Physical I/O Physical I/O Physical I/O © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
4/13/2018 8:55 AM Single Root IOV © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Single RC PCIe IOV enabled endpoint requirements 4/13/2018 8:55 AM Base PCIe 1.x System IOV Enabled PCIe System Only PCIe endpoints shall be specified for IOV enabled Endpoints Native based PCI SR OV enabled Endpoints shall be backwards compatible, in a non virtualized mode, with PCIe base 1.x SPEC Host CPU set Host CPU set SI SI 1 SI N PCIM PCI Root PCI Root PCIe Switch PCIe Switch PCI Bridge PCI Adapter PCI Adapter PCIe IOV Endpoint PCIe IOV Endpoint PCIe Switch PCIe Switch PCI-X Bridge PCIe Endpoint PCIe IOV Endpoint PCIe Endpoint PCIe IOV Endpoint PCI-X Adapter PCI-X Adapter © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Single root PCIe IOV endpoint reqs 4/13/2018 8:55 AM Single root PCIe IOV endpoint reqs Host CPU set A mechanism shall be provided to allow a VF to be associated with an SI, such that data movement operations are enabled and can be performed directly between the SI and its associated VF, without VI involvement The virtualization mechanisms defined in this specification may require a VI (such as a PCI Configuration Manager) to be involved for configuration operations performed on a VF SI 1 SI N PCIM IOV Enabled PCIe System PCI Root PCIe Switch PCIe Port PCIe IOV Endpoint Internal Routing 2. VF (Virtual Function1) 1. Physical Resources1 Configuration Management : Non-separable Resources VF (Virtual FunctionN) Physical ResourcesN Sharable Resource Pool © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
SR-IOV Physical Functions (PFs) are full PCIe devices that include the SR-IOV capabilities. Physical Functions configure and manage the SR-IOV functionality by assigning Virtual Functions. Virtual Functions (VFs) are simple PCIe functions that only process I/O. A single Ethernet port, the Physical Device, may map to many Virtual Functions that can be shared to virtualized guests. The number of Virtual Functions a device may have is limited by the device hardware.
4/13/2018 8:55 AM Multi-Root IOV © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Multi-Root PCIe IOV endpoint requirements 4/13/2018 8:55 AM Multi-Root PCIe IOV endpoint requirements The multi-root solution Shall give each RC its own Virtual Hierarchy Shall enable each switch, bridge, function, and VF to be uniquely represented in the configuration space of each RC Host CPU set PCIe Root PCI-X Device PCI-X Bridge PCI PCI Bridge PCIe IOV Endpoint PCIe Switch PCIe Physical View Host CPU set Virtual View PCIe Root PCIe Switch PCIeSwitch PCI Bridge PCIe Switch PCIe IOV Endpoint PCIe Endpoint PCI Device © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Multi-Root PCIe IOV endpoint requirements 4/13/2018 8:55 AM Multi-Root PCIe IOV endpoint requirements The multi-root solution shall Provide the same characteristics to its IOV enabled Endpoints as the single-root solution relative to separate SIs Enable use of existing PCIe 1.x or later RC Enable existing PCIe 1.x Switches, Endpoints, and PCIe to PCI/PCI-X Bridges to each be bound to a single RC Enable an IOV enabled endpoint to be shared amongst multiple RC’s using a Multi-Root Aware (MRA) PCIe switch Optional SMP Fabric Host CPU set Host CPU set 1. 1. SI 1 SI 2 SI 3 SI 3 PCIM MRA PCIe Root 2. PCIe 1.x Root MRA PCIe Switch MRA PCIe Switch 3. PCI Bridge PCI-X Adapter PCIe Endpoint PCI Adapter PCI-X Adapter 4. 3. MRA PCI-X Bridge PCI-X Adapter PCI-X Adapter MRA PCIe Switch PCI-X Adapter 4. PCIe Endpoint PCIe Endpoint © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
PLX
Network Device IOV 4/13/2018 8:55 AM © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Network virtualization Virtual I/O in “Native” and Guest OS Limitations of sharing I/O device in software prevent HPC applications from running in VMs Virtualization overhead Loss of “native” features Lack of SLA guarantees
Evolution of hardware IOV PCIe multi-port and multi-function solutions Single-Root (SR IOV) PCI-SIG specification Direct Hardware Access (pass-thru)
I/O virtualization and hardware SLA Over-Provisioning Bandwidth No or Limited QoS Bandwidth Over-Provisioning 2 Gbps VM1: High VM1 VM1 VM1 VM1: High VM1 VM1 VM1 VM2 2 Gbps VM2: Low VM2 VM2 Bandwidth ‘Buffer’ 2 Gbps VM3: Low VM3 VM3 VM2: Low VM2 VM2 VM2 VM3 2 Gbps VM4: High VM4 VM4 VM2 Bandwidth ‘Buffer’ VM4 2 Gbps VM5: High VM5 VM5 VM5 Time T0 T1 T2 Time T0 T1 T2 Example: 5 Virtual Machines – SLAs require 2 Gbps per VM Low Priority VM2 Receive: 5 Gbps spike Incoming traffic surge and no QoS results in SLAs being violated Traditional Solution: Over-provision bandwidth to attempt to meet SLAs Reduces VM count by 50% or more
Network virtualization with IOQoS Optimum Resource Utilization Single X3120 Replace multiple HBA and NICs with single dual port 10GbE adapter Clustering – RoCE IP Storage Data – 10GbE IOQoS™ Easy management of Prioritization and Bandwidth Allocation Direct Access for Latency sensitive (clustering) fabric VM1 VMn Application Application MPI MPI O/S O/S Exar Guest OS Driver VMxnet Driver Exar Guest OS Driver VMxnet Driver Virtual L2 Switch Exar Host Driver KVM Host Neterion X3120 X3120 configured with Multiple PCI Functions NIC 1 NIC 0 NIC 3 NIC 4 NIC 2 Integrated L2 Switch