Lecture 12: Virtualization

Slides:



Advertisements
Similar presentations
Device Virtualization Architecture
Advertisements

Virtualization Dr. Michael L. Collard
Virtualization Technology
Hardware Assisted Virtualization
Web RoleWorker Role At runtime each Role will execute on one or more instances A role instance is a set of code, configuration, and local data, deployed.
Virtualization and Cloud Computing
Figure 1.1 Interaction between applications and the operating system.
Virtual Machines. Virtualization Virtualization deals with “extending or replacing an existing interface so as to mimic the behavior of another system”
Virtualization for Cloud Computing
Windows Server Scalability And Virtualized I/O Fabric For Blade Server
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
Intel Virtualization Technology: Strategy And Evolution
虛擬化技術 Virtualization Techniques
虛擬化技術 Virtualization and Virtual Machines
Mahesh Wagh Intel Corporation Member, PCIe Protocol Workgroup.
Microsoft Desktop Virtualization Migrating to Windows 7 With MED-V.
Virtualization Technology Prof D M Dhamdhere CSE Department IIT Bombay Moving towards Virtualization… Department of Computer Science and Engineering, IIT.
Tanenbaum 8.3 See references
Benefits: Increased server utilization Reduced IT TCO Improved IT agility.
Xen I/O Overview. Xen is a popular open-source x86 virtual machine monitor – full-virtualization – para-virtualization para-virtualization as a more efficient.
Virtualization Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is licensed.
IO Memory Management Hardware Goes Mainstream
Virtual Machine and its Role in Distributed Systems.
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
Operating System for the Cloud Runs applications in the cloud Provides Storage Application Management Windows Azure ideal for applications needing:
Windows Server 2012 Hyper-V Networking
LegendCorp What is System Center Virtual Machine Manager (SCVMM)? SCVMM at a glance Features and Benefits Components / Topology /
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
Full and Para Virtualization
Lecture 12 Virtualization Overview 1 Dec. 1, 2015 Prof. Kyu Ho Park “Understanding Full Virtualization, Paravirtualization, and Hardware Assist”, White.
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
Technical Overview of Microsoft’s NetDMA Architecture Rade Trimceski Program Manager Windows Networking & Devices Microsoft Corporation.
Virtualization-optimized architectures
Lecture 15: IO Virtualization
Lecture 2. A Computer System for Labs
Lecture 14: CPU and I/O Virtualization
12/30/2017 8:55 AM Особенности и улучшения работы сети в гипервизоре Windows Server 2008 R2 Панов Никита Технический инженер Microsoft MCP Leader
Virtualization for Cloud Computing
Virtualization Technology
Lecture 13: Virtualization
COSC6385 Advanced Computer Architecture Lecture 7. Virtual Memory
Chapter 13: I/O Systems Modified by Dr. Neerja Mhaskar for CS 3SH3.
Why VT-d Direct memory access (DMA) is a method that allows an input/output (I/O) device to send or receive data directly to or from the main memory, bypassing.
L2- Virtualization Technology
Introduction to Windows Azure AppFabric
Virtualization Dr. Michael L. Collard
Windows Azure Cloud Visit – Ravindra verma.
CS 286 Computer Organization and Architecture
Virtualisation Assessment & Roadmap
Use server-based personal desktops in Windows Server 2016
Group 8 Virtualization of the Cloud
OS Virtualization.
Virtualization Techniques
Microsoft Virtual Academy
I/O BUSES.
Pedro Miguel Teixeira Senior Software Developer Microsoft Corporation
Building continuously available systems with Hyper-V
Windows Server 2008 Iain McDonald Director of Program Management
Windows Virtual PC / Hyper-V
Computer Security: Art and Science, 2nd Edition
CSE 451: Operating Systems Autumn 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 596 Allen Center 1.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
CSE 451: Operating Systems Winter 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 412 Sieg Hall 1.
Microsoft Virtual Academy
Service Template Creation from the Ground Up
Service Template Creation from the Ground Up
Virtualization Dr. S. R. Ahmed.
Chapter 13: I/O Systems.
Making Windows Azure Relevant to IT Professionals
Presentation transcript:

Lecture 12: Virtualization COSC6376 Cloud Computing Lecture 12: Virtualization Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston

Outline Today Virtualization

Intel virtualization technology evolution 4/13/2018 8:55 AM Intel virtualization technology evolution PCI-SIG Standards for IO-device sharing: Multi-Context I/O Devices Endpoint Address Translation Caching Under definition in the PCI-SIG* IOVWG Vector 3: I/O Focus Hardware support for IO-device virtualization Device DMA remapping Direct assignment of I/O devices to VMs Interrupt Routing and Remapping VT-d Vector 2: Platform Focus Establish foundation for virtualization in the IA-32 and Itanium architectures… VT-x VT-i … followed by on-going evolution of support: Micro-architectural (e.g., lower VM switch times) Architectural (e.g., Extended Page Tables) Vector 1: Processor Focus VMM Software Evolution Software-only VMMs Binary translation Paravirtualization Simpler and more Secure VMM through foundation of virtualizable ISAs Increasingly better CPU and I/O virtualization performance and functionality as I/O devices and VMMs exploit infrastructure provided by VT-x, VT-i, VT-d Past No Hardware Support Today VMM software evolution over time with hardware support © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

VT-x Overview: Intel Virtualization Technology For IA-32 Processors 4/13/2018 8:55 AM VT-x Overview: Intel Virtualization Technology For IA-32 Processors © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

VT-x overview Operating modes Guest SW  VMM Transitions Virtual-machine control structure Principal causes of VM Exits Benefits

Operating modes VMX root operation: VMX non-root operation: Fully privileged, intended for VM monitor VMX non-root operation: Not fully privileged, intended for guest software Reduces Guest SW privilege w/o relying on rings Solution to Ring Aliasing

VM entry and VM exit ... VM Entry VM Exit Transition from VMM to Guest Enters VMX non-root operation Loads Guest state and Exit criteria from VMCS VM Exit VMEXIT instruction used on transition from Guest to VMM Enters VMX root operation Saves Guest state in VMCS Loads VMM state from VMCS VM0 VM1 ... ... App App App ... App App App Guest OS0 Guest OS1 VM Monitor VM Exit VM Entry Physical Host Hardware

VT-x operations VMX Non-root Operation . . . IA-32 Operation VMX Root Ring 0 Ring 3 VM 1 Ring 0 Ring 3 VM 2 Ring 0 Ring 3 VM n VMX Non-root Operation . . . VMCS 1 VMCS 2 VMCS n VM Exit Ring 0 Ring 3 IA-32 Operation VMX Root Operation VMXON VMLAUNCH VMRESUME

Virtual machine control structure (VMCS) VMCSs are Control Structures in Memory Only one VMCS active per virtual processor at any given time VMCS Payload: VM execution, VM exit, and VM entry controls Guest and host state VM-exit information fields VMCS Format not defined and may vary VMPTRLD: Establishes a pointer to a desired VMCS VMREAD/VMWRITE: New VMCS Access instructions

Principal causes of VMEXIT Paging state exits allow page-table control CR3 accesses, INVLPG cause exits Selectively exit on page faults CR0/CR4 controls allow exiting on changes to selected bits State-based exits allow function virtualization CPUID, RDMSR, WRMSR, RDPMC, RDTSC, MOV DRx Selective exception and I/O exiting reduce unnecessary exits 32-entry exception bitmap, I/O-port access bitmap Controls provided for asynchronous events Host interrupt control allows delivery to VMM even when guest blocking interrupts Detection of guest inactivity to support VM scheduling HLT, MWAIT, PAUSE

Running in a Virtual Machine Mechanisms to determine if software is running in a VMware virtual machine int cpuid_check() { unsigned int eax, ebx, ecx, edx; char hyper_vendor_id[13]; cpuid(0x1, &eax, &ebx, &ecx, &edx); if (bit 31 of ecx is set) { cpuid(0x40000000, &eax, &ebx, &ecx, &edx); memcpy(hyper_vendor_id + 0, &ebx, 4); memcpy(hyper_vendor_id + 4, &ecx, 4); memcpy(hyper_vendor_id + 8, &edx, 4); hyper_vendor_id[12] = '\0'; if (!strcmp(hyper_vendor_id, "VMwareVMware")) return 1; // Success - running under VMware } return 0;

Benefits: VT helps improve VMMs VT Reduces guest OS dependency Eliminates need for binary patching / translation Facilitates support for Legacy OS VT improves robustness Eliminates need for complex SW techniques Simpler and smaller VMMs Smaller trusted-computing base VT improves performance Fewer unwanted Guest  VMM transitions

Extended page tables (EPT) 4/13/2018 8:55 AM Extended page tables (EPT) A VMM must protect host physical memory Multiple guest operating systems share the same host physical memory VMM typically implements protections through “page-table shadowing” in software Page-table shadowing accounts for a large portion of virtualization overheads Goal of EPT is to reduce these overheads © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

What Is EPT? Extended Page Table 4/13/2018 8:55 AM What Is EPT? Guest IA-32 Page Tables Guest Linear Address Guest Physical Address Extended Host Physical Address EPT Base Pointer (EPTP) CR3 Extended Page Table A new page-table structure, under the control of the VMM Defines mapping between guest- and host-physical addresses EPT base pointer (new VMCS field) points to the EPT page tables Guest has full control over its own IA-32 page tables © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

EPT translation: details 4/13/2018 8:55 AM EPT translation: details All guest-physical memory addresses go through EPT tables (CR3, PDE, PTE, etc.) Above example is for 2-level table for 32-bit address space Translation possible for other page-table formats (e.g., PAE) © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Malware Malware can escape from a guest OS and infect VMM.

VT-d Overview: Intel Virtualization Technology For Directed I/O 4/13/2018 8:55 AM VT-d Overview: Intel Virtualization Technology For Directed I/O © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Q35 chipsets system block diagram

PCI Express 3rd generation high-performance I/O bus Used to interconnect peripheral devices Point-to-point connection as opposed to bus PCIe interconnect consists of either a x1, x2, x4, x8, x12, x16 or x32 point-to-point link if you have x16 link, there are 64 physical lines (16 * 2 (both directions) * 2 (differential signaling)) 1st generation ISA, EISA, VESA and Micro Channel buses 2nd generation PCI, PCI-X, and AGP

Rod Canion

Extended Industry Standard Architecture A bus standard for IBM PC compatible computers. Announced in September 1988. Managed by a consortium of PC clone vendors. A counter to IBM's use of its proprietary Micro Channel architecture (MCA).

PCIe-based system topology Root Complex Denote the root of I/O hierarchy that connects the CPU/memory subsystem to the I/O May support one or more PCIe ports as shown Endpoint devices other than root complex and switches that are requesters or completers of PCIe transactions Souce: PCIe specification 2.0

Three IA-32 address-spaces memory space (4GB) i/o space (64KB) PCI configuration (16MB) accessed using a large variety of processor instructions (mov, add, or, shr, push, etc.) and virtual-to-physical address-translation accessed only by using the processor’s special ‘in’ and ‘out’ instructions (without any translation of port-addresses) PCIe supports the same address spaces as PCI Memory space IO space Configuration space PCIe provides a 4KB space per a function as opposed to 256B in PCI i/o-ports 0x0CF8-0x0CFF dedicated to accessing PCI Configuration Space

PCI configuration header 16 doublewords 31 0 31 0 Dwords Status Register Command Register Device ID Vendor ID 1 - 0 BIST Header Type Latency Timer Cache Line Size Class Code Class/SubClass/ProgIF Revision ID 3 - 2 Base Address 1 Base Address 0 5 - 4 Base Address 3 Base Address 2 7 - 6 Base Address 5 Base Address 4 9 - 8 Subsystem Device ID Subsystem Vendor ID CardBus CIS Pointer 11 - 10 reserved capabilities pointer Expansion ROM Base Address 13 - 12 Maximum Latency Minimum Grant Interrupt Pin Interrupt Line reserved 15 - 14

Typical NIC TX FIFO nic RX FIFO CPU packet main memory transceiver buffer LAN cable B U S RX FIFO CPU

PCI devices and functions A PCI device may include between 1 and 8 functions Function numbers range from 0 to 7 Function 0 must always be present Classified as single-function and multi-function devices

DMA (Direct Memory Access) DMA has the ability to transfer large blocks of data directly to/from the memory without involving the processor The processor initiates the DMA transfer by supplying source and destination addresses, the number of bytes to transfer The DMA controller manages the entire transfer (possibly thousand of bytes in length), arbitrating for the bus When the DMA transfer is complete, the DMA controller interrupts the processor to inform that the transfer is complete

DMA (Direct Memory Access) DMA has the ability to transfer large blocks of data directly to/from the memory without involving the processor

Options for I/O virtualization 4/13/2018 8:55 AM Options for I/O virtualization Hypervisor Shared Devices I/O Services Device Drivers VM0 Guest OS and Apps VMn Monolithic Model Pro: High Security Pro: I/O Device Sharing Pro: VM Migration Con: Lower Performance Shared Devices I/O Services Hypervisor Device Drivers Service VMs VMn VM0 Guest OS and Apps Guest VMs Service VM Model Pro: Highest Performance Pro: Smaller Hypervisor Pro: Device assisted sharing Con: Migration Challenges Assigned Devices Hypervisor VM0 Guest OS and Apps Device Drivers VMn Pass-through Model Pro: Higher Performance Pro: I/O Device Sharing Pro: VM Migration Con: Larger Hypervisor VT-d Goal: Support all Models © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

VT-d overview VT-d is platform infrastructure for I/O virtualization 4/13/2018 8:55 AM VT-d overview VT-d is platform infrastructure for I/O virtualization Defines architecture for DMA remapping Implemented as part of platform core logic Will be supported broadly in Intel server and client chipsets CPU DRAM South Bridge System Bus PCI Express PCI, LPC, Legacy devices, … Integrated Devices North Bridge VT-d PCIe* Root Ports © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

How VTd works? Each VM thinks it is 0 address based 600 1000 100 200 250 350 700 Each VM thinks it is 0 address based GPA (Guest Physical Address) But mapped to a different address in the system memory HPA (Host Physical Address) VTd does the address mapping between GPA and HPA Catches any DMA attempt to cross VM memory boundary VM2 VM0 VM1 100 300 50 10 260

VT-d usage Basic infrastructure for I/O virtualization 4/13/2018 8:55 AM VT-d usage Basic infrastructure for I/O virtualization Enable direct assignment of I/O devices to unmodified or paravirtualized VMs Improves system reliability Contain and report errant DMA to software Enhances security Support multiple protection domains under SW control Provide foundation for building trusted I/O capabilities Other usages Generic facility for DMA scatter/gather Overcome addressability limitations on legacy devices © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

VT-d architecture detail 4/13/2018 8:55 AM VT-d architecture detail DMA Requests Memory-resident Partitioning And Translation Structures Device Assignment Structures Address Translation Device D1 Device D2 Bus 255 Bus 0 Bus N Dev 31, Func 7 Dev P, Func 1 Dev 0, Func 0 Dev P, Func 2 Page Frame 4KB Page Tables Device ID Virtual Address Length … Fault Generation DMA Remapping Engine Translation Cache Context Cache Memory Access with System Physical Address © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

VT-d: hardware page walk 4/13/2018 8:55 AM VT-d: hardware page walk Bus Device Func 2 3 7 8 15 Requestor ID Level-4 Page Table Level-3 Level-2 Level-1 Page Example Device Assignment Table Entry specifying 4-level page table 56 DMA Virtual Address 11 table offset Level-3 table offset Level-2 table offset Level-1 table offset 12 20 21 29 30 38 39 47 000000000b 63 48 57 Page Offset 000000b Device Assignment Tables Base © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

VT-d applied to pass-through model Assigned Devices Hypervisor VM0 Guest OS and Apps Device Drivers VMn Pass-through Model Direct Device Assignment to Guest OS Guest OS directly programs physical device For legacy guests, hypervisor sets up guest- to host-physical DMA mapping For remapping aware guests, hypervisor involved in map/unmap of DMA buffers PCI-SIG I/O Virtualization Working Group Activity towards standardizing natively sharable I/O devices IOV devices provide virtual interfaces, each independently assignable to VMs Pro: Highest Performance Pro: Smaller Hypervisor Pro: Device-assisted sharing Con: VM Migration Limits

DMA remapping: IOTLB scaling Address Translation Services (ATS) extensions to PCIe* enable IOTLB scaling ATS endpoint implements ‘Device IOTLBs’ Device-IOTLBs can be used to improve performance E.g., Cache only static translations (e.g. command buffers) Pre-fetch translations to reduce latency Minimizes dependency on root-complex caching Support device-specific demand I/O paging *Other names and brands may be claimed as the property of others

Address Translation Services (ATS) ATS Translation Flows Device issues Translation Requests to root-complex Root-complex provides Translation Response Device caches translation locally in ‘Device IOTLB’ Devices can issue DMA with translated address Translated DMA from enabled devices bypass address translation Root Complex Translation Request Endpoint Device Remap Hardware IOTLB Translate Address Translation Response Translated DMA Request Device IOTLB DMA using Translated Address VT-d supports per-device control of ATS *Other names and brands may be claimed as the property of others

VT-x & VT-d working together 4/13/2018 8:55 AM VT-x & VT-d working together Virtual Machines Virtual Machine Monitor (VMM) Binary Translation Paravirtualization Page-table Shadowing IO-Device Emulation Interrupt Virtualization DMA Remap VT-d VT-x Logical Processors I/O Devices Hardware Virtualization Mechanisms under VMM Control Physical Memory © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Mapping to VMM software challenges VMn Virtual Machines (VMs) … Apps Apps Apps Apps OS OS OS OS Higher-level VMM Functions: Resource Discovery / Provisioning / Scheduling / User Interface VMM (a.k.a., hypervisor) Processor Virtualization Memory Virtualization I/O Device Virtualization Ring Deprivileging Virtual CPU Configuration EPT Configuration DMA and Interrupt Remapping Configuration VT-x VT-x2 VMDq VT-d2 PCI SIG VT-d Binary Translation Page-table Shadowing I/O DMA Remapping Interrupt Remapping I/O Device Emulation CPU0 CPU0 Storage Physical Platform Resources CPUn CPUn Network Processors Memory I/O Devices

PCI-E Endpoint Sharing 4/13/2018 8:55 AM PCI-E Endpoint Sharing © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

I/O Virtualization Intermediary 4/13/2018 8:55 AM Terminology I/O Virtualization (IOV) - the capability for a single physical I/O unit to be shared by more than one System Image I/O Virtualization Intermediary (IOVI) - software or firmware that is used to support IOV by intervening on one or more of the following: Configuration, I/O, and Memory operations from a System Image; and DMA, completion, and interrupt operations to a System Image I/O Virtualization Intermediary Virtual I/O Virtual System 1 Physical System System Image 1 Virtual System 2 System Image 2 Physical I/O © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

PCI endpoint (EP) sharing 4/13/2018 8:55 AM PCI endpoint (EP) sharing Host CPU set Host CPU set RCVE - RC Virtualization Enablers ATPT - Address Translation and Protection Table Interrupt Table SI 1 SI 2 IO VI IOVI - IO Virtualization Intermediary SI 1 SI 2 IO VI PCI Root PCI Root RCVE PCIe Switch PCIe Switch PCIe Endpoint PCIe Endpoint PCIe Endpoint PCIe IOV Endpoint PCIe IOV Endpoint PCIe IOV Endpoint EP Shared Thru Intermediary RC has no virtualization enablers One or more System Images PCIe EPs shared through IO VI IOVI is involved in all IO transactions and performs all IO Virtualization Functions, for example Multiplexes SIs’ IO queues onto a single queue in the adapter PCIe EP is not required to support any virtualization functions Natively Shared Endpoints RC has virtualization enablers One or more System Images PCIe EPs shared through IO VI Same as IOVI without ATPT PCIe IOV enabled EPs are directly shared IOVI involved in config ops Data transfers are direct © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Adapter IOV mechanisms within a single physical system 4/13/2018 8:55 AM Dedicated Adapter (No Virtualization) Adapter Shared Through Intermediary Natively Shared Adapter Graphic Depiction Intermediary Role None Virtualizes physical I/O by intervening on configuration and data transfer operations Manages assignment of Virtual Resources by intervening on configuration operations Configuration Operation Path SI direct to Adapter VI serves as proxy (SI to VI; VI to Adapter) Data Transfer Operation Path System Image 1 System Image 1 System Image 2 System Image 1 System Image 2 I/O Virtualization Intermediary I/O Virtualization Intermediary Physical System Physical System Physical System Physical I/O Physical I/O Physical I/O © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

4/13/2018 8:55 AM Single Root IOV © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Single RC PCIe IOV enabled endpoint requirements 4/13/2018 8:55 AM Base PCIe 1.x System IOV Enabled PCIe System Only PCIe endpoints shall be specified for IOV enabled Endpoints Native based PCI SR OV enabled Endpoints shall be backwards compatible, in a non virtualized mode, with PCIe base 1.x SPEC Host CPU set Host CPU set SI SI 1 SI N PCIM PCI Root PCI Root PCIe Switch PCIe Switch PCI Bridge PCI Adapter PCI Adapter PCIe IOV Endpoint PCIe IOV Endpoint PCIe Switch PCIe Switch PCI-X Bridge PCIe Endpoint PCIe IOV Endpoint PCIe Endpoint PCIe IOV Endpoint PCI-X Adapter PCI-X Adapter © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Single root PCIe IOV endpoint reqs 4/13/2018 8:55 AM Single root PCIe IOV endpoint reqs Host CPU set A mechanism shall be provided to allow a VF to be associated with an SI, such that data movement operations are enabled and can be performed directly between the SI and its associated VF, without VI involvement The virtualization mechanisms defined in this specification may require a VI (such as a PCI Configuration Manager) to be involved for configuration operations performed on a VF SI 1 SI N PCIM IOV Enabled PCIe System PCI Root PCIe Switch PCIe Port PCIe IOV Endpoint Internal Routing 2. VF (Virtual Function1) 1. Physical Resources1 Configuration Management : Non-separable Resources VF (Virtual FunctionN) Physical ResourcesN Sharable Resource Pool © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SR-IOV Physical Functions (PFs) are full PCIe devices that include the SR-IOV capabilities. Physical Functions configure and manage the SR-IOV functionality by assigning Virtual Functions. Virtual Functions (VFs) are simple PCIe functions that only process I/O. A single Ethernet port, the Physical Device, may map to many Virtual Functions that can be shared to virtualized guests. The number of Virtual Functions a device may have is limited by the device hardware.

4/13/2018 8:55 AM Multi-Root IOV © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Multi-Root PCIe IOV endpoint requirements 4/13/2018 8:55 AM Multi-Root PCIe IOV endpoint requirements The multi-root solution Shall give each RC its own Virtual Hierarchy Shall enable each switch, bridge, function, and VF to be uniquely represented in the configuration space of each RC Host CPU set PCIe Root PCI-X Device PCI-X Bridge PCI PCI Bridge PCIe IOV Endpoint PCIe Switch PCIe Physical View Host CPU set Virtual View PCIe Root PCIe Switch PCIeSwitch PCI Bridge PCIe Switch PCIe IOV Endpoint PCIe Endpoint PCI Device © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Multi-Root PCIe IOV endpoint requirements 4/13/2018 8:55 AM Multi-Root PCIe IOV endpoint requirements The multi-root solution shall Provide the same characteristics to its IOV enabled Endpoints as the single-root solution relative to separate SIs Enable use of existing PCIe 1.x or later RC Enable existing PCIe 1.x Switches, Endpoints, and PCIe to PCI/PCI-X Bridges to each be bound to a single RC Enable an IOV enabled endpoint to be shared amongst multiple RC’s using a Multi-Root Aware (MRA) PCIe switch Optional SMP Fabric Host CPU set Host CPU set 1. 1. SI 1 SI 2 SI 3 SI 3 PCIM MRA PCIe Root 2. PCIe 1.x Root MRA PCIe Switch MRA PCIe Switch 3. PCI Bridge PCI-X Adapter PCIe Endpoint PCI Adapter PCI-X Adapter 4. 3. MRA PCI-X Bridge PCI-X Adapter PCI-X Adapter MRA PCIe Switch PCI-X Adapter 4. PCIe Endpoint PCIe Endpoint © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

PLX

Network Device IOV 4/13/2018 8:55 AM © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Network virtualization Virtual I/O in “Native” and Guest OS Limitations of sharing I/O device in software prevent HPC applications from running in VMs Virtualization overhead Loss of “native” features Lack of SLA guarantees

Evolution of hardware IOV PCIe multi-port and multi-function solutions Single-Root (SR IOV) PCI-SIG specification Direct Hardware Access (pass-thru)

I/O virtualization and hardware SLA Over-Provisioning Bandwidth No or Limited QoS Bandwidth Over-Provisioning 2 Gbps VM1: High VM1 VM1 VM1 VM1: High VM1 VM1 VM1 VM2 2 Gbps VM2: Low VM2 VM2 Bandwidth ‘Buffer’ 2 Gbps VM3: Low VM3 VM3 VM2: Low VM2 VM2 VM2 VM3 2 Gbps VM4: High VM4 VM4 VM2 Bandwidth ‘Buffer’ VM4 2 Gbps VM5: High VM5 VM5 VM5 Time T0 T1 T2 Time T0 T1 T2 Example: 5 Virtual Machines – SLAs require 2 Gbps per VM Low Priority VM2 Receive: 5 Gbps spike Incoming traffic surge and no QoS results in SLAs being violated Traditional Solution: Over-provision bandwidth to attempt to meet SLAs Reduces VM count by 50% or more

Network virtualization with IOQoS Optimum Resource Utilization Single X3120 Replace multiple HBA and NICs with single dual port 10GbE adapter Clustering – RoCE IP Storage Data – 10GbE IOQoS™ Easy management of Prioritization and Bandwidth Allocation Direct Access for Latency sensitive (clustering) fabric VM1 VMn Application Application MPI MPI O/S O/S Exar Guest OS Driver VMxnet Driver Exar Guest OS Driver VMxnet Driver Virtual L2 Switch Exar Host Driver KVM Host Neterion X3120 X3120 configured with Multiple PCI Functions NIC 1 NIC 0 NIC 3 NIC 4 NIC 2 Integrated L2 Switch