Download presentation
Presentation is loading. Please wait.
1
Lecture 14: CPU and I/O Virtualization
COSC6376 Cloud Computing Lecture 14: CPU and I/O Virtualization Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston
2
Outline CPU Virtualization I/O Virtualization
3
Types of virtualization
Container virtualization Para-virtualization Full-virtualization
4
Xen architecture
5
Clustered Xen environment
6
Network flow in Xen
7
Linux bridge The old version of Citrix XenServer (before v5.6 FP1) using simple Linux Bridge. Many hypervisor based virtualization also apply Linux Bridge model, such as KVM, libvirt. All of bridging work are done by ‘brctl’. Provide simple L2 switching functions. Layer-1 : A network hub, or repeater, is Layer-2: bridge Layer-3: router Layer 3 switch, typically optimized for Ethernet
8
Xen network environment
peth0 — This is the port that connects to the physical network interface in your system. vif0.0 — This is the bridge port that is used by traffic to/from Domain 0. vifX.0 — This is the bridge port that is used by traffic to/from Domain X.
9
VMware Infrastructure 3
VMware Infrastructure 3 provides a rich set of networking capabilities. Virtual switches are the key networking components, up to 248 virtual switches on each ESX Server 3 host. They provide core Layer 2 forwarding engines. Physical Ethernet adapters (uplinks) serve as bridges between virtual and physical networks.
10
VMware vSphere’s vDS vNic is logically connected to a dvPort shown as black squares. Each dvPort is implemented by the proxy switch on the host where the VM runs. vSphere’s vNetwork distributed switch (vDS) functions as a single switch across all associated hosts. This enables you to set network configurations that span across all member hosts, and allows virtual machines to maintain consistent network configuration as they migrate across multiple hosts (the vDS centrally managed by vCenter).
11
Intel virtualization technology evolution
12/28/2017 1:25 PM Intel virtualization technology evolution PCI-SIG Standards for IO-device sharing: Multi-Context I/O Devices Endpoint Address Translation Caching Under definition in the PCI-SIG* IOVWG Vector 3: I/O Focus Hardware support for IO-device virtualization Device DMA remapping Direct assignment of I/O devices to VMs Interrupt Routing and Remapping VT-d Vector 2: Platform Focus Establish foundation for virtualization in the IA-32 and Itanium architectures… VT-x VT-i … followed by on-going evolution of support: Micro-architectural (e.g., lower VM switch times) Architectural (e.g., Extended Page Tables) Vector 1: Processor Focus VMM Software Evolution Software-only VMMs Binary translation Paravirtualization Simpler and more Secure VMM through foundation of virtualizable ISAs Increasingly better CPU and I/O virtualization performance and functionality as I/O devices and VMMs exploit infrastructure provided by VT-x, VT-i, VT-d Past No Hardware Support Today VMM software evolution over time with hardware support © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
12
VT-x Overview: Intel Virtualization Technology For IA-32 Processors
12/28/2017 1:25 PM VT-x Overview: Intel Virtualization Technology For IA-32 Processors © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
13
VT-x overview Operating modes Guest SW VMM Transitions
Virtual-machine control structure Principal causes of VM Exits Benefits
14
Operating modes VMX root operation: VMX non-root operation:
Fully privileged, intended for VM monitor VMX non-root operation: Not fully privileged, intended for guest software Reduces Guest SW privilege w/o relying on rings Solution to Ring Aliasing
15
VM entry and VM exit ... VM Entry VM Exit Transition from VMM to Guest
Enters VMX non-root operation Loads Guest state and Exit criteria from VMCS VM Exit VMEXIT instruction used on transition from Guest to VMM Enters VMX root operation Saves Guest state in VMCS Loads VMM state from VMCS VM0 VM1 ... ... App App App ... App App App Guest OS0 Guest OS1 VM Monitor VM Exit VM Entry Physical Host Hardware
16
VT-x operations VMX Non-root Operation . . . IA-32 Operation VMX Root
Ring 0 Ring 3 VM 1 Ring 0 Ring 3 VM 2 Ring 0 Ring 3 VM n VMX Non-root Operation . . . VMCS 1 VMCS 2 VMCS n VM Exit Ring 0 Ring 3 IA-32 Operation VMX Root Operation VMXON VMLAUNCH VMRESUME
17
Virtual machine control structure (VMCS)
VMCSs are Control Structures in Memory Only one VMCS active per virtual processor at any given time VMCS Payload: VM execution, VM exit, and VM entry controls Guest and host state VM-exit information fields VMCS Format not defined and may vary VMPTRLD: Establishes a pointer to a desired VMCS VMREAD/VMWRITE: New VMCS Access instructions
18
Principal causes of VMEXIT
Paging state exits allow page-table control CR3 accesses, INVLPG cause exits Selectively exit on page faults CR0/CR4 controls allow exiting on changes to selected bits State-based exits allow function virtualization CPUID, RDMSR, WRMSR, RDPMC, RDTSC, MOV DRx Selective exception and I/O exiting reduce unnecessary exits 32-entry exception bitmap, I/O-port access bitmap Controls provided for asynchronous events Host interrupt control allows delivery to VMM even when guest blocking interrupts Detection of guest inactivity to support VM scheduling HLT, MWAIT, PAUSE Description Loads the contents of a 64-bit model specific register (MSR) specified in the ECX register into registers EDX:EAX. The EDX register is loaded with RDPMC -- Read Performance Monitoring Counters Time Stamp Counter: RDTSC
19
Benefits: VT helps improve VMMs
VT Reduces guest OS dependency Eliminates need for binary patching / translation Facilitates support for Legacy OS VT improves robustness Eliminates need for complex SW techniques Simpler and smaller VMMs Smaller trusted-computing base VT improves performance Fewer unwanted Guest VMM transitions
20
Extended page tables (EPT)
12/28/2017 1:25 PM Extended page tables (EPT) A VMM must protect host physical memory Multiple guest operating systems share the same host physical memory VMM typically implements protections through “page-table shadowing” in software Page-table shadowing accounts for a large portion of virtualization overheads Goal of EPT is to reduce these overheads © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
21
What Is EPT? Extended Page Table
12/28/2017 1:25 PM What Is EPT? Guest IA-32 Page Tables Guest Linear Address Guest Physical Address Extended Host Physical Address EPT Base Pointer (EPTP) CR3 Extended Page Table A new page-table structure, under the control of the VMM Defines mapping between guest- and host-physical addresses EPT base pointer (new VMCS field) points to the EPT page tables Guest has full control over its own IA-32 page tables © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
22
EPT translation: details
12/28/2017 1:25 PM EPT translation: details All guest-physical memory addresses go through EPT tables (CR3, PDE, PTE, etc.) Above example is for 2-level table for 32-bit address space Translation possible for other page-table formats (e.g., PAE) © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
23
VT-d Overview: Intel Virtualization Technology For Directed I/O
12/28/2017 1:25 PM VT-d Overview: Intel Virtualization Technology For Directed I/O © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
24
Q35 chipsets system block diagram
25
PCI Express 3rd generation high-performance I/O bus
Used to interconnect peripheral devices Point-to-point connection as opposed to bus PCIe interconnect consists of either a x1, x2, x4, x8, x12, x16 or x32 point-to-point link if you have x16 link, there are 64 physical lines (16 * 2 (both directions) * 2 (differential signaling)) 1st generation ISA, EISA, VESA and Micro Channel buses 2nd generation PCI, PCI-X, and AGP
26
PCIe-based system topology
Root Complex Denote the root of I/O hierarchy that connects the CPU/memory subsystem to the I/O May support one or more PCIe ports as shown Endpoint devices other than root complex and switches that are requesters or completers of PCIe transactions Souce: PCIe specification 2.0
27
Three IA-32 address-spaces
memory space (4GB) i/o space (64KB) PCI configuration (16MB) accessed using a large variety of processor instructions (mov, add, or, shr, push, etc.) and virtual-to-physical address-translation accessed only by using the processor’s special ‘in’ and ‘out’ instructions (without any translation of port-addresses) PCIe supports the same address spaces as PCI Memory space IO space Configuration space PCIe provides a 4KB space per a function as opposed to 256B in PCI i/o-ports 0x0CF8-0x0CFF dedicated to accessing PCI Configuration Space
28
PCI configuration header
16 doublewords Dwords Status Register Command Register Device ID Vendor ID 1 - 0 BIST Header Type Latency Timer Cache Line Size Class Code Class/SubClass/ProgIF Revision ID 3 - 2 Base Address 1 Base Address 0 5 - 4 Base Address 3 Base Address 2 7 - 6 Base Address 5 Base Address 4 9 - 8 Subsystem Device ID Subsystem Vendor ID CardBus CIS Pointer reserved capabilities pointer Expansion ROM Base Address Maximum Latency Minimum Grant Interrupt Pin Interrupt Line reserved
29
Typical NIC TX FIFO nic RX FIFO CPU packet main memory transceiver
buffer LAN cable B U S RX FIFO CPU
30
PCI devices and functions
A PCI device may include between 1 and 8 functions Function numbers range from 0 to 7 Function 0 must always be present Classified as single-function and multi-function devices
31
DMA (Direct Memory Access)
DMA has the ability to transfer large blocks of data directly to/from the memory without involving the processor The processor initiates the DMA transfer by supplying source and destination addresses, the number of bytes to transfer The DMA controller manages the entire transfer (possibly thousand of bytes in length), arbitrating for the bus When the DMA transfer is complete, the DMA controller interrupts the processor to inform that the transfer is complete
32
DMA (Direct Memory Access)
DMA has the ability to transfer large blocks of data directly to/from the memory without involving the processor
33
Options for I/O virtualization
12/28/2017 1:25 PM Options for I/O virtualization Hypervisor Shared Devices I/O Services Device Drivers VM0 Guest OS and Apps VMn Monolithic Model Pro: High Security Pro: I/O Device Sharing Pro: VM Migration Con: Lower Performance Shared Devices I/O Services Hypervisor Device Drivers Service VMs VMn VM0 Guest OS and Apps Guest VMs Service VM Model Pro: Highest Performance Pro: Smaller Hypervisor Pro: Device assisted sharing Con: Migration Challenges Assigned Devices Hypervisor VM0 Guest OS and Apps Device Drivers VMn Pass-through Model Pro: Higher Performance Pro: I/O Device Sharing Pro: VM Migration Con: Larger Hypervisor VT-d Goal: Support all Models © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
34
VT-d overview VT-d is platform infrastructure for I/O virtualization
12/28/2017 1:25 PM VT-d overview VT-d is platform infrastructure for I/O virtualization Defines architecture for DMA remapping Implemented as part of platform core logic Will be supported broadly in Intel server and client chipsets CPU DRAM South Bridge System Bus PCI Express PCI, LPC, Legacy devices, … Integrated Devices North Bridge VT-d PCIe* Root Ports © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
35
How VTd works? Each VM thinks it is 0 address based
600 1000 100 200 250 350 700 Each VM thinks it is 0 address based GPA (Guest Physical Address) But mapped to a different address in the system memory HPA (Host Physical Address) VTd does the address mapping between GPA and HPA Catches any DMA attempt to cross VM memory boundary VM2 VM0 VM1 100 300 50 10 260
36
VT-d usage Basic infrastructure for I/O virtualization
12/28/2017 1:25 PM VT-d usage Basic infrastructure for I/O virtualization Enable direct assignment of I/O devices to unmodified or paravirtualized VMs Improves system reliability Contain and report errant DMA to software Enhances security Support multiple protection domains under SW control Provide foundation for building trusted I/O capabilities Other usages Generic facility for DMA scatter/gather Overcome addressability limitations on legacy devices © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
37
VT-d architecture detail
12/28/2017 1:25 PM VT-d architecture detail DMA Requests Memory-resident Partitioning And Translation Structures Device Assignment Structures Address Translation Device D1 Device D2 Bus 255 Bus 0 Bus N Dev 31, Func 7 Dev P, Func 1 Dev 0, Func 0 Dev P, Func 2 Page Frame 4KB Page Tables Device ID Virtual Address Length … Fault Generation DMA Remapping Engine Translation Cache Context Cache Memory Access with System Physical Address © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
38
VT-d: hardware page walk
12/28/2017 1:25 PM VT-d: hardware page walk Bus Device Func 2 3 7 8 15 Requestor ID Level-4 Page Table Level-3 Level-2 Level-1 Page Example Device Assignment Table Entry specifying 4-level page table 56 DMA Virtual Address 11 table offset Level table offset Level table offset Level table offset 12 20 21 29 30 38 39 47 b 63 48 57 Page Offset 000000b Device Assignment Tables Base © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
39
VT-d applied to pass-through model
Assigned Devices Hypervisor VM0 Guest OS and Apps Device Drivers VMn Pass-through Model Direct Device Assignment to Guest OS Guest OS directly programs physical device For legacy guests, hypervisor sets up guest- to host-physical DMA mapping For remapping aware guests, hypervisor involved in map/unmap of DMA buffers PCI-SIG I/O Virtualization Working Group Activity towards standardizing natively sharable I/O devices IOV devices provide virtual interfaces, each independently assignable to VMs Pro: Highest Performance Pro: Smaller Hypervisor Pro: Device-assisted sharing Con: VM Migration Limits
40
DMA remapping: IOTLB scaling
Address Translation Services (ATS) extensions to PCIe* enable IOTLB scaling ATS endpoint implements ‘Device IOTLBs’ Device-IOTLBs can be used to improve performance E.g., Cache only static translations (e.g. command buffers) Pre-fetch translations to reduce latency Minimizes dependency on root-complex caching Support device-specific demand I/O paging *Other names and brands may be claimed as the property of others
41
Address Translation Services (ATS)
ATS Translation Flows Device issues Translation Requests to root-complex Root-complex provides Translation Response Device caches translation locally in ‘Device IOTLB’ Devices can issue DMA with translated address Translated DMA from enabled devices bypass address translation Root Complex Translation Request Endpoint Device Remap Hardware IOTLB Translate Address Translation Response Translated DMA Request Device IOTLB DMA using Translated Address VT-d supports per-device control of ATS *Other names and brands may be claimed as the property of others
42
VT-x & VT-d working together
12/28/2017 1:25 PM VT-x & VT-d working together Virtual Machines Virtual Machine Monitor (VMM) Binary Translation Paravirtualization Page-table Shadowing IO-Device Emulation Interrupt Virtualization DMA Remap VT-d VT-x Logical Processors I/O Devices Hardware Virtualization Mechanisms under VMM Control Physical Memory © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
43
Mapping to VMM software challenges
VMn Virtual Machines (VMs) … Apps Apps Apps Apps OS OS OS OS Higher-level VMM Functions: Resource Discovery / Provisioning / Scheduling / User Interface VMM (a.k.a., hypervisor) Processor Virtualization Memory Virtualization I/O Device Virtualization Ring Deprivileging Virtual CPU Configuration EPT Configuration DMA and Interrupt Remapping Configuration VT-x VT-x2 VMDq VT-d2 PCI SIG VT-d Binary Translation Page-table Shadowing I/O DMA Remapping Interrupt Remapping I/O Device Emulation CPU0 CPU0 Storage Physical Platform Resources CPUn CPUn Network Processors Memory I/O Devices
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.