Download presentation
Presentation is loading. Please wait.
Published byChristian May Modified over 9 years ago
1
Fernando Martins Director Virtualization Strategy and Planning Tom Adelmeyer Principal Engineer Virtualization Performance and Benchmarking
2
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. Intel may make changes to specifications and product descriptions at any time, without notice. All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel, the Intel logo, Intel Leap ahead, Intel Leap ahead logo, Intel vPro, Intel vPro logo, Intel VIIV, Intel VIIV logo, Intel Centrino Duo, Intel Centrino Duo logo, Intel Xeon, Intel Xeon Inside logo, Intel Itanium 2 and Intel Itanium 2 Inside logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries." *Other names and brands may be claimed as the property of others. Copyright © 2006 Intel Corporation. Throughout this presentation: VT-x refers to Intel® VT for IA-32 and Intel® 64 VT-i refers to the Intel® VT for IA-64, and VT-d refers to Intel® VT for Directed I/O and its extensions
3
The confluence of compelling usage models and robust solutions is driving virtualization to mainstream adoption New usage models require radically new approaches to performance measurement and capacity planning This session will describe Intel’s portfolio of virtualization technologies, and through practical examples provide a deep technical dive into the challenging problem of meaningful benchmarking in a virtualized environment We will discuss Intel’s research in the space and share our latest results, including vConsolidate - Intel’s seed contribution to a vendor-agnostic standard virtualization benchmark currently being developed by SPEC
4
Intel’s Strategy for Virtualization Intel ® Virtualization Technology Evolution Current and Emerging Usage models Usage Model based Benchmarking
5
41% of new server x86 purchased in 2007 will be virtualized - IDC End User Study; Jun-06 Server Virtualization is now considered a mainstream technology among IT buyers. IT professional are bullish in future use: driving 45% server use in 12 months - -IDC Directions 2007 Feb-07 >81% of business are using virtualization in production environments - 451 Group Special Report – Dec-06
6
Platform of Choice for Virtualization Broad Ecosystem Support Remove Adoption Barriers
7
Leadership in HW assists for Virtualization CPU virtualization (VT-x and VT-i) IO virtualization (VT-d) Networking virtualization (IOAT and VMDq) Better Platform Reliability Features Leader in Reliability features Proven Platform Architecture: 40X more Intel servers More Power/Performance Headroom Quad-Core 4-way NICs Q4’05 IDC server Tracker, 1996-2005 total system shipped
8
IA-based System Virtualization Today Requires Frequent VMM Software Intervention IA-based System Virtualization Today Requires Frequent VMM Software Intervention
9
Hardware support for I/O virtualization Device DMA remapping Direct assignment of I/O devices to VMs Interrupt Routing and Remapping Software-only VMMs Binary translation Paravirtualization Simpler and more Secure VMM through foundation of virtualizable ISAs Establish foundation for virtualization in the IA-32 and Itanium architectures… … followed by on-going evolution of support: Micro-architectural (e.g., lower VM switch times) Architectural (e.g., Extended Page Tables) Increasingly better CPU and I/O virtualization performance and functionality as I/O devices and VMMs exploit infrastructure provided by VT-x, VT-i, VT-d *Other names and brands may be claimed as the property of others Standards for IO-device sharing: Multi-Context I/O Devices Endpoint Address Translation Caching Under definition in the PCI-SIG* IOV WG
10
New CPU Operating Mode VMX Root Operation (for VMM) Non-Root Operation (for Guest) Eliminates ring deprivileging New Transitions VM entry to guest OS VM exit to VMM VM Control Structure (VMCS) Configured by VMM software Specifies guest Operating System (OS) state Controls when VM exits occur (eliminates over and under exiting) Supports on-die CPU state caching WinXP Apps Linux Apps H/W VM Control Structure (VMCS) VM Entry VMExit VMCS Configuration
11
Extended Page Table (EPT) A new page-table structure under the control of the VMM Map guest-physical to host- physical (accesses memory) Performance Benefit Guest OS able to freely modify its own page tables Eliminates VM exits due to page faults, INVLPG, or CR3 accesses Memory Savings Shadow page tables required for each guest user process (w/o EPT) A single EPT supports entire VM VT-x with EPT No VM Exits
12
Platform implementation for I/O virtualization Defines an architecture for DMA remapping Implemented as part of core logic chipset Will be supported broadly in Intel server and client chipsets Improves system reliability Contains and reports errant DMA to software Basic infrastructure for I/O virtualization Enable direct assignment of I/O devices to unmodified or paravirtualized VMs
13
DMA-remapping Improves reliability and security through device isolation Improves I/O performance through direct assignment of devices Improves I/O performance of 32 bit devices that happen to use bounce buffer condition Interrupt-remapping Interrupt isolation: isolate interrupts across VMs Interrupt migration: efficiently migrate interrupts across CPUs Address Translation Services (ATS) Support for ATS capable endpoint devices DMA remapping performance improvements
14
Network Chipset Processor Intel’s holistic design approach delivers Platforms built to excel in Virtualization
15
Hardware Virtualization Mechanisms under VMM Control
16
HWn … HW0 VM1 VMn OS App OS App … HW VM1 VMn VMM OS App OS App HW VMM OS App OS App
17
… HW0 VM1 VMn VMM OS App OS App … HWn VM1 VMn VMM OS App OS App HW0 VMMVM1 OS App HW0 VMM VMn OS App HW VMM VMn OS App VM1 OS App
18
Traditional benchmarking covers Performance, Power, Scalability Metrics: Throughput (MB/s), Response time, #users, etc Micro-architecture focus: cache sizing, frequency, bandwidth, etc. New technology requires new areas of analysis and metrics Areas of focus driven by use models. E.g., VM migration time, VM utilization Need to measure how Intel ® Virtualization technology benefits end-users and ISVs
19
Virtualization presents unique challenges Which configurations to focus on Homogeneous or heterogeneous OS Number Virtual Machines Configuration of individual VMs (CPU, Memory, NIC, HBA, HDD) Measuring performance Virtual clock accuracy induces platform dependent error Availability of performance monitoring capabilities Consolidation use case adds additional testing challenges Synchronicity: Use automation scripts Utilization: Avoid harmonic bottlenecks Steady State: Easy, repeatable measurements Only way to overcome the challenges is to develop the benchmarks Tier consolidation using SAP SD vConsolidate: a server application consolidation benchmark
20
SAP SD (Sales and Distribution) OLTP-style benchmark that measures performance of a server running the Enterprise Resource Planning (ERP) solution from SAP AG Tier Consolidation Database and app server run in VMs Benefits of 3-Tier (isolation, maintainability), cost of 2-Tier Benchmark value Reuse existing Metrics New focus area Inter VM communication HW VMM OS DB OS App Svr VM1 VMn
21
Description Benchmark that represents predominant use case -> server application consolidation Application types selected for consolidation guided by market data vConsolidate provides A methodology for measuring performance in a consolidated environment A means for fellow travelers to publish virtualization performance proof points The ability to analyze performance across VMMs and hardware platforms Knowledge obtained SPEC virtualization workload
22
5 Virtual Machines 3 Clients: Controller, Mail, and Web
23
*Other names and brands may be claimed as the property of others
24
Consolidation Stack Unit – (CSU) Smallest granule in vCon Consist of 5 Virtual Machines Database Commercial Mail Web Server Java Application Server Idle Each CSU represents single score Final score is aggregate of the individual CSU scores
26
Running vConsolidate Running vConsolidate Controller application Starts the tests via helper scripts; Runs for 30 minutes Stops the test and reports score Time measured in “Controller Client” external timer Scoring Scoring The “Controller” application calculates final score SpecJBB, Sysbench and Loadsim - transactions/ second WebBench – throughput CSU Final Score = GEOMEAN (VM Relative Perf[i])
27
VM relative scores = Measured/Reference (E.g., WebBench = 3.52) 1 CSU score: GEOMEAN (3.52, 1.04, 1.14, 1.16) = 1.48
28
Lower is better Higher is better
29
Seeding Industry with Benchmark Workloads Seeding Industry with Benchmark Workloads vConsolidate– Consolidated stack of business workloads consisting of Server Side Java, Commercial Database, Commercial Mail, Commercial Web Server on 4 VMs vConsolidate– Consolidated stack of business workloads consisting of Server Side Java, Commercial Database, Commercial Mail, Commercial Web Server on 4 VMs Collaborating with Virtualization leaders Collaborating with Virtualization leaders Microsoft and OEMs - consolidation workloads, methodology & metrics Microsoft and OEMs - consolidation workloads, methodology & metrics VMware – VMmark* consolidation stack VMware – VMmark* consolidation stack Establishing benchmarks with ISV/OSVs Establishing benchmarks with ISV/OSVs Contributing to standard benchmarks through SPEC (long term) Contributing to standard benchmarks through SPEC (long term) *Other names and brands may be claimed as the property of others.
30
Platform of Choice for Virtualization Dedicated HW support Reliability Leadership High Performance / Energy Efficient Broader Ecosystem Support VMM vendors, ISVs, OEMs, SIGs, Standards Removing Adoption Barriers Education Programs / Best Practices New Benchmarks
33
Dual Port 10/100/1000 x4 PCI Express* Gigabit Ethernet Controller External Interfaces Dual 1000BASE-T, SerDes, and SGMII interfaces PCIe ver 1.1 x4 Intel® I/O Acceleration Technology (IOAT2) MSI-X Low Latency Interrupt Direct Cache Access Header-splitting and replication Virtualization support (VMDq): 4 TX/RX Queues (per port) I/O Enhancements Offloads compatible with IPv4, IPv6 & multiple VLAN tags Receive Side Scaling Manageability PXE, iSCSI Boot RMII, SMBus Interfaces ECC on all memory 25mm x 25mm FCBGA Schedule Sampling now Production: Q2’07
34
Enabled by a combination of processor, chipset and platform memory technologies. Data as of March 6, 2006 Other x86 Based Servers A Better Business Foundation Less Downtime, Higher Service Availability and Improved Confidence Intel Xeon processor Based Servers Memory Sparing Predicts a “failing” DIMM & copies the data to a spare memory DIMM, maintaining server available & uptime Memory Mirroring Data is written to 2 locations in system memory so that if a DRAM device fails, mirrored memory enables continued operation and data availability Symmetric Access to all CPUs Enables a system to restart and operate if the primary processor fails Memory CRC (FBD) Address & command transmissions are automatically retried if a transient error occurs vs. the potential of silent data corruption Enhanced Memory ECC Retry double-bit errors vs. standard memory ECC that does single-bit errors only Memory ECC Detects & corrects single-bit errors Data Integrity & Availability Continued Operation & Availability Data Availability Data Protection Server Continuity Feature Benefit Description
36
Pro: Higher Performance Pro: I/O Device Sharing Pro: VM Migration Con: Larger Hypervisor Hypervisor Shared Devices I/O Services Device Drivers VM 0 Guest OS and Apps VM n Guest OS and Apps Monolithic Model Pro: Highest Performance Pro: Smaller Hypervisor Pro: Device assisted sharing Con: Migration Challenges Assigned Devices Hypervisor VM 0 Guest OS and Apps Device Drivers VM n Guest OS and Apps Device Drivers Pass-through Model VT-d Goal: Support all Models Pro: High Security Pro: I/O Device Sharing Pro: VM Migration Con: Lower Performance Shared Devices I/O Services Hypervisor Device Drivers Service VMs VM n VM 0 Guest OS and Apps Guest VMs Service VM Model
37
VT-d is platform infrastructure for I/O virtualization Defines architecture for DMA remapping Implemented as part of platform core logic Will be supported broadly in Intel server and client chipsets CPU DRAM South Bridge System Bus PCI Express PCI, LPC, Legacy devices, … Integrated Devices North Bridge VT-d PCIe* Root Ports
38
Basic infrastructure for I/O virtualization Enable direct assignment of I/O devices to unmodified or paravirtualized VMs Improves system reliability Contain and report errant DMA to software Enhances security Support multiple protection domains under SW control Provide foundation for building trusted I/O capabilities Other usages Generic facility for DMA scatter/gather Overcome addressability limitations on legacy devices
39
Memory-resident Partitioning And Translation Structures Device Assignment Structures Address Translation Structures Device D1 Device D2 Address Translation Structures DMA Requests Device IDVirtual Address Length Memory Access with System Physical Address DMA Remapping Engine Translation Cache Context Cache Fault Generation … Bus 255 Bus 0 Bus N Dev 31, Func 7 Dev P, Func 1 Dev 0, Func 0 Dev P, Func 2 Page Frame 4KB Page Tables
40
PControlsRsvdPage-Table Root Pointer Address WidthRsvd Domain ID Ext. Controls 0 64 63 127 VT-d Page Table Entry RSPSP Page-Frame / Page-Table Address 063 WAvailableRsvd Ext. Controls VT-d supports hierarchical page tables for address translation Page directories and page tables are 4 KB in size 4KB base page size with support for larger page sizes Support for DMA snoop control through page table entries VT-d hardware selects page-table based on source of DMA request Requestor ID (bus / device / function) in request identifies DMA source VT-d Device Assignment Entry
41
000000b BusDeviceFunc 0237815 Requestor ID Device Assignment Tables Base Level-4 Page Table Level-3 Level-2 Level-1 Page Example Device Assignment Table Entry specifying 4-level page table 56 DMA Virtual Address 011 Level-4 table offset Level-3 table offset Level-2 table offset Level-1 table offset 1220212930383947 000000000b 634857 Page Offset
42
Architecture supports caching of remapping structures Context Cache: Caches frequently used device-assignment entries IOTLB: Caches frequently used translations (results of page walk) Non-leaf Cache: Caches frequently used page-directory entries When updating VT-d translation structures, software enforces consistency of these caches Architecture supports global, domain-selective, and page-range invalidations of these caches Primary invalidation interface through MMIO registers for synchronous invalidations Extended invalidation interface for queued invalidations
43
PCI Express protocol extensions being defined by PCISIG for Address Translation Services (ATS) Enables scaling of translation caches to devices Devices may request translations from root complex and cache Protocol extensions to invalidate translation caches on devices VT-d extended capabilities Support for ATS Enables VMM software to control device participation in ATS Returns translations for valid ATS translation requests Supports ATS invalidations Provides capability to isolate, remap and route interrupts to VMs Support device-specific demand paging by ATS capable devices VT-d Extended features utilize PCI Express enhancements being pursued within the PCI-SIG
44
A VMM must protect host physical memory Multiple guest operating systems share the same host physical memory VMM typically implements protections through “page-table shadowing” in software Page-table shadowing accounts for a large portion of virtualization overheads VM exits due to: #PF, INVLPG, MOV CR3 Goal of EPT is to reduce these overheads
45
Extended Page Table A new page-table structure, under the control of the VMM Defines mapping between guest- and host-physical addresses EPT base pointer (new VMCS field) points to the EPT page tables EPT (optionally) activated on VM entry, deactivated on VM exit Guest has full control over its own IA-32 page tables No VM exits due to guest page faults, INVLPG, or CR3 changes Guest IA-32 Page Tables Guest Linear Address Guest Physical Address Extended Page Tables Host Physical Address EPT Base Pointer (EPTP) CR3
46
All guest-physical memory addresses go through EPT tables (CR3, PDE, PTE, etc.) Above example is for 2-level table for 32-bit address space Translation possible for other page-table formats (e.g., PAE)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.