© 2010 VMware Inc. All rights reserved Confidential Storage Virtualization VMware, Inc.
2 Agenda Introduction to Storage Storage Basics “Enterprise” Storage Storage Management Storage Virtualization Storage Virtualization in a Hypervisor General Storage Virtualization Technologies Currently Industry Trends
3 Storage Basics - Simplistic View An IDE or SATA disk drive directly connected to a computer. IDE – Integrated Device Electronics ATA – PC/AT bus attachment or Advanced Technology Attachment SATA – Serial ATA
4 Storage Basics – Low-end Storage Direct Attached Storage (DAS) Typically not shared between hosts Typically provides less device connectivity and more restrictive transmission distance limitations Typically used in small and entry level solutions that don’t have specific reliability requirements Examples: IDE, SATA, SAS
5 Storage Basics – Enterprise Storage Computers connected through a switch fabric to a Storage Array Switch Storage Array
6 Confidential Storage Basics – Enterprise Storage Block-based SAN Protocols E.g. FC, iSCI Allows multiple physical machines to access the same storage across multiple paths Disk arrays on SANs can provide reliable storage that can easily be divided into arbitrary-sized logical disks (SCSI logical units) Avoids situation where computer has enough CPU power for a workload, but not enough disk Makes it much easier to migrate VMs between hosts (no copying of large virtual disks, just copy VM memory contents) Greatly enhances the flexibility provided by VMs
7 Storage Basics – Enterprise Storage File-based Network Attached Storage E.g. NFS, CIFS Many of the same benefits of SAN storage, but at a lower price point (with potential performance penalities) SAN/NAS Hybrids Bandwidth scaling using parallel data access paths (pNFS), Object-based Storage Devices (OSD), e.g. Panasas’ cluster file system
8 Storage Management Storage Platforms add significant functionality to just a bunch of disks (JBODs) SCSI Logical Unit (LUN or LU) Virtualization Provides abstractions for the physical media that data is stored on allowing easier management of the ‘devices’ seen by servers RAID – provide hardware failure recovery (striping, parity, mirroring, nested levels)
9 Storage Management (Contd.) Volume/Virtual Device Management Provides further abstraction and capabilities for the devices exposed by the storage platform Local Replication – (split) Mirror, Clone, Snapshot – provide point in time backup/restore points Provisioning – thin, thick, pass-through Policy Based Device Provisioning and Mobility
10 Storage Management (Contd.) Disaster Recovery Allows datacenters/servers to recover from catastrophic environmental/infrastructure failures Progression: offline backups, online backups w/snapshots, synchronous remote mirrors, asynchronous remote mirror, CDP Continuous Data Protection Archives all changes to the protected storage allowing information to be restored from any point in time dependent write synchronization integration w/application level coherency reduces recovery time objectives (RTO) to zero
11 Storage Management (Contd.) Storage Platform Virtualization Storage Platform and Fabric Based abstraction of storage platforms Adds generic abstractions for heterogeneous array farms that allow many of the previous features
12 Confidential Virtualizing Storage Resources Store a VM’s virtual disk as a file IO Scheduling between multiple VMs Provide multipathing, snapshots VM1.vmdk VM1.vmdk: File backing the virtual disk for VM1 VMFS: VMware’s SAN File system, an example cluster file system
13 Confidential Virtualizing Storage Resources (Implications) Several differences are introduced by new layer Virtualization isn’t free The fast path access to commonly used device is highly optimized Still, the storage virtualization stack is longer than on native Extra features gained significantly outweigh the extra stack depth
14 Virtualizing Storage Resources (Implications) Guest is oblivious to the real hardware complexities The complexity of different types of storage devices, and transport protocols are hidden Hypervisor can be a single up-to-date place where the storage stack is well- maintained Don’t have to build drivers for every conceivable type of operating system Hypervisor provides reliable connection Guest doesn’t have to worry about multipathing, path or even device failover
15 Specialized Blocks (Redo Logs) Linked Clones Example Physical Disk Common OS Base Disk Linked Clone Microsoft Office Guest Filesystem outlook.exe VM 1 (Alice) Microsoft Office Guest Filesystem outlook.exe VM 2 (Bob)
16 Confidential Sharing Storage Resources Highly scalable cluster file system (e.g. VMFS) Concurrent accesses from multiple hosts No centralized control Essential for Live migration of VMs High availability (VM restarts in case of host failure)
17 Proportional Sharing of Storage Resources Provide differentiated QoS for IO resources If we assign per-VM disk shares for shared VMFS LUN How to provide proportional sharing of storage resources without centralized control? A. Gulati, I. Ahmad, and C. Waldspurger. PARDA: Proportional Allocation of Resources for Distributed Storage Access. In Proc. of FAST, Feb
18 Efficient Sharing of Storage Resources Deduplicate identical blocks in a cluster FS* Efficient block layout for multi- VM workloads Virtuaized storage power management Hierarchical Storage * A. Clements, I. Ahmad, M. Vilayannur and J. Li. Decentralized Deduplication in a SAN Cluster Filesystem. In Proc. of USENIX Annual Technical Conference, June 2009.
19 Intelligent Sharing of Storage Resources Open research questions Bridge semantic gap of the scsi/block interface Enhanced security against guest rootkits or viruses Virtual I/O speculation
20 Live Migration of VM Storage State-of-the-art solution to perform live migration of virtual machine disk files Across heterogeneous storage arrays with complete transaction integrity No interruption in service for critical applications
21 Architecture for Storage in Virtualization
22 Contrast Virtualization Architectures Hosted System Virtualization General purpose OS in parent partition All I/O shared device traffic going thru parent partition Bare-metal System Virtualization Ultra small, virtualization centric kernel Embedded driver model optimized for VMs Xen/Viridian Drivers Virtual Machine Virtual Machine Dom0 (Linux) or Parent VM (Windows) Drivers Virtual Machine Virtual Machine General Purpose OS Drivers Virtual Machine Virtual Machine Drivers Virtual Machine Virtual Machine Drivers Virtual Machine Drivers Virtual Machine Drivers
23 Contrast Virtualization Architectures Passthrough Disks Preserves Complex SAN management Each VM has dedicated LUN(s) Provisioning a VM requires provisioning LUN Clustered Storage “Extra” storage virtualization layer Storage independence and portability Instant Provisioning Virtual Machine Guest OS Application Virtual Machine Guest OS Application Physical Disks Virtual Machine Guest OS Application Clustered Virtual Volume Virtual Machine Guest OS Application Virtual Disks Physical Storage
24 Typical Operating System I/O Path Read contents of a file Application opens a file and issues a read() syscall File System maps the read request to a location within a volume The LVM maps the logical location to a block on the physical mass storage device The device driver issues the read of a block to the physical storage device Block of data is returned up the stack to the application Application read() syscall File System FS operation Logical Volume Manager Block operation Device Driver Storage Platform SCSI request
25 Add in a hypervisor (VMware ESX Example) Tracing a read request from guest OS through VMM to ESX Guest OS device driver enqueues SCSI read CDB within IOCB to HBA via PCI I/O space instruction or PCI memory space reference PCI bus emulated adapter in virtual machine monitor traps PCI I/O space instruction reference or PCI memory space memory reference Emulated adapter parses IOCB, retrieves SCSI CDB and remaps IOCB S/G list Emulated adapter passes SCSI CDB and remapped S/G list to ESX Virtual Machine VM Emulated PCI Adapter ESX Guest OS Device Driver SCSI command PCI Mem/IO Space Ref
26 Confidential Virtualized I/O End-to-end (VMware ESX example) Tracing a read request from the Virtual Machien to the storage platform I/O issued by the Virtual Machine to emulation layer Emulation handles conversion of request to format used by ESX and issues a file system request File system converts to a block operation and issues request to logical device Storage stack maps request to a ‘physical’ device issues to HBA HBA initiates and completes request and data traverses back up the stack Virtual Machine Logical Volume Manager Storage Core SCSI command SCSI Virtualization Engine FS operation VMFS Block operation Device Driver Storage Platform
27 Virtualized I/O – Advanced Topics Potential performance optimizations Accelerate guest code Idealized virtual device with paravirtualized guest drivers Accelerate with variety of I/O assists Intel VT-d & AMD IOMMU: Faster CPU and MMU virtualization PCI-SIG SR-IOV: passthrough I/O RDMA: accelerate VMotion, NFS, iSCSI Device Driver I/O Stack Guest OS Device Driver Device Emulation Hypervisor
28 Differences in VMs Virtualized deployments Large set of physical machines consolidated Diverse set of applications Workload characteristics Difference I/O patterns to the same volume I/O from one app split to different volumes Provisioning operations along with applications Hypervisor and the storage subsystem Clustered file system locking CPU and virtual device emulation, CPU and memory affinity settings, new Hardware Assist technology System setup can affect performance Partition alignment affects performance Raw Device Mapping of File System Protocol conversion (e.g. SCSI “over” NFS) Virtualization file systems often optimized very differently. Standard benchmarks not always sufficient.
29 Other Storage Virtualization Technologies Technologies NPIV – allow multiple VMs to have unique identifiers while sharing a single physical port Pass-through I/O Dedicated I/O or Device/Driver Domains Overall Implications Mobility Performance
30 Technologies – NPIV ANSI T11 standard for multi-homed fabric-attached N-ports Enables end-to-end visibility of storage I/O on per VM basis Facilitates per VM storage QoS at target Improves WWN zoning effectiveness at cost of increased SAN administration Requires FC driver, HBA, and switch hardware NPIV support V-port (virtualized N-port) identified by unique node/port WWN pair V-port per VM as long as VM is powered on Overall Implications V-port migrates with VM No significant performance cost or benefit
31 Technologies – Passthrough I/O The idealized I/O devices seen by the guest hide any special features of the underlying physical controllers Passthrough I/O: virtualization extension to Pcle from standards body (PCI-SIG) Virtualizes a single PCI device into multiple virtual PCI devices Enables a direct guest OS access to PCIe I/O or memory space Improves performance by eliminating some of the virtualization overheads However live migration of VMs becomes harder sicne VMs are now tied to a particular type of hardware Requires host platform IOMMU platform and PCI MSI/MSI-X Overall Implications Yields improvements in I/O efficiency Complex support for migration of VMs using passthrough I/O
32 Technologies – I/O Domains Basics Provides isolation via dedicated address space (domain) for all devices Can provide flexibility by leveraging existing device drivers e.g. Xen uses Dom0 to perform all I/O on behalf of VMs Implications Performance can suffer due to scheduling latency for I/O domain In this model, the VM issuing I/O doesn’t result in the hypervisor immediately putting it on the wire Instead, the hypervisor will wake up the I/O domain and pass the request on to it. I/O domains can become bottleneck
33 Conclusions Storage plays a critical role in Virtualization The choice of architectures has implications for the complexity versus feature tradeoffs. Many problems remain unsolved in this area of research