IBM eServer xSeries Technical Conference © IBM Corporation Session ID: O24 Steve Dobbelstein Lake Buena Vista, FL September 8-12, 2003 Enterprise Volume Management System for Linux © IBM Corporation 2003
IBM eServer xSeries Technical Conference © IBM Corporation Trademark Notes The following terms are trademarks of International Business Machines Corporation in the United States and/or other countries: IBM AIX OS/2 System/390 Linux is a trademark of Linus Torvalds Other company, product, and service names may be trademarks or service marks of others.
IBM eServer xSeries Technical Conference © IBM Corporation Agenda Volume management basics Overview of volume management schemes Enterprise Volume Management System (EVMS) Overview Demonstration
IBM eServer xSeries Technical Conference © IBM Corporation What is Volume Management? Provide a logical abstraction of the physical storage devices File systems and applications do not need to know about the organization of the physical devices
IBM eServer xSeries Technical Conference © IBM Corporation Where To Use Volume Management Systems with lots of physical storage Lots of disks (tens, hundreds, thousands) Combine many disks into a single pool of storage Increased total storage space Redundancy to protect against hardware failures Systems with little physical storage Single disk (most PCs, laptops) Divide up disk to provide logically separate storage for different uses
IBM eServer xSeries Technical Conference © IBM Corporation Disk Partitioning Divide a disk into one or more logical sections Simple, widely used Fixed sizes, difficult to resize No redundancy
IBM eServer xSeries Technical Conference © IBM Corporation RAID Redundant Array of Inexpensive Disks Combine several disks Increase total storage space Provide redundancy Improve performance Can be done in hardware or software
IBM eServer xSeries Technical Conference © IBM Corporation RAID Linear Linear concatenation of several disks Increased total storage space No redundancy or performance improvement
IBM eServer xSeries Technical Conference © IBM Corporation RAID 0 Striping Data are interleaved across all disks Increased total storage space Improved performance with parallel I/O No redundancy
IBM eServer xSeries Technical Conference © IBM Corporation RAID 1 Mirroring Redundancy Multiple copies of all data No extra storage space Device size is equal to a single disk Improved read performance Reduced write performance
IBM eServer xSeries Technical Conference © IBM Corporation RAID 4 Striping with parity Redundancy, but less than with mirroring One "chunk" of parity bits per stripe Increased storage space (minus size of one disk) Improved performance, but at cost of CPU overhead
IBM eServer xSeries Technical Conference © IBM Corporation RAID 5 RAID 4 creates a bottleneck on the parity disk Spread parity among all disks for better performance Same total size as RAID 4
IBM eServer xSeries Technical Conference © IBM Corporation Volume Groups A collection of devices (disks, partitions, RAID) The space of all devices is combined in the group, but not directly available as a device.
IBM eServer xSeries Technical Conference © IBM Corporation Volume Groups Combined space is divided into fixed-sized chunks Physical Extents (PEs) Similar to memory page frames
IBM eServer xSeries Technical Conference © IBM Corporation Volume Groups Create volumes from free-space in the group Volumes consist of Logical Extents (LEs) Each LE maps to a PE
IBM eServer xSeries Technical Conference © IBM Corporation Volume Groups Simple resizing of groups Add new devices to the group to expand total available free-space Remove devices that aren't used by any volumes
IBM eServer xSeries Technical Conference © IBM Corporation Volume Groups Simple resizing of volumes Add or remove extents at the end of the volume
IBM eServer xSeries Technical Conference © IBM Corporation Snapshots Frozen image of a live volume Useful for performing consistent backups without needing to take file system off-line Copy-On-Write to save old data "Origin" volume is always up-to-date Snapshot capacity can be smaller than origin Multiple simultaneous snapshots of same origin
IBM eServer xSeries Technical Conference © IBM Corporation Snapshots Origin volume and snapshot storage are divided into equal sized “chunks”
IBM eServer xSeries Technical Conference © IBM Corporation Snapshots Write to the origin volume
IBM eServer xSeries Technical Conference © IBM Corporation Snapshots Write to the origin volume Write request is queued to be finished later Chunk is read from the origin
IBM eServer xSeries Technical Conference © IBM Corporation Snapshots Write to the origin volume Write chunk to snapshot storage
IBM eServer xSeries Technical Conference © IBM Corporation Snapshots Write to the origin volume New COW table entry: Map chunk 4 to chunk 1
IBM eServer xSeries Technical Conference © IBM Corporation Snapshots Write to the origin volume Release all I/Os waiting on this copy
IBM eServer xSeries Technical Conference © IBM Corporation Snapshots Read from the snapshot volume Unmapped chunk: get data from origin volume
IBM eServer xSeries Technical Conference © IBM Corporation Snapshots Read from the snapshot volume Re-mapped chunk: get data from snapshot
IBM eServer xSeries Technical Conference © IBM Corporation Bad Block Relocation Detect I/O errors Remap bad blocks to a reserved area of the device
IBM eServer xSeries Technical Conference © IBM Corporation Enterprise Volume Management System Modular, extensible system for managing storage on Linux Integrates all aspects of volume management into a single package Disk Partitioning (fdisk, Disk Druid) Volume Groups (LVM) Software RAID (MD) File Systems (mkfs, fsck, resizefs) More
IBM eServer xSeries Technical Conference © IBM Corporation EVMS Architecture
IBM eServer xSeries Technical Conference © IBM Corporation EVMS Architecture Engine Core library Provides API for user interfaces Coordinates all activities with the plug-ins Defines common set of possible tasks Creation Deletion Resize Configuration changes
IBM eServer xSeries Technical Conference © IBM Corporation EVMS Engine Architecture
IBM eServer xSeries Technical Conference © IBM Corporation EVMS Engine Architecture Volume discovery bubbles up through the layers Local Disk Manager plug-in discovers all disk devices Each plug-in examines current list of devices Claims a device by removing from the list Creates new devices and adds to the list
IBM eServer xSeries Technical Conference © IBM Corporation EVMS Engine Architecture Plug-ins Each plug-in recognizes a specific volume format Partitions Volume Groups Software RAID EVMS Features File Systems
IBM eServer xSeries Technical Conference © IBM Corporation EVMS Plug-ins
IBM eServer xSeries Technical Conference © IBM Corporation EVMS User Interfaces Communicate with Engine through well-defined API Graphical User Interface ( evmsgui ) Point-and-click Intuitive displays Text-Mode ( evmsn ) Graphical-like for terminal windows Same look and feel of the GUI Command-Line ( evms ) Create scripts for common tasks
IBM eServer xSeries Technical Conference © IBM Corporation Clustering Support in EVMS EVMS can operate in a cluster of machines Assign ownership to a group of disks Private to one cluster node Shared by all cluster nodes Reassign ownership during fail-over Remote Administration Ability to administer other nodes in the cluster from a single machine
IBM eServer xSeries Technical Conference © IBM Corporation Clustering Support in EVMS Cluster Managers Provide membership and communication services Used to specify fail-over policies Linux-HA Open-source, High-Availability cluster manager RSCT Reliable Scalable Cluster Technology OpenGFS (in development) Open-source cluster file system Mount a file system on all cluster nodes simultaneously
IBM eServer xSeries Technical Conference © IBM Corporation EVMS Platform Support EVMS has been tested on Linux running on: xSeries x86 IA64 pSeries PPC PPC64 zSeries s390 s390x
IBM eServer xSeries Technical Conference © IBM Corporation Summary of EVMS Capabilities EVMS integrates all aspects of disk, partition, and volume management into a single, easy-to-use system. EVMS has no design limits on the number of disks, partitions, or volumes that it can handle. EVMS minimizes down time due to configuration changes. EVMS is very extensible, due to its modular design and support of plug-in modules. (continued)
IBM eServer xSeries Technical Conference © IBM Corporation Summary of EVMS Capabilities EVMS is designed to be scalable, including use in clustering environments. EVMS can read, write, and manipulate volumes created by other volume managers (given the proper plug-ins). EVMS can reduce the costs associated with migrating data to Linux.
IBM eServer xSeries Technical Conference © IBM Corporation EVMS Project Information Hosted on SourceForge Live CVS tree Mailing lists Installation instructions Documentation IRC: irc.freenode.net, #evms
IBM eServer xSeries Technical Conference © IBM Corporation EVMS Demonstration