Grant Cohoe IMPACT OF DISK ALIGNMENT IN VIRTUALIZED ENVIRONMENTS
WHY SHOULD YOU CARE? Performance Misalignment causes more IO’s than you need Shared Storage issues
UNDERSTAND YOUR STUFF Hard Disk Geometry Sector Size (Logical & Physical) Operating System What does it want? What does it do by default? Sometimes silly things…
LAYERS Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS
DISK GEOMETRY/PARTITIONS Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS
TERMINOLOGY Sectors Units of disk storage Partition Logical group of sectors Track Ring of sectors on a single side of a platter Cylinder 3D track (all platters at one track location) Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS
MASTER BOOT RECORD (MBR) That thing that boots your OS First 512 bytes of the disk 440 bytes of bootloader 32 bytes of partition information 4 primary partitions - max size 2TB 512 START 440 (Boot loader) 32 Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS
MASTER BOOT RECORD (MBR) DOS Compatibility Cannot span cylinders (because DOS was silly) Number of sectors per cylinder = – 1 (MBR) = 62 sectors before first usable This is deprecated MBR LBA-1 LBA-6263 Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS
MASTER BOOT RECORD (MBR) 1MB Alignment Align all partitions to 1MB 1MB = B / 512B sectors = 2048 (1 st Sector) Improves performance Ensures compatibility for 4K “Advanced Format” This is new standard (Windows Vista) MBR LBA-1 LBA Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS
RESULTING DISK 512B MBR – Alignment Space – 1 st Partition Starting Sector – This is good! MBR … MBR 2048 Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS
LOGICAL VOLUME MANAGEMENT (LVM) Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS
TERMINOLOGY Physical Volume Container of data stored as a partition on disk Logical Volume Virtualized storage structure stored as data in a PV pe_start LV offset within a PV Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS
LVM PHYSICAL VOLUMES (LVM PV) pe_start specifies the start of LV data Very intelligent. Usually not a problem Needs to be aligned to your sectors! Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS
LVM PHYSICAL VOLUMES (LVM PV) Bad pe_start does not line up with a sector Going to hurt performance later Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS MBR Physical Volume pe_start PV Data Region
LVM PHYSICAL VOLUMES (LVM PV) Good As long as pe_start is a multiple of your sector size (usually 512B) you’re good! Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS MBR Physical Volume pe_start PV Data Region
LVM PHYSICAL VOLUMES (LVM PV) PE Size Physical Extent – LVM “block” size Usually default is fine Multiple of sector size (512) Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS
RESULTING VOLUME LV starting point aligned (pe_start) PV aligned to sectors on disk Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS MBR Physical Volume pe_start PV Data Region Logical Volume
HOST FILE SYSTEM Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS
HOST FILE SYSTEM Not much to do here RAID would be a different story… Ext is good at picking sane defaults Block size Smallest unit of data for the filesystem Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS
RESULTING FILESYSTEM MBR Physical Volume pe_start PV Data Region Logical Volume Filesystem
VMDK GEOMETRY & PARTITIONS Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS
VMDK GEOMETRY/PARTITIONS Same principles as host disks DOS compatibility sucks 1MB alignment is good Performance impact is bigger Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS MBR
VM FILE SYSTEM Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS
VM FILE SYSTEM Don’t use RAID/LVM in VMs Unless you really need it for some reason Or if you did a P2V Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS MBR VM File System
VM ALIGNMENT Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS
PERFECTLY ALIGNED VM Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS MBR VM File System MBR Physical Volume pe_start PV Data Region Logical Volume Filesystem 2054
PERFECTLY ALIGNED VM Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS VM FS Block VMDK Sectors Host FS Block LVM PE* Host Disk Blocks * PE shown as 1K for example 1024
MISALIGNED VM Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS VM disk image sits across two Host FS blocks, thus requiring more reads of the host disks to get all data 4096B of VM data requires 8192B of host disk data to read 1024
END GOAL Disk Geometry/Partitions LVM Host File System VMDK Geometry/Partitions VMFS MBR VM File System MBR Physical Volume pe_start PV Data Region Logical Volume Filesystem
Grant Cohoe QUESTIONS?
MODERN STUFF BONUS MATERIAL
ADVANCED FORMAT DISKS 4K Sectors Old: New: Much more efficient with todays data usage 512e Emulation Mode Lets old stuff still work with new disks Logical (OS): Physical (Disk):
ADVANCED FORMAT DISKS & MBR Regular disks (512 byte sectors) LBA-63 Advanced Format (4K sectors) w/ e512 LBA-63 PROBLEM LATER ON MBR MBR K sectors 789
GUID PARTITION TABLE (GPT) That new thing that boots your OS First 17K of the disk Lots of stuff > On Disk GPT Alignment Space2048
RAID IMPLICATIONS If RAID volume misaligned, entire array is affected RAID in VMs is BAD!
RAID TERMINOLOGY Data Disk A disk that has real data (not parity) Stripe RAID unit of IO (“block”) Also called “Chunk” Stride Amount of data from a stripe before moving to next disk Stripe Width Length of a stripe
RAID MATH Constants DATA_DISKS = 3 (lets say this is RAID5 with 4 disks) BLOCK_SIZE = 4K (from the filesystem) CHUNK_SIZE = 512K Calculate Stride STRIDE = CHUNK_SIZE / BLOCK_SIZE = 128K Calculate Stripe Width STRIPE_WIDTH = STRIDE * DATA_DISKS = 384K What this means: One unit of RAID IO will write 128K to the first disk then move on to the next one
REFERENCES drives-e512/ drives-e512/