Analysis of Virtual Tape Subsystems Ned Diehl The Information Systems Manager, Inc NCACMG Vienna, Virginia 5 June 2002
Trademarks (Omissions are unintentional) ISM PerfMan IBM Parallel Sysplex RMF OS/390 z/OS Magstar VSM StorageTek HSC Nearline
Objectives Discuss sources of performance analysis and capacity planning data for virtual tape subsystems (VT) in an OS/390 environment Primary focus on hardware implementations –IBM VTS –StorageTek VSM Present graphical examples System rather than application focus
Contents Introduction Virtual Tape Structure Data Sources Key Performance Metrics Recommendations Summary References
Introduction Solves many traditional tape problems –Small data sets –Low activity open data sets –Allocation wait for tape drives Weakest with “good” tape activity –Full volumes –High transfer rates Exploits high capacity tapes –VT helps remove implementation impediments
Introduction Consider disaster recovery –VTS Peer-to-Peer –Clustered VTSS Configuration Mount management considerations –Deferred mount should generally be avoided –Premount can be desirable Not yet everything for everyone – but getting there –Fewer problem datasets with current implementations
VT Structure General Looks (almost) like 3490E to host Combination of hardware and software –RISC processor –Library of high capacity tapes –RAID DASD buffer or CACHE –Tape drives Amount of host software varies with implementation
VT Structure Logical Components
VT Structure VTS - Physical Components
VT Structure VSM - Physical Architecture HSC Control Path CDS VTCS VSM Control & Data Path Migrate/ Recall VTSS MVC MVC MVC VTV VTV VTV VTV VTV Physical tape drives Physical tape volumes Physical Libraries & slots Virtual tape drives Virtual tape volumes
Data Sources Application –SMF 30, 72, 14, & 15 Hardware –SMF 94 - tape library statistics –STK user SMF record –SMF 14 & 15 – data set activity –SMF 21 – error statistics by volume –RMF 73 - channel path activity –RMF device activity –RMF I/O queuing –Real time controller interface
Data Sources SMF 94 Easy to work with though not flexible Hourly summary from VTS or Library –Difficult to synchronize (do not use SMF94HHI) –Identical data to all attached recording images Base segments reflect all library statistics –ATL, mount, dismount, eject, & insert –Native and VTS activity Identical data in multiple records with a different serial number if VTS and native drives attached VTS segments reflect single VTS –VTS, import / export, enhanced statistics
Data Sources SMF 94 One record produced for each –Library with native drives, identified by ATL serial (SMF94SNO). Contains base segments. –VTS, identified by VTS serial (SMF94SNO & SMF94VLS). Contains base and VTS segments. With peer-to-peer, three (local) or four (remote) VTSs involved –Only two have physical tape –No easy way to associate them Generally consistent with RMF –RMF allows finer detail of comparable metrics
Data Sources SMF 94 Data Issues Hour index (SMF94HHI) is interval end Drives available (SMF94VTA) not always reset after service –Drives used (SMF94VTV, VTN, & VTX) can be greater than available Alignment problem with early Import/Export statistics (SMF94ACA & SMF94ACB) Duplicate library data with a different serial number if VTS and native drives attached Average age in cache (SMF94VCA) has had changed definitions
Data Sources SMF 94 Data Issues Tape data transfer (SMF94VTR & SMF94VTW) reported as zero in some recent samples Max cache volume age (S94MTVCA) recorded as seconds, documented as minutes Backstore compression ratio (S94BSRAT) sometimes reported as less than one Recall throttle percent (S94RCPRT) sometimes greater than 100
Data Sources SMF 94 Data Issues Reference Flash dated 20 November Recalls might be recorded as hits. Effected metrics: –S94MAXCH Maximum Cache Hit Mount Time –S94AVGCH Average Cache Hit Mount Time –SMF94VMH Number of Cache Hit Mounts –S94MAXRM Maximum Recall Mount Time –S94AVGRM Average Recall Mount Time –SMF94VMS Number of Recall Mounts –SMF94VPS Number of Physical Mounts for Recall
Data Sources STK User Record Complex to work with –Many subtypes –Repeating segments Allows detail analysis –Configuration info –CU and device busy –Volume level data –VTSS interval of 15 minutes –Grouping and summarization required –Some logical metrics require multiple subtypes
Data Sources STK User Record Interval metrics by VTSS (not image) to each recording OS/390 –Subsystem, channel interface, and RTD performance –Values vary with OS/390 recording time –Logically identically but physically different Event metrics recorded once –VTV mount, dismount, delete, migrate, recall, movement, replicate –RTD mount, dismount, vary –MVC status
Data Sources STK Data Issues Must process event records from all recording images Mount events not necessarily recorded to original requesting image Dismount events not necessarily recorded to same image as mount Might be extra mounts or dismounts Calculated times might be very large or negative Some time values in complex formats
Data Sources STK Data Issues RMF consistency varies –Most metrics close across multiple samples I/O Rates Connect time Allocated time Total mount time –Significant variation with mount counts and thus average mount time Probably a code level issue
Key Performance Metrics Application –Service levels Hardware –Data transfer –Device usage –Virtual mount time –Mount miss (recall) rate –Storage usage –Tape volume age (in DASD buffer or cache)
Key Performance Metrics Application Performance Should have reasonable and measurable performance objectives Objectives will vary with –Installation –Application –VT –Time & day
Key Performance Metrics Data Transfer Good for reporting work performed –Mounts and I/O rates also options Saturation varies with environment SMF 14, 15, and 21 provide read and write RMF 73 provides read and write for FICON. VTS provides data transfer –Host read (SMF94VBR) and write (SMF94VBW) are good throughput indicators –Read (SMF94VTR) and write (SMF94VTW) between cache and real tapes
Key Performance Metrics Data Transfer VTSS provides many data transfer metrics but assumptions are required –RTD bytes read (SMF20BTR) and written (SMF20BTW) –RTD connect (SMF20DCT) and utilization (SMF20DUT) –Host and RTD channel interface busy (SMF11CUB) –VTV dismount has VTV size (SMF14VSZ) Probably best throughput indicator
Key Performance Metrics Data Transfer
Key Performance Metrics Device Usage RMF 74-1 provides virtual by device or group. –Allows separation of mount and allocation components (e.g. connect, pend, wait). VTS provides minimum, maximum, average, and configured for both virtual and physical devices. All values are integers. VTS average (SMF94VTV) and maximum (SMF94VTX) physical are good for trending While not normally a problem, virtual device use should not be ignored.
Key Performance Metrics Device Usage Difficult to calculate VTSS minimum, maximum, average, and available for either virtual or physical devices –Mount, dismount, and vary subtypes must be grouped and summarized –RTDs can be statically shared (operator command) by multiple VTSSs and OS/390 VTSS RTD connect and utilization, which were previously discussed, are useful
Key Performance Metrics Device Usage
Key Performance Metrics Device Usage (RMF)
Key Performance Metrics Virtual Mount Time RMF 74-1 provides average mount time by device or group VTS provides minimum, maximum, and average virtual and physical mount times Average virtual (SMF94VRA) is good for trending High maximum virtual (SMF94VRX) often correlates with problems
Key Performance Metrics Virtual Mount Time VTSS mount time calculated from mount event records (SMF13MET - SMF13MST) –Min, max, average and counts require grouping and summarization –Average might be distorted by extra mounts VTSS provides several potentially interesting identifiers –Job, step, DSN, VTV management class –Scratch, existing
Key Performance Metrics Virtual Mount Time Mount times tend to increase with: –Reduced tape volume age on DASD (SMF94VCA) –Increased mount miss rate (SMF94VMS) –Increased average real drives mounted (SMF94VTV) Published targets: –Daily average less than 30 seconds –Hourly average less than 300 seconds –Maximum less than 900 seconds
Key Performance Metrics Virtual Mount Time (RMF)
Key Performance Metrics Virtual Mount Time
Key Performance Metrics Max Virtual Mount Time
Key Performance Metrics Average Virtual Mount Time
Key Performance Metrics Mount Miss Rate VTS provides mounts by type –Fast ready (SMF94VFR) –Specific mount hits (SMF94VMH) –Specific mount misses (SMF94VMS) Related to average time on DASD (SMF94VCA) and application cycles Published target of miss percentage less than 20% –Miss % = 100 * Misses / Total Virtual Mounts –Miss % = 100 * VMS / (VFR + VMH + VMS)
Key Performance Metrics Mount Miss Rate VTSS mount hit indicator added in recent update (SMF13RCI) –Scratch mounts indicated (SMF13VMT) –If older data, could assume a hit if calculated mount time less than a threshold
Key Performance Metrics Mount Miss Rate
Key Performance Metrics Specific Mount Miss Rate
Key Performance Metrics Mount Miss Rate
Key Performance Metrics Storage Usage VTS provides estimate of data on stacked volumes (SMF94VBA) and available capacity on empty cartridges (SMF94VEC) –Reconciliation will probably free space High recalls (SMF94VMS) and low tape volume age on DASD (SMF94VCA) indicate need for more DASD More DASD can relieve saturated physical tape drives
Key Performance Metrics Storage Usage
VTSS provides cache and DASD metrics –Configuration info DASD capacity (SMF10TCP), cache & NVS size, channels –Used (SMF10FCP) and contiguous (SMF10CFP) free DASD space No easy way to report VTSS MVC metrics –Must produce & process MVC status records
Key Performance Metrics Storage Usage
Key Performance Metrics Tape Volume Age VTSS delete reference age can sometimes be related to application cycles –SMF delete time stamp – SMF15LTR –Consider selective summarization –Separate migrate immediate delete Subtype 18 with SMF18MTI set, closely followed by related Subtype 15 (same virtual VOLSER) Occasionally very high values
Key Performance Metrics Tape Volume Age
Recommendations Report and track application service levels Host data transfer is good workload measure Virtual mount time (average and maximum) is good response measure Keep history Vary period and interval size with need –Longer (e.g monthly) for trends –Shorter (e.g. daily) for glitches –Hour is good interval for most purposes
Summary Many existing analysis tools useful Some valuable data only available in unique VT records Packages probably needed for VT metrics VT will become more common and important
References WP IBM Magstar Model B16 Virtual Tape Server Elements of Performance, IBM WP VTS and Logical Paths, IBM WP IBM Magstar 3494 Peer-to-peer VTS Performance White Paper, IBM WP IBM Magstar 3494 Virtual Tape Server Performance White Paper, IBM WP IBM TotalStorage Virtual Tape Server Performance, IBM WP IBM TotalStorage Peer-to-peer Virtual Tape Server Performance, IBM SG IBM Virtual Tape Server & Enhancements to Magstar, IBM SG IBM Magstar VTS: Planning, Implementing, and Monitoring, IBM SG Guide to Sharing and Partitioning IBM Tape Library Data Servers, IBM SC OS/390 RMF Report Analysis, IBM GC OS/390 System Management Facilities (SMF), IBM
References Cheryl Watson’s Tuning Letter 2000, N0.5 - Focus: VTS Measurements Cheryl Watson’s Tuning Letter 2001, N0.6 - Focus: StorageTek VSM Introduction to Virtual Storage Manager, StorageTek StorageTek Virtual Tape Control System - Installation, Configuration & Administration Guide, StorageTek PerfMan for Tape Libraries, ISM PerfMan for OS/390, ISM CMG Proceedings Share Proceedings Vendor WEB sites