Developing Highly Available Multipath Solutions and Device-Specific Modules

Slides:



Advertisements
Similar presentations
Computer-System Structures Er.Harsimran Singh
Advertisements

WDM 드라이버의 기본 구조 What is WDM?
MCITP Guide to Microsoft Windows Server 2008, Server Administration (Exam #70-646) Chapter 2 Installing Windows Server 2008.
Hands-On Microsoft Windows Server 2003 Administration Chapter 10 Monitoring and Troubleshooting Windows Server 2003.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
Home: Phones OFF Please Unix Kernel Parminder Singh Kang Home:
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 2: Managing Hardware Devices.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 2: Managing Hardware Devices.
I/O Request Flaw in WDF Kernel-Mode Driver
Section 3 : Business Continuity Lecture 29. After completing this chapter you will be able to:  Discuss local replication and the possible uses of local.
1 © 2006 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Technical Support Presentation Using the Cisco Technical Support.
Implementing Failover Clustering with Hyper-V
iSCSI Management and Tuning Shiv Rajpal Senior Development Lead Device and Storage Technologies
I/O Systems ◦ Operating Systems ◦ CS550. Note:  Based on Operating Systems Concepts by Silberschatz, Galvin, and Gagne  Strongly recommended to read.
Module 10 Configuring and Managing Storage Technologies.
Hands-On Microsoft Windows Server 2008
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 11 Case Study 2: Windows Vista Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 2: Managing Hardware Devices.
Guide to Linux Installation and Administration, 2e 1 Chapter 9 Preparing for Emergencies.
How to Add WMI Interfaces to SCSIPort and Storport Miniports
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
Conditions and Terms of Use
Module 9: Configuring Storage
Module 7: Hyper-V. Module Overview List the new features of Hyper-V Configure Hyper-V virtual machines.
Appendix B Planning a Virtualization Strategy for Exchange Server 2010.
Hardware Definitions –Port: Point of connection –Bus: Interface Daisy Chain (A=>B=>…=>X) Shared Direct Device Access –Controller: Device Electronics –Registers:
ATA Miniport Nuts and Bolts
Module 7: Fundamentals of Administering Windows Server 2008.
CHAPTER 2: COMPUTER-SYSTEM STRUCTURES Computer system operation Computer system operation I/O structure I/O structure Storage structure Storage structure.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Cisco S2 C4 Router Components. Configure a Router You can configure a router from –from the console terminal (a computer connected to the router –through.
MCTS Guide to Microsoft Windows Server 2008 Applications Infrastructure Configuration (Exam # ) Chapter Three Configuring Windows Server 2008 Storage.
Implementing Hyper-V®
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Windows Vista Inside Out Chapter 22 - Monitoring System Activities with Event Viewer Last modified am.
Guide to Linux Installation and Administration, 2e1 Chapter 2 Planning Your System.
Computer Emergency Notification System (CENS)
Compatibility and Interoperability Requirements
Device Drivers CPU I/O Interface Device Driver DEVICECONTROL OPERATIONSDATA TRANSFER OPERATIONS Disk Seek to Sector, Track, Cyl. Seek Home Position.
3.14 Work List IOC Core Channel Access. Changes to IOC Core Online add/delete of record instances Tool to support online add/delete OS independent layer.
Virtual Machine Queue Driver Development Sambhrama Mundkur Sr. Software Design Engineer Core Networking
Enhanced Storage Architecture
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
VMware vSphere Configuration and Management v6
High Availability in DB2 Nishant Sinha
Peter Mattei HP Storage Consultant 16. May 2013
NDIS Intermediate Drivers Larry Cleeton Program Manager Windows Networking And Communications Microsoft Corporation.
© 2006 EMC Corporation. All rights reserved. The Host Environment Module 2.1.
Bob “GRIZZY” Griswold Senior Program Manager, WHEG Microsoft Corporation.
Virtual Machine Movement and Hyper-V Replica
Windows Server 2008 R2 Failover Clustering and Network Load Balancing October 25 th 2009.
© ExplorNet’s Centers for Quality Teaching and Learning 1 Explain the importance of routine backup and maintenance. Objective Course Weight 4%
13 Copyright © 2007, Oracle. All rights reserved. Using the Data Recovery Advisor.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
1 © 2004 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Technical Support Seminar Using the Cisco Technical Support Website.
Monitoring Dynamic IOC Installations Using the alive Record Dohn Arms Beamline Controls & Data Acquisition Group Advanced Photon Source.
Windows Server 2003 { First Steps and Administration} Benedikt Riedel MCSE + Messaging
CIS 221 Lesson 2. What is the first phase of the of the Installation of Windows XP? MS-DOS phase Why is the MS-DOS phase needed? the computer required.
Lesson 19: Configuring and Managing Updates
Fujitsu Training Documentation ETERNUS Multipath (ETMP)
WHDC PowerPoint Template Notes & Handouts
CONFIGURING HARDWARE DEVICE & START UP PROCESS
Specialized Cloud Architectures
Microsoft Core Storage Update
Distance Vector Routing Protocols
Presentation transcript:

Developing Highly Available Multipath Solutions and Device-Specific Modules Jaivir Aithal Senior Software Development Engineer Device & Storage Technologies jaivira@microsoft.com

AGENDA Microsoft Multipath IO (MPIO) Deployment and Configuration Key Enhancements for Windows Server 2008 R2 Configuration in the absence of storage Performance optimizations Health monitoring Best Practices for MPIO tuning Registry settings Tips & Tricks for Device-Specific Module (DSM) writers How to get a DSM to best work with MPIO’s management UI Lessons learned through the Microsoft DSM (MSDSM) Common pitfalls for DSM writers and tips for how to address them.

MPIO Deployment and Configuration MPIO Optional Component (OC) Using dism.exe dism /online /quiet /enable-feature:MultipathIo Claiming DSM Support Using MSDSM vs. Vendor DSM SPC-3 compliance Migration requirements Registry restrictions HKLM\System\CurrentControlSet\Services\<DSM>\Parameters x86 vs. x64 System class vs. SCSI Adapter class Driver signing

DSM Installation Y N N N Y Y Determine OS Server 2008 or upwards Use HardwareID Root\MPIO N MPIO installed? Windows Server 2008? Use HardwareID Detected\MPIO N N Enable Optional Component using DISM Y Y Install MPIO, DSM & MPDEV Enable Optional Component using PKGMGR Restart storage stack Install only the DSM

Enabling Pre-Configuration Problem Definition Ability to configure multipath settings without requirement for external storage to be physically attached Scenarios Datacenter automation (preconfigure servers, connect storage later) Configuration utility that sets tunables Management utility that sets operation settings Architecture changes WMI registration by MPIO Control object (FDO) WMI registration piggy-backing on pseudo-LUN (PDO) Supported only on Windows Server 2008 R2 and upwards

DSM Changes Required Implementation Details MOF changes Distinguish DSM-centric classes from Device-centric ones Split WMI classes into two files to avoid common mistakes Generate the binary data during compile time Remember to specify the resource name of the new binary MOF Registration details Update DsmType to DsmType5 Pass the structure size as the size of the updated DSM_INIT_DATA Specify DSM-centric WMI GUIDs using DsmWmiGlobalInfo Continue specifying Device-centric GUIDs using DsmWmiInfo

DSM-centric MOF example – msdsmdsm.mof // This is information that should be available even if no storage is physically present. // // Example: Supported devices list class. [WMI, Dynamic, Provider("WmiProv"), Description("Retrieve MSDSM's supported devices list.") : amended, Locale("MS\\0x409"), guid("{c362d67c-371e-44d8-8bba-044619e4f245}")] class MSDSM_SUPPORTED_DEVICES_LIST { [key, read] string InstanceName; [read] boolean Active; [WmiDataId(1), read, Description("Number of supported devices.") : amended] uint32 NumberDevices; [WmiDataId(2)] uint32 Reserved; [WmiDataId(3), read, MaxLen(31), Description("Array of device hardware identifiers.") : amended, WmiSizeIs("NumberDevices") ] string DeviceId[]; };

Device-centric MOF example – msdsm.mof // This is information that pertains to a specific instance of the device. Here’s an example: // // Embedded basic-statistics class. [WMI, guid("{a34d03ec-6b0b-46a1-9178-82525f41133f}")] class MSDSM_DEVICEPATH_PERF { [WmiDataId(1)] uint64 PathId; [WmiDataId(2)] uint32 NumberReads; [WmiDataId(3)] uint32 NumberWrites; [WmiDataId(4)] uint64 BytesRead; [WmiDataId(5)] uint64 BytesWritten; }; // Statistics provider class [WMI, Dynamic, Provider("WmiProv"), Description("Retrieve MSDSM Performance Information.") : amended, Locale("MS\\0x409"), guid("{875b8871-4889-4114-93f6-cd064c001cea}")] class MSDSM_DEVICE_PERF [key, read] string InstanceName; [read] boolean Active; [WmiDataId(1), read, Description("Number of paths.") : amended] uint32 NumberPaths; [WmiDataId(2), read, Description("Array of Performance Information per path for the device.") : amended, WmiSizeIs("NumberPaths“)] MSDSM_DEVICEPATH_PERF PerfInfo[];

DSM WMI Registration typedef struct _DSM_INIT_DATA { // Size, in bytes. ULONG InitDataSize; // DSM entry points. DSM_INQUIRE_DRIVER DsmInquireDriver; . . . DSM_BROADCAST_SRB DsmBroadcastSrb; // Wmi entry point and guid information. DSM_WMILIB_CONTEXT DsmWmiInfo; // Version 2 starts here... DSM_TYPE DsmType; // Version 5 starts here... // Wmi entry point and guid information for DSM-centric classes. DSM_WMILIB_CONTEXT DsmWmiGlobalInfo; } DSM_INIT_DATA, *PDSM_INIT_DATA;

DSM-centric WMI Registration // DsmTypeUnknown == mustn't be used. // DsmType1 == first version // DsmType2 == indicates that DSM uses InterpretErrorEx() and handles WMI calls with // DSM_IDS passed in as extra parameter // DsmType3 == indicates that DSM handles cases where completion routine can be called with NULL DsmId // DsmType4 == indicates that DSM provides version info // DsmType5 == indicates that DSM provides additional DSM-centric (global) WMI classes // DsmType6 == not used typedef enum _DSM_TYPE { DsmTypeUnknown = 0, DsmType1, DsmType2, DsmType3, DsmType4, DsmType5, DsmType6 } DSM_TYPE, *PDSM_TYPE; #define DSM_INIT_DATA_TYPE_1_SIZE (RTL_SIZEOF_THROUGH_FIELD(DSM_INIT_DATA, Reserved)) #define DSM_INIT_DATA_TYPE_2_SIZE (RTL_SIZEOF_THROUGH_FIELD(DSM_INIT_DATA, DsmType)) #define DSM_INIT_DATA_TYPE_3_SIZE DSM_INIT_DATA_TYPE_2_SIZE #define DSM_INIT_DATA_TYPE_4_SIZE (RTL_SIZEOF_THROUGH_FIELD(DSM_INIT_DATA, DsmVersion)) #define DSM_INIT_DATA_TYPE_5_SIZE (sizeof(DSM_INIT_DATA))

Performance Enhancements Improvements in Core MPIO stack Elimination of unnecessary use of spinlocks Conversion of a spinlock into a reader-writer lock Minimizing unnecessary memory write operations Re-laying members of a data structure to minimize CPU reads MSDSM enhancements Make gathering statistics optional Eliminate unnecessary use of processor-intensive operations New load balance policy: Least Blocks Performance Gains in the MPIO stack (i.e. mpio.sys and msdsm.sys) Preliminary results indicate up to 15% improvement on certain configuration under certain loads. (However, pre-Beta and Beta builds might not indicate what is expected for RTM performance numbers.)

MPIO Health Monitoring Common Interface for basic statistical data Querying interface is WMI Granularity at three levels LUN Path Device Instance (i.e. LUN-Path pairing) Health packets maintained even after monitored entity has gone offline Potential advantages Improve diagnosability Reduce DSM’s overhead for maintaining these counts Consumers can implement custom triggers Consistent interface for management applications, regardless of underlying DSM

MPIO Health Monitoring Reads WMI event Consumer A Writes MPIO Path Failures Retries Consumer B IO Errors WMI event

Health Monitoring WMI Class For LUN // Embedded Disk Health Class [WMI, guid("{6453c476-0499-42ab-9825-5133282b0b56}")] class MPIO_DISK_HEALTH_CLASS { [WmiDataId(1), read, Description("Number of read requests sent to this device.") : amended] uint64 NumberReads; [WmiDataId(2), read, Description("Number of write requests sent to this device.") : amended] uint64 NumberWrites; [WmiDataId(3), read, Description("Cumulative number of bytes read by requests sent to this device.") : amended] uint64 NumberCharsRead; [WmiDataId(4), read, Description("Cumulative number of bytes written by requests sent to this device.") : amended] uint64 NumberCharsWritten; [WmiDataId(5), read, Description("Number of requests sent to this device that were retried.") : amended] uint64 NumberRetries; [WmiDataId(6), read, Description("Number of requests sent to this device that failed.") : amended] uint64 NumberIoErrors; [WmiDataId(7), read, Description("System time at which this health packet was created for this device.") : amended] uint64 CreateTime; [WmiDataId(8), read, Description("Number of path failures experienced by this device.") : amended] uint64 PathFailures; [WmiDataId(9), read, Description("System time at which this device went offline/failed.") : amended] uint64 FailTime; [WmiDataId(10), read, Description("Flag that indicates if the device is offline/failed.") : amended] boolean DeviceDisabled; [WmiDataId(11), read, Description("Count of the number of times that the NumberReads field wrapped.") : amended] uint8 NumberReadsWrap;

Health Monitoring WMI Class For LUN – Contd. [WmiDataId(12), read, Description("Count of the number of times that the NumberWrites field wrapped.") : amended] uint8 NumberWritesWrap; [WmiDataId(13), read, Description("Count of the number of times that the NumberCharsRead field wrapped.") : amended] uint8 NumberCharsReadWrap; [WmiDataId(14), read, Description("Count of the number of times that the NumberCharsWritten field wrapped.") : amended] uint8 NumberCharsWrittenWrap; [WmiDataId(15), read] uint8 Pad1[3]; }; // Provider Health Information Class [WMI, Dynamic, Provider("WmiProv"), Description("MPIO Psuedo-LUN Health Information.") : amended, Locale("MS\\0x409"), guid("{ef04568a-782b-443c-a3db-966ab43775f9}")] class MPIO_DISK_HEALTH_INFO { [key, read] string InstanceName; [read] boolean Active; [WmiDataId(1), read, Description("Number of Psuedo-LUN Health Packets.") : amended] uint32 NumberPlPackets; [WmiDataId(2), read, Description("Reserved for future use.") : amended] uint32 Reserved; [WmiDataId(3), read, Description("MPIO Pseudo-LUN Health Info Array.") : amended, WmiSizeIs("NumberPlPackets“)] MPIO_DISK_HEALTH_CLASS PlHealthPackets[];

Health Monitoring WMI Classes – Path & Device Instance Path Health Information Embedded class: MPIO_PATH_HEALTH_CLASS Provider class: MPIO_PATH_HEALTH_INFO Device Instance Health Information Embedded class: MPIO_DEVINSTANCE_HEALTH_CLASS Provider class: MPIO_DEVINSTANCE_HEALTH_INFO Health Packet Cleanup Registry value: FlushHealthInterval Default: 24 hours Turning OFF Health Monitoring Registry value: GatherHealthStats Default: TRUE (i.e. ON)

MPIO Health Reporting – Example 1 Path Health, Disk (pseudo-LUN) Health and DeviceInstance Health Statistics

MPIO Health Reporting – Example 2 Health Statistics output after the user-specified “Health Flush” period has expired and the “orphan” Health packets (associated with failed path 000000077030001 have been discarded.

MPIO Configuration Snapshot Uses existing WMI classes Exports the existing MPIO configuration to a text file Can be used by administrators for troubleshooting Can be used by DSM writers during development and testing phases Information written to a file in reverse chronological order (i.e. history maintained) Default output file used: HKLM\System\CurrentControlSet\Services\mpio\Parameters, DefaultConfigOutputFile

MPIO Tunables Application IRP NTFS DISK DISK DISK IRP PathVerify IRP Timer MIN MAX DEFAULT PathVerifyEnabled FALSE TRUE FALSE PathVerificationPeriod 0 MAXULONG 30s RetryCount 0 500 3 RetryInterval 0 MAXULONG 1s PDORemovePeriod 0 MAXULONG 20s IRP NTFS DISK DISK DISK Number of times retired <= RetryCount IRP LUN continues residing in memory, waiting for a path to come back online When PDORemovePeriod expires PathVerify InterpretError returns Retry = TRUE IRP IRP IRP PathVerify IRP IRP When PathVerificationPeriod expires PathVerifyEnabled LUN DSM A LUN B LUN DsmID(0) A, B DsmID(1) HBA 0 HBA 1 MPIO PCI PCI PNP Adapter 0 LUN Adapter 1

DSM Tips & Tricks

DSM must preferably implement DsmType4 requirements Getting a DSM To Work With MPIO UI // List of supported GUIDs GUID DSM_QuerySupportedLBPoliciesV2GUID = DSM_QuerySupportedLBPolicies_V2Guid; ... #define DSM_QuerySupportedLBPoliciesV2GUID_Index 0 WMIGUIDREGINFO DsmGuidList[] = { {&DSM_QuerySupportedLBPoliciesV2GUID, 1, 0}, }; NTSTATUS DriverEntry(. . .) { DSM_INIT_DATA dsmInitData; // Get DSM’s version information DsmpGetVersion(&dsmInitData.DsmVersion); // Set-up the init data dsmInitData.InitDataSize = DSM_INIT_DATA_TYPE_4_SIZE; dsmInitData.DsmType = DsmType4; // Send the IOCTL to mpio.sys to register. DsmSendDeviceIoControlSychronous(IOCTL_MPDSM_REGISTER, ..., *dsmInitData); } DSM needs to implement Version 2 (i.e. xyz_V2) of the WMI classes defined in mpioLBPo.mof DSM needs to implement DSM_QuerySupportedLBPolicies_V2 at the very least DSM must preferably implement DsmType4 requirements

DSM needs to implement DSM_QueryLBPolicy_V2 Getting MPIO UI To Restrict Allowable Path States // List of supported GUIDs GUID DSM_QueryLBPolicyV2GUID = DSM_QueryLBPolicy_V2Guid; ... #define DSM_QueryLBPolicyV2GUID_Index 0 WMIGUIDREGINFO DsmGuidList[] = { {&DSM_QueryLBPolicyV2V2GUID, 1, 0}, }; NTSTATUS DsmQueryData(...) { if (GuidIndex == DSM_QueryLBPolicyV2GUID_Index) { PDSM_Load_Balance_Policy_V2 LBPolicy = &(((PDSM_QueryLBPolicy_V2)Buffer)->LoadBalancePolicy); for (ULONG inx = 0; inx < DsmIds->Count; inx++) { LBPolicy->DSM_Paths[index]->Reserved = DSM_STATE_ACTIVE_OPTIMIZED_SUPPORTED; // Depending on supported states, OR them in if (activeUnoptimizedSupported) { LBPolicry->DSM_Paths[index]->Reserved |= DSM_STATE_ACTIVE_UNOPTIMIZED_SUPPORTED; } DSM needs to implement DSM_QueryLBPolicy_V2 DSM should return an OR’d flag of the supported path states in MPIO_DSM_Path_V2’s Reserved field

To work with VDS, the DSM must also implement DSM_QueryUniqueId Ensuring DSM Works With Virtual Disk Service It is mandatory for a DSM to implement DSM_QuerySupportedLBPolicies_V2 at a minimum To work with VDS, the DSM must also implement DSM_QueryUniqueId // List of supported GUIDs GUID DSM_QuerySupportedLBPoliciesV2GUID = DSM_QuerySupportedLBPolicies_V2Guid; GUID DSM_QueryDsmUniqueIdGUID = DSM_QueryUniqueIdGuid; ... #define DSM_QuerySupportedLBPoliciesV2GUID_Index 0 #define DSM_QueryDsmUniqueIdGUID_Index 1 WMIGUIDREGINFO DsmGuidList[] = { {&DSM_QuerySupportedLBPoliciesV2GUID, 1, 0}, {&DSM_QueryDsmUniqueIdGUID, 1, 0}, }; NTSTATUS DsmQueryData(...) { if (GuidIndex == DSM_QueryDsmUniqueIdGUID_Index) { PDSM_QueryUniqueId dsmQueryUniqueId = Buffer; // Ensure that the 64-bit returned value will be unique dsmQueryUniqueId->DsmUniqueId = (ULONGLONG)((ULONG_PTR)DsmContext); }

Avoiding Immediate LUN Tear-down Post-Initialization NTSTATUS DsmSetDeviceInfo( __in IN PVOID DsmContext, __in IN PDEVICE_OBJECT TargetObject, __in IN PVOID DsmId, __inout IN OUT PVOID *PathId ) { PDSM_DEVICE_INFO deviceInfo = DsmId; PSCSI_ADDRESS scsiAddress = deviceInfo->ScsiAddress; // It is possible that Port, Bus and Target are all zero // Ensure that the returned PathId is never zero (since MPIO // will treat that as NULL) pathId = DSM_PATHID_PREFIX; pathId <<= 8; pathId |= scsiAddress->PortNumber; // Port pathId |= scsiAddress->PathId; // Bus pathId |= scsiAddress->TargetId; // Target *PathId = ((PVOID)((ULONG_PTR)(pathId))); ... return status; } It is possible that Port, Bus and Target are all zero for a given device instance based on its location information DSM needs to ensure that if PathId is being generated (in SetDeviceInfo) using SCSI address, that it isn’t getting returned as zero (NULL) in such cases

Avoiding Bogus Path Flagging On Path Recovery BOOLEAN DsmIsPathActive(...) { ... // Set a flag that IsPathActive was successfully called deviceInfo->Usable = TRUE; return TRUE; } PVOID DsmLBGetPath(...) { for (inx = 0; inx < DsmList->Count; inx++) { deviceInfo = DsmList->IdList[inx]; // Don’t consider paths that aren’t yet usable if (deviceInfo->Usable == FALSE) continue; // Find the best candidate to return, even if not in A/O // Prefer: Active/Unoptimized > StandBy > Unavailable pathId = DsmpCheckIfIsBetterCandidatePath(deviceInfo,...); return pathId; ULONG DsmInterpretErrorEx(...,PBOOLEAN Retry,PLONG RetryInterval){ // If SenseData indicates non-A/O path was chosen, retry IO if (addSenseQ == 0xA || addSenseQ == 0xB || addSenseQ == 0xC){ *Retry = TRUE; *RetryInterval = ALUA_STATE_CHANGE_TIME_TAKEN; DSM should ensure that an Active/Optimized (A/O) path should NOT be returned for handling I/O until SetDeviceInfo, PathVerify, and IsPathActive have been called Until then, return a non-Active/Optimized path and handle check conditions caused, if any

Handling IO In The Absence of Active/Optimized Path PVOID DsmLBGetPath(...) { ... // Find the best candidate to return, even if not in A/O return DsmpFindBestCandidatePath(...); } ULONG DsmInterpretErrorEx(...,PBOOLEAN Retry,PLONG RetryInterval,...) // If SenseData indicates access state changed, or implict // transition failed, or TPG in non-active state, retry IO if((sKey==0x6 && addSn==0x2A && (aSQ==0x6 || aSQ==0x7)) || (sKey==2 && addSn==4 && (aSQ==0xA || aSQ==0xB || aSQ==0xC))){ sendTPG = TRUE; *Retry = TRUE; *RetryInterval = ALUA_STATE_CHANGE_TIME_TAKEN; errorMask = DSM_RETRY_DONT_DECREMENT; // Send an RTPG asynchronously to get updated TPG states // If explicit-only transitions supported, this routine will // send an STPG first to make one of the TPGs Active/Optimized if (sendTPG) DsmpSetPathForIoRetryALUA(...); return errorMask; Due to path failure or implicit state transition, there may be no path in Active/Optimized state to return back to MPIO Return a non-A/O path. The request may fail with a check condition suggesting state is StandBy, Unavailable or Transitioning Use InterpretErrorEx to retry after a specific period of time & DSM_RETRY_DONT_DECREMENT flag to prevent request from failing back due to retries getting exhausted

The DSM will likely be sending down RTPG during DsmInquire Reducing ALUA Storage Device Initialization Time NTSTATUS DsmInquire(...) { PDSM_DEVICE_INFO deviceInfo; // Represents this DeviceInstance ... // For ALUA storage, get the Target Port Groups (TPG) info status = DsmpReportTargetPortGroups(TargetDevice, ...); if (NT_SUCCESS(status) deviceInfo->IgnorePathVerify = TRUE; return status; } NTSTATUS DsmPathVerify(...) // If storage is ALUA, and this is the first time PathVerify // is being called, we may be able to skip doing it if(deviceInfo->ALUASupport != DSM_DEVINFO_ALUA_NOT_SUPPORTED){ if (deviceInfo->IgnorePathVerify == TRUE) { status = STATUS_SUCCESS; // From now on, we should send PathVerify if asked to deviceInfo->IgnorePathVerify = FALSE; ReportTargetPortGroups (RTPG) can be used to implement PathVerify for ALUA storage The DSM will likely be sending down RTPG during DsmInquire Do NOT send it again for the first PathVerify received after SetDeviceInfo is called

Retry the PR command down the same path for retry-able errors Avoid Preventing Cluster Disk Resource Coming Online ULONG DsmCategorizeRequest(...) { if (DsmpReservationCommand(Irp, Srb)) return DSM_WILL_HANDLE; ... } NTSTATUS DsmSrbDeviceControl(...) { if (opCode == SCSIOP_PERSISTENT_RESERVE_OUT) ( status = DsmpPersistentReserveOut(...); NTSTATUS DsmpPersistentReserveOut(...) { if (serviceAction == RESERVATION_ACTION_RESERVE) { __RetryRequest: status = DsmSendRequest(...); if (!NT_SUCCESS(status) { if (Srb->SrbStatus & SRB_STATUS_AUTOSENSE_VALID && Srb->SrbStatus & SRB_STATUS_ERROR && Srb->ScsiStatus == SCSISTAT_CHECK_CONDITION) { // check if the error is retry-able if (DsmpShouldRetryPRcommand(senseData)) { goto __RetryRequest; It is possible that a Persistent Reservation (PR) command (like PR_OUT for Register) is failing with a “retry-able” error Retry the PR command down the same path for retry-able errors

Ensuring DSM Can Be Uninstalled Using MPIOCPL ... [Contoso_Install.Services] AddService=contosodsm,%SPSVCINST_ASSOCSERVICE%,Contosodsm_Service [Contosodsm_Service] AddReg = Contosodsm_Addreg [Contoso_Addreg] HKR, Parameters, DsmSupportedDeviceList, %REG_MULTI_SZ%,\ "Vendor 8Product 16" ; The following cannot be grouped (as above) HKLM, SYSTEM\CurrentControlSet\Control\MPDEV,\ MPIOSupportedDeviceList, %REG_MULTI_SZ_APPEND%, "Vendor 8Product 16" ; Uninstall Section [DefaultUninstall] DelReg = Contosodsm_Delreg [DefaultUninstall.Services] DelService = contosodsm [Contosodsm_Delreg] HKLM, SYSTEM\CurrentControlSet\Control\MPDEV, MPIOSupportedDeviceList, %REG_MULTI_SZ_DELETE%, "Vendor 8Product 16“ The control panel applet assumes that the DSM provides an Uninstall point in its INF Moreover, the assumption is that the uninstall point is specified as DefaultUninstall

Ensuring DSM Is Presented a Device before MSDSM NTSTATUS DriverEntry(...) { DSM_INIT_DATA dsmInitData; ... // Ensure this DSM is presented the device before MSDSM dsmInitData.Reserved = 0; // Send dsmInitData to mpio.sys via the IOCTL to register. DsmSendDeviceIoControlSynchronous(IOCTL_MPDSM_REGISTER, ...); } <File: CONTOSODSM.INF> [Contoso_Install.Services] AddService=contosodsm,%SPSVCINST_ASSOCSERVICE%,Contosodsm_Service [Contosodsm_Service] AddReg = Contosodsm_Addreg [Contoso_Addreg] HKR, Parameters, DsmSupportedDeviceList, %REG_MULTI_SZ%,\ "Vendor 8Product 16“ ; The following cannot be grouped (as above) HKLM, SYSTEM\CurrentControlSet\Control\MPDEV,\ MPIOSupportedDeviceList, %REG_MULTI_SZ_APPEND%, "Vendor 8Product 16” When registering with MPIO, ensure that the DSM_INIT_DATA Reserved field is set to zero Specify the devices that the DSM supports in the INF, by updating the MPIOSupportedDeviceList

Call To Action Revisit existing DSM WMI classes to determine whether preconfiguration feature needs to be implemented Assess whether any of the performance-related changes can be implemented in your DSM Consider modifying management applications to implement new health WMI classes Implement triggers Implement Version 2 of the classes defined in mpioLBPo.mof Test your storage with inbox MSDSM Encourage adoption of SPC-3 ALUA for your storage

RESOURCES Web Resources Microsoft Storage Technologies - Multipath I/O http://www.microsoft.com/MPIO SCSI Specifications (SPC-3), ratified version http://t10.org/ftp/t10/drafts/spc3/spc3r23.pdf Microsoft Windows Server Failover Clustering (WSFC) http://www.microsoft.com/downloads/details.aspx?familyid=75566F16-627D-4DD3-97CB-83909D3C722B&displaylang=en Windows Management Interface on MSDN http://msdn.microsoft.com/en-us/library/aa394572.aspx Contact Information (for feedback, future feature asks) mpiopm@microsoft.com

Related Sessions Session Day / Time iSCSI Management and Tuning Mon. 5:15-6:15 and Tues. 4-5 Storport Smorgasboard Tues. 4-5 and Wed. 11-12 Developing Highly Available Mulitpath Solutions and Device-Specific Modules Wed. 1:30-2:30

Questions?