WSV309. Agenda What, why, and where to look Summary Other Troubleshooting Items Scenario 2: CSV Troubleshooting Scenario 1: CNO / VCO Recovery Cluster.

Slides:



Advertisements
Similar presentations
WCL211. A specialized Windows product portfolio. Licensing adapted to meet embedded scenarios. Supported by a specialized partner ecosystem Distributors.
Advertisements

SIM201. Announcing… copyright chappellseminars.com some hosts comply; RST = closed no = response open some hosts comply; RST = closed no = response.
WSV304 Manual Deployment High cost Fully Automated Low cost.
Self Assessment COS202 a-Expense.
DBI331. Cube Measure Group Measure Partition Cube Dimension Dimension Attribute Relationship Hierarchy Level Cube Attribute Cube Hierarchy Measure.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.
SIM346. General information about the software application.
DEV314. Entity Data Model demo Entity Data Model.
National Manager Database Services
DBI311. Existing NMS application Agent Existing NMS application Agent 0.5 KB/record 2500 records/ agent 10 agents = 25K r/s (12. 5 MB/s) 25K r/s.
Module 8 Implementing Backup and Recovery. Module Overview Planning Backup and Recovery Backing Up Exchange Server 2010 Restoring Exchange Server 2010.
Gopal Ashok Program Manager Microsoft Corp Session Code: DAT 312.
WCL 319. fast clean trusted interoperable IT friendly.
MID301. App Server 1 App Server 1 App Server 2 App Server 2 App Server 3 App Server 3 DatabaseDatabase Local Store.
EXL302-R. Storage Management Balance mailbox size demands with available storage resources Reduce the proliferation of.PST files stored outside of IT.
Empower Hyper-V Improved Manageability Improved Manageability Continue Simplification Mission Expand Cluster Validation Flexible Migrations.
WSV206. X64 Server $40,000,000$1,000,000$1,000.
Hyper-V High-Availability & Mobility: Designing the Infrastructure for Your Private Cloud Symon Perriman Technical Evangelist Microsoft
VIR314. Understand the scenarios Application support Understand the scenarios Application support Review of the sequencing process Demo Review of the.
demo.
Planning a high availability model Validate and understanding support policies Understanding Live Migration Deployment Planning VM Failover Policies Datacenter.
1 Chapter Overview Monitoring Server Performance Monitoring Shared Resources Microsoft Windows 2000 Auditing.
Ch 11 Managing System Reliability and Availability 1.
Node 1Node 2Node 3Node 4Node 5 DB 1Copy 1 DB 2Copy 1 DB 3Copy 1 DB 4Copy 1 DB 5Copy 1 DB 6Copy 1 DB 7Copy 1 DB 8Copy 1 DB 9Copy.
®® Microsoft Windows 7 for Power Users Tutorial 8 Troubleshooting Windows 7.
VIR309. Installed? What version? Are all of the ICs added to the VMs? Integration Components Supported LPs number on host Number of logical processors.
Chapter 18: Windows Server 2008 R2 and Active Directory Backup and Maintenance BAI617.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Margin Content Padding Border.
DBI326. PhraseGoal “Data Mining”Inform actionable decisions “Machine Learning”Determine best performing algorithm.
Module 7: Fundamentals of Administering Windows Server 2008.
WCL308. (While you’re sitting there, sign up for the GPanswers.com Tip of the Week … (Scan a tag.. Fill out the little form…) and enter to win a copy.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Windows Vista Inside Out Chapter 22 - Monitoring System Activities with Event Viewer Last modified am.
EXL321. Lync 2010 Planning tool+ Planning guides+ * new in LS significant enhancements in LS 2010.
2.

MID305. AppFabric / Host WF Runtime Extensions Tracking Persistence … … Tooling VS Designer VS Debugger Rehosted Designer Workflow Activity Library.
OSP310. What is a SharePoint® Farm? A collection of one or more SharePoint Servers and SQL Servers® providing a set of basic SharePoint.
Chapter 33 Troubleshooting Windows Errors. STOP Errors  When Microsoft Windows XP encounters a serious problem  And the operating system can't continue.
OSP402 Required Slide Track PMs will supply the content for this slide, which will be inserted during the final scrub.
DEV211. The simplest way to create business applications for the desktop and the cloud.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Steven Ekren Senior Program Manager Microsoft Corporation SESSION CODE: WSV314.
Chapter 10 Chapter 10: Managing the Distributed File System, Disk Quotas, and Software Installation.
SIM333 Microsoft Confidential Simplified Management Manage FPE 2010 and FPSP 2010 Server Discovery and Grouping FPSMC agent deployment Deploy.
DBI325. Monitoring Analytics Support will extend to Analysis Services in the Denali release.
DPR301 demo Executable Requirements.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
DEV332. Required Slide Speakers, please list the Breakout Sessions, Interactive Discussions, Labs, Demo Stations and Certification Exam that.
Alwayson Availability Groups
#TEDEV342 A A B B I currently deploy via FTP directly to my host. My deployment is manual because I need to set permissions on the target server.
DEV203. Coded workflows Declarative workflows Web part hook-up Professional developerBusiness Analyst/Process Designer List definitions Event receivers.
Learn more: Download SCM: Join the TechNet Wiki community:
COS308. SQL Azure Database DEMO.
OSP-302. DescriptionUri All lists on a site.../_vti_bin/ListData.svc All Items in a named list.../_vti_bin/ListData.svc/MyList 2nd Item in the list.../_vti_bin/ListData.svc/MyList(2)
Module 6: Administering Reporting Services. Overview Server Administration Performance and Reliability Monitoring Database Administration Security Administration.
DEV348. demo Valid HTML5 Syntax demo.
Module 14: Advanced Topics and Troubleshooting. Microsoft ® Windows ® Small Business Server (SBS) 2008 Management Console (Advanced Mode) Managing Windows.
DEV325. ODATA Service metadata demo.
WCL301. demo Basic Custom XML-file.
SIM336 Services Cloud Deployment Fabric Hyper-V Bare Metal Provisioning Hyper-V, VMware, Citrix XenServer Hyper-V, VMware, Citrix XenServer Network.
DBI317. Enhanced Usability Improved Deployment, Configuration and Management Improved deployment, configuration and management of SSIS projects  SSIS.

DEV355 Jack Swigert demo my wife demo.
DEV353. Required Slide Speakers, please list the Breakout Sessions, Interactive Discussions, Labs, Demo Stations and Certification.
Troubleshooting Tools
Examining the Cluster Log
Chapter 2: System Structures
Presentation transcript:

WSV309

Agenda What, why, and where to look Summary Other Troubleshooting Items Scenario 2: CSV Troubleshooting Scenario 1: CNO / VCO Recovery Cluster Validate

Agenda What, why, and where to look Summary Other Troubleshooting Items Scenario 2: CSV Troubleshooting Scenario 1: CNO / VCO Recovery Cluster Validate

New Validation Tests in R2 Cluster Configuration List Information (Core Group, Networks, Resources, Storage, Services and Applications) Validate Quorum Configuration Validate Resource Status Validate Service Principal Name Validate Volume Consistency Network List Network Binding Order Validate Multiple Subnet Properties System Configuration Validate Cluster Service and Driver Settings Validate Memory Dump Settings Validate OS Installation Options Validate System Driver Variable

Validate: Storage

Validate Tips

Agenda What, why, and where to look Summary Other Troubleshooting Items Scenario 2: CSV Troubleshooting Scenario 1: CNO / VCO Recovery Cluster Validate

Powershell

Where to find Cluster events

Operational Channel

New Diagnostic Logging Capture snap-in pop-up’s o Even before cluster creation New debug logging channels o Disabled by default o Enabled for advanced troubleshooting Cluster.log converted to an ETW channel, now appears in Event Viewer as well Tip: Be sure to click on View / Show Analytic and Debug Logs

Understanding Cluster Events Every Cluster event edited with improved descriptive text and error codes Online troubleshooting steps for all cluster events:

Viewing Events Cluster Wide Failover Cluster Manager provides an aggregated view of cluster events from all nodes. Click “Recent Cluster Events” to see all Error and Warnings Cluster wide in the last 24 hours.

Built-in Event queries On the right hand ‘Actions’ pane in Failover Cluster Management there are links to open filtered events Application Level Events associated with all resources in the group Resource Level Events related to that specific resource

Troubleshooting Tips

Cluster Debug Logging All Cluster debug logging done to an event trace session: Microsoft-Windows-FailoverClustering No longer is there a Cluster.Log file being written to. Must manually generate to get a “snapshot in time”.

Configuring Debug Logging Logging enabled by default Log files stored as.ETL in: %WinDir%\System32\winevt\logs\Microsoft-Windows-FailoverClustering Default log size is 100 MB Set-Clusterlog –Size 100 Default log level is 3 Set-Clusterlog –Level 3 Cluster Output Levels LevelErrorWarningInfo VerboseDebug 0 (disabled ) 1  2  3  4  5  Can have performance impact Default

How it works An ETL file lasts for the uptime of a node A new ETL file is used each time you restart the node o When you restart, you move on to the next file. After you have restarted 3 times you return back to the first file. Each ETL has a log size of 100 MB and will wrap on themselves, but only within their own log Cmdlet will merge all the.ETL logging data into a single contiguous text file Get-ClusterLog o The output can be confusing and a common question on where the data went aspx ETL.001 ETL.002ETL.003 Reboot

Troubleshooting Tips The cluster log is verbose and complex! o It should be the last place you go, not the first Make sure your cluster.log captures at least 72 hours of data o Mileage will vary depending on how noisy apps are Cluster log timestamps are in GMT, while event log timestamps are in local time Start at the bottom and work your way upwards searching for: o[ERR] o-->failed Use NET HELPMSG to decipher error codes

Agenda What, why, and where to look Summary Other Troubleshooting Items Scenario 2: CSV Redirected Troubleshooting Scenario 1: CNO / VCO Recovery Cluster Validate

What you need to know

demo CNO / VCO Recovery

Troubleshooting Tips

Agenda What, why, and where to look Summary Other Troubleshooting Items Scenario 2: CSV Troubleshooting Scenario 1: CNO / VCO Recovery Cluster Validate

CSV in action VHD SAN Connectivity Failure I/O Redirected via network Coordination Node VM running on Node 2

What you need to know Possible Causes: One or more nodes have lost direct connection to the SAN/LUN CSV aware backup is in progress Manually put into “Redirected access”

demo Troubleshooting Redirected Access

demo Troubleshooting hanging CSV accessibility

Troubleshooting Tips

Agenda What, why, and where to look Summary Other Troubleshooting Items Scenario 2: CSV Troubleshooting Scenario 1: CNO / VCO Recovery Cluster Validate

Troubleshooting RHS Terminations How clustering deals with unresponsive resources 1. RHS makes calls to resources (IsAlive, LooksAlive, Online, Offline, Terminate, etc…) 2. If that resource does not respond, Cluster health detection attempts to recover 3. The RHS process is restarted, so the resource can be restarted Events Generated Event 1230 Cluster resource 'Resource Name' (resource type '', DLL ‘xxx.dll') either crashed or deadlocked. The Resource Hosting Subsystem (RHS) process will now attempt to terminate, and the resource will be marked to run in a separate monitor. Event 1146 The cluster resource host subsystem (RHS) stopped unexpectedly. An attempt will be made to restart it. This is usually due to a problem in a resource DLL. Please determine which resource DLL is causing the issue and report the problem to the resource vendor.

Troubleshooting RHS Terminations (cont) The problem is that the resource did not respond to a Cluster call within the timeout period. What was the resource trying to do? Look for underlying core failures / events Physical Disk… look for storage issues Network Name… look for networking issues See these blogs for more details: rhs-in-windows-server-2008-failover-clusters.aspxhttp://blogs.technet.com/askcore/archive/2009/11/23/resource-hosting-subsystem- rhs-in-windows-server-2008-failover-clusters.aspx

User Mode Problems Caught by Cluster Bugcheck: USER_MODE_HEALTH_MONITOR (9e) Clustering conducts health monitoring from kernel mode to a user mode process to detect when user mode becomes unresponsive or hung. To recover from this condition, clustering will bugcheck the box. This is configurable via the following property. PS C:\> Get-Cluster | fl ClusSvcHangTimeout, HangRecoveryAction ClusSvcHangTimeout : 60 HangRecoveryAction : 3 ClusSvcHangTimeout = This property controls how long we wait between heartbeats before determining that the Cluster Service has stopped responding. HangRecoveryAction = This property controls the action to take if the user-mode processes have stopped responding. 0 = Disables the heartbeat and monitoring mechanism. 1 = Logs an Event ID: 4870 in the System Event Log. 2 = Terminates the Cluster Service. 3 = Causes a Stop error (Bugcheck) on the cluster node.

User Mode Problems Caught by Cluster (cont) This is not a Cluster problem, Cluster is reporting a problem. Check memory.dmp for evidence of what caused the hang, like locks, memory, handles, etc See this blog for more details: Why is my 2008 Failover Clustering node blue screening with a Stop 0x E? failover-clustering-node-blue-screening-with-a-stop-0x e.aspx

Check WMI Very common error is due to WMI being offline Create Cluster, Add Node, Migration To test if WMI is online 1. From a remote server PS > get-wmiobject mscluster_resourcegroup -computer W2K8-R2-NODE1 -namespace "ROOT\MSCluster“ If an error is returned, must re-enable WMI by rebooting If that doesn’t work try: Stop WMI service to ensure that dependent services are stopped Start WMI service again PS > winmgmt /salvagerepository 2. Directly on the node/machine CMD > Wbemtest Select: root\mscluster Use authentication level: Packet Privacy Select ‘query’ and type: SELECT * from MSCluster_Resource

Performance Counters Some components in the Cluster deal with lots of calls or traffic going through them and some buffer information in memory before it can get processed. We have added performance counters to several such components.  Cluster API Calls  Cluster API Handles  Cluster Checkpoint Manager  Cluster Database  Cluster Global Update Manager Messages  Cluster Multicast Request-Response Messages  Cluster Network Messages  Cluster Network Reconnections  Cluster Resource Control Manager  Cluster Resources  Cluster Shared Volumes

Agenda What, why, and where to look Summary Other Troubleshooting Items Scenario 2: CSV Troubleshooting Scenario 1: CNO / VCO Recovery Cluster Validate

Validate, Validate, Validate. Use it for troubleshooting. Use it for best practices. Use it when changes are made to your system. Since we are reliant on active directory objects, protect yourself. Enable the Recycle Bin in AD, protect the objects from accidental deletion. Everything is headed in the Powershell direction. Invite her in and can be a good friend. When troubleshooting, take a step back and look at everything that can be affected. Then start narrowing your focus. Failover Cluster is designed to detect, recover from, and report problems. The fact that the cluster is telling you there is/was a problem does not mean the cluster caused it. Don’t shoot the messenger……… Summary

Required Slide Speakers, please list the Breakout Sessions, Interactive Discussions, Labs, Demo Stations and Certification Exam that relate to your session. Also indicate when they can find you staffing in the TLC. Related Failover Cluster Content

Required Slide Track PMs will supply the content for this slide, which will be inserted during the final scrub. Failover Cluster Resources

Sessions On-Demand & CommunityMicrosoft Certification & Training Resources Resources for IT ProfessionalsResources for Developers Connect. Share. Discuss.

Scan the Tag to evaluate this session now on myTechEd Mobile