Module 7: Server Cluster Maintenance and Troubleshooting
Overview Cluster Maintenance Troubleshooting Cluster Service
Cluster Maintenance Backup Restoring the First Node Restoring Cluster Disks Restoring the Second Node Evicting a Node
Backup Backing Up the System State Backing Up the Local Disk Backing Up the Cluster Disk
Restoring the First Node Steps For Restoring a Server Cluster: Restore the first node Restore the cluster disks Restore the second node Perform node testing
Restoring Cluster Disks Restoring Disk Signature Files Restoring the Data on the Cluster Disk Restoring the Cluster Configuration Files
Restoring the Second Node Restoring the Remaining Node(s) of a Cluster Perform Node Testing
Evicting a Node Steps for Evicting a Node Back up both nodes Verify backup Move all groups to the remaining node Stop Cluster service on the node to be removed Evict the node Unplug the server from the shared bus
Troubleshooting Cluster Service Troubleshooting Tools Examining the Cluster Log Troubleshooting Network Communications SCSI Configuration Problems Group and Resource Failures Quorum Log Corruption
Troubleshooting Tools Disk Manager Task Manager Performance Monitor Network Monitor Dr. Watson Services Snap-in
Examining the Cluster Log Copy of cluster - Wordpad Creates a new cluster group 000003b8.000003b4::2000/10/02-19:44:12.946 [CS] Cluster Service started – Cluster Node Vers 000003b8.000003b4::2000/10/02-19:44:12.946 OS Version 5.0.21 000003b8.000002f0::2000/10/02-19:44:12.957 [CS] Service Starting… 000003b8.000002f0::2000/10/02-19:44:13.007 [EP] Initialization… 000003b8.000002f0::2000/10/02-19:44:13.057 [DM]: Initialization 000003b8.000002f0::2000/10/02-19:44:13.097 [DM]: Loading cluster database form D:\WINNT\clu 000003b8.000002f0::2000/10/02-19:44:13.397 [DM] DmpStartFlusher: Entry 000003b8.000002f0::2000/10/02-19:44:13.397 [DM] DmpStartFlusher: thread created 000003b8.000002f0::2000/10/02-19:44:13.427 [NM] Initializing… 000003b8.000002f0::2000/10/02-19:44:13.427 [NM] Local node name = SERVER1. 000003b8.000002f0::2000/10/02-19:44:13.427 [NM] Local node ID = 1. 000003b8.000002f0::2000/10/02-19:44:13.427 [NM] Creating object for node 1 (SERVER1) 000003b8.000002f0::2000/10/02-19:44:13.437 [NM] Initializing networks. 000003b8.000002f0::2000/10/02-19:44:13.447 [NM] Initializing network interfaces. 000003b8.000002f0::2000/10/02-19:44:13.788 [NM] Initializing complete. 000003b8.000002f0::2000/10/02-19:44:13.848 [NM] Starting worker thread… 000003b8.000002f0::2000/10/02-19:44:13.848 [API] Initializing 000003b8.000002f0::2000/10/02-19:44:13.848 [FM] Worker thread running 000003b8.000002f0::2000/10/02-19:44:13.878 [LM] :LMInitialize Entry. 000003b8.000002f0::2000/10/02-19:44:13.878 [LM] :TimerActInitialize Entry. 000003b8.000002f0::2000/10/02-19:44:13.878 [CS] Service Domain Account = clusservice@mocmoc 000003b8.000002f0::2000/10/02-19:44:13.878 [CS] Initializing RPC server. 000003b8.000002f0::2000/10/02-19:44:14.038 [INIT] Attempting to join cluster MYCLUSTER 000003b8.000002f0::2000/10/02-19:44:14.048 [JOIN] Spawning thread to connect to sponsor 10. 000003b8.000002f0::2000/10/02-19:44:14.048 [JOIN] Spawning thread to connect to sponsor 169 File Edit View Insert Format Help The IDs of the process and thread issuing the log entry timestamp event description In the above slide, when I printed this page in hard copy, the two bottom callouts are really difficult to read in hard copy. The box is transparent, so “The Ids of the process and thread…etc” are printed on top of the cluster log, and you can see the list of numbers underneath the boxes. I would try to have the graphic artist fix this.
Troubleshooting Network Communications Troubleshooting Node-to-Node Communication Verify RPC Communication’s Verify Cluster Heartbeats Troubleshooting Client-to-Node Communications Check NetBT Cache with Nbtstat Ping IP Address WINS Static Mappings
SCSI Configuration Problems SCSI Controllers SCSI Terminiation SCSI Cabling
Group and Resource Failures Cluster Administrator – [MYCLUSTER (MYCLUSTER)] File View Window Help For Help, press F1 MYCLUSTER Groups Cluster Group Mygroup SQL Group Resources Cluster Configuration SERVER1 SERVER2 Name State Owner Reso Cluster IP Address Online SERVER2 IP Ad Cluster Name Online SERVER2 Netw Disk W: Online SERVER2 Physi Printer Spooler Online SERVER2 Print Public Failed SERVER2 File S NUM
Quorum Log Corruption Reset the Quorum Log Clussvc –debug -resetquorumlog Delete the Quorum Log -noquorumlogging
Lab A: Cluster Maintenance
Review Cluster Maintenance Troubleshooting Cluster Service