Presentation is loading. Please wait.

Presentation is loading. Please wait.

Best Practices for Implementing Unicenter NSM r11 in an HA MSCS Environment Part II -Last Revision April 24, 2006.

Similar presentations


Presentation on theme: "Best Practices for Implementing Unicenter NSM r11 in an HA MSCS Environment Part II -Last Revision April 24, 2006."— Presentation transcript:

1 Best Practices for Implementing Unicenter NSM r11 in an HA MSCS Environment Part II -Last Revision April 24, 2006

2 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 2 Agenda -This presentation will cover the following topics: -Agent Technology -Management Command Centre (MCC) -Job Management Option (JMO) -Event Management -Interoperability -Failback -Uninstallation -Unicenter Desktop & Server Management (DSM) -FAQs

3 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 3 Disclaimer -Although Unicenter NSM r11 supports other vendor clusters for High Availability, this presentation only focuses on Microsoft Cluster (MSCS). -MSCS supports more than 2 server nodes, however, the concepts that apply to 2 node clusters in this presentation also apply to multiple server node clusters. -The topics and procedures provided in this presentation pertain to Unicenter NSM r11 which uses an Ingres based MDB -MS-SQL based MDBs are supported in Unicenter NSM r11.1 only. Best practices for r11.1 are provided in a separate presentation.

4 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 4 References -For additional information, review “Appendix A: Making Components Cluster Aware and Highly Available” in the Unicenter NSM r11 Administrator Guide

5 Agent Technology

6 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 6 DSM IP Address Scoping -For Cluster Nodes, update DSM IP Address Scope from LOCALHOST or real cluster nodename to Cluster Name. For example: -If real node names = I14YClust1, I14YClust2 -And cluster name = I14YCluster -Update DSM Server to I14YCLUSTER -If real nodes are specified, DSM will not manage those hosts.

7 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 7 DSM IP Address Scoping Default of LOCALHOST will not work in HA. Change this to your cluster name or add another DSM Server entry for your cluster name. In this example, “I14YCLUSTER” is the cluster name

8 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 8 World View objects -DSM runs under Virtual Node. -Agents will run on both nodes -dsmMonitor World View object will be displayed as “critical” on the inactive nodes as the dsmMonitor only runs on the active node.

9 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 9 Classic GUI Display of Cluster Nodes

10 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 10 Remote DSM -Remote DSM may connect to the available HA MDB. This DSM may not be HA -aws_dsm will retry the connection until MDB is available on the new active node -When MDB is available after failover, the DSM will reconnect

11 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 11 Remote DSM After Failover -The following shows a remote DSM re-connecting to an HA MDB after failover

12 Management Command Center (MCC)

13 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 13 MCC -MCC is not installed on HA server -MCC can be installed on the remote servers and can connect to the cluster’s virtual node.

14 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 14 Global Catalog -In an HA setup, AIS Catalog is created on the shared disk -Catalog is shared by all cluster nodes -Address spaces in the catalog for cluster name and not real nodes -MCC Client uses virtual node

15 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 15 MCC Client MCC Client connects to a Virtual Node name and not real node names

16 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 16 Failover Considerations -During failover, active remote MCC clients may be connected to virtual cluster node -As part of HA concept, cluster will failover to another cluster node -The MCC client will detect the failover and reconnect as the active node has changed. It may issue rollback message and then reconnect

17 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 17 Classic GUI -Classic 2D map GUI connects to Cluster Name and thus eliminates the need to know the active node

18 Job Management Option (JMO)

19 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 19 JMO -If JMO Agent is installed and active, update JMO option to move checkpoint to the shared disk -Identify shared disk where the following directory is created by the install process: - \Program files\ca\SharedComponent\CCS\WVEM Note: This must be on the shared disk and not local disk

20 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 20 Shared Disk: -This shows shared disk cluster resource

21 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 21 Shared Disk -Create TMP subdirectory as shown

22 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 22 Update JMO – Temp Directory Option Default location for checkpoint file

23 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 23 Update JMO – Temp Directory Option -To update option from command line enter cautenv setlocal CAISCHD0008

24 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 24 Update JMO – Temp Directory Option -Repeat CAISCHD0008 change on all cluster nodes -Stop and start Unicenter service to select the changes. This should create a checkpoint file on shared disk, which can then be shared by all cluster nodes

25 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 25 Station -Station is automatically defined for cluster name. (In non-HA mode, this is defined as real node.) -This enables job definitions to be shared across all nodes of the cluster

26 Test1 -JMO - HA Manager

27 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 27 Test1 - Failover Test Plan -Define a jobset as follows: -Station – Remote Node (job will be submitted to JMO Agent running on a non-cluster node) -Define a long running job with Sleep value of 15 mins -Define second job which is dependent on previous Sleep job -While the Sleep job is active, move the group over to simulate failover of the Workload Manager (JMO). -This should move JMO (Manager) to another cluster node. -Review the status of this job and dependent job on the new active node

28 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 28 Test 1 – Active Node

29 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 29 Test 1 – Job Definition HAJTestB job is dependent on HAJTestA job

30 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 30 Test1 – Demand Job HAJTestB job waiting on HAJTestA job to complete

31 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 31 Test1 – Simulate Failover -Move Group I14YClust2 is now new active node

32 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 32 Test1 – Job Status after Failover -This shows the status of the job correctly displayed after failover

33 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 33 Test 1 – Job Completion -The job completes after failover. The dependent job starts and the jobset status changed to completed

34 Test 2 -HA JMO Agent

35 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 35 Test1 – Agent Failover Test Plan -JMO Manager running on non-cluster node -Submits a job to JMO Agent running as HA -HA Agent node fails over -Review Job status after failover of HA JMO Agent

36 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 36 Test 2 – Station Definition -Station Node Name for HA Agent is defined as Cluster Name. This eliminates the need to know the active node.

37 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 37 Test 2 – Submit Job -HAJTestB dependent on HAJTestA

38 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 38 Test 2 – Simulate Failover -This simulates failover of the HA Agent Node Failing Node New Active Node

39 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 39 Test 2 – Simulate Failover -Station not reachable for short period while failover takes place

40 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 40 Test 2 – Failed Node -Workload Agent service stopped on the failed node -The server did not crash as it was application failure. Job continues to run on failed server node

41 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 41 Test 2 – Active Node -When JMO Agent is started on the new active node, it syncs status with JMO Manager -Active job flagged as aborted

42 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 42 Test 2 – Active Node -Checkpoint file synchronized with Job Manager from the new active node

43 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 43 Checkpoint file -JOBTERM issued due to failover -Node Name=Cluster Name which permits failover

44 Event Management

45 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 45 HA Environment Variables -CA_OPR_MONITOR_STATE -Specifies whether the Event Management Daemon track actions that it is in the middle of processing. The default is Yes. -CA_OPR_MONITOR_INTERVAL -Specifies the interval, in seconds, for saving the Event Management state table into a flat file. The default is 30 seconds.

46 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 46 CA_OPR_MONITOR_STATE Defaults to Yes in HA mode. For non-HA install, default is NO

47 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 47 CA_OPR_MONITOR_INTERVAL

48 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 48 Log Files -Unicenter Event Management log files reside on shared disk -Shared by all cluster nodes. For example: -NodeA is active and Event Management Daemon running on NodeA will be writing to the log file -NodeA fails and NodeB now is active -NodeB will continue to write to the same log file used by NodeA and will also contain events from NodeA

49 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 49 Windows Events -Unicenter will be running on the active node only. Thus, Event Management will be running on the active node only -In a cluster environment, Microsoft forwards all Windows Events from all cluster nodes to the active node -Unicenter Event Management MRA can process Events from other nodes as they are forwarded by Microsoft to the active node. However, as Event Management is not running on other nodes, MRA node cannot be specified for non-active node

50 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 50 MRA - Node -When defining MRA, do not specify real node name in Node field Must not use real node name of the cluster

51 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 51 Windows Event from Non-Active Nodes -Windows events generated on a non-active node are written to Unicenter Event log This shows CA Event is not running on non-active node

52 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 52 Windows Event from Non-Active Nodes -Windows events generated on a non-active node are written to the Unicenter Event log

53 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 53 Windows Event from Non Active Nodes

54 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 54 Test 1 – HA MRA -Main objective: to demonstrate the MRA failover -By default, MRA that were active at the time of failover will continue after failover -After failover, the actions following the active MRA will continue on the new node. -If this feature is not required, Event State monitor option should be set to NO

55 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 55 Test1 – MRA Failover Tasks -Define 2 Message Record Actions. -One with “Delay” of 5 mins -One with “HIGHLITE”. -Generate an event to trigger above MRA -Wait for 30 seconds to get STATE_SAV updated -While waiting on “Delay” action to complete, simulate failover -Verify if the HIGHLITE message is displayed on the new active node

56 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 56 Test1 – Define MRA -MRA sequence 30 will wait for 5 minutes. -After 30 seconds, verify STATE table is updated -Simulate failover -Verify subsequent actions executed on new active node

57 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 57 Test1 – STATE Table -State table at start

58 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 58 Trigger MRA

59 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 59 Review STATE Table -This shows STATE table has been updated to log HAFAIL – test1 MRA

60 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 60 After Failover – New Active Node -Restarting message id is displayed. It will then continue with the remaining actions (e.g., after “Delay”)

61 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 61 After Failover -Actions after Delay executed on new active node

62 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 62 Test2 -Event Variables in the HA setup -This shows some of Event Variables that are set to cluster name in HA setup -NodeDomain[&NODEDOMAIN] - ClusterName -NodeId[&NODEID] - ClusterName -Nodename[&NodeName] - ClusterName -ComputerName[ $COMPUTERNAME ] - ClusterName

63 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 63 Test2

64 Interoperability

65 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 65 Ingres HA -Ingres can be installed in HA mode and can be shared by other distributed NSM r11 installs. These distributed NSM installs do not have to be installed in HA mode -Non-HA solutions (e.g., Unicenter Management Portal) can be installed on non-cluster environment with HA MDB. -NSM r11 requires MDB to be “locked down.” This means -If MDB is installed by other non-NSM r11 components then it must be configured by NSM install. -If it is NOT configured by NSM install, it cannot be used.

66 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 66 Ingres - Interoperability HA MDB Server MDB installed on Microsoft Cluster as HA Ingres Client UMP Ingres Client NSM Performance

67 Brightstor High Availability (BHA)

68 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 68 BHA vs. MSCS -BHA and MSCS cannot co-exist on the same server. -Choose MSCS if: -Client is already using MSCS for other HA applications -Failover does not occur across different distributed locations -Consider BHA if: -Client has no MSCS -HA required for geographically federated deployment -Require less expensive solution and mean-time-to- recover (MTTR) is not that aggressive

69 Failback -Application Failures

70 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 70 Cleanup -If the failover was the result of application failure, then you should first clean up those processes that may not stop on the failed node BEFORE failback -The two processes to review are: -Sevpropcom -Rmi_server -If these processes are running on failed node, they should be killed prior to failback

71 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 71 Sevpropcom -If sevpropcom process did not stop on the failed node, upon failback severity propagation will not come up cleanly -To avoid this, end the sevpropcom process prior to failback -To determine if sevpropcom is eligible for kill on failback, verify if sevpropcom is running without sevprop. If so, it should be killed prior to failback

72 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 72 Rmi_sever -If the rmi_server process did not stop on the failed node, then stop rmi_server prior to failback -To determine if rmi_server is running, execute rmi_monitor

73 Stop Enterprise Management Subcomponents -unicntrl

74 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 74 Unicntrl stop -In HA setup, Unicntrl stop is not a valid command. Its displays information to issue a stop for all subcomponents or offline the CA- Unicenter Cluster resource -If aws_dsm is running, it will be stopped as the cluster resource is off line

75 Uninstall

76 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 76 Uninstall -Uninstall needs to be performed on all cluster nodes -Data on the shared disk should be removed with the uninstall of NSM from the last cluster node -Uninstall Ingres after NSM has been uninstalled from ALL cluster nodes

77 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 77 Uninstall – Node 1 -Do not remove the shared disk. Select No option and click Finish

78 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 78 Uninstall – Node 2 -Remove shared disk with last cluster node. Select Yes and click Finish. This drops MDB but Ingres is still installed

79 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 79 Ingres -Uninstall Ingres from all cluster nodes -Start with the active node, Move Group and then uninstall from the other cluster nodes

80 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 80 Cluster Resources -Uninstall of NSM r11 from all cluster nodes should remove cluster resources -If uninstall fails or some components are not removed, then you will have to manually remove them -Take extra care to ensure you do not delete other cluster resources. Microsoft Cluster will remove dependent resources when you delete a resource on which other resources are dependent

81 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 81 NSM Cluster Resources

82 Unicenter Desktop Server Management -(Unicenter DSM)

83 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 83 Installing Unicenter DSM in a Cluster Environment -Unicenter DSM is not cluster aware. -If MDB is installed on a different server, then it can be Highly available. -In this case MDB should be installed from NSM media -If MDB is installed from NSM media, it will not create CA_Itrm operating system userid. This will have to be manually created -If failover occurs and MDB is then moved to another cluster node, CAF (Common Application Framework) service running on Remote Server will require a restart or a test fix to correct the problem. -If the fix is not applied or if CAF is not restarted, Unicenter DSM Explorer will not work correctly

84 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 84 Unicenter DSM MDB -If MDB is installed with Unicenter DSM, then the MDB will not be highly available -It can use the existing HA MDB, but other services will not be highly available.

85 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 85 Configuration Server Does not recognize virtual node

86 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 86 Unicenter DSM install with HA NSM -If NSM is installed as HA and Unicenter DSM installed on top of the NSM, it will not be HA.

87 FAQ

88 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 88 UMP -Can Unicenter Management Portal be installed with NSM HA setup? -UMP is not classified as HA. If it is installed prior to NSM, it will be installed in NON-HA mode. -If NSM r11 is installed first in HA mode, it will still be installed as non-HA mode. -UMP can continue to use MDB which is HA

89 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 89 Exchange Agent -We are using 3.x Exchange agent which is cluster aware. How do we integrate this with the r11 NSM HA install? -Exchange is not part of r11. Review Migration Guide for more details -Or wait for UME 11.1 which is currently in Beta status

90 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 90 Event Management Run ID -(not specific to HA mode) If Run id is used to define MRA, ensure that userid has “Logon as a batch job” privilege

91 © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies. 91 Event Management - Runid -To grant “Logon as a batch job” privilege, simply add the user to the TNDUsers security group -If “logon as a batch job” privilege is not granted, Logon Type: 4 failure will be encountered


Download ppt "Best Practices for Implementing Unicenter NSM r11 in an HA MSCS Environment Part II -Last Revision April 24, 2006."

Similar presentations


Ads by Google