Download presentation
Presentation is loading. Please wait.
1
BMC ProactiveNet Performance Management v8
BMC ProactiveNet Performance Management v8.6 Best Practice Deployment Session 1 1
2
Overview First Level Training - Basic Deployment Knowledge
Best Practice vs. How To Covers Core BPPM Components Does not address every scenario Prior knowledge of BPPM components and terms This training does not cover every scenario. It is basic training. Best Practice provide guidance on What to do and not do, much more than How to do something. Best Practices are focused much more on decisions than how to execute actions. General examples: Step-by-step procedures for configuring a PATROL Agent to send availability events from PATROL up to the BPPM server is not a best practice. Sending availability events from PATROL agents to the BPPM server instead of trending availability parameters in the BPPM server is a best practice. The method used to send events from PATROL agents to the BPPM server is also a best practice. Some details of Best Practices are not necessarily mandatory requirements. 2 2
3
Architecture Examples BPPM Database Considerations
Agenda Phased Deployment Basic Architecture Architecture Examples BPPM Database Considerations BPPM Application Servers PATROL Data Collection Firewalls & Protocols Questions & Feedback This session and the next session are focused on Deployment. The number of components to deploy is based on Scalability. Scalability will be discussed in detail in future sessions. We decided to present deployment first because we think the Scalability sessions will make more sense if you first understand the deployment concepts that you need to scale for. 3 3
4
Phased Deployment – Customer First
BSM involves many moving parts and dependencies People Process Products Understand the environment first Business Technical Design and document initial plan first Deployment depends on Implementation Architecture Implementation Architecture depends on Scalability This session is focused on Deployment Deployment must be completed in phases. Do Not “Boil the Ocean” You must plan You must consider all dependences Implementing a BSM oriented solution like BPPM involves many moving parts and dependencies. It must be planned and dependencies must be considered. Some of the dependences are technical, some are business oriented, some are product related, some are process related, and some are people related. The order of importance is listed here. Nothing happens without people. Processes support people. Products support processes. Often one of the biggest mistakes made is focusing on the technology before clearly defining the business needs and overall technical goals to support those needs. A key step is to understand the environment that will be managed as much as possible before discussing or planning the actual implementation/deployment. We should understand the environment and business needs completely first. From that information we should define and document an initial plan before installing any software. After we determine and document the initial plan we can then review and edit it based on feedback. 4 4
5
Phased Deployment – Major Considerations
Drivers Business Critical apps to be monitored Other Projects Time Lines Technical What is installed vs. not installed CMDB? ITSM? Environment Readiness Hardware Technology to be monitored Access Choose initial App(s) with greatest potential for success People Vacations, other assignments, etc Training (AO, Service Modeling, etc.) Concurrence Business parameters drive importance and order of implementation Technical and people topics drive order and timeline for implementation. 5 5
6
Phased Deployment – Recommended Order
Phase 1 – Implementation Architecture Determine and document implementation architecture Depends on Scalability Must be done first for every project big or small Phase 2 – Deploy Central Management infrastructure and integrations BMC Atrium CMDB Atrium Orchestrator ADDM BMC Remedy ITSM BMC ProactiveNet Server BMC PATROL Consoles IBRSD TM ART Central Server Order and content will vary depending on the project Some order here is flexible, some is not. For example determining the implementation architecture fist is not optional. Configuring CMDB before BPPM vs. the opposite is optional. 6 6
7
Phased Deployment – Recommended Order
Phase 3 – Deploy monitoring agents, data collectors, and Integration Services BMC PATROL Agents BMC TM ART Workbench & Execution Servers Tip: Do not configure Integration Services in this phase Phase 4 – Configure Data Collection Configure availability monitoring first Configure event processing for availability monitoring second Configure collection of trended performance data third Phase 5 – Configure Integration Services Configure trended data integration from BMC PATROL to BMC ProactiveNet Server. Phase 6 – Identify and Configure KPIs Collect data for a period of at least two weeks Configure analytics thresholds for performance KPIs Application response times should be the first performance metrics configured Phase 7 – Configure Service Modeling 7 7
8
Phased Deployment – Last Key Points
Always keep the Business in mind Solution Acceptance / Perception Work to show and prove basic value early Availability Monitoring Application Response Times Major Mistakes to Avoid Incomplete Availability Monitoring Inaccurate / unknown state Perceived solution failure Incomplete Application Response Time Monitoring Service Model is green while users report the app is slow All availability metrics must be monitored first. Do not publish a business application CI to the service models until you have an application response time metric already trended for it with historical data and thresholds set for the related KPI(s). 8 8
9
Terminology Device A monitored instance
Examples: Server, Database, Application, Middleware, etc Integration Service Node A dedicated server acting as a gateway for data and event collection/consolidation Supports multiple processes Integration Service Event Management Cell ProactiveNet Agent Adapters (event & data) Event Forwarder Process (BII4P3) PATROL Notification Server (PATROL Event Consolidator) Parameter A monitored data point Can be availability or performance data Examples: Total CPU utilization, Process Status (up/down), Memory Usage Synonyms: Attribute, Metric Various components were added to the solution by acquisition. Each acquisition included various terms. Documentation carried forward has included these different terms. Understanding the terminology as you read detailed product documentation is critical. 9 9
10
Basic Architecture – Data Processing Components
Core Components BPPM Application Server *D BPPM Database *D Sybase – Application Server node Oracle – Separate node Integration Service Node *D ProactiveNet Agent * Integration Service * Event Management Cell * Event Adapters * BII4Patrol * PATROL Notification Server * PATROL Agents Remote Monitoring *D Virtual Server KM *D IBRSD *P Extended Components (deploy when necessary) Distributed Event Cells *P Distributed Event Adapters *P Distributed Impact Administration Server *P PATROL Console Server *D PATROL RT Server *P TM ART Central Server *D TM ART Database *D TM ART Workbench *D TM ART Execution Server *D BMC Capacity Optimization Atrium Orchestrator * A component on the standard Integration Service node *D – Should be installed on dedicated nodes. *P – May be installed on non-dedicated nodes. 10 10
11
Basic Architecture – Consoles & Administration
Core Components BPPM Application Server Web console (BPPM app server) *D BPPM Administration Java console *P PATROL Configuration Manager *P PATROL Classic Console *P Optional Components (deploy when necessary) PATROL Operations Windows Central Console (installed on desktops) Central Web Console *D Impact Explorer *P PATROL Distribution Server *D *D – Should be installed on dedicated nodes. *P – May be installed on non-dedicated nodes. NOTE: The Distribution Server is being replaced by the repository capability. 11 11
12
Basic Architecture – Dedicated Nodes
Core Components BPPM Application Server Integration Service Nodes Data Collection Scenarios PATROL Agents for remote collection Virtual Server Monitoring Large Scale Event Processing (for example central SNMP trap collection) Extended Components (deploy when necessary) PATROL Central Console Server PATROL Central Web Server TM ART Central TM ART Workbench TM ART Execution Servers PATROL Distribution Server Although not technically required these components should always be installed on dedicated nodes. 12 12
13
Basic Architecture – Server Nodes
13 13
14
Architecture Examples – Mapping Business Small Environment
14 14
15
Architecture Examples – Mapping Business Small Environment
Performance Reporting – BMC Performance Manager Reporting 15 15
16
Architecture Examples – Mapping Business Small Environment
1000 Servers 80% RedHat enterprise Linux (5.x) 15% Windows 2008 5% Solaris File system capacity metrics on 3-5 logical mount points per server. Basic OS monitoring of all servers is needed. Custom scripts to monitor 4 Custom App servers leveraged by PATROL (10 metrics) 20 JBoss instances need monitoring via JMX. Zones DMZ ~ 50 servers (5%) Internal ~ 950 servers Connectivity / Latency Connectivity between datacenters and within datacenters is reliable, provides low latency and high bandwidth. Number of concurrent users 10 super users on average 16 16
17
Architecture Examples – Healthcare Company Large Environment
17 17
18
Architecture Examples – Healthcare Company Large Environment - PATROL
18 18
19
Architecture Examples – Healthcare Company Large Environment - BPM Reporting
19 19
20
Architecture Examples – Healthcare Company Large Environment
Datacenter locations North America Data Center 1 – Central US Data Center 2 – Eastern US Europe Data Center 3 Data Center 4 Asia-Pacific Data Center 5 - Australia Data Center 6 – Japan Network zones and remote locations Larger business units have about servers in regional sites. Medium sized business units have about servers in regional sites. Small business unites have about No hardware is to be installed in regional sites and would like to stick with infrastructure in data centers only. Except for the DMZ, the network can be considered as a large flat network without concern for secure zones regarding the monitoring and management solutions. The DMZ contains approximately 200 nodes in different locations. 20 20
21
Architecture Examples – Healthcare Company Large Environment
Approximate Nodes per Datacenter Data Center 1 – 5500 Data Center 2 – 1750 Data Center 3 – 2100 Data Center 4 – 500 Data Center 5 – 1000 Data Center 6 – 500 Connectivity / Latency between regional sites and data centers Connectivity is slower in Europe and Asia Pacific but MPLS is global. Total monitored parameters = ~4,200,000 Number of concurrent users per console BMC Performance Manager Reporting - 10 BMC ProactiveNet Performance Manager (BPPM) trends – 75 BPPM Event & Impact Management views - 70 (Figuring 10 per data center and 10 for Engineering/Operations) Patrol Central - 75 BMC Capacity Optimization (BCO) - 10 21 21
22
BPPM Database Considerations
Sybase Installed with the BPPM App Server Cannot be installed on separate node Use if: Oracle License is not available No Oracle DBA is available Robust Database availability is not required Small & medium environments Oracle Must be installed on a separate node from the BPPM App Server Must be a dedicated Oracle Instance Requires Oracle License Large environment Oracle License is available Customer has Oracle DBA expertise Oracle is the standard Robust database availability is required 22 22
23
BPPM Database Considerations - Oracle
Use Oracle RDBMS v Create at least two BMC ProactiveNet users. one for data storage one for data views Consider a third “backend user” for issues like locked accounts Physically co-locate the BPPM App Server and the DB Server on the same subnet. The backup and restore process must be executed by BMC ProactiveNet users. Use BMC Database Recovery Management or an Oracle tool such as RMAN. Enable archive logging. Recommended initial database size (tablespace size) Small Deployments – 15 GB Medium and large deployments – 30 GB Use Oracle RAC for High Availability Use Oracle Data Guard for Disaster Recovery Use Oracle Storage Area Network (SAN). 23 23
24
BPPM Application Servers
Single BPPM Server Multiple BPPM Server Scalability Considerations Environment Size Environment Segregation Geographic Secure Zones Multi Tenancy & Access Control Install in a high-speed SAN Especially important for Sybase implementations Follow SAN vendor best practices Implementation Architecture Physical where solution components are physically deployed how the management nodes are physically connected Logical depicts data collection performance data flows event data flows 24 24
25
BPPM Application Servers – Physical Architecture Single Server
25 25
26
BPPM Application Servers – Single BPPM Server Logical Architecture
The blocks in this diagram DO NOT represent separate machines. They represent segregation of performance and data collection/processing flows. Notification Server – PATROL Agent & KM provides consolidation point for PATROL Availability events. Provides a standard and centralized way to manage configuration of availability monitoring. Reduces data sent over the network to a single message per event. 4) Allows BII4P3 to connect to fewer PATROL agents 5) Do not limit availability events to only process and server up/down events. Think about errors in log files and other metrics to do not make sense for trending. Performance data from third party should be deployed to dedicated Integration Service nodes separate from PATROL data collection. The same is true for monitoring VMware metrics. Large scale external event collection should also be deployed to dedicated nodes. For example large numbers of SNMP traps or significant numbers of events from another event source like TEC or Netcool. 26 26
27
BPPM Application Servers – Single Server
Three Logical Tiers Presentation tier Data integration tier Event integration tier Do manage thresholds for trended KPI metrics. Do visualize the performance data instead of using other consoles such as BMC PATROL or TM ART. Do manage users with LDAP and Microsoft Active Directory. Do use only the KPI mode of operation. Do promote non-KPI metrics to KPI only when needed. Do limit the number of reports. Do not operate in non-KPI mode. 27 27
28
BPPM Application Servers – Single Server
Data Integration tier Do install the Integration Service nodes close to the data sources. Do deploy by geography, department, business, or applications especially if multiple Integration Services are required from a single source. Do limit data collection to key performance indicators and other supportive metrics only. Do manage enabling and disabling of the performance data at the source rather than filtering out at the adapter. Do make use of the automated workflow feature to manage BMC PATROL data collection. Do collect data at 5-minute polling intervals by default. Avoid faster polling frequencies. Do not mix multiple data sources on the same Integration Service. Do not collected excessive or unnecessary data. Review the need for a lower polling intervals considering server performance and database size. Do not collect trends for Availability metrics If higher polling frequencies are needed evaluate why technically. If the condition you are trying to detect comes and goes faster than every five minutes look for alternatives. For example find out if the condition can be written to a log file by the application or in the OS event log so that the data for the condition you are looking for is persistent and does not require a faster polling frequency to detect. 28 28
29
BPPM Application Servers – Single Server
Event Integration tier Configure event integration and monitoring for Availability first Distribute event collection cells as required, based on event loads and event sources. Deploy cells close to or on the same node as the event sources. Filter, enrich, normalize, de-duplicate and correlate as much as possible before propagating to the next level in the event flow path. Do not integrate trended data for up/down availability metrics. Do not collected unnecessary data/events. Limit event messages sent from the data sources to messages that require action or analysis. Do not try to use the event processers as a high volume SNMP trap forwarding mechanism. 29 29
30
BPPM Application Servers – Multiple Server Logical Architecture
Two Logical tiers Enterprise Event Console Single server deployment instances 30 30
31
BPPM Application Servers – Multiple Server
Enterprise Event Console Use this tier for event consolidation only. Correlate events from other BMC ProactiveNet single server deployments across the enterprise. Cross-launch from events into the lower tier BMC ProactiveNet single server deployment instances for root cause and probable cause analysis. Do not use this tier for raw event collection directly from event sources. Do not use this tier for performance data collection or visualization of data in graphs. Do not create devices. Do not publish service models into this tier. Do not integrate with BMC Remedy Action Request System with this tier. Do not integrate with BMC Atrium Orchestrator. 31 31
32
BPPM Application Servers – Multiple Server
Includes single server deployment instances Deploy separate instances based on Domains Security zones Business needs Geographic requirements Propagate all surviving events to the Enterprise Event Console Deploy a separate BMC ProactiveNet Server to manage shared infrastructure Enable integration with ITSM and BMC Atrium Orchestrator in this tier. Deploy service models in this tier. VMware Enable the Unlimited memory setting for the VM allocated to the BPPM Server CPU and memory resources must be dedicated to the BPPM server VM Failure to follow these recommendations will result in performance issues. 32 32
33
PATROL Data Collection
Complete work in phases Based on one business application at a time, or… Based on monitored technologies, or… Based on geography, etc Deployment Steps Determine the type and volume of data to be collected. Determine the number of PATROL Agents, Integration Services, and BMC ProactiveNet Servers required. Determine the location of all PATROL Agents and the respective Integration Services. At least one Integration Service should exist for each network The Integration Service should be close to the PATROL Agents connecting to it. Follow a standard for assigning the PATROL Agents to each Integration Service. The BPPM Server does not auto-balance the load between PATROL Agents and Integration Services 33 33
34
PATROL Data Collection
Deployment Steps (continued) Configure and deploy PATROL Agents. Limit PATROL KMs to collect only data which is needed Ensure that only required instances are being discovered. Disable discovery of instances that are short lived (for example, instances that are created and then deleted within the span of one to two days). Ensure that all KMs used for data collection are preloaded. Consider whether high availability (HA) is needed for the PATROL Agents used for collection. Patrol Agent devices must be named using a fully qualified domain name (FQDN) Validate that data collection and event generation are occurring on the PATROL Agent before connecting it to the Integration Service. Configure and deploy the PATROL Integration Services as required. 34 34
35
PATROL Data Collection
Deployment Steps (continued) Configure and deploy the PATROL Integration Services as required. Installed on a dedicated computer separate from the BPPM Server. Add PATROL Event integration support to each Integration Service node. BMC Impact Integration for BMC PATROL and the BMC PATROL Event KM Notification Server, must be installed and configured. Configure all the nodes consistently to support future event integration. Use PATROL Configuration Manager (PCM) to manage PATROL Agent & KM mass configuration. Configure data collection at the PATROL agents and ensure only necessary data is collected before connecting them to the Integration Services. PATROL Agents should be connected to the Integration Service using the automated workflow feature. Work with a subset of agents (100 at a time) to avoid overloading the BMC ProactiveNet Server or Integration Service. Filtering at the Integration Service level usually requires at least one instance from the KM to be present. 35 35
36
PATROL Data Collection
Deployment Steps (continued) Configure and deploy the PATROL Integration Services as required. If two different PATROL Agents are monitoring the same target, both agents must be tied to the same Integration Service. Consider high availability (HA) as part of the Integration Service node deployment. VMware HA is a recommended option if the Integration Service is being run on a VMware VM. If VMware HA is not an option, the out-of-the-box HA options can be enabled. Validate that data is being sent from the PATROL Agents to the Integration Service for both the performance metrics and events. Connect the Integration Services to the BMC ProactiveNet Server. 36 36
37
PATROL Data Collection
Deployment Steps (continued) Connect the Integration Services to the BMC ProactiveNet Server. Add the Integration Service nodes to the BMC ProactiveNet. (Note that Integration Service nodes are added as remote ProactiveNet Agents.) Determine if additional filtering at the Integration Service level is required. The following are common scenarios that require filtering at Integration Service: Events are all that is required from an application instance (such as process up or down). In this scenario the PATROL Agent must still collect the data but the data for those respective instances does not need to be brought into the BMC ProactiveNet Server. Existing PATROL customers have a large amount of data being collected at each PATROL Agent (the PATROL Agents were previously configured as independent entities and not with the BMC ProactiveNet Server in mind). KM application instances that are very dynamic (created and deleted within the span of 1 or 2 days). These instances must not be brought into the BMC ProactiveNet Server because they will not generate a baseline pattern and will only cause overhead to the Integration Service and BMC ProactiveNet Server. 37 37
38
PATROL Data Collection
Deployment Steps (continued) Confirm proper operation of the BMC ProactiveNet Server and Integration Services. For each batch of PATROL Agents and Integration Services deployed and configured ensure: BMC ProactiveNet Server and Integration Services are performing well and can still manage the load. Performance diagnostics for the BMC ProactiveNet Server and the respective remote ProactiveNet Agent nodes where the Integration Service is running are available in the web console. Scalability limitations of the Integration Services are not exceeded. Work in phases and repeat these steps as needed per group of monitored nodes, devices, etc. 38 38
39
Non- HTTP/S Connections
Firewalls & Protocols Limit the usage of HTTPS between the Integration Service nodes and the BPPM Server(s). HTTPS is not as scalable as HTTP HTTPS requires more administration Non- HTTP/S Connections Communicate the importance of efficient administration Request that all central administration nodes have access across the environment where needed. If central administration management nodes are not allowed access across multiple secure zones consider the following Install administration components on the Integration Service nodes and request remote access (RDP). PATROL Configuration Manager PATROL Classic Console Impact Explorer Etc. 39 39
40
Additional Resources & Information
Online Documentation BPPM Best Practices Product Documentation BMC Communities (public forum) BMC website documents discussions whitepapers additional information
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.