Stephen Vaillancourt Fellow - PTC Technical Support Tim Atwood

PTC System Monitor Deep-Dive: PTC Windchill Pro-Active Monitoring and Performance Troubleshooting
Stephen Vaillancourt Fellow - PTC Technical Support Tim Atwood Senior Technical Consultant - PTC Enterprise Deployment Center June, 2014

Agenda PSM 3.0 New Features
Which dashboards to start from for initial analysis? Self-help in diagnosing and resolving problems Transition from Reactive to Proactive monitoring of your PTC Environment Tips for adjusting incident actions and thresholds Tips for creating Reports How to better monitor your company’s critical “business transactions” using new dashboards and optional AutoBaselines Include examples of business transactions: checkin, checkout, search, workspace refresh, browse product

PSM 3.0 New Features

PTC System Monitor (PSM) 3.0 Enhancements
RTM: January 6th 2014 Business Transactions Infrastructure Monitoring Ease of Deployment Rich Notifications 65 Reduced Post Install Configuration See for more details. Not on slide: better handling of confidential strings, database aggregation, upcoming new oracle plugin that can get sql explain plans PSM Incident to PTC KB

Content Download Business Transaction
Content Download Dashboard Track content download activity By count and content size per user Username Query Builder report on Works for File Server sites as well Queue Name Top 10 User Activity All Activity View

Host Health monitoring is automatic from java agent
Monitors Disk, CPU, Memory and Network for each host in system. Thresholds and alerts are preconfigured and adjustable. Red Indicators show issues Incidents generated as well; can be ed More Details in Upcoming Slide… No more separate Monitor configuration required, data is automatically gathered using java agent Quickly check for disk/network/CPU/memory using indicators Automatic, bidirectional correlation of host & application process health problems to business transactions and PurePaths Tip for reports: chart the host measures in separate dashboard Timeline of issue(s)

Where to Start?

PSM looks Powerful, where do I start?
Is the system ok? System Health Incidents Infrastructure Overall evaluation: Add a popup pfor “Application” Add use cases for each screenshot Transactions

Where to Start: System Health Evaluation at a Glance
#Dashboard #Open #Host # “01 – Monitoring(M)100 – System Health 10.x” Rollup Incident Indicators Response Times from Server` Response Times from Server To Investigate: -Right Mouse -Drill Down -Web Requests Overall evaluation: Time Spent in Web Requests versus RMI

Where to Start: System Health: Web Requests
Sort by columns to identify operations of concern 10+ columns to sort results with Like transactions grouped together Tip 1: Right Mouse click on column headers to add columns Tip 3: Export Data for TS case Tip 2: Search KB for URI Add sorting bubble

Where to Start: Incidents
Notification likely start will start with an notification Tip 3: Create Downtime for maintenance windows Tip 1: Edit Incident actions and thresholds so notifications are worthwhile Start with an notification Tip 2: Export Data for TS case from incident

Where to Start: Infrastructure – Machine health
Network, CPU, Disks & Memory are all monitored Probl Green will turn red when there is a problem

Where to Start: Monitoring Business Transactions
User Operation Dashboards M108- Windchill Application Overview D101- User Activity 10x D20#- Workspace Operation Dashboards D2##- User Operations

Database Drill Down: Identify problem SQL statements
Sort Click on column headings to sort Problems The circled values are the ones that represent problems, and the corresponding SQL should be investigated further.

Identify a Problem SQL statement from the DB View
-Search the Knowledge base for a solution Search the knowledge base for known solutions – indexes or SPRs using parts of SQL statement the “FROM” and/or “WHERE” clause. Note: Searching using the columns in the “select” part of the statement could result in a lot of matches

Using PSM and the Knowledge Base to get a solution

Proactive Approach

Why transition to pro-active monitoring on PDMLink?
Who can relate to some of these? “I don’t want the director of engineering standing in my office” I need to restart the system sometimes; maybe the problem will go away on it’s own … Riiiight  My life becomes more difficult when I need to explain to the business owners why productivity is suffering because of the tool ….. Downtime on our system costs $ a minute … I need spend as little time administrating Windchill as possible … Who’s on call this weekend? Animate this

The “old” way: reactive and time-consuming
Windchill is considered a “black box” by many Administrators Tuning and monitoring stability is often “reactive” versus “preventative” due to a lack of standardized procedures, monitoring and diagnostics Troubleshooting can be difficult as there are many “moving parts” involved Application, Relational Database, Operating System Resources, Storage, etc. Advanced troubleshooting knowledge often required due to lack of deployed tools Multiple diagnostic efforts frequently occur before the correct information is acquired Need to reproduce an issue several times in order to log it properly Built in monitoring and diagnostic tools within Windchill has improved in 10.x but is light. Troubleshooting is often reactive VS proactive prevention, logging levels must be tweaked and issues reproduced, etc. Sudden Issue Reproduce with Logging Resolve

The “new” way: Make PSM Pro-Active
Step 1.0 – Size your PSM Server for Good Performance Check the PSM Sizing Calculator to ensure you have the proper resources allocated to your PSM server Step 1.1 – Deploy PTC System Monitor Install and configure PSM in your Development, Test and Production Environments. Follow instructions in the Install Guide for detailed steps. Ensure that you use a database (Oracle or SQL Server) for your performance Warehouse and not the embedded database built-in the dynaTrace Server Ensure you configure the Mail Server settings to receive notifications on important Infrastructure and Application incidents. Step 1.2 – Identify Critical Business Transactions Use OOTB Business Transaction Dashboards to identify common and slow transactions Engage your Business Unit leadership to identify critical set of end-user transactions, based on the following metrics Frequency of Use Business critical transactions Historically Slow running transactions Step 1.3 – Define Service Level Agreement Based on feedback from Business Unit define what are the acceptable thresholds for your system from a Business Transaction and System availability perspective Make adjustments to the OOTB incidents and thresholds as needed Step 1.4 – Update Thresholds & Dashboards Step 1.5 – Configure Reporting Based on output from Step 1.3, create custom report dashboards, set frequency for reporting and define target audience for reports. Step 1.6 – Configure Auto Baseline Configure PSM to create a rolling average for any Business Transactions identified as critical from Step 1.3 that may not be readily modeled with a static threshold Step 1.0 – Size your PSM Server for Good Performance Step 1.1 – Deploy PTC System Monitor Step 1.2 – Identify Critical Business Transactions Step 1.3 – Define Service Level Agreement Step 1.4 – Update Thresholds & Dashboards Step 1.5 – Configure Reporting Step 1.6 – Configure Auto Baseline 1.0 Size your PSM Server 1.1 Deploy PTC System Monitor 1.2 Identify Critical Transactions 1.3 Define Service Level Agreement 1.4 OOTB Thresholds? Yes No Reporting: if a tree falls in a forest, does anyone hear it? A system with no reports is similar, no one knows what is going on. Update Thresholds & Dashboards Configure Reporting 1.5 Configure Auto Baseline 1.6

Tools for Proactive, Automated Monitoring
Proactive monitoring tools, mix and match as needed Automatic Host Health monitoring (new) AutoBaseline (new) Enhanced Alerts (new) Custom Performance Incidents from PTC Request Health, Active Contexts, Long Queue Operations Incident advanced actions trigger thread dumps, cpu samples, memory dumps optional tools (see presentations from 2012 & 2013) URL Monitor for availability Web Transaction Monitor (using a jsp test page) Tagging automated client tests The mix of features and tools available give the customer the ability to mix and match in order to achieve their performance and availability goals with better monitoring and faster diagnostics.

Proactive: Configure Incident Definitions
View in the Incidents Dashboard rather than in System Profile. Add columns (Sensitivity, Thresholds, Actions) for best view of definitions, sort by Actions if necessary, then Edit incidents to adjust the thresholds and actions for your deployment.

Edit the Incident Actions

Reports increase ROI from PTC System Monitor
Reports are generated based on dashboards Improving ROI from deploying PSM Show throughput and trends for key Business Transactions Show utilization and/or adoption of system over time Show which locations are critical to business Show trends for future growth: users, hardware resources, disk space, etc Keep your managers informed  Generate on demand or from a schedule (uses last saved version of dashboard) Tip: use a custom “_report” dashboard for best formatting as pdf or xls Add legends and details to charts that are not on regular dashboard May not display properly in client without really large screen Can have multiple layers of dashlets not shown in client but shown on separate pages in report. Keeps reporting dashboards from being modified accidentally by clients with admin privileges. Tip: Use excel format to allow for further formatting or data manipulation Yes, there are reports in Windchill available OOTB from Report Management and from Windchill Business Reporting. However, Report Management has a steep learning curve, and WBR requires separate license and somewhat complex deployment. If you decided to use PSM, you should be aware of this capability.

Example: Dashboard Created for Reporting
Too busy for viewing on regular screen, but contains lots of information Hourly access & activity report? Weekly top users?

Example: Generate Report from Dashboard
Tip: Use Excel (XLS) for easiest formatting & customization with macros or templates Hourly access & activity report? Weekly top users?

Business Transaction Thresholds

Business Transactions
Filters that help you process the large amount of monitoring data PSM 3.0 has over 60 new BTs to pick and choose from Thresholds can be individually tailored for your most important BTs Additional custom BTs can be created as needed A new functionality called AutoBaselines allows for dynamic thresholds based on the past week’s usage data.

Application Health (new)
PSM 3.0 makes where to look A LOT easier – With the Default Monitoring view Red indicates problems Choose Business Transactions to Monitor Overall evaluation: Select a Business Transaction to Drill into

Application Health & AutoBaselines (new)
PSM automatically tracked these actions and shows violations occurring This is an example of the dashboard when an incident has occurred. Top left shows Web Page Requests being monitored, but you usually would not want to have these included. Include specific BTs of interest instead as in bottom right example.

AutoBaseline for Business Transactions
Application Monitoring dashboard and Incidents dashboard Response Time Median (50% percentile) and Slowest 10% (90% percentile) 7 days worth of data evaluated, updated continuously 2 violations in a row triggers incident with Smart Alert Failure Rate checks for statistically significant errors using binomial distribution Throughput No alerts, just charts expected range based on the same 15 minute interval 7 days ago OR 1 day ago OR 1 hour ago OR the first hour when we start baselining Looks for statistical significance based on expected throughput before alerting for response time or failure rate. The Baseline for Failure Rate uses the binomial distribution as compared to the statistical approach used for Response Time. For detection violations we use the same concept of Significant Measurements - here is an example: During low traffic night hours 1 out of 5 failing requests would mean 20% failure rate - but - is this significant with that low traffic? 100 failing out of 1000 requests during high load is significant - therefore we alert

AutoBaseline: Business Transaction
Single occurrences do not trigger an incident, it takes two in a row

Tips to Configure AutoBaselines
Copy and Paste OOTB Business Transactions, then Rename with naming convention (ex: AutoBaseline) Edit to remove all splittings (especially User Tracking), add splitting for Application and Save Note: this is a complicated setup, in the future we would like to get at least some of this into PSI. Looks for statistical significance based on expected throughput before alerting for response time or failure rate. The Baseline for Failure Rate uses the binomial distribution as compared to the statistical approach used for Response Time. For detection violations we use the same concept of Significant Measurements - here is an example: During low traffic night hours 1 out of 5 failing requests would mean 20% failure rate - but - is this significant with that low traffic? 100 failing out of 1000 requests during high load is significant - therefore we alert Select BTs to monitor in Application Monitoring dashboard using naming convention filter Do not monitor Web Page Requests BT, select specific BTs instead.

Tips for Configuring AutoBaseline
Can override automatic baselines Option: Add duration as criteria to prevent false positives This is another tool in your toolbox for proactive monitoring. It may or may not work for your deployment, depending on how much activity you have, have consistent your performance is, whether you have certain critical BTs to model. Some deployments do not want to make use of the 5/30/180 buckets for alerts, but do want performance alerts for certain BTs. They have the choice of creating manual thresholds for the BTs, or letting the system determine them dynamically. Option: Turn On/Off Incident Alert notifications

AutoBaseline: Incident
Triggered by two occurrences in a row. Incident will have links to other analysis dashlets.

Static Thresholds or AutoBaseline?
Each has its uses, depends on the Business Transaction SLA and behavior. Keep in mind that main goal is to avoid “crying wolf” due to natural variations in response.

Reference: ExtendedEmailAction plugin
Available OOTB in 3.0 Supports quiet time intervals using the source as information in the filtering out certain agents from alerts link to dashboards in the . Uses additional variables for the plugin for better filtering Can embed into dashboard names predefined variables which are substituted with their runtime values in link. Supports multiple formats supports Java regular expressions to filter by Agent, Monitor, Collector and Server Plugin provides REST filtering of the PDF reports by agent names/hosts, group names, and/or custom timeframe Example: admins may not want notification if one of multiple foreground method servers exits since it is automatically restarted by server manager, but they would want s regarding any server manager that exits and needs manual restart. Example: Admin1 at main site, admin2 at remote site B1. Incident notifications can be filtered based on the collector, so B1 incident alerts will be sent only to admin2.

Transition to Proactive
Tweak for PSM to go from reactive, log-based monitoring to dashboards to incidents and thresholds to host health and automated baselines No PSM React to User Complaints Diagnosis from logs and text output No history for performance of system Reproduce issues Lack of remote tools PSM 2.0 Incidents and alerts for performance and system health (error %) Thresholds for response time Business Transactions help filter data Dashboards for faster diagnosis Performance Warehouse for long-term history (months) Session Cache (2.1) for short-term history (weeks) PSM 3.0 Automatic Host Health with adjustable thresholds for cpu, memory, disk, network Many more Business Transactions Optional dynamic thresholds (AutoBaseline) Enhanced alerts

Export and Upload to a TS case
Select “red” MethodServer(s) and Export Look for problem MethodServers to Export data from

Thresholds are adjustable
Host Health Criteria Thresholds are adjustable

Is Transaction affected by host or process health?
Shown in Application Overview, Transaction Flow and/or Process Health

Maturity Model for System Monitoring and Maintenance
Fragmented Domain-integrated Enterprise-integrated Level 1 Level 3 Level 2 Extended Level 4 TCO $$ System availability and performance is un- measured Tedious log analysis Often required reproduction of issues with additional verbosity Most issues discovered by users System Availability and Performance metrics defined More advanced scripting additional monitoring (JMX) and profiling tools 3rd party monitoring tools System availability and Performance metrics captured which generate actionable improvements End-to-End Transaction centric monitoring capabilities Production friendly and “Always-on” monitoring and diagnostic capabilities One monitoring system that would cover all Windchill components across the enterprise Plug into enterprise-wide toolset and maintenance solution or outsourced Level 2: starting to use better, advanced tools and define performance metrics. In 10.x, there is some historical records for JVMs. Level 3: all the benefits of PSM available, Level 4: can include managed services, integration of PSM into an overall approach Pre-PSM With-PSM

Summary Many new features in PSM 3.0 make Administrators’ lives easier
There are several key dashboards that are good starting places for both monitoring and digging into a problem Utilize PSM diagnostic results to perform targeted searches of the TS Knowledge base PSM can be configured to be more proactive using both new and existing features Use both static thresholds (with new dashboards) and self-adjusting AutoBaselines to proactively monitor the user transactions which are most important your company Use Reports to monitor and track KPI’s (such as throughput and logins) and broadcast that information to key stakeholders

System Administrator Sessions
PTC, Customer and Partner Lead Sessions Title Presenter(s) Day Time Location PTC103 – PTC Windchill Business and System Admin Roadmap Walid Saad Monday 10:30AM – 11:15AM 102A CUST104 – We’re Watching You: Improving PTC Windchill Performance with PSM and UEM John English (BAE Systems) Todd Votapka (BAE Systems 104A PTC108 – PTC System Monitor Deep-Dive: PTC Windchill Proactive Monitoring and Performance Troubleshooting Stephen Vaillancourt 11:30AM – 12:15PM 102B PTC113 – Improving PTC Windchill Performance Ram Krishnamurthy 1:30PM – 2:15PM 103 PART100 – PTC System Monitor: Extending to the End User Desktop Dan Breslin (Compuware) Dan Betry (Solar Turbines) 4:15PM – 5:00PM 251 PTC121 Guerrilla Oracle Tuning for PTC Windchill PTC206 – Windchill Architecture Deployment and Security Steve Dertien Tuesday 11:00AM – 11:45AM CUST215 – Remote Sites, WANs and PTC Windchill Replicas: Put the Data Close to the User Mark Grothe (ITT Corp.) Ray Schussler (ITT Corp.) 1:15PM – 2:00PM PTC210 – Ask the Experts: PTC Windchill Will Kohler PTC217 – Protecting Your Intellectual Property with PTC Windchill Steve Shaw 2:15PM – 3:00PM PART213 – PSM PLUS: Realizing and Extending the Value of PTC System Monitor (PSM) Chris Stark (Compuware) 2:40PM – 3:00PM 256 CUST317 – Implementing PTC Windchill PDMLink with Security Labels: Pitfalls and Successes Jeff Brodsky (Raytheon) Danny Poisson (Raytheon) Wednesday 10:15AM-11:00AM PTC311 – Windchill Rehosting Utility: Present and Future Chris Watson 11:15AM – 12:00PM

Stephen Vaillancourt Fellow - PTC Technical Support Tim Atwood

Similar presentations

Presentation on theme: "Stephen Vaillancourt Fellow - PTC Technical Support Tim Atwood"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Stephen Vaillancourt Fellow - PTC Technical Support Tim Atwood

Similar presentations

Presentation on theme: "Stephen Vaillancourt Fellow - PTC Technical Support Tim Atwood"— Presentation transcript:

Similar presentations

About project

Feedback