Turning Practice into Perfect Implementing Fathom 2.0 Adam Backman White Star Software
Notes ä All of the information covered in this presentation is covered in a new portion of the Progress documentation called: OpenEdge Revealed Mastering the Progress Database with Fathom ä A system works as a whole and not as a sum of its parts so the presentation is written the same way. With this in mind please hold your questions until the end
Presentation Goal ä Explain how to use Fathom ä Implementing best practices ä Not here to teach in-depth System Administration ä Point you to other presentations for more in- depth information
The Operation’s Challenge ä Constant reactive mode ä Manual processes ä Poor reporting ä Unplanned downtime ä Cannot plan for growth ä Poor System Performance Resulting In: n n Unpredictable Operations n n Exposure to Errors n n Incomplete Information n n Frustrated End Users n n Frustrated Administrators
Goals of a Well Maintained System ä Resiliency – The ability to recover ä Availability – Provide maximum uptime ä Performance – Consistency despite system load Fathom can help achieve these goals
Roadmap ä What are best practices? ä What is Fathom? ä Providing a resilient system ä Making your system highly available ä Providing consistent performance
What are Best Practices? ä Defined processes to follow ä Consistent verifiable outcome ä End result – well maintained system
Defined Process to Follow ä Must have clear goals ä Functional ä Business ä Document where you are now and how you are going to achieve your goals Munich Paris London Amsterdam Prague Paris
Defined Verifiable Outcome ä Know what you expect ä Know what you are getting ä Test completely prior to implementation ä Unit testing ä End-to-end testing
Well Maintained System ä Ability to support 24 hour operations with only scheduled outages for upgrades and maintenance ä Ability to recover from disaster with little or no data loss and minimal interruption to operations ä Ability to support the changing needs of the business with little or no performance degradation during times of heavy processing
Roadmap ä What are best practices? ä What is Fathom? ä Providing a resilient system ä Making your system highly available ä Providing consistent performance
What is Fathom? ä Java-based management console and agent ä Management console ä Provides interface to the agent ä Provides an interface to the Fathom Database ä Allows for definition of alerts ä Fathom agent ä Collector of operating system resource information ä Collector of Progress database management information
Dictionary ä Resource – Anything Fathom can monitor or trend ä Schedule – A defined timeframe when a resource is available for monitoring, alerting, and trending ä Poll – The process of gathering information about a resource ä Rule – a performance requirement that can be evaluated ä Alert – A response to a rule being broken ä Action – A process to be performed in response to an alert ä Trending – The process of storing performance and audit data in the Fathom Trend Database ä Monitoring – Performing polling, evaluating rules, generating alerts, executing actions, and trending of a resource within a scheduled timeframe.
Progress Fathom Architecture Fathom DB Fathom DB ProductionDBProductionDB Memory Disk CP U Net* Log File System
Fathom Architecture Multiple Sites Fathom DB Fathom DB Fathom DB Fathom DB
Fathom Architecture Monitor Locally\Trend Remotely Fathom DB Fathom DB
Fathom Architecture 2.0 Monitor/Trend Database Remotely DB Agent Fathom DB Fathom DB Fathom DB Fathom DB
DB Agent Fathom Architecture 2.0 Monitor & Trend Anywhere Fathom DB Fathom DB Fathom
Fathom Architecture Manage from One Browser Fathom DB Fathom DB DB Agent Fathom DB Fathom DB Fathom DB Fathom DB
Roadmap ä What are best practices? ä What is Fathom and how does it work? ä Providing a resilient system ä Making your system highly available ä Providing consistent performance
Resiliency ä Redundancy ä Developing an effective recovery plan ä Monitoring for problem avoidance
Redundancy ä Disk ä RAID ä Raid Levels ä Dos and Don’ts ä After imaging ä Memory
RAID ä Redundant Array of Inexpensive Disks Patterson, Gibson and Katz at the University of California Berkeley (1987) ä Common RAID Levels ä RAID 0 – striping ä RAID 1 – mirroring ä RAID 10 or 0+1 – Striped with mirrors ä RAID 5 – Striped with calculated parity
RAID: Dos and Don’ts ä Do: ä Use RAID 10 for randomized storage ä Use RAID 1 for sequential storage ä Use RAID 5 for READ-ONLY data ä Don’t ä Use RAID 5 for OLTP ä Use RAID 0 for data storage
Memory Interleaving Memory interleaving works like RAID 0 for memory. While there are significant potential performance gains from interleaving memory you run the risk of having one faulty memory chip bring down your application.
Resiliency: Recovery Planning ä Who in involved in the process? ä What gets backed up? ä Where do we backup up our data ä Where do we store the physical backup? ä When do we do a backup? ä Why do a backup at all? ä How can Fathom help?
Who is Involved in Recovery Planning? ä Technical people ä They understand what is possible ä Business people ä They understand what is needed and the cost of downtime ä Management ä They understand where the business is headed and what can be afforded
What is Included on the Backup? ä More than just a database backup ä Database ä Application ä Other Files ä Physical backup ä Secondary machine room ä Additional Hardware ä Infrastructure
Where Do We Backup To? ä Capacity – How much do you need to store? ä Removable – To allow off-site archival ä Reliable – It must work every time ä Compatible – Keeps your options open
Where to Store your Backup? ä Formal service ä 24 hour access ä Secure ä Highly disaster resistant ä Separate location (different building) ä Inexpensive ä Greater need for planning (access, security, disaster, etc.)
When to do a Backup? ä As often as practical ä A once a day backup will cause you to loose up to 24 hour of processing in the worst case ä Fill in with after imaging ä Store AI on different disk ä Archive AI files throughout the day ä Keep warm standby to reduce downtime
Why do a Backup? ä Reduce data loss ä Build user confidence ä Keep your job
How Can Fathom Help? Scheduling ä Consistent schedule that is not forgotten ä Pro-active notification if there is a problem ä Fathom 2.0 Job Templates
How Can Fathom Help? Reporting ä Processing time is captured ä Historical trend report of backup ä Audit trail
Resiliency: Problem Avoidance ä Common problem areas: ä Disk full problems ä Database extents filling fast
Fathom: Disk Monitoring ä Disk view ä Monitoring disks other than database ä Graphical view of what disks look like
Roadmap ä What are best practices? ä What is Fathom and how does it work? ä Providing a resilient system ä Making your system highly available ä Providing consistent performance
Availability ä Reducing the impact of unplanned events ä Planning for system growth ä Reducing impact of change to the user ä Scheduling Online Utilities
Planning for System Growth ä Trending allows for patterns to be viewed and acted upon ä Trending allows for operational thresholds to be established ä Trending allows for advanced planning so maintenance can be scheduled when convenient for the business
Fathom: Disk Trending ä Correlating database and disk trends ä Month by Month, Week by Week or Day by Day it is your choice ä Fill rates and activity of each disk
Fathom: Storage Area Trending ä Fill rate ä Activity by area ä This information can show a need to spread data even further
Fathom: Table and Index Trending – database analysis ä Predicting table growth ä Predicting Index growth ä Index compaction rates can be monitored and actions can be taken if the compaction drops below a certain level ä Utilization of each table and index can also be tracked and viewed in other areas of fathom
Fathom: Memory Trending ä Focus on paging and swapping rather than utilization ä This is currently a weak area within the Fathom product
Fathom: CPU Trending ä Look at Idle ä Look at the ratio between User and System ä High system time can indicate an incorrect value for –spin or High paging or swapping
Roadmap ä What are best practices? ä What is Fathom and how does it work? ä Providing a resilient system ä Making your system highly available ä Providing consistent performance
Performance ä Performance is relative ä Fast is overrated ä Fathom can help find tough problems
Performance is Relative ä What is a baseline? ä Determining your baselines ä How Fathom can help ä Important indicators ä Who is your canary?
Determining your baseline ä Good baseline guidelines ä Often accessed portions on the application ä High customer impact ä End to End (Time to enter an order) ä Bad baseline ä Year-end process ä Management reporting (in most cases) ä Little used portions of the application
Components of Performance ä Network ä Disk ä Memory ä CPU
Issues: Network ä Check your network capacity BEFORE adding any additional applications ä Baseline response times with Fathom ä Routed vs. switched networks ä Location of Progress files ä Program Libraries
Issues: Disk ä Storage capacity vs. throughput capacity ä Remember your RAID levels ä Location of data
Issues: Memory ä Memory acts as a buffer between the user processes and disk ä Use memory for the common good ä Increase broker memory first ä Increase client memory (-Bt, …) ä Then get creative
Issues: CPU ä Good CPU usage vs. Bad CPU Usage ä The –spin parameter ä Have a CPU problem? Look at your disks
Monitoring Performance ä Spot checks ä My Fathom Views ä Trend reporting ä Getting out of the Forest
Conclusion ä Start slow ä Remember your goals ä Resiliency ä Availability ä Performance ä Consider the cost/benefit before adding monitoring or trending to a resource
Questions