The Road to Hell is Paved with Average

Slides:



Advertisements
Similar presentations
Week 3 Outline Post-Mortem By: Jamaral Johnson. 2 After Actions Review In this presentation I will do my best to highlight what went wrong. This is just.
Advertisements

CS 113 Welcome to Unit 8 Academic Strategies for Business Professionals.
Noticing When Students are Not Engaged
Ian F. C. Smith Writing a Journal Paper. 2 Disclaimer / Preamble This is mostly opinion. Suggestions are incomplete. There are other strategies. A good.
Classic Connections: Innovative Methods for Making Education Work.
Warm up 1 Take a syllabus from the front table marked with your hour by it. Read through. Write 3 sentences on what you learned from the syllabus.
CS 113 Welcome to Unit 8 Academic Strategies for Business Professionals Jennifer Hooker.
MEDIATION. What is your conflict style? How do you resolve conflicts? Are you aggressive (my way of the highway) Compromising (let’s work it out) Appeasing.
Day One Every Day per lb You will have 15 minutes to work with your partner to help me solve my problem. Then you will have another 10 minutes.
MRS. HAYNES ENGLISH II ● ROOM 311 Period 1 - 7:30-8:17 English II Period 2 - 8:23-9:10 PLC Period 3 - 9:16-10:03 English II Period :09-10:56 Conference.
What is Brainstorming? Brainstorming is a process when you focus on a problem and come up with as many solutions as possible. One of the reasons it is.
Section 2 Effective Groupwork Online. Contents Effective group work activity what is expected of you in this segment of the course: Read the articles.
Bell Schedule Developing the school bell schedule can be a very complex task especially when incorporating shortened days for staff development, assembly.
TODAY YOU WILL NEED YOUR QUOTE OF ARMS YOUR READING LOG
Why Metrics in Software Testing?
GIVING FEEDBACK ON PERFORMANCE CONCERNS IN A 1:1 MEETING -
Mobile Testing - Bug Report
HISTORY 10 Mr. Hambleton.
*The claim is your topic/main idea of essay
Using the Templates in Salesforce
Using the Templates in Salesforce
Primary Lesson Designer(s):
Origin Myths Monday, August 15, 2016.
Exploring Values.
Tips for Taking the Computer-Based
QlikView Licensing.
Introduction to Project Management
Good Morning Everyone!! Our Warm Up today is finishing the exam we began on Monday. You will have exactly 30 mins in class today before we need to move.
Entry Task #1 – Date Self-concept is a collection of facts and ideas about yourself. Describe yourself in your journal in a least three sentences. What.
Go Math! Chapter 1 Lesson 1.3, day 1 Comparing Numbers
Academic representative Committee CHAIR training
Adding Assignments and Learning Units to Your TSS Course
NEW Behavior Log is filled out for 10/10/16
Have you ever conducted an experiment before
SCIENCE FAIR Mini-Lesson #5
2018 Test Administrator training
Agenda 9/2: Fill out profile Introductions Syllabus/Procedures Debrief
Formative Feedback The single most powerful influence on enhancing achievement is feedback. Hattie, 2009 At best, students receive ‘moments’ of feedback.
Communicating with caregivers about IPV and multiple injections
Mental and Emotional Health
Welcome! January 26th, 2018 Friday
Write this on your colorful Unit 1 Homework Calendar
Warm-Up #4 (pg 91_#1) Here is one of Spiro the Spectacular’s favorite number tricks. Choose a number. Add 6. Multiply by 3. Subtract 10. Multiply by 2.
Welcome to 3rd Grade.
Unit 1 The History of Earth Overview and Unit Guide
Working in groups Roles.
Partnered or Group Projects
Do You Know? Convert the following:
Simple computer problems to solve
Presentation Mastery Stop Presenting – Start Connecting
GIVING FEEDBACK ON PERFORMANCE CONCERNS IN A 1:1 MEETING -
GCSE Revision In response to a large number of Y11 students asking for advice on how to revise….. Introduction & revision planning Revision techniques.
Adventures with Charge Back
Data Quality: Why it Matters
1) 2) 3) 4) 5) Warm Up/Do Now Evaluate: OBJECTIVE: Try these problems
Ticket In the Door d a c b 1) 2) 3) 4) a) the product of 9 and 3
I've Got To Write A Research Paper ! ! !.
The of and to in is you that it he for was.
Test Administrator training — Part C
Topic Leader Training 2012.
Welcome to Science!!! Things to know!.
Near Miss & Hazard Reporting The S.T.A.R. Approach Managers’ briefing
Tech Que: “Hide n Seek” Title Graphic
The Troubleshooting theory
Fahrig, R. SI Reorg Presentation: DCSI
Tips for taking the Grades 9 and 10 FCAT 2.0 Reading Test
Article of the Week – A.O.W.
The Road to Hell is Paved with Average
Use of Chinese 普通话 English must be used for general conversation.
The Road to Hell is Paved with Average
Presentation transcript:

The Road to Hell is Paved with Average The road to Actionable Intelligence is paved with Minimum, Average, 95th Percentile and Maximum Good afternoon. Welcome to the CMG impact session 2L5c "The Road to Actionable Intelligence is paved with Minimum, Average, 95th Percentile and Maximum“. This is a slight change from the original title “The Road to Hell is Paved with Average”. This is a conversation where, using a real-world example, we will explore how reliance on average on 15 second data slices, failed to reveal actionable intelligence. We will encourage the use of other metrics that seem neglected such as minimum, maximum, and 95th Percentile. The example data is minimum idle workers, an overlooked JVM metric that caused problems in our production environment for months, but once found and monitored effectively, eliminated dozens of support incidents a week. Note that a paper was submitted for publication in the CMG Journal. So look out for that, or come talk to us. We have a booth in the vendor area, and would love to talk to you about this or any of our other presentations. === Congratulations again on the acceptance of your paper/presentation "The road to Actionable Intelligence is paved with Minimum, Average, 95th Percentile and Maximum" for the CMG imPACt 2017 Conference to be held November 6 - 9, 2017 in New Orleans.    Author(s):  Benjamin Davies Session Number:  2L5c Subject Area:  CAP Session Date and Time: 11/7/2017, 3:10 PM-3:25 PM Room Assignment: Lafourche https://www2.eventsxd.com/event/3865/cmgimpact2017/sessions THE ROAD TO ACTIONABLE INTELLIGENCE IS PAVED WITH MINIMUM, AVERAGE, 95TH PERCENTILE AND MAXIMUM Tuesday, November 07 - 03:10PM to 03:25PM Ben Davies Using a real-world example, we will explore how reliance on average on 15 second data slices, failed to reveal actionable intelligence. We will encourage the use of other metrics that seem neglected such as minimum, maximum, and 95th Percentile. The example data is minimum idle workers, an overlooked JVM metric that caused problems in our production environment for months but once found and monitored effectively, eliminated dozens of support incidents a week. 2017/11/09 Mr. Ben Davies – ben.davies@moviri.com

Agenda Average, on average, is a below average metric Configured workers; Idle and Busy Average is better in small time slices Example where average failed Summary and call to action In the next few minutes we will discuss why on average, average is a below average metric. Then explain configured workers, idle workers and busy workers. Then discuss that average becomes a better metric at smaller time slices. then show an example where even at 15 second time slices, average failed to show actionable intelligence. We will finish with a summary and call to action.

Average, On Average, is a Below Average Metric The road to hell is paved with AVERAGE The road to ACTIONABALE INTELLIGENCE is paved with minimum, average, maximum and 95th percentile Lets be clear on the message of this conversation. Average, on average, is a below average metric. I find that significant actionable intelligence is missed with our obsession with average. It is not that average is not a good metric, but if that is ALL you use, then you are on the road to (capacity) hell. Today is an effort to encourage you to include maximum, 95th percentile, average and minimum in your capacity analysis arsenal.

Average is Better in small time slices Small in this case depends on the monitoring target. Hourly monitoring targets Average of a day of hourly data is characterizing 24 discrete events to 1 number X per second monitoring targets are much different An average of hourly connections at five per second, is characterizing 18,000 discrete events to 1 number. Average is usually an acceptable metric, but looses usefulness as the ‘time slices’ become thicker, as compared to the monitoring target. Average of a day of hourly data is characterizing 24 discrete events to 1 number. That may be fine for hourly counts, but most capacity metrics, are events that are measured is in tens or hundreds of milliseconds. An average of hourly connections at five per second, is characterizing 18,000 discrete events to 1 number. === 5 per second * 60 seconds (in a minute) * 60 minutes (in an hour) = 18,000 transactions

Workers What is the metric showing? Configured workers Busy workers Idle workers Introduction to “JVM idle workers”, an under appreciated JVM metric we monitored with Wily. (Computer Associates, CA Wily) What is the metric showing? When the JVM service is started, there are a number of configured workers that get started. Configured worker is in one of two states. Idle Worker or Busy Worker. We are focusing in idle worker. When ‘nothing’ is happening on the device, the idle workers are many (up on the chart), however as the system gets busier, the idle workers are doing work, becoming busy workers, so there are fewer available idle workers to do new work. When idle workers get to zero, there are no idle workers to service new work, and you are effectively down, that is, unable to process new work, until workers become idle again. In our use case, idle workers are independent of all other resources. It matters not one bit that you have extra terabytes of storage, gigs of network, megs of ram and plenty of CPU. When the system as no idle workers to service new work, our application is effectively down.

Average on 15 second data If average is better as time resolution shrinks, surely average will a suitable on 15 second data. 5 transactions a second * 15 seconds = 75 distinct events summarized to one number While I agree, in theory, in practice you should use other metrics to CONFIRM that average is appropriate, especially when troubleshooting a system.

Example where average failed Got an alert and researched to see this chart. This looks ‘fine’ Turned on Min and saw this. Much more interesting. This IS actionable intelligence. We get these alerts on idle workers that support this super important application. We research and see this chart (left). This is a perfectly fine looking chart. Basically half of the resource is ‘un used’ as indicated by the white space under the dots. Remember this is a negative chart, meaning up is good and down the chart is bad. But the chart on the right has minimum turned on. This is much more interesting. Not only do we not have half of our resources unused, but on that one day, we have ZERO resources for HOURS. This is actionable intelligence

A bit about this Application problem The result of this little find was the application owners and technician now look at minimum idle workers as a leading indicator. As idle workers gets below some threshold, 10 lets say, they look to see what the environment is doing. This gives some early warning and lets them know when a problem actually started, not when it was seen by customers. === In the example, on average these behaved in a completely normal way even when there was a problem. What the technicians did during the troubleshooting was to restart the JVMs, which resolved the problem. The conventional wisdom was that the JVMs are misbehaving in some way that no one could figure out, but once it was found that resetting the JVMs resolved the issues investigation stopped. When we pursued the minimum idle worker alerts we interrogated the monitoring data and ‘found’ the show minimum and maximum check box in Wily. This revealed the other chart. Once this was observed we told everyone about it and got very hostile responses, from all but one important application owner. They used the data to determine which device was ‘freaking out’ and causing the burst of connections that ‘flooded’ the idle workers but did not get anything done. A code fix or two mitigated the issue and the application went from three a week ‘all hands on deck calls’ about poor performance, to zero ‘all hands’ calls for a year. Simply by having more complete information that pointed them to the real problem. For the record, the JVMs had nothing to do with the actual problems. They were victims of the upstream device flooding them with bogus connections or the downstream not releasing connections appropriately. Resetting the JVMs forced resets of these bogus connections both upstream and downstream which seemed to clear the problem, at least for the moment. As they were where the corrective action seemed to happened, they were blamed as the problem.

AVERAGE MEAN MAX MIN 95TH PERCENTILE The Root Failure The analysis rule of thumb should be to use the best metric, at the best resolution, of the best type (min, average, max, etc.). And you will not be able to determine this without looking at the options. AVERAGE MEAN MAX The root failure of the initial analysis is the ‘love of average’. The application troubleshooter said: “Average is all we look at. Especially in a 15 second slice, minimum and maximum would be little different. So there is no use to look.” Rubbish I say. The analysis rule of thumb should be to use the best metric, at the best resolution, of the best type (min, average, max, etc.). And you will not be able to determine what this is without looking at the options. MIN 95TH PERCENTILE

averaged to a single point really the metric you wish Is 18,000 events averaged to a single point really the metric you wish to base your career on? That is what you get with hourly average of the typical transaction based application Is 18 thousand events averaged to a single point really the metric you wish to base your career on? That is what you get with hourly average of the typical transaction based application

Summary and Call To Action We are at the end of our time so …

The road to hell is paved with AVERAGE Lets be clear on the message of this conversation. Average, on average, is a below average metric. If you ONLY use average as a capacity planner, you are firmly on the road to (capacity) hell.

ACTIONABLE INTELLIGENCE This is your exit Maximum 95th Percentile Mean Average Minimum ACTIONABLE INTELLIGENCE However, if you incorporate other metrics such as minimum, 95th percentile, and maximum, you are on the road to actionable intelligence. The Call To Action is to take this exit. The exit to actionable intelligence. As you will see the roads here are paved with max, min, mean, 95th percentile and yes even some average. We are not against average, but we, as professional capacity planners, we should be for finding actionable intelligence. I submit this is one road we should be on to get there. There are several presentations that review some of the topics we just breezed over. So give me your card and I will send you the links. Movìri is here to help with these and other tools and techniques. Give us a call. === These ideas are used in this presentation http://mlcu.com/Session533-TSCO-MotherLodeForActionableIntelligenceV03.pptx CMG Proceedings. http://mlcu.com/articles http://moviri.com/blog

Questions & Answers