Analyzing Performance test data (or how to convert your numbers to information) Carles Roch-Cunill Test Lead for System Performance McKesson Medical Imaging.

Slides:



Advertisements
Similar presentations
The Scientific Method 6 easy steps.
Advertisements

1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Mean, Proportion, CLT Bootstrap
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
G. Alonso, D. Kossmann Systems Group
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
What is Science? We are going to be studying science all year long! Take a moment and write down on your paper in several sentences what you think science.
Evaluating Hypotheses
Statistics: The Science of Learning from Data Data Collection Data Analysis Interpretation Prediction  Take Action W.E. Deming “The value of statistics.
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 10: Estimating with Confidence
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Measures of Central Tendency
Introduction to Science: The Scientific Method
Scientific Method Lab.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Describing distributions with numbers
Think of a topic to study Review the previous literature and research Develop research questions and hypotheses Specify how to measure the variables in.
How can you find a supported answer to an investigative question?
Introduction to Science: The Scientific Method
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Evidence Based Medicine
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Scientific Inquiry Mr. Wai-Pan Chan Scientific Inquiry Research & Exploratory Investigation Scientific inquiry is a way to investigate things, events.
Scientific Method Scientific Method – Process of critical thinking that uses observations and experiments to investigate testable predictions about the.
The Scientific Method. The Scientific Method The Scientific Method is a problem solving-strategy. *It is just a series of steps that can be used to solve.
Science & Technology: Chapter 1 Section 2
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
1 Psych 5500/6500 Standard Deviations, Standard Scores, and Areas Under the Normal Curve Fall, 2008.
The Scientific Method Defined: step by step procedure of scientific problem solving (5) Major steps are listed below.
 We are going to be studying science all year long! Take a moment and write down on your paper in several sentences what you think science is.  Be Prepared.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Section 10.1 Confidence Intervals
Introduction to Earth Science Section 2 Section 2: Science as a Process Preview Key Ideas Behavior of Natural Systems Scientific Methods Scientific Measurements.
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Scientific Methods and Terminology. Scientific methods are The most reliable means to ensure that experiments produce reliable information in response.
+ DO NOW. + Chapter 8 Estimating with Confidence 8.1Confidence Intervals: The Basics 8.2Estimating a Population Proportion 8.3Estimating a Population.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
CHAPTER Basic Definitions and Properties  P opulation Characteristics = “Parameters”  S ample Characteristics = “Statistics”  R andom Variables.
Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.
Major Science Project Process A blueprint for experiment success.
Chapter 6 Lecture 3 Sections: 6.4 – 6.5. Sampling Distributions and Estimators What we want to do is find out the sampling distribution of a statistic.
The Scientific Method: How to solve just about anything.
Outline of Today’s Discussion 1.The Distribution of Means (DOM) 2.Hypothesis Testing With The DOM 3.Estimation & Confidence Intervals 4.Confidence Intervals.
Warsaw Summer School 2015, OSU Study Abroad Program Normal Distribution.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
Chapter 7: The Distribution of Sample Means
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
The Scientific Method.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Presentation transcript:

Analyzing Performance test data (or how to convert your numbers to information) Carles Roch-Cunill Test Lead for System Performance McKesson Medical Imaging Group

6/1/2009 Agenda - Performance testing as an experimental activity - Very fast review of Scientific Method - Errors, forget them at your own risk - About the meaning of data - Some statistical concepts - Analyzing data - Adjusting your data to a model - Summary

6/1/2009 Performance testing as an experimental activity There are two approaches to testing: a) Without added value –This feature does not work –This requirement is not meet b) With added value –This feature does not work, and this module/component/software artifact is the culprit –This requirement is not meet, and it fails for this reason. Usually, things are not so clear, and testers statements fall somehow in the middle. Because Performance testing gathers data that can be analyzed, the performance tester is well positioned to provide added value information to the team.

6/1/2009 Performance testing as an experimental activity If you want to provide added value and explain why the requirement is not met you will - Formulate a hypothesis: “My performance degrades due to component X” - Test the hypothesis by developing an appropriate test environment - Gather results - Analyze the results to see if they confirm or reject your hypothesis If you are lucky and your guess (the hypothesis) was good, you will have explained at least a part of the performance behaviour. However, usually there may be other factors that may also influence your performance, so you have catch one low hanging fruit.

6/1/2009 Performance testing as an experimental activity You can create different test that will put more emphasis in one of the components of the system. For example, you may want to specifically measure the performance of the data repository tier, or the network, or only the UI. Depending where is your focus, your methodology and your tools will change. In all cases, you need to fix all the parameters but one. For example, if you want to study the influence of the network on your system, you need to do the following: Determine the parameters that characterize the network (latency, bandwidth, utilization…) Determine the parameters that characterize the network (latency, bandwidth, utilization…) Identify if they are independent or not (utilization and latency may not be independent) Identify if they are independent or not (utilization and latency may not be independent) Modify one parameter at a time while keeping the other constant Modify one parameter at a time while keeping the other constant

6/1/2009 Very fast review of Scientific Method - An effect has been observed. Example: performance degradation on your application - You try to reproduce it and learn the conditions to reproduce it at will - You may gather some data through testing - To explain the data you formulate a model (hypothesis) - You refine your testing and tailor it around your model - You analyze the new data and check if your model fits the data - If the model fits it, you are on a good footing - If the model partially fits it, you either refine your model or discard it. - If the model does not fits it, you formulate another model - In both cases, new data obtained from other tests may force you to modify/rethink or even dump your model. - Once your data fits the model, you draw conclusions based on the framework provided by the model.

6/1/2009 Very fast review of Scientific Method Unstated principles: Simpler is better Simpler is better Same procedure and system, you get the same results. Same procedure and system, you get the same results. A model should not introduce mode questions than it answers A model should not introduce mode questions than it answers Usually, newer models include the older models as particular cases Usually, newer models include the older models as particular cases Models are dynamic. Models are dynamic.

6/1/2009 Errors, forget them at your own risk Errors happen… so take them into account There are two main kind of errors: Human Errors: stopping the watch in the wrong moment, confusing digits… Human Errors: stopping the watch in the wrong moment, confusing digits… Instrument error: Your watch is not precise, has a mechanical defect… Instrument error: Your watch is not precise, has a mechanical defect…

6/1/2009 Errors, forget them at your own risk In the graph besides. If your error bar is ± 1, we can say the trend is to a larger value. However, if the error bar is ± 3, then we can not say anything about the trend of this data

6/1/2009 About the meaning of data Performance generates a lot of data. But what all the data means? To explain this data you need to take into account: Hardware Hardware Network characteristics Network characteristics Network topology Network topology Physical support for Data tier (storage, database..) Physical support for Data tier (storage, database..) The architecture of your application The architecture of your application How your application is coded How your application is coded….

6/1/2009 About the meaning of data In addition, you need to analyze the results in the context of the requirement or the question you are trying to answer. For example: “ Event A should not take more than x seconds” In most of the circumstances involving computer systems, you will have an stochastic component in your distribution. Assuming a normal one you will have something like

6/1/2009 About the meaning of data But, what exactly the requirement means? Strictly it means:

6/1/2009 About the meaning of data However, the requirement it usually interpreted as : For formal point of view the requirement “Event A should not take more than x seconds” would have failed with the above distribution. However the statement “The average of Event A should not take more than x seconds” would pass

6/1/2009 About the meaning of data The requirement can also be expressed as percentile In this case the requirement will be stated as “Event A should not take more than X seconds 50% of the time”

6/1/2009 Some statistical concepts Once we have defined the question, we can provide the answer. The answer will be obtained through measurements (either manual or automated). The more measurements you take, the better will be your statistics and the better will be your answers. However, the measurements need to be statistically significant. What it means is the measurement is good enough to be included in your statistics. All the measurements that are included in your statistics need to be statistically equivalent

6/1/2009 Some statistical concepts How you determine if your data is statistically equivalent? You can apply some complex mathematical analysis or apply common sense. Some rules of thumb: If in a single set of measurements, 20% of your data is very different, you either have a problem in your test system or you are observing different phenomena. If in a single set of measurements, 20% of your data is very different, you either have a problem in your test system or you are observing different phenomena. If you have done several runs, and the 90th percentile of a new test is bigger (smaller) than the maximum (minimum) of the previous tests, then the new data is not statistically similar, and has no statistically significance for your results. If you have done several runs, and the 90th percentile of a new test is bigger (smaller) than the maximum (minimum) of the previous tests, then the new data is not statistically similar, and has no statistically significance for your results. If you are expecting a specific distribution, and you are not getting it, the current set can not be compared (is not statistically equivalent) to the data you were expecting. If you are expecting a specific distribution, and you are not getting it, the current set can not be compared (is not statistically equivalent) to the data you were expecting. Outliers are not statistically equivalent to the rest of the set. Outliers are not statistically equivalent to the rest of the set.

6/1/2009 Some statistical concepts Example of 90 th percentile for Test 3 being bigger than the maximum of the other sets of measurements. In this context Test 3 is not statistically equivalent and will be rejected.

6/1/2009 Some statistical concepts Outliers are usually defined as Measurement outside the overall pattern of a distribution (Moore and McCabe 1999). Measurement outside the overall pattern of a distribution (Moore and McCabe 1999). A more precise definition is a point the is 1.5 more than the interquartile range above the third quartile of below the first quartile A more precise definition is a point the is 1.5 more than the interquartile range above the third quartile of below the first quartile Usually, the presence of an outlier indicates either an error in the measurement or an incomplete model

6/1/2009 Analyzing data While testing a non deterministic system you will always get a distribution of values, all of them valid in principle. While testing a non deterministic system you will always get a distribution of values, all of them valid in principle. For example, if your average in a measure is 3 and you sample again and get 6, this ‘6’ is also correct and you can not discard this number ( unless you do not determine this point is an outlier ). For example, if your average in a measure is 3 and you sample again and get 6, this ‘6’ is also correct and you can not discard this number ( unless you do not determine this point is an outlier ). The good news is you can extract information from this succession of different numbers. The good news is you can extract information from this succession of different numbers.

6/1/2009 Analyzing data For example, we may have the following collection of raw data for a measure that generically we will describe as “query database”, in seconds 4.18; 2.1; 1.9; 2.23; 4.5; 4.2; 2.19; 2.21; 4.24; 2.23; 1.99; 2.01; 2.39; 4.19; 2.42; 2.08; 2.27; 3.98; 2.21; 2.45; 4.32;  average: 2.9 These results seem to be a mix of two series: 2.1; 1.9; 2.23; 2.19; 2.21; 2.23; 1.99; 2.01; 2.39; 2.42; 2.08; 2.27; 2.21; 2.45  average: 2.2 And 4.18; 4.24; 4.19; 3.98; 4.32; 4.5; 4.2  average: 4.2

6/1/2009 Analyzing data What the previous slide is telling us? Averaging all the results tells us nothing. The results point to a hidden effect: the system executes the query in different ways. One possible cause could be that one query joints more tables and thus, it takes more time to return the results So, if you want to answer the question of “What is the time to execute this query” you would need to be more nuanced or would need to know the frequency of these queries, so you would be able to make a weighted average.

6/1/2009 Adjusting your data to a model The most common one is the usual Gaussian or normal distribution, where σ is the standard deviation and μ is the average The importance of this distribution lay in the Central Limit Theorem, that indicates the distribution of random variables tend to be a normal distribution when sampled a large number of times. Example: if we assume that latency experience by users in a wireless network only depend on the distance to the hub, μ can be interpreted as the average distance of the user to the hub and σ will indicate how spread are the users around the hub.

6/1/2009 Adjusting your data to a model Another example of analysis: The Chi distribution Resembles in first approximation to the Gaussian distribution, however, it refers when a phenomena depends of K independent parameters, and each of them individually would provide a Gaussian distribution. Example: the observed latency time in a ADSL city wide network may depend of the network utilization, and the latency induced by the distance to the nearest hub. If we want to improve the performance of the system, then we need to tackle both problems.

6/1/2009 Adjusting your data to a model This would be an example of two uniform distributions

6/1/2009 Adjusting your data to a model If your model can not explain well the results, you need to change or improve the model If your model can not explain well the results, you need to change or improve the model A useful model should have predictive capabilities, so you can design new tests to prove/disprove the model. A useful model should have predictive capabilities, so you can design new tests to prove/disprove the model. Negative results (model disproved) can be as useful as a positive results Negative results (model disproved) can be as useful as a positive results The analysis of the performance data can help to prevent future bottlenecks and problems The analysis of the performance data can help to prevent future bottlenecks and problems The analyzed results will have a range of validity. Do not force too many consequences from them The analyzed results will have a range of validity. Do not force too many consequences from them

6/1/2009 Summary Performance testers provide information beyond requirement compliance Performance testers provide information beyond requirement compliance Performance testing should be treated like a experimental activity Performance testing should be treated like a experimental activity As experimental activity, scientific method is the most appropriate method of enquiry. As experimental activity, scientific method is the most appropriate method of enquiry. In tune with the scientific method, you need to make assumptions, design your experiment accordingly and reduce the error bars In tune with the scientific method, you need to make assumptions, design your experiment accordingly and reduce the error bars Data should be subject to an statistical analysis Data should be subject to an statistical analysis After the analysis, you should try explain your data with a model After the analysis, you should try explain your data with a model If the models does not a good job explaining your data, you should change/refine the model If the models does not a good job explaining your data, you should change/refine the model Your analysis should help to make the software better. Your analysis should help to make the software better.

6/1/2009 Analyzing Performance test data Questions?