Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC CHAPTER 10 Basic Statistical Concepts
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC Preliminary Concepts The following terms are used frequently in this chapter: Population- Collection of data or objects (that describes some phenomenon of interest. Sample: Subset of a population that is accessible for measurement. Variable- Characteristic or entity that can take on different values. Qualitative variable-Categorical variable not placed on a meaningful number scale. Quantitative variable-One that is measurable using a meaningful scale of numbers. Discrete variable- Quantitative variable with gaps or interruptions in the values it may assume. Continuous variable- Quantitative variable that can take on any value, including fractional ones possible and limited by instrumentation or application. 2
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC Levels of Measurement Numbers (0, 1, 2, … ) have the following properties: - Distinguishability: 0, 1, 2, and so on, are different numbers. -Ranking (greater than or less than): 1 is less than 2. -Equal intervals: Between 1 and 2, we assume the same distance as between 3 and 4. Nominal- named categories without any particular order to them Ordinal- consist of discrete categories that have an order to them (no indication of equal interval) Continuous (Interval)- can assume any value, rather than just whole numbers (assume equal, uniform intervals) Continuous (Ratio) - mathematically strongest level is the ratio, where numbers represent equal intervals and start with zero 3
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC Significant Figures Number of digits used to express a measured number is a rough indication of the error Zero as Significant Figures: - Final zeros to the right of the decimal point that are used to indicate accuracy are significant - For numbers less than one, zeros between the decimal point and the fi rst digit are not significant Calculations Using Significant Figures- the least precise measurement used in a calculation determines the number of significant figures in the answer 4
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC Rounding Done so that you do not infer accuracy in the result that was not present in the measurements Universal rounding rules: If the final digits of a number are 0, 1, 2, 3, or 4, the numbers are rounded down (dropped, and the preceding figure is retained unaltered). If the final digits are 5, 6, 7, 8, or 9, the numbers are rounded up (dropped, and the preceding figure is increased by one). 5
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC Descriptive Statistics Methods for organizing data and reducing a large set of numbers to a few informative numbers Data representation- data set should be organized for inspection through use of a frequency distribution Histogram- A bar graph in which the height of the bar indicates the frequency of occurrence of a value or class of values. Frequency polygon- A graph in which a point indicates the frequency of a value, and the points are connected to form a broken line (hence a polygon) Percentage- The numerical frequency on the Y-axis is replaced with the percentage of occurrence in this form of the polygon. 6
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC Descriptive Statistics Percentile- A percentile is the value of a variable in a data set below which a certain percent of observations fall Cumulative percentage curve- This graph plots the cumulative percentage on the Y-axis against the values of the variable on the X-axis. The curve then describes the rate of accumulation for the values of the variable.
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC Measures of the Typical Value of a Set of Numbers Summation Operator- denoted by the Greek capital letter sigma(∑) and simply indicates addition over values of variable Three statistics are used to represent the typical value (also called the central tendency) -Mean-sum of all observations divided by the number of observations -Median- is the 50th percentile of a distribution, or the point that divides the distribution into equal halves -Mode- is the most frequently occurring observation in the distribution 8
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC Measures of Dispersion Dispersion indicate the variability, or how spread out the data are Range- is the distance between the smallest and the largest values of the variable Variance - is a measure of how different the values in a set of numbers are from each other. It is calculated as the average squared deviation of the values from the mean Standard deviation-average deviation from mean Coefficient of Variation- expresses standard deviation as percentage of mean Standard scores (or z score)- deviation from mean expressed in units of standard deviation 9
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC Propagation of Errors in Calculations and Correlation and Regression Propagation of Errors in Calculations- physical quantity of interest is not measured directly but rather a function of one of more measurement made from an experiment Correlation- descriptive measure of relationship or association between two variables Regression- linear relationship between two variables, use the value of one variable to predict the value of the other variable - When we measure X and predict Y, Y is said to be regressed on X 10
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC Inferential Statistics Although a sample from a population is economical, we still wish to use the sample measurements (statistics) to infer to the population measures (parameters). Concept of Probability- probability of an event can be defined as the relative frequency, or proportion, of occurrence of that event out of some total number of events. -Values between 0 and 1 Normal Distribution and Standard scores- a normally distributed variable, the mean is at the center of the distribution, and therefore, the mean is also the median and the mode. -Normal distribution- z score for the mean must always be zero. 11
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC Normal Curve 12
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC Inferential Statistics Sampling Distribution- is the probability distribution of a statistic and most important concept in inferential statistics Confidence Intervals- is the range of values that are believed to contain the true parameter value Error intervals- describe the combined effects of systematic and random errors on individual measurements. -We can also say something about how much confidence should be placed in the estimate 13
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC Inferential Statistics Data Analysis for Device Evaluation Studies Step 1- create a scatter plot of the raw data to get a subjective impression of their validity. Step 2- make sure the data comply with the assumption of normality Step 3- once the data are judged to conform to the underlying assumptions, the mean and standard deviation are used to calculate error intervals Step 4- the data should be presented in graphic form and labeled with the numerical values for the error intervals 14
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC Inferential Statistics Interpreting Manufacturers’ Error Specifications- evaluating a new device, a major concern is with how much error can be expected in normal use. - Knowing that any specification of error is just an estimate, we want to know how much confidence to place in it. - Manufacturers can be rather obscure about their error specifications. Hypothesis testing- technique for quantifying our guess about a hypothesis. We never know the “real” situation. -Does drug X cause Y or not? We can figure the odds and quantify our probability of being right or wrong. -Chance difference
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC Inferential Statistics Type I and II Errors – Type 1-the error of rejecting the null hypothesis when it is true – Type 2-the error of accepting false null hypothesis Power Analysis and Sample Size- probability of correctly rejecting the null hypothesis – The most practical means to control power is to manipulate sample size 16
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC Inferential Statistics Rules of Thumb for Estimating Sample Size -Estimates Based on Mean and Standard Deviation -Estimates Based on Proportionate Change and Coefficient of Variation -Estimates for Confidence Intervals -Sample Size for Binomial Test -Unequal Sample Sizes -Rule of Threes 17
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC Inferential Statistics Clinical Importance Versus Statistical Significance – Size of the test statistic for a given difference is determined by the standard error, which in turn is determined by the sample size – Difference between two mean values (treatment group vs. control group) is significant but so small that it does not have any practical effect, then we must conclude that the results are not clinically important 18
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC Inferential Statistics Matched Versus Unmatched Data – Unmatched data (or unpaired or independent) if values in one group are unrelated in any way to the data values in the other group – Matched data (or paired or dependent) data are selected so that they will be as nearly identical as possible 19