Copyright © 2005 by Lippincott Williams and Wilkins. PowerPoint Presentation to Accompany Statistical Methods for Health Care Research by Barbara Hazard.

Slides:



Advertisements
Similar presentations
Chapter 3 Properties of Random Variables
Advertisements

Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Descriptive Measures MARE 250 Dr. Jason Turner.
Descriptive Statistics
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
Calculating & Reporting Healthcare Statistics
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Descriptive Statistics
Analysis of Research Data
1.2: Describing Distributions
Measures of Dispersion
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Describing Data: Numerical
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
APPENDIX B Data Preparation and Univariate Statistics How are computer used in data collection and analysis? How are collected data prepared for statistical.
Numerical Descriptive Techniques
Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source:
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Skewness & Kurtosis: Reference
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Practice Page 65 –2.1 Positive Skew Note Slides online.
Numerical Measures of Variability
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
Introduction to statistics I Sophia King Rm. P24 HWB
Descriptive Statistics(Summary and Variability measures)
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Central Bank of Egypt Basic statistics. Central Bank of Egypt 2 Index I.Measures of Central Tendency II.Measures of variability of distribution III.Covariance.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 18.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 10 Descriptive Statistics Numbers –One tool for collecting data about communication.
Descriptive Statistics ( )
Exploratory Data Analysis
Statistical analysis.
Business and Economics 6th Edition
MATH-138 Elementary Statistics
Chapter 3 Describing Data Using Numerical Measures
2.5: Numerical Measures of Variability (Spread)
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Statistical analysis.
Practice Page Practice Page Positive Skew.
Descriptive Statistics (Part 2)
CENTRAL MOMENTS, SKEWNESS AND KURTOSIS
Teaching Statistics in Psychology
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Descriptive Statistics
Description of Data (Summary and Variability measures)
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Module 8 Statistical Reasoning in Everyday Life
Basic Statistical Terms
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
BUS173: Applied Statistics
Numerical Descriptive Measures
CH2. Cleaning and Transforming Data
Honors Statistics Review Chapters 4 - 5
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Chapter Nine: Using Statistics to Answer Questions
Advanced Algebra Unit 1 Vocabulary
Measures of Central Tendency Measures of Variability
Business and Economics 7th Edition
Presentation transcript:

Copyright © 2005 by Lippincott Williams and Wilkins. PowerPoint Presentation to Accompany Statistical Methods for Health Care Research by Barbara Hazard Munro. Chapter 2 UNIVARIATE DESCRIPTIVE STATISTICS

Objectives for Chapter 2  Define measures of central tendency & dispersion  Select appropriate measures to use for a particular dataset  Discuss methods to identify & manage outliers  Discuss methods to handle missing data

Basic Characteristics of a Distribution  Central Tendency  Variability  Skewness  Kurtosis

Measures of Central Tendency  When assessing the central tendency of your measurements, you are attempting to identify the “average” measurement Mean: best known & most widely used average, describing the center of a frequency distribution Median: the middle value/point of a set of ordered numbers below which 50% of the distribution falls Mode: the most frequent value or category in a distribution

Comparison of Central Tendency Measures  In a perfect world, the mean, median & mode would be the same.  However, the world is not perfect & very often, the mean, median and mode are not the same

Central Tendency - Graphed MEANMODE MEDIAN

Comparison of Central Tendency Measures  Use Mean when distribution is reasonably symmetrical, with few extreme scores and has one mode.  Use Median with nonsymmetrical distributions because it is not sensitive to skewness.  Use Mode when dealing with frequency distribution for nominal data

Variability  A quantitative measure of the degree to which scores in a distribution are spread out or are clustered together;  Types of variability include: Standard Deviation: a measure of the dispersion of scores around the mean Range: Highest value minus the lowest value Interquartile Range: Range of values extending from 25 th percentile to 75 th percentile

Variability - Graphed RANGE - 1 SD + 1 SD

Standard Deviation  Most widely reported measure of variability  Commonly used to calculate other statistical measures  Indicates dispersion, or spread, of scores in a distribution

Standard Deviation  The smaller the standard deviation, the more tightly clustered the scores  The larger the standard deviation, the more spread out the scores  Report SD when you report mean of a continuous variable’s distribution

Range  Simplest measure of variability  Difference between the maximum value in distribution and the minimum value.  Unstable because it is based only on two values  Sensitive to extreme scores  Usually reported as the minimum and maximum scores, not as the difference between them

Percentiles  Percentile is a score above which & below which a certain percentage of values fall.  Symbolized by letter P  Ex: P 40 = 55 Means that 40% of values in the distribution fall below the score 55

Interpercentile Measures  Interquartile Range (IQR): range of values extending from P 25 to P 75  Like the median, IQR is not sensitive to extreme scores  Most common use of the IQR is for growth charts

Comparison of Measures of Variability Standard Deviation  Most widely used measure of variability  Most reliable estimate of population variability  Best with symmetrical distributions with only one mode

Comparison of Measures of Variability Range  Main use is to call attention to the two extreme values of a distribution  Quick, rough estimate of variability  Greatly influenced by sample size: the larger the sample, the larger the range

Comparison of Measures of Variability Interpercentile Measures  Easy to understand  Can be used with distributions of any shape  Especially useful in very skewed distributions  Use IQR when reporting median of distribution

Shape of the Distribution  The shape of the distribution provides information about the central tendency and variability of measurements.  Three common shapes of distributions are: Normal: bell-shaped curve; symmetrical Skewed: non-normal; non-symmetrical; can be positively or negatively skewed Multimodal: has more than one peak (mode)

Normal Distribution

Positively Skewed Distribution

Negatively Skewed Distribution

Bimodal Distribution

Variable Distribution Symmetry  Normal Distribution is symmetrical & bell-shaped; often called “bell-shaped curve”  When a variable’s distribution is non- symmetrical, it is skewed  This means that the mean is not in the center of the distribution

Skewness  Skewness is the measure of the shape of a nonsymmetrical distribution  Two sets of data can have the same mean & SD but different skewness  Two types of skewness: Positive skewness Negative skewness

Relative Locations for Measures of Central Tendency Negatively Skewed Mode Median Mean Symmetric (Not Skewed) Mean Median Mode Positively Skewed Mode Median Mean

Positively Skewed Distribution

Positive Skewness  Has pileup of cases to the left & the right tail of distribution is too long

Negatively Skewed Distribution

Negative Skewness  Has pileup of cases to the right & the left tail of distribution is too long

Measures of Symmetry  Pearson’s Skewness Coefficient Formula = (mean-median) SD  Skewness values > 0.2 or < 0. 2 indicate severe skewness

Measures of Symmetry  Fisher’s Skewness Coefficient Formula = Skewness coefficient NB Standard error of skewness  Skewness values >+1.96 SD indicate severe skewness NB: Calculating skewness coefficient & its standard error is an option in most descriptive statistics modules in statistics programs

Data Transformation  With skewed data, the mean is not a good measure of central tendency because it is sensitive to extreme scores  May need to transform skewed data to make distribution appear more normal or symmetrical  Must determine the degree & type of skewness prior to transformation

Data Transformation  If positive skewness, can apply either square root (moderate skew) or log transformations (severe skew) directly  If negative skewness, must “reflect” variable to make the negative skewness a positive skewness, then apply transformations for positive skew

Data Transformation  Reflecting a variable change in the meaning of the scores. Ex. If high scores on a self-esteem total score meant high self-esteem before reflection, they now mean low self-esteem after reflection

Data Transformation  As a rule, it is best to transform skewed variables, but keep in mind that transformed variables may be harder to interpret  Once transformed, always check that transformed variable is normally or nearly normally transformed  If transformation does not work, may need to dichotomize variable for use in subsequent analyses

Kurtosis A measure of whether the curve of a distribution is:  Bell-shaped -- Mesokurtic  Peaked -- Leptokurtic  Flat -- Platykurtic

Fisher’s Measure of Kurtosis  Formula = Kurtosis coefficient NB Standard error of kurtosis  Kurtosis values >+1.96 SD indicate severe kurtosis NB: Calculating kurtosis coefficient & its standard error is an option in most descriptive statistics modules in statistics programs

Types of Charts/Graphs  Line Chart: frequently used to display longitudinal trends  Box Plot: graphic display using descriptive statistics based on percentiles  Simultaneously shows median, IQR, & smallest & largest values for a group  Sometimes called “box-and-whiskers” plot

LINE CHART Medication Error Tracking

BOX PLOT Examples

Outliers  Outlier: value that is extreme relative to bulk of scores in the distribution  May be due to: data recording error failure in data collection actual extreme value from an unusual respondent

Handling Outliers  Try analyzing data with outliers included in distribution & with outliers removed - If results are similar, outliers not a problem  Could use trimmed mean ( removing a certain percentage of respondents from data, then calculate new mean Ex. 5% trimmed mean is calculated on middle 90% of respondents’ scores (top 5% & bottom 5% of scores dropped prior to calculation)

Handling Outliers  Move the outlier scores closer to the bulk of scores in distribution via recoding them  This makes outliers less deviant and they still stay in the same place in the distribution.  Sometimes this method can reduce a serious skewness problem

Missing Data  Especially problematic in longitudinal & repeated measures studies  Data analyst must: Identify pattern & amount of missing data Assess why it is missing Determine what to do about it

Pattern & Amount of Missing Data  Pattern is more important than amount of missing data  Two basic patterns: Random pattern -- values missing in an unplanned or haphazard fashion throughout dataset Systematic pattern -- values missing in a methodical, nonrandom way throughout data

Pattern & Amount of Missing Data  If only a few data values are missing in a random pattern from large dataset, no problem  If many data missing from small or moderate sized sample, serious problems can ensue

Random Missing Data Categories  Missing Completely at Random (MCAR)  Missing at Random (MAR)  Not Missing at Random (NMAR)

Missing Completely at Random (MCAR)  Have highest degree of randomness, showing no underlying reason that would contribute to biased data  MCAR data are randomly distributed across all cases & completely unrelated to other variables in dataset

Missing at Random (MAR)  Display some randomness to pattern of missing data that can be traced or predicted from cases with no missing data  Occurs when probability of a missing value is not dependent on the value itself but may rely on values of other variables in dataset

Not Missing at Random (NMAR)  Occurs when missing values are systematically different from those observed, even from respondents with other similar characteristics  Systematic missing data, even in a few cases, should always be treated seriously because they affect generalizability of results

Testing for Patterns of Missing Data  Create grouping variable with two levels: 1. Cases with missing values on variable 0. Cases with no missing values on variable  Perform test of difference (t-test, Chi square) using this grouping variable on the dependent variable(s)  If serious differences noted, systematic missing (NMAR) data are present and must be handled

Assessing Why Data Are Missing  Missing Data Process (MDP): any systematic event external to respondent (data entry error or data collection problem) or action on respondent’s part (refusal to answer) that leads to missing data  If MDP is under researcher’s control can be explicitly defined, then missing data can be ignored & no specific remedies needed

Assessing Why Data Are Missing  Often, researcher has no idea what data are missing  Thus, need to examine pattern of missing data  Major Question: Are respondents with missing data on some variables different than respondents with no missing data on these variables?

Handling Missing Data  Complete-Case Deletion (Listwise deletion)  Available-Case Deletion (Pairwise deletion)  Deleting Cases or Variables  Weighting Techniques  Estimating Missing Data through Imputation

Listwise Deletion  Analyzes only those cases with complete data  Easiest method for handling missing data  Often default option in most statistics programs  Use if amount of missing data is small, sample is sufficiently large & relationships in data are strong enough to survive deleting cases

Pairwise Deletion  Use only those cases with no missing data on the variables for a specific analysis  Commonly an option in most statistics programs  Often used for correlations, linear regression & factor analysis

Deleting Cases or Variables  Have a preset amount of missing data that can be tolerated (5% - 10%)  Remove all cases or variables that exceed that amount  Good solution if sample size is large enough

Weighting Techniques  Disregard missing values & assign a weight to cases with complete data  Weight cases with no missing data higher than those with missing data  Decreases bias from case deletion methods as well as sample variance  Less common procedure than other missing data handling methods

Missing Data Estimation Via Imputation  Process of estimating missing data based on valid values of other variables or cases in sample  Goal is to use known relationship that can be identified in the valid values of the sample to help estimate the missing data

Missing Data Estimation Methods  Prior Knowledge: replace missing value with value based on educated guess  Mean/Median Replacement: Replace missing value with variable mean or median  Regression: Use other variables in dataset as independent variables to develop regression equation for variable with missing data (dependent variable)

Missing Data Estimation Methods  Expectation Maximization (EM): iterative process that can be used with randomly missing data SPSS Missing Values Analysis performs EM to produce imputed values  Multiple Imputation (MI): Iterative process that produces several datasets (3 - 5) with imputed values for missing data, then averages the resulting estimates & standard errors