Presentation is loading. Please wait.

Presentation is loading. Please wait.

Descriptive Statistics

Similar presentations


Presentation on theme: "Descriptive Statistics"— Presentation transcript:

1 Descriptive Statistics
4 Descriptive Statistics Chapter Numerical Description Central Tendency Dispersion Standardized Data Percentiles, Quartiles, and Box Plots Correlation McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. 1

2 Numerical Description
Three key characteristics of numerical data: Characteristic Interpretation Central Tendency Where are the data values concentrated? What seem to be typical or middle data values? Dispersion How much variation is there in the data? How spread out are the data values? Are there unusual values? Shape Are the data values distributed symmetrically? Skewed? Sharply peaked? Flat? Bimodal? 2

3 Central Tendency Six Measures of Central Tendency Statistic Formula
Excel Formula Pro Con Mean =AVERAGE(Data) Familiar and uses all the sample information. Influenced by extreme values. Median Middle value in sorted array =MEDIAN(Data) Robust when extreme data values exist. Ignores extremes and can be affected by gaps in data values. 3

4 Central Tendency Six Measures of Central Tendency Statistic Formula
Excel Formula Pro Con Mode Most frequently occurring data value =MODE(Data) Useful for attribute data or discrete data with a small range. May not be unique, and is not helpful for continuous data. Midrange =0.5*(MIN(Data) +MAX(Data)) Easy to understand and calculate. Influenced by extreme values and ignores most data values. 4

5 =TRIMMEAN(Data, Percent)
Central Tendency Six Measures of Central Tendency Statistic Formula Excel Formula Pro Con Geometric mean (G) =GEOMEAN(Data) Useful for growth rates and mitigates high extremes. Less familiar and requires positive data. Trimmed mean Same as the mean except omit highest and lowest k% of data values (e.g., 5%) =TRIMMEAN(Data, Percent) Mitigates effects of extreme values. Excludes some data values that could be relevant. 5

6 Central Tendency Skewness
Compare mean and median or look at histogram to determine degree of skew ness. 4-6 6

7 =MAX(Data)-MIN(Data)
Dispersion Variation is the “spread” of data points about the center of the distribution in a sample. Consider the following measures of dispersion: Measures of Variation Statistic Formula Excel Pro Con Range xmax – xmin =MAX(Data)-MIN(Data) Easy to calculate Sensitive to extreme data values. Variance (s2) =VAR(Data) Plays a key role in mathematical statistics. Non-intuitive meaning. 4-7 7

8 Dispersion Measures of Variation Statistic Formula Excel Pro Con
Standard deviation (s) =STDEV(Data) Most common measure. Uses same units as the raw data ($ , £, ¥, etc.). Non-intuitive meaning. Coef-ficient. of variation (CV) None Measures relative variation in percent so can compare data sets. Requires non-negative data. 8

9 Dispersion Measures of Variation Standardized Data Chebyshev’s Theorem
Statistic Formula Excel Pro Con Mean absolute deviation (MAD) =AVEDEV(Data) Easy to understand. Lacks “nice” theoretical properties. Standardized Data Chebyshev’s Theorem 4-9 9

10 Standardized Data The Empirical Rule
Are there any unusual values or outliers? Unusual Outliers 65.0 -19.5 50.9 -5.4 36.8 8.6 22.72

11 Standardized Data Defining a Standardized Variable
A standardized variable (Z) redefines each observation in terms the number of standard deviations from the mean. Standardization formula for a population: Standardization formula for a sample: 11

12 Percentiles and Quartiles
Percentiles are data that have been divided into 100 groups. For example, you score in the 83rd percentile on a standardized test. That means that 83% of the test-takers scored below you. Deciles are data that have been divided into 10 groups. Quintiles are data that have been divided into 5 groups. Quartiles are data that have been divided into 4 groups. 12

13 Box Plots A useful tool of exploratory data analysis (EDA).
Also called a box-and-whisker plot. Based on a five-number summary: Xmin, Q1, Q2, Q3, Xmax Consider the following five-number summary : Xmin, Q1, Q2, Q3, Xmax

14 Box Plots The box plot is displayed visually, like this.
A box plot shows central tendancy, dispersion, and shape. 14

15 Correlation Correlation Coefficient
The sample correlation coefficient is a statistic that describes the degree of linearity between paired observations on two quantitative variables X and Y. Its range is -1 ≤ r ≤ +1. 15


Download ppt "Descriptive Statistics"

Similar presentations


Ads by Google