Presentation is loading. Please wait.

Presentation is loading. Please wait.

Describing Data Descriptive Statistics: Central Tendency and Variation.

Similar presentations


Presentation on theme: "Describing Data Descriptive Statistics: Central Tendency and Variation."— Presentation transcript:

1 Describing Data Descriptive Statistics: Central Tendency and Variation

2 Lecture Objectives You should be able to: 1.Compute and interpret appropriate measures of centrality and variation. 2.Recognize distributions of data. 3.Apply properties of normally distributed data based on the mean and variance. 4.Compute and interpret covariance and correlation.

3 Summary Measures 1. Measures of Central Location Mean, Median, Mode 2. Measures of Variation Range, Percentile, Variance, Standard Deviation 3. Measures of Association Covariance, Correlation

4 It is the Arithmetic Average of data values: The Most Common Measure of Central Tendency Affected by Extreme Values (Outliers) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14 Mean = 5Mean = 6 Sample Mean Measures of Central Location: The Arithmetic Mean

5 0 1 2 3 4 5 6 7 8 9 100 1 2 3 4 5 6 7 8 9 10 12 14 Median = 5 Important Measure of Central Tendency In an ordered array, the median is the “middle” number. If n is odd, the median is the middle number. If n is even, the median is the average of the 2 middle numbers. Not Affected by Extreme Values Median

6 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 A Measure of Central Tendency Value that Occurs Most Often Not Affected by Extreme Values There May Not be a Mode There May be Several Modes Used for Either Numerical or Categorical Data 0 1 2 3 4 5 6 No Mode Mode

7 Measures of Variability Range The simplest measure Percentile Used with Median Variance/Standard Deviation Used with the Mean

8 Range Range = 12 - 7 = 5 7 8 9 10 11 12 Range = 12 - 7 = 5 Difference Between Largest & Smallest Observations: Range = Ignores How Data Are Distributed:

9 Percentile Obs Medals Obs Medals Obs Medals Obs Medals Obs Medals 111012242310346453 21001319249356463 3721418258366472 4 1518268375482 54616 277385492 6411715287395502 7401814297404512 8311913306414521 9282011316424531 10272110326434541 11252210336443551 2008 Olympic Medal Tally for top 55 nations. What is the percentile score for a country with 9 medals? What is the 50 th percentile?

10 Percentile - solutions Order all data (ascending or descending). 1. Country with 9 medals ranks 24 th out of 55. There are 31 nations (56.36%) below it and 23 nations (41.82%) above it. Hence it can be considered a 57 th or 58 th percentile score. 2. The medal tally that corresponds to a 50 th percentile is the one in the middle of the group, or the 28 th country, with 7 medals. Hence the 50 th percentile (Median) is 7. Now compute the first and third quartile values.

11 Box Plot The box plot shows 5 points, as follows: Median Q1Q3 LargestSmallest

12 Outliers Interquartile Range (IQR) = [Q3 – Q1] = 60-40 = 20 1 Step = [1.5 * IQR] = 1.5*20 = 30 Q1 – 30 = 40 - 30 = 10 Q3 + 30 = 60 + 30 = 90 Any point outside the limits (10, 90) is considered an outlier. 20 40 6080 50 105 Outlier

13 Variance For the Population: For the Sample: Variance is in squared units, and can be difficult to interpret. For instance, if data are in dollars, variance is in “squared dollars”.

14 Standard Deviation For the Population: For the Sample: Standard deviation is the square root of the variance.

15 Computing Standard Deviation Computing Sample Variance and Standard Deviation Mean of X = 6 Deviation XFrom MeanSquared 3-39 4-24 600 824 939 26Sum of Squares 6.50Variance = SS/n-1 2.55Stdev = Sqrt(Variance)

16 The Normal Distribution A property of normally distributed data is as follows: Distance from Mean Percent of observations included in that range ± 1 standard deviation Approximately 68% ± 2 standard deviations Approximately 95% ± 3 standard deviations Approximately 99.74%

17 Comparing Standard Deviations 11 12 13 14 15 16 17 18 19 20 21 Data A 11 12 13 14 15 16 17 18 19 20 21 Data B Data C 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 3.338 Mean = 15.5 s =.9258 Mean = 15.5 s = 4.57

18 Outliers Typically, a number beyond a certain number of standard deviations is considered an outlier. In many cases, a number beyond 3 standard deviations (about 0.25% chance of occurring) is considered an outlier. If identifying an outlier is more critical, one can make the rule more stringent, and consider 2 standard deviations as the limit.

19 Coefficient of Variation Standard deviation relative to the mean. Helps compare deviations for samples with different means

20 Computing CV Stock A: Average Price last year = $50 Standard Deviation = $5 Stock B: Average Price last year = $100 Standard Deviation = $5 Coefficient of Variation: Stock A: CV = 10% Stock B: CV = 5%

21 Standardizing Data ObsAgeIncomeZ-AgeZ-Income 12525000-1.05-1.13 22852000-0.86-0.63 33563000-0.41-0.43 43674000-0.34-0.22 53969000-0.15-0.31 645800000.23-0.11 7481250000.420.72 8752000002.152.11 Mean41.3886000.00 Std Dev15.6353973.54 Which of the two numbers for person 8 is farther from the mean? The age of 75 or the income of 200,000? Z scores tell us the distance from the mean, measured in standard deviations

22 Measures of Association Covariance and Correlation Mean 2 9 Stdev1 3.6 XDevProductDevY 13-36 2008 314413 7 Covariance3.5 Correlation0.97 Covariance measures the average product of the deviations of two variables from their means. Correlation is the standardized form of covariance (divided by the product of their standard deviations). Correlation is always between -1 and +1.


Download ppt "Describing Data Descriptive Statistics: Central Tendency and Variation."

Similar presentations


Ads by Google