Download presentation
Presentation is loading. Please wait.
Published byAnn Miller Modified over 9 years ago
1
Univariate Descriptive Statistics Dr. Shane Nordyke University of South Dakota This material is distributed under an Attribution-NonCommercial-ShareAlike 3.0 Unported Creative Commons License, the full details of which may be found online here: http://creativecommons.org/licenses/by-nc-sa/3.0/. You may re-use, edit, or redistribute the content provided that the original source is cited, it is for non- commercial purposes, and provided it is distributed under a similar license.http://creativecommons.org/licenses/by-nc-sa/3.0/ CC BY-NC-SA Nordyke 2010
2
Why do we need descriptive statistics We use the label univariate descriptive statistics to refer to a variety of measures of center and variation that are useful for understanding the nature and distribution of a single variable. They can allow us to quickly understand a large amount of information about a single variable. They make data meaningful! CC BY-NC-SA Nordyke 2010
3
Making Data Meaningful Age of Volunteer 15 19 22 17 39 17 26 CC BY-NC-SA Nordyke 2010 A relatively small sample of the ages of volunteers at a local non- profit agency in the community. What does this list tell us about the age of volunteers in the agency?
4
Making Data Meaningful Age of Volunteer 15 17 19 22 26 39 CC BY-NC-SA Nordyke 2010 Sorting the list can provide a starting place. What do we know now?
5
Making Data Meaningful CC BY-NC-SA Nordyke 2010 What if the sample is larger? 39252240371530162528 16315046301525201722 43274243171633263130 38434022191524192640 39273528262841434722 36412538253638 1845 163040211648 463031 162649244439152124 414249442418282238 224744203124 273433 17493344274349162325 35342026294417424329 32331824455021394021 2831191626 16452221 47153949332940201837 49161923343718151941
6
The Menu of Basic Descriptive Statistics Measures of central tendency – Mean, Median, Mode, Midrange Measures of distribution – Range, Min, Max, Percentiles Measures of Variation – Standard Deviation, Variance, Coefficient of Variation CC BY-NC-SA Nordyke 2010
7
Some initial notation CC BY-NC-SA Nordyke 2010 indicates the addition of a set of values y is the variable used to represent the individual data values n represents the number of values in a sample N represents the number of values in a population
8
Measures of Central Tendency - Mean The sample mean is the mathematical average of the data and is the measure of central tendency we use most often. CC BY-NC-SA Nordyke 2010
9
Measures of Central Tendency - Mean CC BY-NC-SA Nordyke 2010 Observation # Age of Volunteer 115 217 3 419 522 626 739 155 The sum of all of the observations n = the number of observations
10
Measures of Central Tendency - Median The sample median is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude. If there isn’t one value in the middle we take the average of the two middle values. The median is not affected by extreme values. CC BY-NC-SA Nordyke 2010
11
Measures of Central Tendency - Median CC BY-NC-SA Nordyke 2010 Median: Median is often denoted by ỹ which is pronounced “y-tilde”
12
Measures of Central Tendency - Median CC BY-NC-SA Nordyke 2010 1517 19222639 Sample ages are arranged in ascending order The middle value is the median. ỹ = 19
13
Measures of Central Tendency - Median CC BY-NC-SA Nordyke 2010 1517 1922263439 If there are two values in the middle, we take the average of the two.
14
Measures of Central Tendency - Median CC BY-NC-SA Nordyke 2010 1517 1922263499 Note that the presence of an extreme value, doesn’t change the median.
15
Measures of Central Tendency - Mode The mode is the value that occurs most frequently. – Not every sample has a distinct mode. Sometimes it is bimodal (two modes) or multimodal (three or more modes) or sometimes there is no mode at all. – The mode is the only measure of central tendency we can use for nominal data. CC BY-NC-SA Nordyke 2010
16
Measures of Central Tendency - Mode CC BY-NC-SA Nordyke 2010 1517 19222639 17 is the only value that occurs more than once, so it is the value that occurs most frequently and the mode. Mode is often denoted with the symbol M M = 17
17
Measures of Central Tendency - Mode Blue Green Purple Red Yellow CC BY-NC-SA Nordyke 2010 M = Red 20 29 33 34 41 42 43 45 Multi modal 1.1 2.3 4.1 5.3 4.3 6.7 8.2 8.3 8.7 8.9 10.3 No Mode
18
Measures of Central Tendency - Midrange The midrange, or middle of the range is the average of the highest and lowest values. There is no distinct symbol for the Midrange. CC BY-NC-SA Nordyke 2010
19
Measures of Central Tendency - Midrange CC BY-NC-SA Nordyke 2010 1517 19222639
20
Comparing Measures of Central Tendency CC BY-NC-SA Nordyke 2010 1517 19222639
21
Comparing Measures of Center Measure of Center (Listed from most used to least used) Does it always exist? Does it take into account every value? Is it affected by extreme values? MeanAlwaysYes MedianAlwaysNo ModeMight not exist, may have more than one No MidrangeAlwaysNoYes CC BY-NC-SA Nordyke 2010
22
The Range The range of a sample is the difference between the highest value and the lowest value. CC BY-NC-SA Nordyke 2010 1517 19222639 In our example the Range = 39 – 15 or 24; there are 24 years between our youngest and oldest volunteers in the sample.
23
Measures of Variance Where measures of central tendency try to give us an idea of where the middle of the data lies, measures of variance (or variation) tell us about how the data is distributed around that center. Our three primary measures of variance are: – Standard Deviation, – Variance and – Coefficient of Variation CC BY-NC-SA Nordyke 2010
24
Measures of Variance – Standard Deviation CC BY-NC-SA Nordyke 2010 The Standard Deviation is a measure of the variation of values around the mean.
25
Some Key Points for Understanding Standard Deviation The standard deviation is always positive. The standard deviation of a sample will always be in the same units as the observations in the sample. Extreme values or outliers can change the value of the standard deviation substantially. The size of the sample will affect the size of the standard deviation; as the sample size increases, the size of the standard deviation decreases. CC BY-NC-SA Nordyke 2010
26
Measures of Variance - Variance CC BY-NC-SA Nordyke 2010
27
Standard Deviation and Variance Notation SamplePopulation s = standard deviation = standard deviation s 2 = variance 2 = variance CC BY-NC-SA Nordyke 2010
28
Seeing Standard Deviations Once I figure out how to draw the curves, this well be a slide that shows the difference between a distribution with a small standard deviation (tall and narrow) and a large one (broad and flat). CC BY-NC-SA Nordyke 2010
29
Back to our example In our sample of volunteer ages, the mean was 22.14 years. We can calculate the standard deviation to better understand how the values or distributed around that mean. CC BY-NC-SA Nordyke 2010 1517 19222639
30
Back to our example CC BY-NC-SA Nordyke 2010 y 1522.14-7.1450.9796 1722.14-5.1426.4196 1722.14-5.1426.4196 1922.14-3.149.8596 2222.14-0.140.0196 2622.143.8614.8996 3922.1416.86284.2596 412.8572
31
Back to our example CC BY-NC-SA Nordyke 2010
32
Copyright © 2004 Pearson Education, Inc. How are standard deviations helpful? The Empirical Rule When data sets have distributions that are approximately bell shaped, the following is true: About 68% of all values fall within 1 standard deviation of the mean About 95% of all values fall within 2 standard deviations of the mean About 99.7% of all values fall within 3 standard deviations of the mean
33
The Empirical Rule CC BY-NC-SA Nordyke 2010 34% 68% of values fall within 1 standard deviation of the mean
34
The Empirical Rule CC BY-NC-SA Nordyke 2010 34% 68% of values fall within 1 standard deviation of the mean 95% of values fall within 2 standard deviations of the mean 13.5%
35
The Empirical Rule CC BY-NC-SA Nordyke 2010 34% 68% of values fall within 1 standard deviation of the mean 95% of values fall within 2 standard deviations of the mean 99.7% of values fall within 3 standard deviations of the mean 13.5% 2.4%
36
Measures of Center – Coefficient of Variation The Coefficient of Variation (CV) is a measure of the standard deviation of a sample relative to its mean. CV’s can be useful when you are comparing the standard deviations of variables that are in two different units. CC BY-NC-SA Nordyke 2010
37
Measures of Center – Coefficient of Variation An example: You are comparing the heights and weights of fourth graders. CC BY-NC-SA Nordyke 2010 Which variable has greater variance? How can we compare 4” to 10 lbs?
38
Measures of Center – Coefficient of Variation CC BY-NC-SA Nordyke 2010 The standard deviation of height is 8% of the mean of height, where as the standard deviation of weight is 12.5% of the mean of weight, so there is greater variation in the weight of the fourth graders than in the height.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.