Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics 11 The mean The arithmetic average: The “balance point” of the distribution: X=2 -3 X=6+1 X=10+5 0 An error or deviation is the distance from.

Similar presentations


Presentation on theme: "Statistics 11 The mean The arithmetic average: The “balance point” of the distribution: X=2 -3 X=6+1 X=10+5 0 An error or deviation is the distance from."— Presentation transcript:

1 Statistics 11 The mean The arithmetic average: The “balance point” of the distribution: X=2 -3 X=6+1 X=10+5 0 An error or deviation is the distance from a score to the mean: X-μ

2 Statistics 12 The sum of the errors or deviations around the mean is always 0. Advantage: More informative than median & mode Takes all the observation/scores into account. Takes the distance & direction of deviations/errors into account.

3 Statistics 13 Advantage: More uses than median & mode Necessary for calculating many inferential statistics. Limitation: Not always possible to calculate a mean (scale) Mean can only be calculated for interval/ratio level data Need a different measure for nominal or ordinal level data

4 Statistics 14 Limitation: Not always appropriate to use the mean to describe the middle of a distribution (distribution) Mean is sensitive to extreme values or “outliers” Mean does not always reflect where the scores “pile up” Need a different measure for asymmetrical distributions Use the mean for unimodal, symmetrical distributions of interval/ratio level data.

5 Statistics 15 The median Divides the distribution exactly in half; 50th percentile Odd # of scores & no “pileup” or ties at the middle: median = the middle score

6 Statistics 16 Even # of scores & no “pileup” or ties at the middle: median = the average of the 2 middle scores

7 Statistics 17 Advantage: Insensitive to extreme values Can be used when extreme values distort the mean 3, 5, 5, 6, 7, 8, 9median=6mean= 6.1 3, 5, 5, 6, 7, 8, 50median=6mean=12 Is the most central, representative value in skewed distributions Advantage: Can be calculated when the mean cannot Can be used with ranks (as well as interval/ratio data) Can be used with open-ended distributions Example: # of siblings (5+ siblings?)

8 Statistics 18 Limitation: Not as informative as the mean Takes only the observations/scores around the 50th %ile into account. Provides no information about distances between observations. Limitation: Fewer uses than the mean Median is purely descriptive.

9 Statistics 19 Limitation: Not always possible to calculate a median (scale) Median can only be calculated for ordinal & interval/ratio data Use the median when you cannot calculate a mean or when the distributions of interval/ratio data are skewed by extreme values.

10 Statistics 110 The mode The most frequently occurring score(s) Advantage: Simple to find mode = “coke”

11 Statistics 111 Advantage: Can be used with any scale of measurement Median can only be calculated for ordinal & interval/ratio data Mean can only be calculated for interval/ratio data Advantage: Can be used to indicate >1 most frequent value Use to indicate bimodality, multimodality

12 Statistics 112 M = 5 median = 5 Neither reflects where the scores actually “pile up.” Modes = 2 & 8.

13 Statistics 113 Use to indicate major & minor modes major mode = 6am minor mode = 6pm

14 Statistics 114 Limitation: Not as informative as the mean or median Takes only the most frequently observed X values into account. Provides no information about distances between observations or the # of observations above/below the mode. Limitation: Fewer uses than the mean Mode is purely descriptive. Need to calculate a mean to use with inferential statistics. Use the mode when you cannot compute a mean or median, or with the mean/median to describe a bimodal/multimodal distribution.

15 Statistics 115 Describing distributions: Measures of variability or dispersion To describe/summarize a distribution of scores efficiently, you need: A measure of central tendency + a measure of variability. Which measure of central tendency is most appropriate? Why?

16 Statistics 116 Central tendency & variability measures are “partners.” Mode  range Median  interquartile range, semi-interquartile range Mean  SS, variance (σ²or s² ), standard deviation (σ or s) These measures describe distributions & indicate how well individual scores or samples of scores represent the population.

17 Statistics 117 Variability measures used with the mode & median Range Based on the distance between the highest & lowest observations on the X scale. Only takes the 2 most extreme observations into account.

18 Statistics 118 For interval/ratio data, range = data highest score or X max =11 URL X max =11.5 lowest score or X min =2 LRL X min =1.5 highest score – lowest score +1 = 11 - 2 + 1 = 10 OR URL X max – LRL X min =11.5 - 1.5 = 10

19 Statistics 119 Range can also be used for ordered categories: Range = from “agree” to “disagree strongly,” with modal response = “disagree.” The range is typically used with the mode, when the mean & median are inappropriate or impossible to calculate (but may be reported along with a median or a mean).

20 Statistics 120 Interquartile range (IQR) & semi-interquartile range (SIQR) Based on distances between scores corresponding to percentiles on the X scale. Only take the middle 50% of the distribution into account. Use only with interval/ratio data. Interquartile range = distance between 1st & 3rd quartiles IQR = Q 3 - Q 1 1st quartile [Q 1 ] is the score at the 25th percentile 2nd quartile [Q 2 ] is the score at the 50th percentile—the median 3rd quartile [Q 3 ] is the score at the 75th percentile

21 Statistics 121 IQR provides information about how much distance on the X scale covers or contains the middle 50% of the distribution.

22 Statistics 122 N= Q1= Q3= IQR= SIQR = Semi-Interquartile range = half the interquartile range SIQR = For a symmetrical distribution, SIQR tells you the distance from the median up to Q3 or down to Q1—the distance covering the 25% of the distribution to each side of the median. The interquartile range & semi-interquartile range are typically used with the median (but may be reported along with a mean).

23 Statistics 123 Variability measures used with the mean SS, variance & standard deviation are based on distances between each of the scores & the mean on the X scale. All scores are taken into account, as with the mean. Use only with interval/ratio data. Most useful for symmetrical distributions, when the mean is the best measure of central tendency.

24 Statistics 124 Notation:Population parameter Sample statistic Sum of squared deviations SS Variance σ²σ²s² Standard deviation (s.d.) σs For now, we will be working with the population values.

25 Statistics 125 Suppose you want to summarize how far the scores in a distribution typically deviate from the mean. Averages are a convenient way to summarize information, BUT: REMEMBER: An error or deviation is the distance from a score to the mean: X- μ REMEMBER: The sum of the deviations around the mean is always 0.

26 Statistics 126 You can’t sum the deviations & divide by the number of scores to get a useful average amount of deviation: 0/N will always = 0. What can you do to summarize the deviations? SS = the sum of the squared deviations or errors around the mean (definitional formula—conceptual) Squaring the deviations first allows you to sum them.

27 Statistics 127 Computing, squaring, & summing all N deviations is tedious, so there is a “shortcut.” Plug these values into the following formula: (computational formula—use this one) For the distribution of N=4,  X= 8  X ² = 38

28 Statistics 128 Computing, squaring, & summing all N deviations is tedious, so there is a “shortcut.” Plug these values into the following formula: (computational formula—use this one) For the distribution of N=4,  X= 8  X ² = 38 SS=

29 Statistics 129 So, what does __ tell you about the variability of the distribution of scores? By itself, not much… SS summarizes the amount of deviation & is useful for further analyses. In general, we CAN say that:

30 Statistics 130 As variability increases (more differences between scores, larger deviations) SS gets larger. Extreme scores farther from μ contribute proportionately more to SS because they produce larger deviations. As N increases (more squared deviations to sum) SS gets larger. Because SS increases with N, SS is NOT a good descriptive statistic You can’t compare SS between groups of different sizes.

31 Statistics 131 How can you use SS to create a measure that will allow you to compare different-sized groups?

32 Statistics 132 How can you use SS to create a measure that will allow you to compare different-sized groups? Variance = the average squared deviation or “mean squared deviation” Variance is not affected by N, because it is an average— the mean squared deviation. Variance summarizes the amount of deviation, allows for comparisons between different-sized groups, & is useful for further analyses. Since variance is a mean of squared deviations, it is not on the same scale as our original variable

33 Statistics 133 Because variance does NOT allow you to describe typical variation among scores in terms of the original scale, it is still NOT a good descriptive statistic. The relationship between variance & distances or units on the I/E scale is difficult to visualize or understand. How can you use  ² to create a descriptive measure of variability on the same scale as the original scores? SS=the sum of the squared deviations=  (X-  ) ²  ² =the average squared deviation = SS/N What we’d really like to have is a measure of the typical or average deviation from the mean that is NOT based on squared quantities.

34 Statistics 134 Standard deviation = the typical or expected deviation The typical, average, or “expected” distance that scores deviate from the mean. population s.d. Taking the square root of the variance “returns” the measure of variability to the original units of measurement. This allows you to represent standard deviation as a distance on the X axis. This also allows you to make statements about how extreme or unusual an observation is.

35 Statistics 135 Note: As with the mean, standard deviation is most useful for describing symmetrical distributions. Standard deviation is the best descriptive measure of variability around a mean; SS & variance are important concepts for understanding & for use in further analyses.

36 Statistics 136 Sample variance & standard deviation We often want to make statements about population parameters. How extroverted are male U.S. citizens, on average? Parameter of interest = μ. BUT, much of the time we only have access to sample statistics. How extroverted are males from the PSY 1 subject pool, on average? Our best estimate of μ = M calculated using sample data.

37 Statistics 137 Notation:Population parameter Sample statistic MeanμM # of observationsNn Sum of squared deviations SS Variance σ²σ² s²s² Standard deviation (s.d.) σs We use statistics s & s ² as estimates of σ & σ² when the population parameters are unknown.

38 Statistics 138 Formulae PopulationSampleDifference? M instead of , “N”  “n” AND n-1 instead of N “N”  “n” AND n-1 instead of N

39 Statistics 139 Sample & population SS are the same Calculations do NOT change from population to sample. Population:Sample: “N” has just been relabeled as “n.” If you use the definitional formula, use the correct mean. Population: Sample: This will matter later on…

40 Statistics 140 Comparing sample & population variances Calculations DO change from population to sample. Population: Sample: “N” has been relabeled as “n.” AND Use n-1 instead of N in the denominator. Sample formula will always yield a larger value.

41 Statistics 141 Why (n-1) instead of N? Because (n-1) instead of N corrects for bias in calculating s & s ². Remember: Sample statistics are only useful to the extent that they provide unbiased estimates of population parameters. What is an unbiased statistic?

42 Statistics 142 One that on average = the population parameter. M is an unbiased estimate of  : The average of many sample means = the population mean. (Each box is a sample mean.) What is a biased statistic? One that systematically over or underestimates the parameter.

43 Statistics 143 SS/N tends to underestimate population variance when using sample data. Why doesn’t the SS/N formula work with sample data? Samples usually contain less variability than the populations they come from. Samples tend to contain observations from the center of the population distribution. These samples do not reflect the extremes of the population, so we underestimate the true variability.

44 Statistics 144 Dividing SS by a smaller number corrects for the tendency to underestimate true population variability. The n-1 correction makes s ² & s unbiased estimators of σ ² & σ. n-1 is also referred to as “degrees of freedom.” Sample variances have n-1 degrees of freedom—they are calculated from n-1 independent scores. The last score is determined by the other scores & by M.

45 Statistics 145 For the following set of data, compute the value for SS. –Scores: 5, 2, 2, 7, 9 ANS: SS = Calculate the variance and the standard deviation for the following date (Population & Sample) –Scores: 2, 3, 2, 4, 7, 5, 3, 6, 4

46 Statistics 146 Data: xf xf X²X²f( X ²) 153 145 136 124 112 101 95 84 75 64 56 45 33 25 SS = σ ² = σ = s ² = s =


Download ppt "Statistics 11 The mean The arithmetic average: The “balance point” of the distribution: X=2 -3 X=6+1 X=10+5 0 An error or deviation is the distance from."

Similar presentations


Ads by Google