Making Sense of Statistics: A Conceptual Overview Sixth Edition PowerPoints by Pamela Pitman Brown, PhD, CPG Fred Pyrczak Pyrczak Publishing
Part B: Descriptive Statistics
FREQUENCIES, PERCENTAGES, AND PROPORTIONS Section 6
Frequency Number of participants or cases Symbol for frequency is f (lower- case f, italicized) N meaning number of participants is also used to stand for frequency.
Frequency If a report says f = 23 for a score of 99 then you know that 23 participants had a score of 99 99
Percentage Indicates number per 100 who have a certain characteristic. So when we say “percent” we are really saying “per 100.” Symbol %
Percentage 44 % of the registered voters in a town are registered Democrats. You know that for each 100 registered voters, 44 are Democrats.
Calculating Frequencies Example: 44% of registered voters in town are Democrats. Convert 44% to decimal form (move decimal 2 places to left). 44% =.44 If the town has 2,200 registered voters, to determine the f (# of Democrats), do this:.44 X 2,200 = 968 Democrats
The previous statement tells you that for every 100 registered voters there are 44 Democrats. Example: Calculating Frequencies
Calculating Percentage Example: There are 84 gifted children in a school. Of those 84 gifted children, 22 are afraid of the dark.
What percentage of the gifted children in the school are afraid of the dark? 22 ÷ 84 = X 100 = 26.19% # of children afraid of dark Based on the information given by the sample, if you asked 100 children from the same population (gifted) you would expect 26 of them to be afraid of the dark! Calculating Percentage Total # of gifted children
Proportion 22 ÷ 84 = is the proportion of gifted children afraid of the dark. REMEMBER: Proportion is part of 1 (one) The above statement tells us that twenty-six hundredths of the children are afraid of the dark. PROPORTIONS ARE KIND OF HARD TO INTERPRET!
Reporting f Along With % A news article says that 8% of foreign language students at Whatsamatta U are Russian majors. BUT what if you knew that f = 12 (8%) 12 students study Russian out of 150 total foreign language students at Whatsamatta U
Whatsammata U has 150 foreign language students (FLS) Big State U has 350 foreign language students (FLS) Whatsammata UBig State U Total # of FLS N = 150 N = 350 Russian majors N = 12 (8%) N = 14 (4%) *So who has the most FLS? Looking at f (indicated by N ), who has the most Russian majors? Looking at %, who has the most Russian majors? Reporting f Along With %
Section 6 Questions 1.What does frequency mean? 2.What is the symbol for frequency? 3.For what does N stand? 4.If 21% of kindergarten children are afraid of monsters, how many out of each 100 are afraid? 5.Suppose you read that 20% of a population of 1,000 was opposed to a city council resolution. How many are opposed? 6.What statistic is part of 1? 7.According to this section, are “percentages” or “proportions” easier to interpret? 8.Why is it a good idea to report the underlying frequencies when reporting percentages?
SHAPES OF DISTRIBUTIONS Section 7
Frequency Distribution A frequency distribution is a table that shows how many participants have each score. The frequency (f) associated with each score (X) is shown. Distribution of Depression Scores Xf N=24
Frequency Polygon 3 participants had score of 21 8 participants had score of 19
Normal Distribution Also called a bell-shaped curve
Positive Skew Many people have low income, so curve is high on the left Only a few people have higher income, so curve is lower on the right
Negative Skew Only a few people have low scores More people have higher scores, so curve is to the right.
Bimodal Distribution Bimodal distributions are rare in research.
Section 7 Questions 1.What is the name of a table that shows how many participants have each score? 2.What does a frequency polygon show? 3.What is the most important type of curve? 4.Which type of distribution is often found in nature? 5.In a distribution with a negative skew, is the long tail pointing to the “left” or to the “right”? 6.When plotted, income in large populations usually has what type of skew? 7.What is the name of the type of distribution that has two high points? 8.Which type of distribution is found much less frequently in research than the others?
THE MEAN: AN AVERAGE Section 8
MEAN Most frequently used average So widely used that people refer to the mean as the average called X-bar (usually in mathematical stats) M is used for the mean of a population, and m is used for the mean of a sample drawn from the population. The mean is the balance point in a distribution of scores. Specifically, it is the point around which all the deviations sum to zero. **The mean is sensitive to extreme scores! Symbols for mean: M m used in academic journals
Computing the Mean Computing the mean is easy! You probably learned it in 4th – 6th grade, BUT it has been a while. So how is it done? Mean = Average SUM (add all scores) and divide by the number of scores Let’s look at an example: Scores: 5, 6, 7, 10, 12, 15 Sum of scores: 55 Number of scores: 6 Computation of mean: 55/6 = 9.166= 9.17 *notice that answer was rounded to 2 decimal places
Deviation From the Mean This is easy, too! Subtract the mean from each score to find the deviation from the mean. The mean is the balance point in a distribution of scores. Specifically, it is the point around which all the deviations sum to zero. Example: Scores: 7, 11, 11, 14, 17 Sum of scores is 60 Number of scores: 5 Computation of the mean: 60/5 = 12
Compute Deviations From the Mean Scores and Their Deviations From Their Mean__ ScoreMeanDeviation ____ Sum of deviations = 0
Substitution of Another Number for Mean ScoreMeanDeviation ____ Sum of deviations = 10
Major Drawback of the Mean Pulled in the direction of extreme scores. Extremely high scores will pull the mean higher. Extremely low scores will pull the mean lower.
Example: Group A: 1, 1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 7, 8, 10, 10, 10, 11 Mean for Group A = 5.52 Group B: 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 9, 10, 10, 150, 200 Mean for Group B = 21.24
Extreme Scores A distribution that has some extreme scores at one end but not the other is called a skewed distribution.
Limitations of the Mean Mean is almost always inappropriate for describing the average of a highly skewed distribution. The mean is only appropriate for interval or ratio scales of measurement.
Synonym of Average Measure of central tendency
Section 8 Questions 1.How is the mean computed? 2.What are the most commonly used symbols for the mean in academic journals? 3.For a given distribution, if you subtract the mean from each score to get deviations and then sum the deviations, what will the sum of the deviations equal? 4.Refer to the example in this section of contributions given to charity. Explain why the mean for Group B is much higher than the mean for Group A. 5.If most participants have similar scores but there are a few very high scores, what effect will the very high scores have on the mean? 6.Is the mean usually appropriate for describing the average of a highly skewed distribution? 7.For which scales of measurement is the mean appropriate? 8.The term measure of central tendency is synonymous with what other term?
MEAN, MEDIAN, AND MODE Section 9
Mean Most frequently used average The mean is the balance point in a distribution of scores. Specifically, it is the point around which all the deviations sum to zero. **The mean is sensitive to extreme scores!!
Median Alternative average 50% of scores are above the median, and 50% of scores are below the median Middle point of the distribution **Unlike the mean, the median is insensitive to extreme scores! Median is an appropriate average for describing the typical participants in a highly skewed distribution.
Determine the Median Scores (in order from low to high) There are 11 scores. Which one is the middle score?
Determine the Median Scores (in order from low to high) There are 6 scores. Which one is the middle score? Take the sum of the 2 scores ( = 17) and Divide by 2 (17 ÷ 2 = 8.5)
Determine the Median Scores (in order from low to high) There are 6 scores, which one is the middle score? Take the sum of the 2 scores ( = 17) and Divide by 2 (17 ÷ 2 = 8.5) * Median is insensitive to extreme scores!
Mode Most frequently occurring score This is easy, too!
What is the mode for the following scores? Scores (arranged from low to high) Seven occurs the most, so 7 is the mode for the scores.
What is the mode for the following scores? Scores (arranged from low to high) There are 2 modes! This is one disadvantage of using mode; there may be more than one mode in the distribution.
How do I choose which average to use? Mean Usually the most appropriate NOT appropriate for nominal or ordinal data ONLY can be used with interval or ratio levels of measurement Almost always inappropriate for use with highly skewed distributions
How do I choose which average to use? Median Chose median when mean is not appropriate NOT appropriate for nominal data
How do I choose which average to use? Mode APPROPRIATE for nominal data
Positions of Averages in Skewed Distributions
Section 9 Questions 1.Which average always has 50% of the cases below it? 2.Which average is defined as the most frequently occurring score? 3.Which average is defined as the middle point in a distribution? 4. If you read that the median equals 42 on a test, what percentage of the participants have scores higher than 42? 5. What is the mode of the following scores? 11, 13, 16, 16, 18, 21, Is the mean appropriate for describing highly skewed distributions? 7. This is a guideline from this section: "Choose the median when the mean is inappropriate." What is the exception to this guideline? 8. For describing nominal data, what is an alternative to reporting the mode? 9. In a distribution with a negative skew, does the "mean" or the "median" have a higher value? 10. In a distribution with a positive skew, does the "mean" or the "median" have a higher value?
Section 10 RANGE AND INTERQUARTILE RANGE
Variability differences among the scores of participants Synonyms are spread & dispersion
NO Variability
Measures of Variability Range Interquartile range
Range Difference between the highest score and the lowest score To calculate the range, we can subtract the lowest score from the highest score. 20 – 2 = 18 So we say “the range is 18”
Range Difference between the highest score and the lowest score OR we can simply state that “the scores range from 2 to 20!”
What is the weakness of using the range? EXTREME SCORES We also call these extreme scores OUTLIERS
Interquartile Range (IQR) The range of the middle 50% of the participants Lowest 25% Highest 25% MIDDLE 50% The range for the middle 50% is only 3 points 5.5 – 2.5 =
Section 10 Questions 1.What is the name of the group of statistics designed to concisely describe the amount of variability in a set of scores? 2.What are the two synonyms for variability? 3.If all participants have the same score on a test, what should be said about the variability in the set of scores? 4.If the differences among a set of scores are great, do we say that there is "much variability" or "little variability"? 5. What is the definition of the range? 6. What is a weakness of the range? 7. What is the outlier in the following set of scores? 2, 31, 33, 35, 36, 38, What is the outlier in the following set of scores? 50, 50, 52, 53, 56, 57, As a general rule, is the range appropriate for describing a distribution of scores with outliers? 10. What is the definition of the interquartile range? 11. Is the interquartile range unduly affected by outliers? 12. When the median is reported as the average, it is also customary to report which measure of variability?
Section 10 STANDARD DEVIATION
Standard Deviation Most frequently used measure of variability. AKA spread and dispersion Symbol: S (upper case S, italicized) (population) Symbol: s (lower case s, italicized) (sample) AKA sd or SD
Standard Deviation Statistic that provides an overall measurement of how much participants’ scores differ from the mean score of their group. Special type of average of the deviation of the scores from their mean.
Standard Deviation The more spread out participants’ scores are around the mean, the larger the standard deviation.
Standard Deviation Example 1: Scores for Group A: 0, 0, 5, 5, 10, 15,15, 20, 20 M= 10.00, S= 7.45 Example 2: Scores for Group B: 8, 8, 9, 9, 10, 11, 11, 12, 12 M= 10.00, S= 1.49 Greater variability, larger S
Standard Deviation Example 3: Scores for Group C: 10, 10, 10, 10, 10, 10, 10, 10, 10 M= 10.00, S= 0.00 NO variability, NO S
Scores for Group A: 0, 0, 5, 5, 10, 15,15, 20, 20 M= 10.00, S= 7.45 Scores for Group B: 8, 8, 9, 9, 10, 11, 11, 12, 12 M= 10.00, S= 1.49 Scores for Group C: 10, 10, 10, 10, 10, 10, 10, 10, 10 M= 10.00, S= 0.00 Here is what we CAN say: Each Group has an M= Group A has more variability than Group B or C. 2. Group B has more variability than Group C. 3. Group C has NO variability.
S & Normal Curve REMEMBER THIS!! About 2/3 of the cases (68%) lie within one standard deviation unit of the mean in a normal distribution. AND DON’T FORGET THIS!! “Within one standard deviation unit” means one unit on both sides of the mean!!
Normal Curve S=10
Normal Curve S=5
Sample Statement Reporting M & S Group A has a higher mean (M = 67.89, S = 8.77) than Group B (M = 60.23, S = 8.54).
Section 11 Questions 1.The term variability refers to what? 2.Is the standard deviation a frequently used measure of variability? 3.The standard deviation provides an overall measurement of how much participants' scores differ from what other statistic? 4.If the differences among a set of scores are small, this indicates which of the following? A. There is much variability B. There is little variability 5.What is the symbol for the standard deviation when a population has been studied? 6. Will the scores for "Group D" or "Group E” below have a larger standard deviation if the two standard deviations are computed? Group D: 23,24,25,27,27,26 Group E: 10,19,20,21,25,30,40 7. If all the participants in a group have the same score, what is the value of the standard deviation of the scores? 8. If you read the following statistics in a research report, which group should you conclude has the greatest variability? Group F: M = 30.23, S = 2.14; Group G: M = 25.99, S = 3.0 Group H: M = 22.43, S = What percentage of the cases in a normal curve lies within one standard deviation unit of the mean (i.e., within one unit above and one unit below the mean)? 10. Suppose M = 30 and S= 3 for a normal distribution of scores. What percentage of the cases lies between scores of 27 and 30? 11. Suppose M = 80 and S= 10 for a normal distribution of scores. About 68% of the cases lie between what two scores?