The Normal Distribution Ch. 6
Average Any continuous quantitative variable that we measure will have a statistical average. We learned about this in Chapter 2. Any continuous quantitative variable that we measure will have a data distribution, with variance ranging above and below the average. Extremes will be outliers, but there is plenty of room for people to float around average.
p. 245 This distribution of all population data around the mean is known as the normal curve Sometimes, as the bell curve due to its shape -Centered at µ (lowercase Greek letter mu, or the population mean) -Range is µ - 3σ to µ + 3σ (lowercase Greek letter sigma, or population standard deviation) mu sigma
Normal Curve IMPORTANT POINT: We aren’t talking about a simple sampling of data. We are talking about the distribution of ALL individuals in a population. For example, not a sample of 100 Gannon students. Instead, we are talking about ALL Gannon students.
Heights of American Men Tall People Short People Below Average SAT score Average height Above Average SAT score SAT scores from 2010 Average SAT score
Normal Curve Remember, the curve is referring to the distribution of actual data Here, heights of female college students (p. 246).
Standardizing the data In order to apply the rules of the normal curve to a data set we need to translate a variable into a language that speaks in terms of µ and σ This is known as standardization. No mater what scale you measured your variable on, you can translate it into terms of the mean and standard deviation. A z-score will tell you how many standard deviations from the mean a given data point is.
Standardization Formula z = x - µ / σ z = standardize variable, also known as the z-score x = raw data point This translates your raw data into a z-score, or “standard deviation-ese” This is also helpful because it provides a CONTEXT to your data that you don’t get from raw data…
When would standardization be helpful? When your data isn’t all on the same scale Imagine that you are collecting data from nurses about their satisfaction with their training during college. At 20 different colleges, nursing alumni are asked, “How prepared were you for your first job?” Half of your participants answer on a scale of 1 to 5, the other half on a scale of 1 to 9. Standardization transforms this data so that it is all on the same scale.
When would standardization be helpful? GREs (Graduate Record Exam, like the SATs for graduate school) Maximum score on Verbal portion = 800 pts You get a 630. Is that a good score? How does that compare to all students taking the GREs? How does it compare to students in your given field? Standardization transforms this data into the context of overall GRE performance (if you have μ and σ for all test-takers) or into the context of other people with the same undergraduate major (if you have μ and σ for all test-takers with a certain major)?
Figure 5.4 Transformation of a Population of Scores Shape of the distribution DOES NOT change…all that changes is that all of the values are now in terms of the standard deviation
5.2 Locations and Distributions Every single data point is transformed (via a formula) into a z-score Exact location in terms of the standard deviation is translated from a z-score: + z-score : Above the Mean - z-score: Below the Mean Numeric value of z-score: number of standard deviations away from the mean.
More on z-scores 1) z-score = 2 2) z-score = -.3 Above or below the mean? How many standard deviations from the mean? 2) z-score = -.3 Above the mean, two standard deviations from the mean. Below the mean, .3 standard deviations from the mean.
Use Excel or a calculator… Applying the z-scores Use Excel or a calculator… A bit trickier: The average IQ is 100. The SD for IQ is 15. If you have an IQ of 88, what is your z-score? If you have an IQ of 140, what is your z-score? You are told that your IQ has a z-score of -.02. What is your IQ? You are told that your IQ has a z-score of 1.15. What is your IQ? Z = x – μ/σ = 88-100/15 = -12/15 = -.8, or .8 of a standard deviation below the mean. Z = x – μ/σ = 140-100/15 = 40/15 = 2.67, or 2.67 standard deviations above the mean. X = μ + zσ = X = 100 + -.02*15 = IQ of 97 X = μ + zσ = X = 100 + 1.15*15 = IQ of 117.25
Tried and true population data!
Applying standardized scores… Let’s say that you take two exams in two different courses. By chance, you get an 88 on both exams. However, are these 88s equivalent? Find out by obtaining z-scores for each. 1) English Exam: μ = 87, σ = 3 2) Math Exam: μ = 85, σ = 6 See what I mean by context? A “B” is nice, but more satisfying when it demonstrates greater mastery . Like you guys don’t check the average grades for HW and exams… Z = x – μ/σ = 88-87/3 = .33, or .33 of a standard deviation below the mean. Z = x – μ/σ = 88-85/6 = .50, or .5 of a standard deviations above the mean. Would your rather be 1/3 of a standard deviation above average or ½ of a standard deviation above average? I would prefer to be ½ a standard deviation above average! Take that, classmates!
5.3 Standardizing a Distribution Every X value can be transformed to a z-score Characteristics of z-score transformation Same shape as original distribution Mean of z-score distribution is always 0. Standard deviation is always 1.00 A z-score distribution is called a standardized distribution
Back to the normal curve Remember, this is a visualization of the statistical distribution of a variable. All of the area underneath equals 100% of the data in the distribution. Since we know the shape is defined by σ and µ, if we know the values for a given distribution, we can determine area (or percentage of data) associated with a given z-score. Using the table on p. A-6 (for negative z-scores) and A-7 (for positive z-scores) in the Appendixes in the text What % of your data falls between any two given z-scores
An example (p. 253)
z = 1.23
That means that the portion of your data that falls below a z-score of 1.23 is 89.07% of the data. If you got a score back on a particular test, and the z-score for your score was 1.23, you did better than 89.07% of your peers.
MORE EXAMPLES…sketch them out! 1) z = 1.35, what is the portion of the data below this z-score? 2) z = -.07, what is the portion of the data above this z-score? 3) z = -2.6, what is the portion of the data below this z-score? 1) On the table on page A-7, first look up 1.3 in the column on the left, then look up .05 in the row on the top of the table. The point at which they intercept is .9115. This means that 91.15% of a population falls below a z-score of 1.35 2) On the table on page A-6 (since this is a negative z-score), find .00 in the row on the far left and .07 in the column along the top of the table. The point at which they intercept is .4721. This means that 47.21% of the data is below the z-score of -.07. However, we are looking for the portion of the data ABOVE -.07. Since the total area under the curve equals 100%, we need to take 100% - 47.21%, which gives us 52.79% (which is the portion of data ABOVE a z-score of -.07). 3) On the table on page A-6, find the -2.6 in the column on the far left. Find the .00 column running along the top row. The point at which they intercept is .0047, or .47%. This is the percent of the data that falls below a z-score of -2.6.
How do we know if a variable is normal? Typically, we assume normality if a sample is large enough. Create a normal probability plot Put the z-scores on one axis Put the raw scores on the other Create a scatter plot: Is the relationship fairly linear That is to say, do the dots form a line?