Chapter 1-6 Review Chapter 1 The mean, variance and minimizing error.

Chapter 1-6 Review

Chapter 1 The mean, variance and minimizing error

To calculate SS, the variance, and the standard deviation: find the deviations from , square and sum them (SS), divide by N (  2 ) and take a square root(  ). Example: Scores on a Psychology quiz Student John Jennifer Arthur Patrick Marie X78357X78357  X = 30 N = 5  = 6.00 X -  +1.00 +2.00 -3.00 +1.00  (X-  ) = 0.00 (X -  ) 2 1.00 4.00 9.00 1.00  (X-  ) 2 = SS = 16.00  2 = SS/N = 3.20  = = 1.79

If you must make a prediction of someone’s score, say everyone will score precisely at the population mean, mu. Without any other information, the mean is the best prediction. The mean is an unbiased predictor or estimate, because the deviations around the mean sum to zero [  (X-  ) = 0.00]. The mean is the smallest average squared distance from the other numbers in the distribution. So it is called a least squares predictor.

Error is the squared amount you are wrong When you predict that everyone will score at the mean, you are wrong. The amount you are wrong is the difference between each score and the mean (X-  ). But in statistics, we square the amount that we are wrong when we measure error.

 2 is precisely how much error we make, on the average, when we predict that everyone will score right at the mean. Another name for the variance (  2 ) is the “mean square for error”.

Why doesn’t everyone score precisely at the mean? Two sources of error –Random individual differences –Random measurement problems Because people will always be different from each other and there are always random measurement problems, there will always be some error inherent in our predictions.

Theoretical histograms

Rolling a die – Rectangular distribution The mean provides no information 120 rolls - how many of each number do you expect? 1 2 3 4 5 6 100 75 50 25 0

Normal Curve

J Curve Occurs when socially normative behaviors are measured. Most people follow the norm, but there are always a few outliers.

Principles of Theoretical Curves Expected freq. = Theoretical relative frequency (N) Expected frequencies are your best estimates because they are closer, on the average, than any other estimate when we square the error. Law of Large Numbers - The more observations that we have, the closer the relative frequencies should come to the theoretical distribution.

The Normal Curve

The Z table and the curve The Z table shows a cumulative relative frequency distribution. That is, the Z table lists the proportion of the area under a normal curve between the mean and points further and further from the mean. Because the two sides of the normal curve areexactly the same, the Z table shows only the cumulative proportion in one half of the curve. The highest proportion possible on the Z table is therefore.5000

KEY CONCEPT The proportion of the curve between any two points on the curve represents the relative frequency of scores between those points.

Normal Curve FrequencyFrequency Measure The mean The standard deviation |------------------------------97.72--------------------------| |--------47.72-----------|---------47.72--------| -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 Z scores |---34.13--|--34.13---| Percentages Percentiles 3 2 1 0 1 2 3 Standard deviations

Z scores A Z score indicates the position of a raw score in terms of standard deviations from the mean on the normal curve. In effect, Z scores convert any measure (inches, miles, milliseconds) to a standard measure of standard deviations. Z scores have a mean of 0 and a standard deviation of 1.

Calculating z scores Z = score - mean standard deviation What is the Z score for someone 6’ tall, if the mean is 5’8” and the standard deviation is 3 inches? Z = 6’ - 5’8” 3” = 72 - 68 3 = 4 3 = 1.33

2100 2080 22802030 2330 Production FrequencyFrequency units 2180 What is the Z score for a daily production of 2100, given a mean of 2180 units and a standard deviation of 50 units? Z score = ( 2100 - 2180) / 50 3 2 1 0 1 2 3 Standard deviations = -80 / 50 = -1.60 22302130

Common Z table scores Z Proportion Score mu to Z 0.00.0000 3.00.4987 2.00.4772 1.00.3413 1.960.4750 2.576.4950X 2 = 99% X 2 = 95% We have already seen these!

CPE - 3.4 - Calculate percentiles Z Area Add to.5000 (if Z > 0) Score mu to Z Sub from.5000 (if Z < 0) Proportion Percentile -2.22.4868.5000 -.4868.0132 1st -0.68.2517.5000 -.2517.2483 25th +2.10.4821.5000 +.4821.9821 98th +0.33.1293.5000 +.1293.6293 63rd +0.00.0000.5000 +.0000.5000 50th

-1.06 Proportion of scores between two points on opposite sides of the mean FrequencyFrequency Percent between two scores. -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 Z scores +0.37 Proportion mu to Z for -1.06 =.3554 Proportion mu to Z for.37 =.1443 Area Area Add/Sub Total Per Z 1 Z 2 mu to Z 1 mu to Z 2 Z 1 to Z 2 Area Cent -1.06 +0.37.3554.1443 Add.4997 49.97 %

+1.50 Proportion of scores between two points on the same side of the mean FrequencyFrequency Percent between two scores. -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 Z scores +1.12 Proportion mu to Z for 1.12 =.3686 Proportion mu to Z for 1.50 =.4332 Area Area Add/Sub Total Per Z 1 Z 2 mu to Z 1 mu to Z 2 Z 1 to Z 2 Area Cent +1.50 +1.12.4332.3686 Sub.0646 6.46 %

Translating to and from Z scores, the standard error of the mean and confidence intervals

Definition Z = score - mean standard deviation If we know mu and sigma, any score can be translated into a Z score: = X -  

Definition Conversely, as long as you know mu and sigma, a Z score can be translated into any other type of score : Score =  + ( Z *  )

Scale scores Z scores have been standardized so that they always have a mean of 0.00 and a standard deviation of 1.00. Other scales use other means and standard deviations. Examples: IQ -  =100;  = 15 SAT/GRE -  =500;  = 100 Normal scores -  =50;  = 10

Convert Z scores to IQ scores Z  (Z*  )   + (Z *  ) +2.67 -.060 15 -9.00 100 91 +2.67 15 40.05 +2.67 15 +2.67 15 40.05 100 +2.67 15 40.05 100 140

Translate to a Z score first, then to any other type of score Convert IQ scores of 120 & 80 to percentiles. 120 100 20.0 15 1.33 mu-Z =.4082,.5000 +.4082 =.9082 = 91st percentile, Similarly 80 =.5000 -.4082 = 9th percentile X  (X-  )  (X-  )/  Convert an IQ score of 100 to a percentile. An IQ of 100 is right at the mean and that’s the 50th percentile.

SAT / GRE scores - Examples How many people out of 400 can be expected to score between 550 and 650 on the SAT? Proportion difference =.4332 -.1915 =.2417 550 500 50 100 0.50 SAT  (X-  )  (X-  )/  650 500 150 100 1.50 Proportion mu to Z 0.50 =.1915 Proportion mu to Z 1.50 =.4332 Expected people =.2417 * 400 = 96.68

Midterm type problems: Double translations 35 25.00 10.00 6.00 1.67 On the verbal portion of the Wechsler IQ test, John scores 35 correct responses. The mean on this part of the IQ test is 25.00 and the standard deviation is 6.00. What is John’s verbal IQ score? Raw  (X-  ) Scale Scale Scale score (raw) (raw)  Z   score 6.00 1.67 100 15 125 Scale score = 100 + (1.67 * 15) = 125 Z score = 10.00 / 6.00 = 1.67

The standard error = the standard deviation divided by the square root of n, the sample size

Let’s see how it works We know that the mean of SAT/GRE scores = 500 and sigma = 100 So 68.26% of individuals will score between 400 and 600 and 95.44% will score between 300 and 700 But if we take random samples of SAT scores, with 4 people in each sample, the standard error of the mean is sigma divided by the square root of the sample size = 100/2=50. 68.26% of the sample means will be within 1.00 standard error of the mean from mu and 95.44% will be within 2.00 standard errors of the mean from mu So, 68.26% of the sample means (n=4) will be between 450 and 550 and 95.44% will fall between 400 and 600

What happens as n increases? The sample means get closer to each other and to mu. Their average squared distance from mu equals the standard deviation divided by the size of the sample. The law of large numbers operates – the pattern of actual means approaches the theoretical frequency distribution. In this case, the sample means fall into a more and more perfect normal curve. These facts are called “The Central Limit Theorem” and can be proven mathematically.

Let’s make the samples larger Take random samples of SAT scores, with 400 people in each sample, the standard error of the mean is sigma divided by the square root of 400 = 100/20=5.00 68.26% of the sample means will be within 1.00 standard error of the mean from mu and 95.44% will be within 2.00 standard errors of the mean from mu. So, 68.26% of the sample means (n=400) will be between 495 and 505 and 95.44% will fall between 490 and 510. Take random samples of SAT scores, with 2500 people in each sample, the standard error of the mean is sigma divided by the square root of 2500 = 100/50=2.00. 68.26% of the sample means will be within 1.00 standard error of the mean from mu and 95.44% will be within 2.00 standard errors of the mean from mu. 68.26% of the sample means (n=2500) will be between 498 and 512 and 95.44% will fall between 496 and 504

CONFIDENCE INTERVALS

We want to define two intervals around mu: One interval into which 95% of the sample means will fall. Another interval into which 99% of the sample means will fall.

95% of sample means will fall in a symmetrical interval around mu that goes from 1.960 standard errors below mu to 1.960 standard errors above mu A way to write that fact in statistical language is: CI.95 : mu + 1.960 sigma X-bar or CI.95 : mu - 1.960 sigma X-bar < X-bar < mu + 1.960 sigma X-bar

As I said, 95% of sample means will fall in a symmetrical interval around mu that goes from 1.960 standard errors below mu to 1.960 standard errors above mu Take samples of SAT/GRE scores (n=400) Standard error of the mean is sigma divided by the square root of n=100/ = 100/20.00=5.00 1.960 standard errors of the mean with such samples = 1.960 (5.00)= 9.80 So 95% of the sample means can be expected to fall in the interval 500 +9.80 500-9.80 = 490.20 and 500+9.80 =509.80 CI.95 : mu + 1.960 sigma X-bar = 500+9.80 or CI.95 : 490.20 < X-bar < 509.20

99% of sample means will fall within 2.576 standard errors from mu Take the same samples of SAT/GRE scores (n=400) The standard error of the mean is sigma divided by the square root of n=100/20.00=5.00 2.576 standard errors of the mean with such samples = 2.576 (5.00)= 12.88 So 99% of the sample means can be expected to fall in the interval 500 +12.88 500-12.88 = 487.12 and 500+12.88 =512.88 CI.99 : mu + 2.576 sigma X-bar = 500+12.88 or CI.99 : 487.12 < X-bar < 512.88

Chapter 5-Samples

REPRESENTATIVE ON EVERY MEASURE The mean of the random sample will be similar to the mean of the population. The same holds for weight, IQ, ability to remember faces or numbers, the size of their livers, self- confidence, etc., etc., etc. ON EVERY MEASURE THAT EVER WAS OR CAN BE AND ON EVERY STATISTIC WE COMPUTE, SAMPLE STATISTICS ARE LEAST SQUARED, UNBIASED, CONSISTENT ESTIMATES OF THEIR POPULATION PARAMETERS.

The sample mean The sample mean is called X-bar and is represented by X. X is the best estimate of , because it is a least squares, unbiased, consistent estimate. X =  X / n

Consistent estimation Population is 1320 students taking a test.  is 72.00,  = 12 Let’s randomly sample one student at a time and see what happens.

Test Scores FrequencyFrequency score 36 48 60 96 108 7284 Sample scores: 3 2 1 0 1 2 3 Standard deviations Scores Mean 87 Means: 8079 10272667666786963 76.476.775.674.0

More scores that are free to vary = better estimates Each time you add a score to your sample, it is most likely to pull the sample mean closer to mu, the population mean. Any particular score may pull it further from mu. But, on the average, as you add more and more scores, the odds are that you will be getting closer to mu.. Remember, if your sample was everybody in the population, then the sample mean must be exactly mu.

Consistent estimators We call estimates that improve when you add scores to the sample consistent estimators. Recall that the statistics that we will learn are: consistent, least squares, and unbiased.

Estimated variance Our best estimate of  2 is called the mean square for error and is represented by MS W. MS W is a least squares, unbiased, consistent estimate. SS W =  ( X - X) 2 MS W =  ( X - X) 2 / (n-k)

Estimated standard deviation The least squares, unbiased, consistent estimate of  is called s. s = MS W

Estimating mu and sigma – single sample S# A B C X684X684 MS W = SS W /(n-k) = 8.00/2 = 4.00 s = MS W = 2.00 (X - X) 2 0.00 4.00 (X - X) 0.00 2.00 -2.00 X 6.00  X=18 N= 3 X=6.00  ( X-X)=0.00  ( X-X) 2 =8.00 = SS W

Why n-k? This has to do with “degrees of freedom.” Each time you add a score to a sample, you pull the sample statistic toward the population parameter.

Any score that isn’t free to vary does not tend to pull the sample statistic toward the population parameter. When calculating the estimated average squared deviation from the mean, we base our estimate on the deviation of each score from its group mean. So there are as many df for MS W and s as there are deviation scores that are free to vary. One deviation in each group is constrained by the rule that deviations around the mean must sum to zero. So one score in each group is not free to vary.

Group1 1.1 1.2 1.3 1.4 X 50 77 69 88 MS W = SS W /(n-k) = s = MS W = (X - X) 2 441.00 36.00 4.00 289.00 (X - X) -21.00 +6.00 -2.00 +17.00  (X-X 1 )=0.00  (X-X 1 ) 2 = 770.00 Group2 2.1 2.2 2.3 2.4 78 57 82 63  (X-X 2 ) 2 = 426.00  (X-X 2 )=0.00 64.00 169.00 144.00 49.00 8.00 -13.00 12.00 -7.00 Group3 3.1 3.2 3.3 3.4 74 70 63 81 X 71.00 X 1 = 71.00 70.00 X 2 = 70.00  (X-X 3 ) 2 = 170.00  (X-X 3 )=0.00 4.00 81.00 2.00 -2.00 -9.00 9.00 72.00 X 3 = 72.00 1366.00/9 = 151.78 151.78 = 12.32

n-k is the number of degrees of freedom for MS W Since one deviation score in each group is not free to vary, you lose one degree of freedom for each group - with k groups you lose k*1=k degrees of freedom. There are n deviation scores in total. k are not free to vary. That leaves n-k that are free to vary, n-k degrees of freedom MS W, your estimate of sigma 2.

t distribution, estimated standard errors and CIs with t

t curves The more degrees of freedom for MS W, the better our estimate of sigma 2. The better our estimate, the more t curves resemble Z curves.

1 df To get 95% of the population when there is 1 df of freedom, you need to go out over 12 standard deviations. 5 df To get 95% of the population when there are 5 df of freedom, you need to go out over 3 standard deviations. t curves and degrees of freedom FrequencyFrequency score 3 2 1 0 1 2 3 Standard deviations

Critical values of the t curves Each curve is defined by how many estimated standard deviations you must go from the mean to define a symmetrical interval that contains a proportions of.9500 and.9900 of the curve, leaving proportions of.0500 and.0100 in the two tails of the curve (combined). Values for.9500/.0500 are shown in plain print. Values for.9900/.0100 and the degrees of freedom for each curve are shown in bold print.

df 12345678.05 12.7064.3033.1822.7762.5712.4472.3652.306.01 63.6579.9255.8414.6044.0323.7073.4993.355 df 910111213141516.05 2.2622.2282.2012.1792.1602.1452.1312.120.01 3.2503.1693.1063.0553.0122.9972.9472.921 df 1718192021222324.05 2.1102.1012.0932.0862.0802.0742.0692.064.01 2.8982.8782.8612.8452.8312.8192.8072.797 df 2526272829304060.05 2.0602.0562.0522.0482.0452.0422.0212.000.01 2.7872.7792.7712.7632.7562.7502.7042.660 df 1002005001000200010000.05 1.9841.9721.9651.9621.9611.960.01 2.6262.6012.5862.5812.5782.576

To compute the standard error of the mean, we divide sigma by the square root of n, the size of the sample Similarly, to estimate the standard error of the mean, We divide s by the square root of n, the size of the sample in which we are interested. The estimated standard error of the mean is our best (least squared, unbiased, consistent) estimate of the average unsquared distance of sample means from mu.

Confidence intervals around mu T

Confidence intervals and hypothetical means We frequently have a theory about what the mean of a distribution should be. To be scientific, that theory about mu must be able to be proved wrong (falsified). One way to test a theory about a mean is to state a range where sample means should fall if the theory is correct. We usually state that range as a 95% confidence interval.

To test our theory, we take a random sample from the appropriate population and see if the sample mean falls where the theory says it should, inside the confidence interval. If the sample mean falls outside the 95% confidence interval established by the theory, the evidence suggests that our theoretical population mean and the theory that led to its prediction is wrong. When that happens our theory has been falsified. We must discard it and look for an alternative explanation of our data.

Testing a theory SO WE MUST CONSTRUCT A 95% CONFIDENCE INTERVAL AROUND MU T AND SEE WHETHER OUR SAMPLE MEAN FALLS INSIDE OR OUTSIDE THE CI. If the sample mean falls inside the CI.95, you must accept mu T as the most probable mean for the population from which the sample was drawn. If the sample means falls outside the CI.95, you falsify the theory that the population mean equals mu T. You then turn around and ask what the relevant population parameter is. And there is the sample mean, a least squares, unbiased estimate of mu. If the mean is not mu T, then we use the sample mean as our estimate of mu.

To create a confidence interval around mu T, we must estimate sigma from a sample. For example, we randomly select a group of 16 healthy individuals from the population. We administer a standard clinical dose of our new drug for 3 days. We carefully measure body temperature. RESULTS: We find that the average body temperature in our sample is 99.5 o F with an estimated standard deviation of 1.40 o (s=1.40). IS 99.5 o F. IN THE 95% CI AROUND MU T ???

Knowing s and n we can easily compute the estimated standard error of the mean. Let’s say that s=1.40 o and n = 16: = 1.40/4.00 = 0.35

df 12345678.05 12.7064.3033.1822.7762.5712.4472.3652.306.01 63.6579.9255.8414.6044.0323.7073.4993.355 df 910111213141516.05 2.2622.2282.2012.1792.1602.145 2.131 2.120.01 3.2503.1693.1063.0553.0122.9972.9472.921 df 1718192021222324.05 2.1102.1012.0932.0862.0802.0742.0692.064.01 2.8982.8782.8612.8452.8312.8192.8072.797 df 2526272829304060.05 2.0602.0562.0522.0482.0452.0422.0212.000.01 2.7872.7792.7712.7632.7562.7502.7042.660 df 1002005001000200010000.05 1.9841.9721.9651.9621.9611.960.01 2.6262.6012.5862.5812.5782.576

So, mu T =98.6, df W =15, t CRIT =2.131, s=1.40, n=16, s X-bar =1.40/ = 0.35 Here is the confidence interval CI.95 : mu T + t CRIT * s X-bar = = 98.6 + (2.131)(0.35) = 98.60+ 0.75 CI.95 : 97.85 < X-bar < 99.35 Our sample mean (99.5) fell outside the CI.95 This falsifies the theory that our drug has no effect on body temperature. Our drug may cause a slight fever.

Chapter 1-6 Review Chapter 1 The mean, variance and minimizing error.

Similar presentations

Presentation on theme: "Chapter 1-6 Review Chapter 1 The mean, variance and minimizing error."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 1-6 Review Chapter 1 The mean, variance and minimizing error.

Similar presentations

Presentation on theme: "Chapter 1-6 Review Chapter 1 The mean, variance and minimizing error."— Presentation transcript:

Similar presentations

About project

Feedback