Central Limit Theorem, z-tests, & t-tests PSY440 June 19, 2008
Sample Distributions & The Central Limit Theorem New Topic Sample Distributions & The Central Limit Theorem
Distribution of sample means Distribution of sample means is a “virtual” distribution between the sample and population Population Distribution of sample means Sample
Properties of the distribution of sample means Shape If population is Normal, then the dist of sample means will be Normal If the sample size is large (n > 30), regardless of shape of the population Distribution of sample means Population N > 30
Properties of the distribution of sample means Center The mean of the dist of sample means is equal to the mean of the population Population Distribution of sample means same numeric value different conceptual values
Properties of the distribution of sample means Center The mean of the dist of sample means is equal to the mean of the population Consider our earlier example 2 4 6 8 Population Distribution of sample means means 2 3 4 5 6 7 8 1 2 + 4 + 6 + 8 4 m = = 5 2+3+4+5+3+4+5+6+4+5+6+7+5+6+7+8 16 = = 5
Properties of the distribution of sample means Spread The standard deviation of the distribution of sample mean depends on two things Standard deviation of the population Sample size
Properties of the distribution of sample means Spread Standard deviation of the population The smaller the population variability, the closer the sample means are to the population mean m X 1 2 3 m X 1 2 3
Properties of the distribution of sample means Spread Sample size n = 1 m X
Properties of the distribution of sample means Spread Sample size n = 10 m X
Properties of the distribution of sample means Spread Sample size m n = 100 The larger the sample size the smaller the spread X
Properties of the distribution of sample means Spread Standard deviation of the population Sample size Putting them together we get the standard deviation of the distribution of sample means Commonly called the standard error
Standard error The standard error is the average amount that you’d expect a sample (of size n) to deviate from the population mean In other words, it is an estimate of the error that you’d expect by chance (or by sampling)
Distribution of sample means Keep your distributions straight by taking care with your notation Population s m Distribution of sample means Sample s X
Properties of the distribution of sample means All three of these properties are combined to form the Central Limit Theorem For any population with mean and standard deviation , the distribution of sample means for sample size n will approach a normal distribution with a mean of and a standard deviation of as n approaches infinity (good approximation if n > 30).
Performing your statistical test What are we doing when we test the hypotheses? Computing a test statistic: Generic test Could be difference between a sample and a population, or between different samples Based on standard error or an estimate of the standard error
Hypothesis Testing With a Distribution of Means It is the comparison distribution when a sample has more than one individual Find a Z score of your sample’s mean on a distribution of means
“Generic” statistical test An example: One sample z-test Memory experiment example : Step 1: State your hypotheses We give a n = 16 memory patients a memory improvement treatment. H0: the memory treatment sample are the same (or worse) as the population of memory patients. After the treatment they have an average score of = 55 memory errors. mTreatment > mpop = 60 How do they compare to the general population of memory patients who have a distribution of memory errors that is Normal, m = 60, s = 8? HA: Their memory is better than the population of memory patients mTreatment < mpop = 60
“Generic” statistical test An example: One sample z-test Memory experiment example : H0: mTreatment > mpop > 60 HA: mTreatment < mpop < 60 We give a n = 16 memory patients a memory improvement treatment. Step 2: Set your decision criteria One -tailed After the treatment they have an average score of = 55 memory errors. a = 0.05 How do they compare to the general population of memory patients who have a distribution of memory errors that is Normal, m = 60, s = 8?
“Generic” statistical test An example: One sample z-test Memory example experiment: H0: mTreatment > mpop > 60 HA: mTreatment < mpop < 60 We give a n = 16 memory patients a memory improvement treatment. One -tailed a = 0.05 Step 3: Collect your data After the treatment they have an average score of = 55 memory errors. How do they compare to the general population of memory patients who have a distribution of memory errors that is Normal, m = 60, s = 8?
“Generic” statistical test An example: One sample z-test Memory example experiment: H0: mTreatment > mpop = 60 HA: mTreatment < mpop = 60 We give a n = 16 memory patients a memory improvement treatment. One -tailed a = 0.05 Step 4: Compute your test statistics After the treatment they have an average score of = 55 memory errors. How do they compare to the general population of memory patients who have a distribution of memory errors that is Normal, m = 60, s = 8? = -2.5
“Generic” statistical test An example: One sample z-test H0: mTreatment > mpop > 60 Memory example experiment: HA: mTreatment < mpop < 60 We give a n = 16 memory patients a memory improvement treatment. One -tailed a = 0.05 After the treatment they have an average score of = 55 memory errors. Step 5: Make a decision about your null hypothesis How do they compare to the general population of memory patients who have a distribution of memory errors that is Normal, m = 60, s = 8? 5% Reject H0
“Generic” statistical test An example: One sample z-test Memory example experiment: H0: mTreatment > mpop = 60 HA: mTreatment < mpop = 60 We give a n = 16 memory patients a memory improvement treatment. One -tailed a = 0.05 After the treatment they have an average score of = 55 memory errors. Step 5: Make a decision about your null hypothesis - Reject H0 How do they compare to the general population of memory patients who have a distribution of memory errors that is Normal, m = 60, s = 8? - Support for our HA, the evidence suggests that the treatment decreases the number of memory errors
Other sampling distributions The distribution of sample means is one of the main distributions that underlies inferential statistics, and can be used to test hypotheses about population means (or about the relation between a sample and a population with known parameters). Other distributions are used in a similar manner: Sampling distribution of differences between means (often conceptualized as having mean=0 and se=sqrt(var1+var2) Sampling distributions of correlation coefficients and of differences between correlation coefficients (correlation coefficients need to be log-transformed to create a sampling distribution that has a normal shape see table I in text for conversion of Pearson’s r into “Fisher Z”).
Effect size and power Effect size: Cohen’s d Error types Statistical Power Analysis
Performing your statistical test There really isn’t an effect Real world (‘truth’) There really is an effect H0 is correct H0 is wrong Reject H0 Experimenter’s conclusions Fail to Reject H0
Performing your statistical test Real world (‘truth’) H0 is correct H0 is wrong
Performing your statistical test Real world (‘truth’) H0 is correct H0 is wrong Type I error Type II error Real world (‘truth’) H0 is correct H0 is wrong Type I error Type II error Real world (‘truth’) H0 is correct H0 is wrong So there is only one distribution So there are two distributions The original (null) distribution The new (treatment) distribution The original (null) distribution
Performing your statistical test Real world (‘truth’) H0 is correct H0 is wrong Type I error Type II error Real world (‘truth’) H0 is correct H0 is wrong So there is only one distribution So there are two distributions The original (null) distribution The new (treatment) distribution The original (null) distribution
Effect Size Real world (‘truth’) H0 is correct H0 is wrong Type I error Type II error Hypothesis test tells us whether the observed difference is probably due to chance or not It does not tell us how big the difference is H0 is wrong So there are two distributions The new (treatment) distribution The original (null) distribution Effect size tells us how much the two populations don’t overlap
Effect Size Figuring effect size But this is tied to the particular units of measurement Figuring effect size The new (treatment) distribution The original (null) distribution Effect size tells us how much the two populations don’t overlap
Effect Size Standardized effect size Cohen’s d Puts into neutral units for comparison (same logic as z-scores) Cohen’s d The new (treatment) distribution The original (null) distribution Effect size tells us how much the two populations don’t overlap
Effect Size Effect size conventions small d = .2 medium d = .5 large d = .8 The new (treatment) distribution The original (null) distribution Effect size tells us how much the two populations don’t overlap
Error types There really isn’t an effect Real world (‘truth’) H0 is correct H0 is wrong I conclude that there is an effect Reject H0 Experimenter’s conclusions I can’t detect an effect Fail to Reject H0
Error types Real world (‘truth’) H0 is correct H0 is wrong Type I error (): concluding that there is a difference between groups (“an effect”) when there really isn’t. H0 is correct H0 is wrong Type I error Reject H0 Experimenter’s conclusions Type II error (): concluding that there isn’t an effect, when there really is. Fail to Reject H0 Type II error
Statistical Power So how do we compute this? The probability of making a Type II error is related to Statistical Power Statistical Power: The probability that the study will produce a statistically significant results if the research hypothesis is true (there is an effect) So how do we compute this?
Statistical Power Real world (‘truth’) a = 0.05 H0: is true (is no treatment effect) Real world (‘truth’) H0 is correct H0 is wrong Type I error Type II error The original (null) distribution a = 0.05 Reject H0 Fail to reject H0
Statistical Power Real world (‘truth’) a = 0.05 H0: is false (is a treatment effect) Real world (‘truth’) H0 is correct H0 is wrong Type I error Type II error The new (treatment) distribution The original (null) distribution a = 0.05 Fail to reject H0 Reject H0
Statistical Power Real world (‘truth’) H0: is false (is a treatment effect) Real world (‘truth’) H0 is correct H0 is wrong Type I error Type II error The new (treatment) distribution The original (null) distribution b = probability of a Type II error a = 0.05 Failing to Reject H0, even though there is a treatment effect Reject H0 Fail to reject H0
Statistical Power Real world (‘truth’) H0: is false (is a treatment effect) Real world (‘truth’) H0 is correct H0 is wrong Type I error Type II error The new (treatment) distribution The original (null) distribution b = probability of a Type II error a = 0.05 Power = 1 - b Failing to Reject H0, even though there is a treatment effect Probability of (correctly) Rejecting H0 Reject H0 Fail to reject H0
Statistical Power Steps for figuring power 1) Gather the needed information: mean and standard deviation of the Null Population and the predicted mean of Treatment Population
Statistical Power Steps for figuring power 2) Figure the raw-score cutoff point on the comparison distribution to reject the null hypothesis From the unit normal table: Z = -1.645 a = 0.05 Transform this z-score to a raw score
Statistical Power Steps for figuring power 3) Figure the Z score for this same point, but on the distribution of means for treatment Population Remember to use the properties of the treatment population! Transform this raw score to a z-score
Statistical Power Steps for figuring power 4) Use the normal curve table to figure the probability of getting a score more extreme than that Z score b = probability of a Type II error From the unit normal table: Z(0.355) = 0.3594 Power = 1 - b The probability of detecting an effect of this size from these populations is 64%
Statistical Power Factors that affect Power: a-level Sample size Population standard deviation Effect size 1-tail vs. 2-tailed
Statistical Power Factors that affect Power: a-level b Power = 1 - b Change from a = 0.05 to 0.01 a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: a-level b Power = 1 - b Change from a = 0.05 to 0.01 a = 0.05 a = 0.01 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: a-level b Power = 1 - b Change from a = 0.05 to 0.01 a = 0.05 a = 0.01 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: a-level b Power = 1 - b Change from a = 0.05 to 0.01 a = 0.05 a = 0.01 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: a-level b Power = 1 - b Change from a = 0.05 to 0.01 a = 0.05 a = 0.01 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: a-level b Power = 1 - b So as the a level gets smaller, so does the Power of the test Change from a = 0.05 to 0.01 a = 0.05 a = 0.01 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: Sample size b Recall that sample size is related to the spread of the distribution Change from n = 25 to 100 a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: Sample size b Change from n = 25 to 100 a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: Sample size b Change from n = 25 to 100 a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: Sample size b Change from n = 25 to 100 a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: Sample size b As the sample gets bigger, the standard error gets smaller and the Power gets larger Change from n = 25 to 100 a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: Population standard deviation Change from = 25 to 20 Recall that standard error is related to the spread of the distribution a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: Population standard deviation Change from = 25 to 20 a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: Population standard deviation Change from = 25 to 20 a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: Population standard deviation Change from = 25 to 20 a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: Population standard deviation Change from = 25 to 20 As the gets smaller, the standard error gets smaller and the Power gets larger a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: Effect size mtreatment Compare a small effect (difference) to a big effect mtreatment mno treatment a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: Effect size b Compare a small effect (difference) to a big effect a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0 mtreatment mno treatment
Statistical Power Factors that affect Power: Effect size b Compare a small effect (difference) to a big effect a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: Effect size b Compare a small effect (difference) to a big effect a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: Effect size b Compare a small effect (difference) to a big effect a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: Effect size b Compare a small effect (difference) to a big effect a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: Effect size b Compare a small effect (difference) to a big effect a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: Effect size b Compare a small effect (difference) to a big effect a = 0.05 As the effect gets bigger, the Power gets larger b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: 1-tail vs. 2-tailed b Change from a = 0.05 two-tailed to a = 0.05 two-tailed a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: 1-tail vs. 2-tailed b Change from a = 0.05 two-tailed to a = 0.05 two-tailed p = 0.025 a = 0.05 b Power = 1 - b Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: 1-tail vs. 2-tailed b Change from a = 0.05 two-tailed to a = 0.05 two-tailed a = 0.05 p = 0.025 b Power = 1 - b p = 0.025 Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: 1-tail vs. 2-tailed b Change from a = 0.05 two-tailed to a = 0.05 two-tailed a = 0.05 p = 0.025 b Power = 1 - b p = 0.025 Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: 1-tail vs. 2-tailed b Change from a = 0.05 two-tailed to a = 0.05 two-tailed a = 0.05 p = 0.025 b Power = 1 - b p = 0.025 Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: 1-tail vs. 2-tailed b Change from a = 0.05 two-tailed to a = 0.05 two-tailed Two tailed functionally cuts the -level in half, which decreases the power. a = 0.05 p = 0.025 b Power = 1 - b p = 0.025 Reject H0 Fail to reject H0
Statistical Power Factors that affect Power: a-level: So as the a level gets smaller, so does the Power of the test Sample size: As the sample gets bigger, the standard error gets smaller and the Power gets larger Population standard deviation: As the population standard deviation gets smaller, the standard error gets smaller and the Power gets larger Effect size: As the effect gets bigger, the Power gets larger 1-tail vs. 2-tailed: Two tailed functionally cuts the -level in half, which decreases the power
Why care about Power? Determining your sample size Using an estimate of effect size, and population standard deviation, you can determine how many participants need to achieve a particular level of power (See Table M in text book) When a result is not statistically significant Is it because there is no effect, or not enough power? When a result is significant Statistical significance versus practical significance
Next Topic t-tests One sample, related samples, independent samples Additional assumptions Levene’s test
Statistical analysis follows design Population mean (m) and standard deviation (s)are known One score per subject 1 sample The one-sample z-test can be used when:
Statistical analysis follows design Population mean (m) is known but standard deviation (s) is NOT known One score per subject 1 sample The one-sample t-test can be used when:
Testing Hypotheses Hypothesis testing: a five step program Step 1: State your hypotheses Step 2: Set your decision criteria Step 3: Collect your data Step 4: Compute your test statistics Compute your estimated standard error Compute your t-statistic Compute your degrees of freedom Step 5: Make a decision about your null hypothesis
Performing your statistical test What are we doing when we test the hypotheses? Consider a variation of our memory experiment example Population of memory patients MemoryTest m is known Memory treatment patients Test X Compare these two means Conclusions: the memory treatment sample are the same as those in the population of memory patients. they aren’t the same as those in the population of memory patients H0: HA:
Performing your statistical test What are we doing when we test the hypotheses? Real world (‘truth’) H0: is true (no treatment effect) One population XA the memory treatment sample are the same as those in the population of memory patients. H0: is false (is a treatment effect) Two populations XA they aren’t the same as those in the population of memory patients
Performing your statistical test What are we doing when we test the hypotheses? Computing a test statistic: Generic test Could be difference between a sample and a population, or between different samples Based on standard error or an estimate of the standard error
Performing your statistical test One sample z One sample t identical Test statistic
Performing your statistical test One sample z One sample t Test statistic different Diff. Expected by chance Standard error don’t know this, so need to estimate it
Performing your statistical test One sample z One sample t Test statistic different Diff. Expected by chance Standard error don’t know this, so need to estimate it Estimated standard error Degrees of freedom
One sample t-test The t-statistic distribution (a transformation of the distribution of sample means) Varies in shape according to the degrees of freedom New table: the t-table (Table C in text)
One sample t-test The t-statistic distribution (a transformation of the distribution of sample means) To reject the H0, you want a computed test statistic that is large The alpha level gives us the decision criterion New table: the t-table Distribution of the t-statistic If test statistic is here Reject H0 If test statistic is here Fail to reject H0
One sample t-test levels New table: the t-table One tailed - or - Two-tailed Degrees of freedom df Critical values of t tcrit
One sample t-test What is the tcrit for a two-tailed hypothesis test with a sample size of n = 6 and an a-level of 0.05? Distribution of the t-statistic = 0.05 Two-tailed n = 6 df = n - 1 = 5 tcrit = +2.571
One sample t-test What is the tcrit for a one-tailed hypothesis test with a sample size of n = 6 and an a-level of 0.05? Distribution of the t-statistic = 0.05 n = 6 One-tailed df = n - 1 = 5 tcrit = +2.015
One sample t-test Memory experiment example: An example: One sample t-test Memory experiment example: Step 1: State your hypotheses We give a n = 16 memory patients a memory improvement treatment. H0: the memory treatment sample are the same as those in the population of memory patients. After the treatment they have an average score of = 55, s = 8 memory errors. mTreatment > mpop = 60 HA: they aren’t the same as those in the population of memory patients How do they compare to the general population of memory patients who have a distribution of memory errors that is Normal, m = 60? mTreatment < mpop = 60
One sample t-test An example: One sample t-test Memory experiment example: H0: mTreatment > mpop = 60 HA: mTreatment < mpop = 60 We give a n = 16 memory patients a memory improvement treatment. Step 2: Set your decision criteria One -tailed After the treatment they have an average score of = 55, s = 8 memory errors. a = 0.05 How do they compare to the general population of memory patients who have a distribution of memory errors that is Normal, m = 60?
One sample t-test Memory experiment example: An example: One sample t-test Memory experiment example: H0: mTreatment > mpop = 60 HA: mTreatment < mpop = 60 We give a n = 16 memory patients a memory improvement treatment. Step 2: Set your decision criteria After the treatment they have an average score of = 55, s = 8 memory errors. One -tailed a = 0.05 How do they compare to the general population of memory patients who have a distribution of memory errors that is Normal, m = 60?
One sample t-test Memory experiment example: An example: One sample t-test Memory experiment example: H0: mTreatment > mpop = 60 HA: mTreatment < mpop = 60 We give a n = 16 memory patients a memory improvement treatment. One -tailed a = 0.05 Step 3: Collect your data After the treatment they have an average score of = 55, s = 8 memory errors. How do they compare to the general population of memory patients who have a distribution of memory errors that is Normal, m = 60?
One sample t-test Memory experiment example: An example: One sample t-test Memory experiment example: H0: mTreatment > mpop = 60 HA: mTreatment < mpop = 60 We give a n = 16 memory patients a memory improvement treatment. One -tailed a = 0.05 After the treatment they have an average score of = 55, s = 8 memory errors. Step 4: Compute your test statistics How do they compare to the general population of memory patients who have a distribution of memory errors that is Normal, m = 60? = -2.5
One sample t-test Memory experiment example: An example: One sample t-test Memory experiment example: H0: mTreatment > mpop = 60 HA: mTreatment < mpop = 60 We give a n = 16 memory patients a memory improvement treatment. One -tailed a = 0.05 t = -2.5 After the treatment they have an average score of = 55, s = 8 memory errors. Step 4: Compute your test statistics How do they compare to the general population of memory patients who have a distribution of memory errors that is Normal, m = 60?
One sample t-test Memory experiment example: An example: One sample t-test Memory experiment example: H0: mTreatment > mpop = 60 HA: mTreatment < mpop = 60 We give a n = 16 memory patients a memory improvement treatment. One -tailed a = 0.05 After the treatment they have an average score of = 55, s = 8 memory errors. Step 5: Make a decision about your null hypothesis How do they compare to the general population of memory patients who have a distribution of memory errors that is Normal, m = 60? tcrit = -1.753
One sample t-test Memory experiment example: An example: One sample t-test Memory experiment example: H0: mTreatment > mpop = 60 HA: mTreatment < mpop = 60 We give a n = 16 memory patients a memory improvement treatment. One -tailed a = 0.05 After the treatment they have an average score of = 55, s = 8 memory errors. Step 5: Make a decision about your null hypothesis How do they compare to the general population of memory patients who have a distribution of memory errors that is Normal, m = 60? tobs=-2.5 - Reject H0 -1.753 = tcrit