Advanced Quantitative Techniques Lab 3 Sept 22nd
LAB 3: Hypothesis testing #1 Review from last week: CI (skim over) Hypothesis testing #1 Hypotheses Proportion tests (prtest) single sample, multiple samples T-tests (ttest) Single sample Read more with policy examples here: http://www.urban.org/research/data-methods/data-analysis/quantitative-data-analysis/impact-analysis/paired-testing Also: Importing from excel Groups (by) Errors
Inferential Statistics What can we infer about the population based on a sample? From now on, we’re estimating the population mean (μ) with the sample mean ( ). We are no longer talking about individual behavior; we’re talking about average behavior
Sampling Distributions: The Key Points We generally don’t know anything about the population distribution We have a sample of data from the population We assume that the average/mean is the most appropriate description of population (no more median because we assume normal distribution) The sample is to be random and representative (“large enough”)
Distribution of Means Take a random sample over, and over, and over again (random means each data point has an equal chance of being chosen). You get many sample means Plot the sampling distribution of these means: you get a distribution of averages (not raw data points!)
Distribution of Means Sampling Distribution of Means: Frequency distribution (histogram) of the sample means, not of the data themselves. Distribution of all possible sample means **This is not the distribution of x** Frequency
Remember . . . If we sample randomly from a large enough population, the distribution of the averages of the data (not the population data) is a bell curve (normal distribution). This is the case regardless of what the population distribution looks like.
Example Question We take a random sample of 450 UP graduates. The average salary is $64,800. The standard deviation of this sample is $29,882. What is the probability that if we randomly gather another group of UP grads, their average salary will be greater than $67,000?
Solution n = 450 (sample size) = $64,800 (sample mean) s = $29,882 (sample standard deviation) distribution of all sample means = ? not data
Solution Continued. . . 1) Calculate the standard error: = $64,800 s = $29,882 = ? 1) Calculate the standard error: 2) Substitute in s: 3)
Solution Continued. . . n = 450 = $64,800 s = $29,882 = 1,409 67,000 Now, look up 1.5614 in the z-table = 5.9% There is a 5.9% chance that the average salary of our new sample group of UP people is > $67,000.
Confidence Intervals The goal of calculating confidence intervals is to determine how sure we are that the true population mean, μ, is approximated by the sample mean . We build a confidence interval around the sample mean. Confidence intervals are only for averages, not for individual data points.
How to Form a Confidence Interval To form a confidence interval we need to know: 1) : The mean of the sample 2) σ : The standard deviation of the population (this can be approximated by using the standard deviation of the sample (s) if σ is unknown) 3) n : The size of the sample, and 4) α : estimation error = 1 – CI.
One vs. two-tail? Estimation Error α is the total estimation error (or error allowance) α/2 on the left is the over-estimation error α/2 on the right is under-estimation error. Overestimation Error Underestimation Error α/2
The CI Formula We then use the following formula: If we only have the sample standard deviation, then the interval can be approximated by:
Comparing Two CIs The two CIs must have the same error allowance, but they can have different n and different s. If two confidence intervals do not overlap, then they are statistically different (regardless of their n and s). If two confidence intervals do overlap, then n and s will become important for judgment. Do not compare an interval and a single average (point estimate) from two different samples unless the standard deviations and the sample sizes are the same.
Comparing Two CIs s and n from both samples are the same s and n from the two samples are different Compare from Sample 2 to the confidence interval from Sample 1 If falls within the CI, the population means are statistically equal. If falls outside the CI, the population means are statistically significantly different. DO NOT COMPARE! Two CIs from the two samples partially overlap The population means are statistically equal (i.e., no difference between the two means). CAN’T TELL! …because the overlap could be caused by a change in mean OR a higher variability in one of the datasets. One CI fully covers the other one (complete) overlap The CIs from the two samples do not overlap The population means are statistically significantly different
Single-mean hypothesis test Hypothesis testing with a single sample enables us to make an inference about the mean (μ) of a population. Which variable are you interested in? What is the null hypothesis? What is your alpha? What is the sample size? State appropriate assumptions.
Null and Alternative Hypotheses Null Hypothesis (Ho): Prior belief or default belief (usually a statement of “no effect” or “no difference”) Alternative Hypothesis (H1): New way of thinking or researcher’s claim (usually what we are interested in proving) Ho and H1 are always stated in terms of population mean behavior (μ) The Ho and the H1 never overlap and are exhaustive
Probability testing PR test: one sample Does less than half of the population support school prayer? Ho =? Ha = ? [one tail or two tail?] Download gss2002_chapter7 and open in STATA recode prayer (1 = 1) (2 = 0) gen schpray tab prayer schpray, missing prtest schpray == .5 Note that means = proportion
Probability testing PR test cont. Treatment vs. control ‘Success’ = 1 Policy example from each person! Import pr_test_lab3.xlsx prtest treat == control Control (no change) Treated (your program) household success 1 2 3 4 …40 Household success 1 2 3 4 …40 Interpret results
Probability testing PR test cont.: 2-sample Treatment vs. control – another way that data might be stored Does support for school prayer vary by gender? prtest schpray, by(sex) household Success? Treated? 1 2 3 4 …40
Testing means: z vs. t - stats General rule of thumb..not always
ttest “variable” = “null hypothesis” if “the condition” , level (?) Stata Command: ttest ttest “variable” = “null hypothesis” if “the condition” , level (?) Note that one or two “=” signs are OK in the first part of the command Two “=” signs are required in the “if” clause Stata defaults to 95% level Level is the “confidence” level Even though your alternative hypothesis may be one-tailed, the Stata command ALWAYS uses “=”. Note that putting in > or < for the ttest command will cause Stata errors
Import excel sheet to STATA Download Lab_3_Data.xls File -> import -> excel spreadsheet -> Lab_3_Data.xls Select “Import first row as variable names”
Working Hours sum hrs1 α =5% = .05 In the 1990s, the average workweek was 42.5 hours. In 1999, the legislature passed a bill to limit the average workweek to 40 hours. In 2000, are average work hours equal to 42.5? Use alpha 5%. sum hrs1 n = 1818 Dataset is a sample of the population Sample is representative. We assume the sample is random. Distribution of means is normal. α =5% = .05
Fail to reject the null hypothesis ttest command ttest hrs1=42.5 if wrkstat=="working" Fail to reject the null hypothesis
Properties of p-values The p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true If smaller than alpha then H1 is true We cannot compare one p-value to that of another sample p-value is not dependent on alpha Compare p-value to α and decide whether or not to reject the null with p-value P-value < α reject HO P-value >= α reserve judgment on HO
Interpreting Stata Key values: population mean, sample mean, t-value, d.f., one/two tailed, p-value Decide whether p-value is greater than or less than your alpha. Reject or fail to reject null hypothesis accordingly.
Working Hours: conclusions Based on this sample of 1818 workers taken in 2000, we were unable to say that the average hours worked per week was not 42.5. We cannot conclude that the bill to reduce work hours has lowered the average workweek since 1999. However, we could have asked a bunch of workaholics or lazy workers (two-tailed) when the reality of my population is different. In this case, we would have failed to reject the null when I should have rejected it (Type II error).
Errors: Alpha and Beta α = alpha = Type I error “false positives” (see Reinhart, p. 11) You rejected the Null Hypothesis when you shouldn’t have Example: jury convicted innocent person Used for making decisions about null hypotheses Example: You found that the average number of cigarettes consumed this year is different from last year, when in fact average cigarette consumption did not change. β = beta = Type II error “false negatives” (see Reinhart, p. 11) You failed to reject the Null Hypothesis when you should have Example: jury frees a guilty person Difficult to compute – we will not quantify it in this course Example: You found that the average number of cigarettes consumed this year is the same as last year, when in fact average cigarette consumption has changed.
Reject the null hypothesis Do you reject the null with an alpha of 10%? Reject the null hypothesis Based on this sample of 1818 workers taken in 2000, I found that the average hours worked per week was not equal to 42.5. I can conclude that the average hours worked are statistically significantly different since the bill was passed in 1999. However, there is a 10% chance that I made this conclusion when it is not true. For example, I might have asked a lot of people working fewer hours when in reality most people work more than the ones that I talked to. In this case, I would have rejected the null when I should have not rejected it (Type I error). Note that we only quantify type I error.
Relationships / formulas standard error = SD / sq root of sample size [sample] [pop estimate] [sample] t statistic = sample mean – pop mean / standard error