Advanced Quantitative Techniques

Slides:



Advertisements
Similar presentations
Our goal is to assess the evidence provided by the data in favor of some claim about the population. Section 6.2Tests of Significance.
Advertisements

Inference Sampling distributions Hypothesis testing.
Review: What influences confidence intervals?
Fundamentals of Hypothesis Testing. Identify the Population Assume the population mean TV sets is 3. (Null Hypothesis) REJECT Compute the Sample Mean.
DATA ANALYSIS I MKT525. Plan of analysis What decision must be made? What are research objectives? What do you have to know to reach those objectives?
Chapter 9 Hypothesis Testing.
Getting Started with Hypothesis Testing The Single Sample.
AM Recitation 2/10/11.
Hypothesis Testing:.
Overview Definition Hypothesis
Confidence Intervals and Hypothesis Testing - II
Introduction to Biostatistics and Bioinformatics
Fundamentals of Hypothesis Testing: One-Sample Tests
1/2555 สมศักดิ์ ศิวดำรงพงศ์
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall 9-1 σ σ.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Chapter 9: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Type I and II Errors Testing the difference between two means.
Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.
Applied Quantitative Analysis and Practices LECTURE#14 By Dr. Osman Sadiq Paracha.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Lecture Slides Elementary Statistics Twelfth Edition
Chapter Nine Hypothesis Testing.
HYPOTHESIS TESTING.
Tutorial 11: Hypothesis Testing
Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests
Module 10 Hypothesis Tests for One Population Mean
Hypothesis Testing: One-Sample Inference
Statistics for Managers Using Microsoft® Excel 5th Edition
Advanced Quantitative Techniques
Chapter 9 Hypothesis Testing.
Inference for Two-Samples
One-Sample Tests of Hypothesis
Hypothesis Testing: One Sample Cases
Assumptions For testing a claim about the mean of a single population
Unit 5: Hypothesis Testing
Statistics for the Social Sciences
Inference and Tests of Hypotheses
Testing Hypotheses about Proportions
Hypothesis Testing: Preliminaries
Business Statistics Topic 7
Testing Hypotheses About Proportions
CHAPTER 6 Statistical Inference & Hypothesis Testing
Chapter 9 Hypothesis Testing: Single Population
Hypothesis testing March 20, 2000.
Chapters 20, 21 Hypothesis Testing-- Determining if a Result is Different from Expected.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Chapter 25 Comparing Counts.
Hypothesis Tests for 1-Sample Proportion
Hypothesis Testing: Hypotheses
Elementary Statistics
Chapter 9 Hypothesis Testing.
Testing Hypotheses about Proportions
Review: What influences confidence intervals?
LESSON 20: HYPOTHESIS TESTING
Decision Errors and Power
Statistical Inference about Regression
Significance Tests: The Basics
Significance Tests: The Basics
Testing Hypotheses About Proportions
Lecture 10/24/ Tests of Significance
STA 291 Spring 2008 Lecture 18 Dustin Lueker.
Chapter 26 Comparing Counts.
What are their purposes? What kinds?
Hypothesis Testing and Confidence Intervals
Statistics for Business and Economics
Click the mouse button or press the Space Bar to display the answers.
Reasoning in Psychology Using Statistics
Section 11.1: Significance Tests: Basics
Presentation transcript:

Advanced Quantitative Techniques Lab 3 Sept 22nd

LAB 3: Hypothesis testing #1 Review from last week: CI (skim over) Hypothesis testing #1 Hypotheses Proportion tests (prtest) single sample, multiple samples T-tests (ttest) Single sample Read more with policy examples here: http://www.urban.org/research/data-methods/data-analysis/quantitative-data-analysis/impact-analysis/paired-testing Also: Importing from excel Groups (by) Errors

Inferential Statistics What can we infer about the population based on a sample? From now on, we’re estimating the population mean (μ) with the sample mean ( ). We are no longer talking about individual behavior; we’re talking about average behavior

Sampling Distributions: The Key Points We generally don’t know anything about the population distribution We have a sample of data from the population We assume that the average/mean is the most appropriate description of population (no more median because we assume normal distribution) The sample is to be random and representative (“large enough”)

Distribution of Means Take a random sample over, and over, and over again (random means each data point has an equal chance of being chosen). You get many sample means Plot the sampling distribution of these means: you get a distribution of averages (not raw data points!)

Distribution of Means Sampling Distribution of Means: Frequency distribution (histogram) of the sample means, not of the data themselves. Distribution of all possible sample means **This is not the distribution of x** Frequency

Remember . . . If we sample randomly from a large enough population, the distribution of the averages of the data (not the population data) is a bell curve (normal distribution). This is the case regardless of what the population distribution looks like.

Example Question We take a random sample of 450 UP graduates. The average salary is $64,800. The standard deviation of this sample is $29,882. What is the probability that if we randomly gather another group of UP grads, their average salary will be greater than $67,000?

Solution n = 450 (sample size) = $64,800 (sample mean) s = $29,882 (sample standard deviation) distribution of all sample means = ? not data

Solution Continued. . . 1) Calculate the standard error: = $64,800 s = $29,882 = ? 1) Calculate the standard error: 2) Substitute in s: 3)

Solution Continued. . . n = 450 = $64,800 s = $29,882 = 1,409 67,000 Now, look up 1.5614 in the z-table = 5.9%  There is a 5.9% chance that the average salary of our new sample group of UP people is > $67,000.

Confidence Intervals The goal of calculating confidence intervals is to determine how sure we are that the true population mean, μ, is approximated by the sample mean . We build a confidence interval around the sample mean. Confidence intervals are only for averages, not for individual data points.

How to Form a Confidence Interval To form a confidence interval we need to know: 1) : The mean of the sample 2) σ : The standard deviation of the population (this can be approximated by using the standard deviation of the sample (s) if σ is unknown) 3) n : The size of the sample, and 4) α : estimation error = 1 – CI.

One vs. two-tail? Estimation Error α is the total estimation error (or error allowance) α/2 on the left is the over-estimation error α/2 on the right is under-estimation error.   Overestimation Error Underestimation Error α/2

The CI Formula We then use the following formula: If we only have the sample standard deviation, then the interval can be approximated by:

Comparing Two CIs The two CIs must have the same error allowance, but they can have different n and different s. If two confidence intervals do not overlap, then they are statistically different (regardless of their n and s). If two confidence intervals do overlap, then n and s will become important for judgment. Do not compare an interval and a single average (point estimate) from two different samples unless the standard deviations and the sample sizes are the same.

Comparing Two CIs s and n from both samples are the same s and n from the two samples are different Compare from Sample 2 to the confidence interval from Sample 1 If falls within the CI, the population means are statistically equal. If falls outside the CI, the population means are statistically significantly different. DO NOT COMPARE! Two CIs from the two samples partially overlap The population means are statistically equal (i.e., no difference between the two means). CAN’T TELL! …because the overlap could be caused by a change in mean OR a higher variability in one of the datasets. One CI fully covers the other one (complete) overlap The CIs from the two samples do not overlap The population means are statistically significantly different

Single-mean hypothesis test Hypothesis testing with a single sample enables us to make an inference about the mean (μ) of a population. Which variable are you interested in? What is the null hypothesis? What is your alpha? What is the sample size? State appropriate assumptions.

Null and Alternative Hypotheses Null Hypothesis (Ho): Prior belief or default belief (usually a statement of “no effect” or “no difference”) Alternative Hypothesis (H1): New way of thinking or researcher’s claim (usually what we are interested in proving) Ho and H1 are always stated in terms of population mean behavior (μ)  The Ho and the H1 never overlap and are exhaustive

Probability testing PR test: one sample Does less than half of the population support school prayer? Ho =? Ha = ? [one tail or two tail?] Download gss2002_chapter7 and open in STATA recode prayer (1 = 1) (2 = 0) gen schpray tab prayer schpray, missing prtest schpray == .5 Note that means = proportion

Probability testing PR test cont. Treatment vs. control ‘Success’ = 1 Policy example from each person! Import pr_test_lab3.xlsx prtest treat == control Control (no change) Treated (your program) household success 1 2 3 4 …40 Household success 1 2 3 4 …40 Interpret results

Probability testing PR test cont.: 2-sample Treatment vs. control – another way that data might be stored Does support for school prayer vary by gender? prtest schpray, by(sex) household Success? Treated? 1 2 3 4 …40

Testing means: z vs. t - stats General rule of thumb..not always

ttest “variable” = “null hypothesis” if “the condition” , level (?) Stata Command: ttest ttest “variable” = “null hypothesis” if “the condition” , level (?) Note that one or two “=” signs are OK in the first part of the command Two “=” signs are required in the “if” clause Stata defaults to 95% level Level is the “confidence” level Even though your alternative hypothesis may be one-tailed, the Stata command ALWAYS uses “=”. Note that putting in > or < for the ttest command will cause Stata errors

Import excel sheet to STATA Download Lab_3_Data.xls File -> import -> excel spreadsheet -> Lab_3_Data.xls Select “Import first row as variable names”

Working Hours sum hrs1 α =5% = .05 In the 1990s, the average workweek was 42.5 hours. In 1999, the legislature passed a bill to limit the average workweek to 40 hours. In 2000, are average work hours equal to 42.5? Use alpha 5%. sum hrs1 n = 1818 Dataset is a sample of the population Sample is representative. We assume the sample is random. Distribution of means is normal. α =5% = .05

Fail to reject the null hypothesis ttest command ttest hrs1=42.5 if wrkstat=="working" Fail to reject the null hypothesis

Properties of p-values The p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true If smaller than alpha then H1 is true We cannot compare one p-value to that of another sample p-value is not dependent on alpha Compare p-value to α and decide whether or not to reject the null with p-value P-value < α  reject HO P-value >= α  reserve judgment on HO

Interpreting Stata Key values: population mean, sample mean, t-value, d.f., one/two tailed, p-value Decide whether p-value is greater than or less than your alpha. Reject or fail to reject null hypothesis accordingly.

Working Hours: conclusions Based on this sample of 1818 workers taken in 2000, we were unable to say that the average hours worked per week was not 42.5. We cannot conclude that the bill to reduce work hours has lowered the average workweek since 1999. However, we could have asked a bunch of workaholics or lazy workers (two-tailed) when the reality of my population is different. In this case, we would have failed to reject the null when I should have rejected it (Type II error).

Errors: Alpha and Beta α = alpha = Type I error “false positives” (see Reinhart, p. 11) You rejected the Null Hypothesis when you shouldn’t have Example: jury convicted innocent person Used for making decisions about null hypotheses Example: You found that the average number of cigarettes consumed this year is different from last year, when in fact average cigarette consumption did not change. β = beta = Type II error “false negatives” (see Reinhart, p. 11) You failed to reject the Null Hypothesis when you should have Example: jury frees a guilty person Difficult to compute – we will not quantify it in this course Example: You found that the average number of cigarettes consumed this year is the same as last year, when in fact average cigarette consumption has changed.

Reject the null hypothesis Do you reject the null with an alpha of 10%? Reject the null hypothesis Based on this sample of 1818 workers taken in 2000, I found that the average hours worked per week was not equal to 42.5. I can conclude that the average hours worked are statistically significantly different since the bill was passed in 1999. However, there is a 10% chance that I made this conclusion when it is not true. For example, I might have asked a lot of people working fewer hours when in reality most people work more than the ones that I talked to. In this case, I would have rejected the null when I should have not rejected it (Type I error). Note that we only quantify type I error.

Relationships / formulas standard error = SD / sq root of sample size [sample] [pop estimate] [sample] t statistic = sample mean – pop mean / standard error