Statistical Inference and Regression Analysis: Stat-GB. 3302

Statistical Inference and Regression Analysis: Stat-GB. 3302
Statistical Inference and Regression Analysis: Stat-GB , Stat-UB Professor William Greene Stern School of Business IOMS Department Department of Economics

Part 5 – Hypothesis Testing

Objectives of Statistical Analysis
Estimation How long do hard drives last? What is the median income among the 99%ers? Inference – hypothesis testing Did minorities pay higher mortgage rates during the housing boom? Is there a link between environmental factors and breast cancer on eastern long island?

General Frameworks Parametric Tests: features of specific distributions such as the mean of a Bernoulli or normal distribution. Specification Tests (Semiparametric) Do the data arrive from a Poisson process Are the data normally distributed Nonparametric Tests: Are two discrete processes independent?

Hypotheses Hypotheses - labels State 0 of Nature – Null Hypothesis
State 1 – Alternative Hypothesis Exclusive: Prob(H0 ∩ H1) = 0 Exhaustive: Prob(H0) + Prob(H1) = 1 Symmetric: Neither is intrinsically “preferred” – the objective of the study is only to support one or the other. (Rare?)

Testing Strategy

Posterior (to the Evidence) Odds

Does the New Drug Work? Hypotheses: H0 = .50, H1 = .75
Priors: P0 = .40, P1 = .60 Clinical Trial: N = 50, 31 patients “respond’” p = .62 Likelihoods: L0 (31 of 50|  =.50) = Binomial(50,31,.50) = L1 (31 of 50|  =.75) = Binomial(50,31,.75) = Posterior odds in favor of H0 = (.4/.6)( / ) = > 1 Priors favored H1 1.5 to 1, but the posterior odds favor H0 , to 1. The evidence discredits H1

Decision Strategy Prefer the hypothesis with the higher posterior odds
A gap in the theory: How does the investigator do the cost benefit test? Starting a new business venture or entering a new market: Priors and market research FDA approving a new drug or medical device. Priors and clinical trials Statistical Decision Theory adds the costs and benefits of decisions and errors.

An Alternative Strategy
Recognize the asymmetry of null and alternative hypotheses. Eliminate the prior odds (which are rarely formed or available).

http://query. nytimes. com/gst/fullpage. html

Classical Hypothesis Testing
The scientific method applied to statistical hypothesis testing Hypothesis: The world works according to my hypothesis Testing or supporting the hypothesis Data gathering Rejection of the hypothesis if the data are inconsistent with it Retention and exposure to further investigation if the data are consistent with the hypothesis Failure to reject is not equivalent to acceptance.

Asymmetric Hypotheses
Null Hypothesis: The proposed state of nature Alternative hypothesis: The state of nature that is believed to prevail if the null is rejected.

Hypothesis Testing Strategy
Formulate the null hypothesis Gather the evidence Question: If my null hypothesis were true, how likely is it that I would have observed this evidence? Very unlikely: Reject the hypothesis Not unlikely: Do not reject. (Retain the hypothesis for continued scrutiny.)

Some Terms of Art Type I error: Incorrectly rejecting a true null
Type II error: Failure to reject a false null Power of a test: Probability a test will correctly reject a false null Alpha level: Probability that a test will incorrectly reject a true null. This is sometimes called the size of the test. Significance Level: Probability that a test will retain a true null = 1 – alpha. Rejection Region: Evidence that will lead to rejection of the null Test statistic: Specific sample evidence used to test the hypothesis Distribution of the test statistic under the null hypothesis: Probability model used to compute probability of rejecting the null. (Crucial to the testing strategy – how does the analyst assess the evidence?)

Possible Errors in Testing
Hypothesis is Hypothesis is True False Correct Decision Type II Error Type I Error I Do Not Reject the Hypothesis I Reject the Hypothesis

A Legal Analogy: The Null Hypothesis is INNOCENT
Null Hypothesis Alternative Hypothesis Not Guilty Guilty Correct Decision Type II Error Guilty defendant goes free Type I Error Innocent defendant is convicted Finding: Verdict Not Guilty Finding: Verdict Guilty The errors are not symmetric. Most thinkers consider Type I errors to be more serious than Type II in this setting.

(Jerzy) Neyman – (Karl) Pearson Methodology
“Statistical” testing Methodology Formulate the “null” hypothesis Decide (in advance) what kinds of “evidence” (data) will lead to rejection of the null hypothesis. I.e., define the rejection region Gather the data Mechanically carry out the test.

Formulating the Null Hypothesis
Stating the hypothesis: A belief about the “state of nature” A parameter takes a particular value There is a relationship between variables And so on… The null vs. the alternative By induction: If we wish to find evidence of something, first assume it is not true. Look for evidence that leads to rejection of the assumed hypothesis. Evidence that rejects the null hypothesis is significant

Example: Credit Scoring Rule
Investigation: I believe that Fair Isaacs relies on home ownership in deciding whether to “accept” an application. Null hypothesis: There is no relationship Alternative hypothesis: They do use homeownership data. What decision rule should I use?

Some Evidence = Homeowners 5469 5030 1845 1100

Hypothesis Test Acceptance rate for homeowners = 5030/( ) = Acceptance rate for renters is H0: Acceptance rate for renters is not less than for owners. H0: p(renters) > H1: p(renters) <

The Rejection Region What is the “rejection region?”
Data (evidence) that are inconsistent with my hypothesis Evidence is divided into two types: Data that are inconsistent with my hypothesis (the rejection region) Everything else

My Testing Procedure I will reject H0 if p(renters) < (chosen arbitrarily) Rejection region is sample values of p(renters) < 0.815

Distribution of the Test Statistic Under the Null Hypothesis
Test statistic p(renters) = 1/N i Accept(=1 or 0) Use the central limit theorem: Assumed mean = Implied standard deviation = sqr(.82055*.17945/7413)=.00459 Using CLT, normally distributed. (N is very large). Use z = (p(renters) ) /

Alpha Level and Rejection Region
Prob(Reject H0|H0 true) = Prob(p < .815 | H0 is true) = Prob[(p )/.00459) = Prob[z < ] = Probability of a Type I error Alpha level for this test

Distribution of the Test Statistic and the Rejection Region
Area=.11333

The Test The observed proportion is 5469/( ) = 5469/7314 = The null hypothesis is rejected at the % significance level (by the design of the test)

Power of the test

Power Function for the Test (Power = size when alternative = the null

Application: Breast Cancer On Long Island
Null Hypothesis: There is no link between the high cancer rate on LI and the use of pesticides and toxic chemicals in dry cleaning, farming, etc. Neyman-Pearson Procedure Examine the physical and statistical evidence If there is convincing covariation, reject the null hypothesis What is the rejection region? The NCI study: Working null hypothesis: There is a link: We will find the evidence. How do you reject this hypothesis?

Formulating the Testing Procedure
Usually: What kind of data will lead me to reject the hypothesis? Thinking scientifically: If you want to “prove” a hypothesis is true (or you want to support one) begin by assuming your hypothesis is not true, and look for evidence that contradicts the assumption.

Hypothesis About a Mean
I believe that the average income of individuals in a population is $30,000. H0 : μ = $30,000 (The null) H1: μ ≠ $30,000 (The alternative) I will draw the sample and examine the data. The rejection region is data for which the sample mean is far from $30,000. How far is far????? That is the test.

Application The mean of a population takes a specific value:
Null hypothesis: H0: μ = $30,000 H1: μ ≠ $30,000 Test: Sample mean close to hypothesized population mean? Rejection region: Sample means that are far from $30,000

Deciding on the Rejection Region
If the sample mean is far from $30,000, reject the hypothesis. Choose, the region, for example, The probability that the mean falls in the rejection region even though the hypothesis is true (should not be rejected) is the probability of a type 1 error. Even if the true mean really is $30,000, the sample mean could fall in the rejection region. Rejection Rejection 29, , ,500

Reduce the Probability of a Type I Error by Making the (non)Rejection Region Wider
Reduce the probability of a type I error by moving the boundaries of the rejection region farther out. Probability outside this interval is large. 28,500 29, , , ,500 You can make a type I error impossible by making the rejection region very far from the null. Then you would never make a type I error because you would never reject H0. Probability outside this interval is much smaller.

Setting the α Level “α” is the probability of a type I error
Choose the width of the interval by choosing the desired probability of a type I error, based on the t or normal distribution. (How confident do I want to be?) Multiply the z or t value by the standard error of the mean.

Testing Procedure The rejection region will be the range of values greater than μ0 + zσ/√N or less than μ0 - zσ/√N Use z = for 1 - α = 95% Use z = for 1 - α = 99% Use the t table if small sample, variance is estimated and sampling from a normal distribution.

Deciding on the Rejection Region
If the sample mean is far from $30,000, reject the hypothesis. Choose, the region, say, Rejection Rejection I am 95% certain that I will not commit a type I error (reject the hypothesis in error). (I cannot be 100% certain.)

The Testing Procedure (For a Mean)

Application

The Test Procedure Choosing z = 1.96 makes the probability of a Type I error 0.05. Choosing z = would reduce the probability of a Type I error to 0.01. Reducing the probability of a Type I error reduces the power of the test because it reduces the probability that the null hypothesis will be rejected.

P Value Probability of observing the sample evidence assuming the null hypothesis is true. Null hypothesis is rejected if P value < 

P value <  Prob[p(renter) <. 74774] = Prob[z < (. 74774 -
P value <  Prob[p(renter) < ] = Prob[z < ( )/.00459] = (-15.86) = * Impossible =.11333

Confidence Intervals For a two sided test about a parameter, a confidence interval is the complement of the rejection region. (Proof in text, p. 338)

Confidence Interval If the sample mean is far from $30,000, reject the hypothesis. Choose, the region, say, Rejection Confidence Rejection I am 95% certain that the confidence interval contains the true mean of the distribution of incomes. (I cannot be 100% certain.)

One Sided Tests H0  = 0, H1   0 Rejection region is sample mean far from 0 in either direction H0  = 0, H1  > 0. Sample means less than 0 cannot be in the rejection region. Entire rejection region is above 0. Reformulate: H0  < 0, H1  > 0.

Likelihood Ratio Tests

Carrying Out the LR Test
In most cases, exact distribution of the statistic is unknown Use -2log  Chi squared [1] For a test about 1 parameter, threshold value is 3.84 (5%) or 6.45 (1%)

Poisson Likelihood Ratio Test

Generalities About LR Test

Gamma Application

Specification Tests Generally a test about a distribution where the alternative is “some other distribution.” Test is generally based on a feature of the distribution that is true under the null but not true under the alternative.

Poisson Specification Tests
3820 observations on doctor visits Poisson distribution?

Deviance Test Poisson Distribution p(x) = exp(-)x/x!
H0: Everyone has the same Poisson Distribution H1: Everyone has their own Poisson distribution Under H0, observations will tend to be near the mean. Under H1, there will be much more variation. Likelihood ratio statistic (Text, p. 348)

Deviance Test

Dispersion Test Poisson Distribution p(x) = exp(-)x/x!
H0: The distribution is Poisson H1: The distribution is something else Under H0, the mean will be (almost) the same as the variance Approximate Likelihood ratio statistic (Text, p. 348) = N * Variance / Mean For the doctor visit data, this is 22,348.6 vs. chi squared with 1 degree of freedom. H0 is rejected.

Specification Test - Normality
Normal Distribution is symmetric and has kurtosis = 3. Compare observed 3rd and 4th moments to what would be expected from a normal distribution.

Symmetric and Skewed Distributions

Kurtosis: t[5] vs. Normal
Kurtosis of normal(0,1) = 3 Kurtosis of t[k] = 3 + 6/(k-4); for t[5] = 3+6/(5-4) = 9.

Bowman and Shenton Test for Normality

Testing for a Distribution
H0: The distribution is assumed H1: The assumed distribution is incorrect Strategy: Do the features of the sample resemble what we would observe if H0 were correct Continuous: CDF of data resemble CDF of the assumed distribution Discrete: Sample cell probabilities resemble predictions from the assumed distribution

Probability Plot for Normality

Normal (log)Income?

Random Sample from Normal

Normality Tests

Kolmogorov - Smirnov Test

Chi Squared Test for a Discrete Distribution
Outcomes = A1, A2,…, AM Predicted probabilities based on a theoretical distribution = E1(), E2(),…,EM(). Sample cell frequencies = O1,…,OM

Test Statistics

V2 Rocket Hits Adapted from Richard Isaac, The Pleasures of Probability, Springer Verlag, 1995, pp Km2 areas of South London in a grid (24 by 24) 535 rockets were fired randomly into the grid = N P(a rocket hits a particular grid area) = 1/576 = = θ Expected number of rocket hits in a particular area = 535/576 = How many rockets will hit any particular area? 0,1,2,… could be anything up to 535. The is the λ for a Poisson distribution:

1 2 3 4 5 6 7 8 9 10 11 12 13

Poisson Process θ = 1/169 N = 144 λ = 144 * 1/169 = 0.852
λ = 144 * 1/169 = 0.852 True Probabilities: P(X=0) = .4266 P(X=1) = .3634 P(X=2) = .1548 P(X=3) = .0437 P(X=4) = .0094 P(X>4) = .0021

Interpreting The Process
λ = Probabilities: P(X=0) = .4266 P(X=1) = .3634 P(X=2) = .1548 P(X=3) = .0437 P(X=4) = .0094 P(X>4) = .0021 There are 169 squares There are 144 “trials” Expect .4266*169 = 72.1 to have 0 hits/square Expect .3634*169 = 61.4 to have 1 hit/square Etc. Expect the average number of hits/square to = .852.

Does the Theory Work? Theoretical Outcomes Sample Outcomes Outcome
Probability Number of Cells Sample Proportion Number of cells .4266 72 .4733 80 1 .3634 61 .2899 49 2 .1548 26 .1539 3 .0437 7 .0769 13 4 .0094 .0059 > 4 .0021 .0000 169*Prob(Outcome) Observed frequencies

Chi Squared for the Bombing Run

Difference in Means of Two Populations
Two Independent Normal Populations Common known variance Common unknown variance Different Variances One and two sided tests Paired Samples Means of paired observations Treatments and Controls – Diff-in-Diff SAT Nonparametric – Mann/Whitney Two Bernoulli Populations

Comparing Two Normal Populations

Unknown Common Variance

Household Incomes, Equal Variances
t test of equal means INCOME by MARRIED MARRIED = 0 Nx = MARRIED = 1 Ny = t [ 3872] = P value = .0002 Mean Std.Dev Std.Error INCOME MARRIED = MARRIED =

Unknown Different Variances

2 Proportions Two Bernoulli Populations: Xi ~ Bernoulli with Prob(xi=1) = x Yi ~ Bernoulli with Prob(yi=1) = y H0: x = y The sample proportions are px = (1/Nx)ixi and py = (1/Ny)iyi Sample variances are px(1-px) and py(1-py). Use the Central Limit Theorem to form the test statistic.

z Test for Equality of Proportions
Application: Take up of public health insurance. t test of equal means PUBLIC by FEMALE FEMALE =0 Nx = FEMALE =1 Ny = t [ 3375] = P value = Mean Std.Dev Std.Error PUBLIC FEMALE = FEMALE =

Paired Sample t and z Test
Observations are pairs (Xi,Yi), i = 1,…,N Hypothesis x = y. Both normal distributions. May be correlated. Medical Trials: Smoking vs. Nonsmoking (separate individuals, probably independent) SAT repeat tests, before and after. (Definitely correlated) Test is based on Di = Xi – Yi. Same as earlier with H0:D = 0.

Treatment Effects SAT Do Overs
Experiment: X1, X2, …, XN = first SAT score, Y1, Y2, …, YN = second Treatment: T1,…,TN = whether or not the student took a Kaplan (or similar) prep score Hypothesis, y > x. Placebo: In Medical trials, N1 subjects receive a drug (treatment), N2 receive a placebo. Hypothesis: Effect is greater in the treatment group than in the control (placebo) group.

Measuring Treatment Effects

Treatment Effects in Clinical Trials
Does Phenogyrabluthefentanoel (Zorgrab) work? Investigate: Carry out a clinical trial. N+0 = “The placebo effect” N+T – N+0 = “The treatment effect” The hypothesis is that the difference in differences has mean zero. Placebo Drug Treatment No Effect N N0T Positive Effect N N+T

A Test of Independence In the credit card example, are Own/Rent and Accept/Reject independent? Hypothesis: Prob(Ownership) and Prob(Acceptance) are independent Formal hypothesis, based only on the laws of probability: Prob(Own,Accept) = Prob(Own)Prob(Accept) (and likewise for the other three possibilities. Rejection region: Joint frequencies that do not look like the products of the marginal frequencies.

Contingency Table Analysis
The Data: Frequencies Reject Accept Total Rent , , ,214 Own , , ,630 Total , , ,444 Step 1: Convert to Actual Proportions Reject Accept Total Rent Own Total

Independence Test Step 2: Expected proportions assuming independence: If the factors are independent, then the joint proportions should equal the product of the marginal proportions. [Rent,Reject] x = [Rent,Accept] x = [Own,Reject] x = [Own,Accept] x =

Comparing Actual to Expected

When is the Chi Squared Large?
Critical chi squared D.F Critical values from chi squared table Degrees of freedom = (R-1)(C-1).

Analyzing Default DEFAULT OWNRENT All All Do renters default more often (at a different rate) than owners? To investigate, we study the cardholders (only)

Hypothesis Test

Multiple Choices: Travel Mode
210 Travelers between Sydney and Melbourne 4 available modes, air, train, bus, car Among the observed variables is income. Does income help to explain mode choice? Hypothesis: Mode choice and income are independent.

Travel Mode Choices

Travel Mode Choices and Income
| Travel MODE Data | |INCOME | AIR TRAIN BUS CAR || Total | |LOW | || | | | || | | |MEDIUM | || | | | || | |HIGH | || | | | || | |==============================================++==========+ |Total | || | | | || |

Contingency Table | Travel MODE Data | |INCOME | AIR TRAIN BUS CAR || Total | | | || | |LOW | || | | | || | | | || | |MEDIUM | || | | | || | | || | |HIGH | || | | | || |==============================================++==========+ |Total | || | | | || | Assuming independence, P(Income,Mode) = P(Income) x P(Mode).

Computing Chi Squared For our transport mode problem, R = 3, C = 4, so DF = 2x3 = 6. The critical value is The hypothesis of independence is rejected.

Statistical Inference and Regression Analysis: Stat-GB. 3302

Similar presentations

Presentation on theme: "Statistical Inference and Regression Analysis: Stat-GB. 3302"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistical Inference and Regression Analysis: Stat-GB. 3302

Similar presentations

Presentation on theme: "Statistical Inference and Regression Analysis: Stat-GB. 3302"— Presentation transcript:

Similar presentations

About project

Feedback