HYPOTHESIS TESTING
STATISTICAL INFERENCE It is the procedure where inference about a population is made on the basis of the results obtained from a sample drawn from that population
STATISTICAL INFERENCE This can be achieved by : Hypothesis testing Estimation: Interval estimation (confidence interval)
Hypothesis It is a statement about one or more population. It is usually concerned with the parameter of the population about which the statement is made.
Purpose The purpose of hypothesis testing is to help the researcher or administrator in reaching a decision concerning a population by examining a sample from that population.
Research Hypothesis It is the assumption that motivate the research. When a health investigator seeks to understand or explain something, for example the effect of a toxin or a drug, he or she usually formulates his or her research question in the form of a hypothesis.
Statistical Hypothesis This is stated in a way that can be evaluated by appropriate statistical technique.
Statistical Hypothesis We define five stages when carrying out a hypothesis test: 1. Define the null and alternative hypotheses under study 2. Collect relevant data from a sample of individuals, then draw statistics from the samples (like mean or proportion) 3. Calculate the value of the test statistic ( calculated value) specific to the null Hypothesis (value of Z, t, Chi-square….) 4. Compare the value of the test statistic (Calculated value) to values from a known probability distribution tables ( like Z or t tables) 5. Interpret the P-value and results.
Statistical hypothesis Null hypothesis( Ho): It is the particular hypothesis under test, it is a statement of “no difference” Alternative hypothesis (HA): which disagree with the null hypothesis. The researcher then seeks evidence against the null hypothesis
the null hypothesis would be: Example : if we are interested in comparing smoking rates in men and women in the population, the null hypothesis would be: H0: smoking rates are the same in men and women in the population We then define the alternative hypothesis (Ha) which holds if the null hypothesis is not true. The alternative hypothesis relates more directly to the theory we wish to investigate. So, in the example, we might have: Ha: the smoking rates are different in men and women in the population.
Errors There are two possible errors to come to the wrong conclusion: Type 1 error: rejection of the null hypothesis when it is true. And conclude that there is an effect when, in reality, there is none. It is presented by alpha α , which is the level of significance, often the 5%, 1%, and 0.1% (α=0.05, 0.01, and 0.001) levels are chosen . The selection depends on the particular problem . α =Level of sgnificance = P(reject Ho/Ho is true)
Errors Type II error: we do not reject the null hypothesis when it is false, and conclude that there is no effect when one really exists. It is called Beta we can not decrease it practically < 5% , only by take an enough large sample size and other factors….
Errors
P-value It is the smallest value of α for which the Ho can be rejected , so it gives a more precise statement about probability of rejection of Ho when it is true than the alpha level, so instead of saying the test statistic is significant or not , we will mention the exact probability of rejecting the Ho when it is true
Test Statistic It is a mathematical expression of sample values which provides a basis for testing a statistical hypothesis . The result of this test will determine whether we will: accept the null hypothesis and so the Ha will be rejected , or we reject the null hypothesis and so the Ha will be accepted.
Test statistic It uses the data of the sample to reach to a decision to reject or to accept the null hypothesis. The general formula for a test statistic is: relevant statistic - hypothesized parameter Test statistic = ------------------------------------------------- standard error of the relevant statistic
Test statistic--Example _ x - µ Z=------------- σ/√n
Decision Rule It will tell us to reject the null hypothesis if the test statistic falls in the rejection area, and to accept it if it falls in the acceptance region
Decision Rule The critical values (tabulated value of α) that discriminate between acceptance and rejection regions depends on alpha level of significance If the value of the test statistic (calculated value) falls in the rejection region area , it is considered statistically significant If it falls in the acceptance area it is considered not statistically significant
Decision Rule Whenever we reject a null hypothesis , there is always a possibility of type 1 error( rejection of Ho when it is true). This is why we should decrease this error to the least possible.
Critical values The values of the test statistic ( α ) that separate the rejection region from the acceptance region
Acceptance region A set of values of the test statistic leading to acceptance of the null hypothesis ( values of the test statistic not included in the critical region)
Rejection region A set of values of the test statistic leading to rejection of the null hypothesis
Computed test statistic This should be computed and compared with the acceptance and rejection regions
Conclusion If Ho is rejected , we conclude that HA is true. If Ho is not rejected we conclude that HA may be true, but the evidence is not enough to accept HA.
Two sided test If the rejection area is divided into the two tails the test is called two-sided test. As if we have not specified any direction for the difference in smoking rates, i.e. we have not stated whether men have higher or lower rates than women in the population.
One sided test If all the rejection region is only in one tail it is called one-sided test The decision will depend on the nature of the research question being asked by the researcher When both large and small values will cause rejection of the Ho a two sided test is indicated. When either sufficiently “small” values only or sufficiently “large” values only will cause rejection of the Ho , a one sided test is indicated.
Student’s t-distribution
Student’s t-distribution In cases where the population variance σ2 is unknown we can use the sample variance S2 as the best point estimate for the population variance σ2
Student’s t-distribution The distribution will not follow the standard normal distribution (Z distribution), but it will follow the t-distribution
Student’s t-distribution The most important characteristics of t-distribution are : It has a mean of zero It is symmetric around the mean It ranges between - ∞ - +∞
Student’s t-distribution 4. Compared to the standard normal distribution the curve is less peaked with higher tails 5. The t-distribution approaches the standard normal distribution as the degrees of freedom approaches infinity
Student’s t-distribution The formula for calculating the value of t: _ X-µ t =---------- s /√n
t-table t0.995 t0.99 t0.975 t0.95 t 0.90 df 1 2 30 35 50 60 100 120 200
Single population mean , known population variance _ x -µ Z=------------- σ/√n
Single population mean with unknown population variance _ x -µ t =------------- s/√n
Difference between two population means with known population variances
Difference between two population mean with unknown but assumed equal variances 2 (S1)2 (n1 -1)+(S2)2(n2 -1) Sp = (Pooled variance) n1+n2-2 _ _ (X1 –X2) – (µ1-µ2) t=------------------------------- Sp√ 1 /n1 + 1 /n2
When we compare the means of 2 different samples. Size of the sample should be < 30 ( small sample) Degree of freedom df= (n1+n2)-2. H0: (µ1-µ2) = 0
Paired t-test _ d -µd t =------------- Sd /√n When we compare means from the same sample but in 2 occasions . D.f=n-1 _ d -µd t =------------- Sd /√n _ OR d = ( d)/n
Single population proportion ˜ P -P Z= ----------------- √P(1-P)/n
Difference between two population proportions ˜ ˜ (P1-P2) –(P1-P2) Z=--------------------------------------- √P1(1-P1)/n1 + P2(1-P2)/n2 If the population proportions are unavailable we use the sample proportion averages ( P)instead with the same Z test. [P= (x1 +X2 )/(n1 +n2)]
Example A certain breed of rats shows a mean weight gain of 65 gm, during the first 3 months of life. 16 of these rats were fed a new diet from birth until age of 3 months. The mean was 60.75 gm. If the population variance is 10 gm , is there a reason to believe at the 5% level of significance that the new diet causes a change in the average amount of weight gained
Answer Ho : x =µ =65 HA: x ≠ µ Z 1-α/2 α=0.05 Z=1.96 (critical value) _ x -µ 60.75-65 Z=---------- = ----------- = -5.38 σ /√n √10/ √16 Sine the calculated values falls in the rejection region , we reject the Ho, and accept the HA
In the above example solve : if the population variance is unknown, and the sample Sd is 3.84
Answer t 1- α/2 =0.975 =± 2.1315 df =15 _ x -µ 60.75-65 t =-------------=------------= - 4.1315 s /√n 3.84/ √16 Sine the calculated values falls in the rejection region , we reject the Ho, and accept the HA
Question In a study two types of dental cements were used to hold a crown on tooth cast. The amount of force in foot pounds required to pull each cemented crown from the cast was reported : _ X n σ ----------------------------------------------------------- Cement 1: 45 50 4.1 Cement 2: 42 50 3.4
Test the hypothesis that µ1= µ2 at α=0.05 If σ1 and σ2 are unknown but assumed to be equal test the same hypothesis if : S1=6.2 S2= 5.2 If σ1 and σ2 are unknown and unequal test the same hypothesis
Question In a dental clinic it is hypothesized that 90% of all 4-years old children give no evidence of dental caries. In a study of 100 children 82 gave no such evidence , would you accept the quoted value of the 90% ? Use α=0.05
Question Two communities were sampled to learn their positions concerning fluoridation prior to campaigns being launched. The results were : n1= 110 n2= 75 ˜ ˜ P1=0.52 P2=0.55 Do these two communities have equal proportions ˜
Question Consider the effectiveness of a sleeping drug in which the sleep of 10 patients was observed( in hours) during one night with the drug and one night with a placebo. The results present in the table below. Is the results between 2 nights significantly different.
Placebo Drug Patient No. 5.2 6.1 1 7.9 7 2 3.9 8.2 3 4.3 7.6 4 6.5 5 5.4 8.4 6 4.2 6.9 6.7 8 3.8 7.4 9 6.3 5.8 10
d2 d Placebo Drug Patient No. 0.81 0.9 5.2 6.1 -0.9 7.9 7 18.49 4.3 3.9 8.2 3 10.89 3.3 7.6 4 4.84 2.2 6.5 5 9 5.4 8.4 6 7.29 2.7 4.2 6.9 0.36 0.6 6.7 8 12.96 3.6 3.8 7.4 0.25 -0.5 6.3 5.8 10 65.7 19.2 Sd = 1.79 - d = 1.92