Inferential statistics Study a sample Conclude about the population Two processes: Estimation (Point or Interval) Hypothesis testing
Objective of hypothesis testing Detect differences between groups Between men and women in taking alcohol Between rural and urban in incomes Detect associations between variables Between smoking and lung cancer Between birth weight and maternal height
Hypothesis testing Standard statistical procedure Make judgment based on sample estimates to unknown population parameters
Null hypothesis (Ho) Relates to particular hypothesis under study States, No relationship or difference A hypothesis to be rejected or not
Alternative hypothesis (H1) It disagrees with Ho States, there is a relationship or difference
Note: Ho and H1 Both are concerned with the population Statements referring to the population Conclusions based on the sample Possibility of errors (sampling errors)
Note: Ho and H1 You can never 100% sure Ho is true or not Can ONLY say how likely the hypotheses are
Example Is the familial aggregation of cardiovascular risk factors in general and lipid levels in particular. Suppose we know that the average cholesterol level in children is 175 mg%/mL. Identify a group of men who have died from heart disease within past year. Measure the cholesterol levels of their offspring
Example The average cholesterol level of these children is 175 mg%/mL (Ho) The average cholesterol level of these children is not 175 mg%/mL (H1) Note: Two-sided
Referring to the mean Ho: μA = x-barB H1: μA ≠ x-barB
Test statistics Measures used to reject or fail to reject the Ho Examples: Z-test (SND), t-test, Chi-square test, etc
Type I and II errors Decision to reject Ho has errors One may reject the ‘correct’ hypothesis
Type I and Type II errors Decison: Ho In reality Ho is TRUE Ho is FALSE REJECT Type I error (α) Correct desicion (1 – β) DO NOT (1 – α) Type II error (β)
Example Is the familial aggregation of cardiovascular risk factors in general and lipid levels in particular. Suppose we know that the average cholesterol level in children is 175 mg%/ml. Identify a group of men who have died from heart disease within past year. Measure the cholesterol levels of their offspring
Example What would be Type I error?
Example The probability of deciding that the offspring of men who have died from heart disease have an average cholesterol not equal to 175 mg%/mL when if fact their average cholesterol level is 175 What would be Type II error? The probability to decide that the offspring have normal cholesterol levels when in fact their cholesterol levels are not equal to 175 mg%/mL
Significance level English meaning ‘Important’ Here: ‘Probably true’ (not due to chance) A finding may be true but NOT important SL show how likely are results due to chance (consider results to be rare)
Significance level Probability value small enough for you to reject the null hypothesis. Normally set at 5% Five percent chance of not being true 95% chance of being true IF YOU LOOKED AT THE ENTIRE POPULATION
Critical region/ value The critical region of a hypothesis test is the set of all outcomes which, if they occur, cause the null hypothesis to be rejected and the alternative hypothesis accepted
Critical region/ value The value of a test statistic at or beyond which we will reject Ho A boundary that is “improbable” if the null hypothesis is true
Critical region/ value The value of a test statistic at or beyond which we will reject Ho A boundary that is “improbable” if the null hypothesis is true Value of a test statistic at or beyond which we will reject Ho
p-value The probability that the obtained results are due to chance IF Ho is true This chance = Type I error IF Ho is TRUE, it is a probability to obtain a test statstic value as or more extreme than the observed test statistic value
p-value Large p-values (p > 0.05) suggest Ho Small p-values (p < 0.05) evidence for H1 (= vailability of difference, relationship) p < 0.01 very strong evidence in favour of H1. There is a difference In statistical software, indicated by “ Sig.”
Statistical tests Mathematical formulae that produce p-values to allow investigators to assess the likelihood that chance accounts for the results observed in the study