Basic Statistics II. Significance/hypothesis tests.

Slides:



Advertisements
Similar presentations
HYPOTHESIS TESTING. Purpose The purpose of hypothesis testing is to help the researcher or administrator in reaching a decision concerning a population.
Advertisements

The t-distribution William Gosset lived from 1876 to 1937 Gosset invented the t -test to handle small samples for quality control in brewing. He wrote.
9.4 t test and u test Hypothesis testing for population mean Example : Hemoglobin of 280 healthy male adults in a region: Question: Whether the population.
Introduction to Hypothesis Testing
Our goal is to assess the evidence provided by the data in favor of some claim about the population. Section 6.2Tests of Significance.
One sample means Testing a sample mean against a population mean.
Pitfalls of Hypothesis Testing + Sample Size Calculations.
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Chi square.  Non-parametric test that’s useful when your sample violates the assumptions about normality required by other tests ◦ All other tests we’ve.
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Independent and Dependent Variables Between and Within Designs.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 8-1 TUTORIAL 6 Chapter 10 Hypothesis Testing.
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Sample Size Determination Ziad Taib March 7, 2014.
1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests.
Hypothesis Testing.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Fundamentals of Hypothesis Testing: One-Sample Tests
Chapter 8 Introduction to Hypothesis Testing
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap th Lesson Introduction to Hypothesis Testing.
Chapter 9 Hypothesis Testing: Single Population
T tests comparing two means t tests comparing two means.
EDUC 200C Friday, October 26, Goals for today Homework Midterm exam Null Hypothesis Sampling distributions Hypothesis testing Mid-quarter evaluations.
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
Chapter 10 Hypothesis Testing
Unit 8 Section : z Test for a Mean  Many hypotheses are tested using the generalized statistical formula: Test value = (Observed Value)-(expected.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
S-012 Testing statistical hypotheses The CI approach The NHST approach.
Chap 8-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 8 Introduction to Hypothesis.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
How confident are we in the estimation of mean/proportion we have calculated?
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall 9-1 σ σ.
Power Analysis for Traditional and Modern Hypothesis Tests
Lecture: Forensic Evidence and Probability Characteristics of evidence Class characteristics Individual characteristics  features that place the item.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
T tests comparing two means t tests comparing two means.
The t-distribution William Gosset lived from 1876 to 1937 Gosset invented the t -test to handle small samples for quality control in brewing. He wrote.
Statistical Inferences for Variance Objectives: Learn to compare variance of a sample with variance of a population Learn to compare variance of a sample.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Chapter 9: Hypothesis Tests for One Population Mean 9.5 P-Values.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Testing of Hypothesis. Test of Significance A procedure to assess the significance of a statistics that a sample statistics would differ from a given.
HYPOTHESIS TESTING.
Part Four ANALYSIS AND PRESENTATION OF DATA
Dr.MUSTAQUE AHMED MBBS,MD(COMMUNITY MEDICINE), FELLOWSHIP IN HIV/AIDS
Hypothesis Testing II: The Two-sample Case
Hypothesis tests Single sample Z
Data Analysis and Interpretation
Hypothesis Testing: Two Sample Test for Means and Proportions
Hypothesis Testing: Hypotheses
اختبار الفرضيات اختبارالفرضيات المتعلقة بالوسط
What does it mean to say that the results of an experiment are (or are not) statistically significant? The significance level,  (conventionally set to.
One-Way Analysis of Variance: Comparing Several Means
Two-sided p-values (1.4) and Theory-based approaches (1.5)
Some Nonparametric Methods
SMOKERS NONSMOKERS Sample 1, size n1 Sample 2, size n2
Comparing Populations
Wednesday, October 14 Sampling distribution of the mean.
YOU HAVE REACHED THE FINAL OBJECTIVE OF THE COURSE
F-tests Testing hypotheses.
Hypothesis Testing.
Lecture: Forensic Evidence and Probability Characteristics of evidence
What are their purposes? What kinds?
SMOKERS NONSMOKERS Sample 1, size n1 Sample 2, size n2
YOU HAVE REACHED THE FINAL OBJECTIVE OF THE COURSE
Chapter 9 Hypothesis Testing: Single Population
BHS Methods in Behavioral Sciences I
Presentation transcript:

Basic Statistics II

Significance/hypothesis tests

RCT comparing drug A and drug B for the treatment of hypertension 50 patients allocated to A 50 patients allocated to B Outcome = systolic BP at 3 months

Results Group A Mean = 145, sd = 9.9 Group B Mean = 135, sd = 10.0

Null hypothesis : “μ (A) = μ (B)” [ie. difference equals 0] Alternative hypothesis : “μ (A) ≠ μ (B)” [ie. difference doesn’t equal zero] [where μ = population mean]

Statistical problem When can we conclude that the observed difference mean(A) - mean(B) is large enough to suspect that μ (A) - μ (B) is not zero?

P-value : “probability of obtaining observed data if the null hypothesis were true” [eg. if no difference in systolic BP between two groups]

How do we evaluate the probability?

Test Statistic Numerical value which can be compared with a known statistical distribution Expressed in terms of the observed data and the data expected if the null hypothesis were true

Test statistic [mean (A) – mean (B)] / sd [mean(A)-mean(B)] Under null hypothesis this ratio will follow a Normal distribution with mean = 0 and sd = 1

Hypertension example Test statistic = [mean (A) – mean (B)] / sd [mean(A)-mean(B)] = [ 145 – 135 ] / 1.99 = 5 → p <0.001

Interpretation Drug B results in lower systolic blood pressure in patients with hypertension than does Drug A

Two-sample t-test Compares two independent groups of Normally distributed data

Significance test example I

Null hypothesis : “μ (A) = μ (B)” [ie. difference equals 0] Alternative hypothesis : “μ (A) ≠ μ (B)” [ie. difference doesn’t equal zero] Two-sided test

Null hypothesis : “μ (A) = μ (B) or μ (A) < μ (B) ” Alternative hypothesis : “μ (A) > μ (B)” One-sided test

A one-sided test is only appropriate if a difference in the opposite direction would have the same meaning or result in the same action as no difference

Paired-sample t-test Compares two dependent groups of Normally distributed data

Paired-sample t-test Mean daily dietary intake of 11 women measured over 10 pre-menstrual and 10 post-menstrual days

Dietary intake example Pre-menstrual (n=11): Mean=6753kJ, sd=1142 Post-menstrual (n=11): Mean=5433kJ, sd=1217 Difference Mean=1320, sd=367

Dietary intake example Test statistic = 1320/[367/sqrt(11)] = 11.9 p<0.001

Dietary intake example Dietary intake during the pre- menstrual period was significantly greater than that during the post- menstrual period

The equivalent non-parametric tests Mann-Whitney U-test Wilcoxon matched pairs signed rank sum test

Non-parametric tests Based on the ranks of the data Use complicated formula Hence computer package is recommended

Significance test example II

Type I error Significant result when null hypothesis is true (0.05) Type II error Non-significant result when null hypothesis is false [Power = 1 – Type II]

The chi-square test Used to investigate the relationship between two qualitative variables The analysis of cross-tabulations

The chi-square test Compares proportions in two independent samples

Chi-square test example In an RCT comparing infra-red stimulation (IRS) with placebo on pain caused by osteoarthritis, 9/12 in IRS group ‘improved’ compared with 4/13 in placebo group

Chi-square test example Improve? Yes No Placebo IRS

Placebo : 4/13 = 31% improve IRS: 9/12 = 75% improve

Cross-tabulations The chi-square test tests the null hypothesis of no relationship between ‘group’ and ‘improvement’ by comparing the observed frequencies with those expected if the null hypothesis were true

Cross-tabulations Expected frequency = row total x col total grand total

Chi-square test example Improve? Yes No Placebo IRS Expected value for ‘4’ = 13 x 13 / 25 = 6.8

Expected values Improve? Yes No Placebo IRS

Test Statistic =  (observed freq – expected freq) 2 expected freq

Test Statistic =  (O – E) 2 E = ( ) 2 /6.8 + (9 – 6.2) 2 /6.2 + ( ) 2 /6.8 + (9 – 6.2) 2 /6.2 = 4.9 → p=0.027

Chi-square test example Statistically significant difference in improvement between the IRS and placebo groups

Small samples The chi-square test is valid if: at least 80% of the expected frequencies exceed 5 and all the expected frequencies exceed 1

Small samples If criterion not satisfied then combine or delete rows and columns to give bigger expected values

Small samples Alternatively: Use Fisher’s Exact Test [calculates probability of observed table of frequencies - or more extreme tables-under null hypothesis]

Yates’ Correction Improves the estimation of the discrete distribution of the test statistic by the continuous chi-square distribution

Chi-square test with Yates’ correction Subtract ½ from the O-E difference  (|O – E|-½) 2 E

Significance test example III

McNemar’s test Compares proportions in two matched samples

McNemar’s test example Severe cold age 14 Yes No Severe Yes cold No age

McNemar’s test example Null hypothesis = proportions saying ‘yes’ on the 1 st and 2 nd occasions are the same  the frequencies for ‘yes,no’ and ‘no,yes’ are equal

McNemar’s test Test statistic based on observed and expected ‘discordant’ frequencies Similar to that for simple chi-square test

McNemar’s test example Test statistic = 31.4 => p <0.001 Significant difference between the two ages

Significance test example IV

Comparison of means 2 groups 2-sample t-test 3 or more groups ANOVA

One-way analysis of variance Example: Assessing the effect of treatment on the stress levels of a cohort of 60 subjects. 3 age-groups: 15-25, 26-45, Stress measured on scale 0-100

Stress levels GroupMean (SD) (n=20)52.8 (11.2) (n=20)33.4 (15.0) (n=20)35.6 (11.7)

Graph of stress levels

ANOVA Sum of squares DfMean square FSig Between groups <0.001 Within groups Total

Interpretation Significant difference between the three age-groups with respect to stress levels But what about the specific (pairwise) differences?

Stress levels GroupMean (SD) (n=20)52.8 (11.2) (n=20)33.4 (15.0) (n=20)35.6 (11.7)

Multiple comparisons Comparing each pair of means in turn gives a high probability of finding a significant result by chance A multiple comparison method (eg. Scheffé, Duncan, Newman-Keuls) makes appropriate adjustment

Scheffés test Comparison vs p< vs p< vs p=0.86

Stress levels GroupMean (SD) (n=20)52.8 (11.2) (n=20)33.4 (15.0) (n=20)35.6 (11.7)

Comparison of medians 2 groups Mann-Whitney 3 or more groups Kruskal-Wallis

Kruskal-Wallis Example: Stress levels Overall comparison of 3 groups: p<0.001

Multiple comparisons There are no non-parametric equivalents to the multiple comparison tests such as Scheffés Need to apply Bonferroni’s correction to multiple Mann-Whitney U-tests

Bonferroni’s correction For k comparisons between means: multiply each p value by k

Mann-Whitney U-test Comparison vs p< vs p< vs p=0.68 Need to multiple each p-value by 3

Significance test example V