Hypothesis testing. Parametric tests

Slides:



Advertisements
Similar presentations
Our goal is to assess the evidence provided by the data in favor of some claim about the population. Section 6.2Tests of Significance.
Advertisements

Hypothesis Testing An introduction. Big picture Use a random sample to learn something about a larger population.
Inference Sampling distributions Hypothesis testing.
Chapter Seventeen HYPOTHESIS TESTING
Cal State Northridge  320 Ainsworth Sampling Distributions and Hypothesis Testing.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Chapter 9 Hypothesis Testing.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 11 Introduction to Hypothesis Testing.
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
AM Recitation 2/10/11.
Chapter 10 Hypothesis Testing
Fundamentals of Hypothesis Testing: One-Sample Tests
Introduction to Hypothesis Testing: One Population Value Chapter 8 Handout.
A Broad Overview of Key Statistical Concepts. An Overview of Our Review Populations and samples Parameters and statistics Confidence intervals Hypothesis.
Chapter 20 Testing hypotheses about proportions
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Chapter 20 Testing Hypothesis about proportions
1 Hypothesis Testing A criminal trial is an example of hypothesis testing. In a trial a jury must decide between two hypotheses. The null hypothesis is.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
A review of key statistical concepts. An overview of the review Populations and parameters Samples and statistics Confidence intervals Hypothesis testing.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
Chapter 13 Understanding research results: statistical inference.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
INTRODUCTION TO TESTING OF HYPOTHESIS INTRODUCTION TO TESTING OF HYPOTHESIS SHWETA MOGRE.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Outline Sampling Measurement Descriptive Statistics:
HYPOTHESIS TESTING.
Module 10 Hypothesis Tests for One Population Mean
Statistics for Managers Using Microsoft® Excel 5th Edition
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
Lecture Nine - Twelve Tests of Significance.
Hypothesis Testing: One Sample Cases
Part Four ANALYSIS AND PRESENTATION OF DATA
Chapter 5: Introduction to Statistical Inference
Unit 5: Hypothesis Testing
Inference and Tests of Hypotheses
Keller: Stats for Mgmt & Econ, 7th Ed Hypothesis Testing
Testing Hypotheses About Proportions
HYPOTHESIS TESTING Asst Prof Dr. Ahmed Sameer Alnuaimi.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Hypothesis Testing Is It Significant?.
Sampling and Sampling Distributions
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Chapter 9 Hypothesis Testing.
Testing Hypotheses about Proportions
Descriptive and inferential statistics
Review: What influences confidence intervals?
Discrete Event Simulation - 4
Hypothesis testing. Chi-square test
Hypothesis testing. Parametric tests
I. Statistical Tests: Why do we use them? What do they involve?
Association, correlation and regression in biomedical research
Elements of a statistical test Statistical null hypotheses
Psych 231: Research Methods in Psychology
Hypothesis testing. Association and regression
Testing Hypotheses About Proportions
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Chapter 9 Hypothesis Testing: Single Population
Type I and Type II Errors
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
STA 291 Spring 2008 Lecture 17 Dustin Lueker.
Presentation transcript:

Hypothesis testing. Parametric tests Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine

Outline Statistical inference Hypothesis testing Type I and type II errors Student t test ANOVA Parametric vs non-parametric tests

Importance of biostatistics Diabetes type 2 study Experimental group: Mean blood sugar level: 103 mg/dl Control group: Mean blood sugar level: 107 mg/dl Pancreatic cancer study Experimental group: 1-year survival rate: 23% Control group: 1-year survival rate: 20% Is there a difference? Statistics are needed to quantify differences that are too small to recognize through clinical experience alone.

Statistical inference Diabetes type 2 study Experimental group: Mean blood sugar level: 103 mg/dl Control group: Mean blood sugar level: 107 mg/dl Increased sample size: Experimental group: Mean blood sugar level: 99 mg/dl Control group: Mean blood sugar level: 112 mg/dl

Statistical inference assesses whether the means of two samples are statistically different from each other. This analysis is appropriate whenever you want to compare the means of two samples/ conditions mean arithmetic average a hypothetical value that can be calculated for a data set; it doesn’t have to be a value that is actually observed in the data set calculated by adding up all scores and dividing them by number of scores assumptions of a t-test: from a parametric population not (seriously) skewed no outliers independent samples µ1 µ2 X1 X2 Compare the mean between 2 samples/ conditions if 2 means are statistically different, then the samples are likely to be drawn from 2 different populations, ie they really are different

Statistical inference Diabetes type 2 study Experimental group: Mean blood sugar level: 103 mg/dl Control group: Mean blood sugar level: 107 mg/dl Increased sample size: Experimental group: Mean blood sugar level: 105 mg/dl Control group: Mean blood sugar level: 106 mg/dl

Statistical inference assesses whether the means of two samples are statistically different from each other. This analysis is appropriate whenever you want to compare the means of two samples/ conditions mean arithmetic average a hypothetical value that can be calculated for a data set; it doesn’t have to be a value that is actually observed in the data set calculated by adding up all scores and dividing them by number of scores assumptions of a t-test: from a parametric population not (seriously) skewed no outliers independent samples X1 µ X2 Compare the mean between 2 samples / conditions if 2 samples are taken from the same population, then they should have fairly similar means

Hypothesis testing The general idea of hypothesis testing involves: Making an initial assumption; Collecting evidence (data); Based on the available evidence (data), deciding whether to reject or not reject the initial assumption. Every hypothesis test — regardless of the population parameter involved — requires the above three steps.

Criminal trial Criminal justice system assumes the defendant is innocent until proven guilty. That is, our initial assumption is that the defendant is innocent. In the practice of statistics, we make our initial assumption when we state our two competing hypotheses – the null hypothesis (H0) and the alternative hypothesis (HA). Here, our hypotheses are: H0: Defendant is not guilty (innocent) HA: Defendant is guilty In statistics, we always assume the null hypothesis is true. That is, the null hypothesis is always our initial assumption.

Null hypothesis – H0 This is the hypothesis under test, denoted as H0. The null hypothesis is usually stated as the absence of a difference or an effect. The null hypothesis says there is no effect. The null hypothesis is rejected if the significance test shows the data are inconsistent with the null hypothesis.

Alternative hypothesis – H1 This is the alternative to the null hypothesis. It is denoted as H', H1, or HA. It is usually the complement of the null hypothesis. If, for example, the null hypothesis says two population means are equal, the alternative says the means are unequal

Criminal trial The prosecution team then collects evidence with the hopes of finding sufficient evidence to make the assumption of innocence refutable. In statistics, the data are the evidence. The jury then makes a decision based on the available evidence: If the jury finds sufficient evidence — beyond a reasonable doubt — to make the assumption of innocence refutable, the jury rejects H0 and deems the defendant guilty. We behave as if the defendant is guilty. If there is insufficient evidence, then the jury does not reject H0. We behave as if the defendant is innocent.

Making the decision Recall that it is either likely or unlikely that we would observe the evidence we did given our initial assumption. If it is likely, we do not reject the null hypothesis. If it is unlikely, then we reject the null hypothesis in favor of the alternative hypothesis. Effectively, then, making the decision reduces to determining likely or unlikely.

Making the decision In statistics, there are two ways to determine whether the evidence is likely or unlikely given the initial assumption: We could take the critical value approach (favored in many of the older textbooks). Or, we could take the P-value approach (what is used most often in research, journal articles, and statistical software).

Making the decision Suppose we find a difference between two groups in survival: patients on a new drug have a survival of 15 months; patients on the old drug have a survival of 18 months. So, the difference is 3 months. Do we accept or reject the hypothesis of no true difference between the groups (the two drugs)? Is a difference of 3 a lot, statistically speaking – a huge difference that is rarely seen? Or is it not much – the sort of thing that happens all the time?

Probability A measure of the likelihood that a particular event will happen. It is expressed by a value between 0 and 1. First, note that we talk about the probability of an event, but what we measure is the rate in a group. If we observe that 5 babies in every 1 000 have congenital heart disease, we say that the probability of a (single) baby being affected is 5 in 1000 or 0.005. 0.0 1.0 Cannot happen Sure to happen

Making the decision A statistical test tells you how often you would get a difference of 3, simply by chance, if the null hypothesis is correct – no real difference between the two groups. Suppose the test is done and its result is that P = 0.32. This means that you would get a difference of 3 quite often just by the play of chance – 32 times in 100 – even when there is in reality no true difference between the groups.

Making the decision A statistical test tells you how often you’d get a difference of 3, simply by chance, if the null hypothesis is correct – no real difference between the two groups. On the other hand if we did the statistical analysis and P = 0.0001, then we say that you’d only get a difference as big as 3 by the play of chance 1 time in 10 000. That’s so rarely that we want to reject our hypothesis of no difference: there is something different about the new therapy.

Hypothesis testing Somewhere between 0.32 and 0.0001 we may not be sure whether to reject the null hypothesis or not. Mostly we reject the null hypothesis when, if the null hypothesis were true, the result we got would have happened less than 5 times in 100 by chance. This is the conventional cutoff of 5% or P < 0.05. This cutoff is commonly used but it’s arbitrary i.e. no particular reason why we use 0.05 rather than 0.06 or 0.048 or whatever.

Hypothesis testing Decision: Reject null hypothesis Do not reject null hypothesis Null hypothesis is true Type I error No error Null hypothesis is false Type II error

Type I and II errors A type I error is the incorrect rejection of a true null hypothesis (also known as a false positive finding). The probability of a type I error is denoted by the Greek letter  (alpha). A type II error is incorrectly retaining a false null hypothesis (also known as a false negative finding). The probability of a type II error is denoted by the Greek letter  (beta).

Level of significance Level of significance (α) – the threshold for declaring if a result is significant. If the null hypothesis is true, α is the probability of rejecting the null hypothesis. α is decided as part of the research design, while P-value is computed from data. α = 0.05 is most commonly used. Small α value reduces the chance of Type I error, but increases the chance of Type II error. Trade-off based on the consequences of Type I (false-positive) and Type II (false-negative) errors.

Power Power – the probability of rejecting a false null hypothesis. Statistical power is inversely related to β or the probability of making a Type II error (power is equal to 1 – β). Power depends on the sample size, variability, significance level and hypothetical effect size. You need a larger sample when you are looking for a small effect and when the standard deviation is large.

Common misconceptions P-value is different from the level of significance α. P-value is computed from data, while α is decided as part of the experimental design. P-value is not the probability of the null hypothesis being true. P-value answers the following question: If the null hypothesis is true, what is the chance that random sampling will lead to a difference as large as or larger than observed in the study. A statistically significant result does not necessarily mean that the finding is clinically important. Look at the size of the effect and its precision. Lack of difference may be a meaningful result too!

Choosing a statistical test Choice of a statistical test depends on: Level of measurement for the dependent and independent variables; Number of groups or dependent measures; Number of units of observation; Type of distribution; The population parameter of interest (mean, variance, differences between means and/or variances).

Choosing a statistical test Multiple comparison – two or more data sets, which should be analyzed repeated measurements made on the same individuals; entirely independent samples. Degrees of freedom – the number of scores, items, or other units in the data set, which are free to vary One- and two tailed tests one-tailed test of significance used for directional hypothesis; two-tailed tests in all other situations. Sample size – number of cases, on which data have been obtained Which of the basic characteristics of a distribution are more sensitive to the sample size?

Student t-test 27 27

1-sample t-test Comparison of sample mean with a population mean It is known that the weight of young adult male has a mean value of 70.0 kg with a standard deviation of 4.0 kg. Thus the population mean, µ= 70.0 and population standard deviation, σ= 4.0. Data from random sample of 28 males of similar ages but with specific enzyme defect: mean body weight of 67.0 kg and the sample standard deviation of 4.2 kg. Question: Whether the studied group have a significantly lower body weight than the general population?

2-sample t-test Aim: Compare two means Example: Comparing pulse rate in people taking two different drugs Assumption: Both data sets are sampled from Gaussian distributions with the same population standard deviation Effect size: Difference between two means Null hypothesis: The two population means are identical Meaning of P value: If the two population means are identical, what is the chance of observing such a difference (or a bigger one) between means by chance alone?

Paired t-test Aim: Compare a continuous variable before and after an intervention Example: Comparing pulse rate before and after taking a drug Assumption: The population of paired differences is Gaussian Effect size: Mean of the paired differences Null hypothesis: The population mean of paired differences is zero Meaning of P value: If there is no difference in the population, what is the chance of observing such a difference (or a bigger one) between means by chance alone?

One-way ANOVA Aim: Compare three or more means Example: Comparing pulse rate in 3 groups of people, each group taking a different drug Assumption: All data sets are sampled from Gaussian distributions with the same population standard deviation Effect size: Fraction of the total variation explained by variation among group means Null hypothesis: All population means are identical Meaning of P value: If the population means are identical, what is the chance of observing such a difference (or a bigger one) between means by chance alone?

Parametric and non-parametric tests Parametric test – the variable we have measured in the sample is normally distributed in the population to which we plan to generalize our findings Non-parametric test – distribution free, no assumption about the distribution of the variable in the population

Parametric and non-parametric tests Type of test Non-parametric Parametric Scale Nominal Ordinal Ordinal, Interval, Ratio 1 group χ2 goodness of fit test Wilcoxon signed rank test 1-sample t-test 2 unrelated groups χ2 test Mann–Whitney U test 2-sample t-test 2 related groups McNemar test Paired t-test K unrelated groups Kruskal–Wallis H test ANOVA K related groups Friedman matched samples test ANOVA with repeated measurements