Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Slides:



Advertisements
Similar presentations
Statistical vs Clinical Significance
Advertisements

Introducing Hypothesis Tests
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Hypothesis Testing making decisions using sample data.
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
Find the Joy in Stats ? ! ? Walt Senterfitt, Ph.D., PWA Los Angeles County Department of Public Health and CHAMP.
Chapter 10: Hypothesis Testing
Confidence Intervals for Proportions
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Chapter 8 Hypothesis Testing I. Significant Differences  Hypothesis testing is designed to detect significant differences: differences that did not occur.
Nemours Biomedical Research Statistics March 19, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Cal State Northridge  320 Ainsworth Sampling Distributions and Hypothesis Testing.
Lecture 2: Thu, Jan 16 Hypothesis Testing – Introduction (Ch 11)
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Incidence, Prevalence and 95% Confidence Intervals Tom Walker and Matt Newman.
Ch. 9 Fundamental of Hypothesis Testing
PSY 307 – Statistics for the Behavioral Sciences
Today Concepts underlying inferential statistics
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests.
1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 5: Generalisability of Social Research and the Role of Inference Dr Gwilym Pryce.
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
Hadpop Calculations. Odds ratio What study applicable? Q. It is suggested that obesity increases the chances on an individual becoming infected with erysipelas.
Fundamentals of Hypothesis Testing: One-Sample Tests
Tests of significance & hypothesis testing Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Hypothesis Testing.
14. Introduction to inference
Testing Hypotheses Tuesday, October 28. Objectives: Understand the logic of hypothesis testing and following related concepts Sidedness of a test (left-,
Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T.
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Academic Viva POWER and ERROR T R Wilson. Impact Factor Measure reflecting the average number of citations to recent articles published in that journal.
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Making decisions about distributions: Introduction to the Null Hypothesis 47:269: Research Methods I Dr. Leonard April 14, 2010.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Agresti/Franklin Statistics, 1 of 122 Chapter 8 Statistical inference: Significance Tests About Hypotheses Learn …. To use an inferential method called.
Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed.
Chapter 20 Testing hypotheses about proportions
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Chapter 10 AP Statistics St. Francis High School Fr. Chris.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
10.1: Confidence Intervals Falls under the topic of “Inference.” Inference means we are attempting to answer the question, “How good is our answer?” Mathematically:
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
How confident are we in the estimation of mean/proportion we have calculated?
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Lecture & tutorial material prepared by: Dr. Shaffi Shaikh Tutorial presented by: Dr. Rufaidah Dabbagh Dr. Nurah Al-Amro.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Health and Disease in Populations 2002 Sources of variation (1) Paul Burton! Jane Hutton.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Statistical Techniques
Leicester Warwick Medical School Health and Disease in Populations Revision Paul Burton.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
PEP-PMMA Training Session Statistical inference Lima, Peru Abdelkrim Araar / Jean-Yves Duclos 9-10 June 2007.
Confidence Intervals and Hypothesis Testing Mark Dancox Public Health Intelligence Course – Day 3.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Review Statistical inference and test of significance.
Hypothesis Tests for 1-Proportion Presentation 9.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Unit 5 – Chapters 10 and 12 What happens if we don’t know the values of population parameters like and ? Can we estimate their values somehow?
Statistical inference
Presentation transcript:

Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)

Informal lecture objectives  Objective 1  To enable the student to distinguish between observed data and the underlying tendencies which give rise to those data  Objective 2:  To understand the concept of random variation

... Objective 3  Describe how ‘observed’ values provide knowledge of the ‘true’ values using  tests of hypotheses about about the true value  Confidence intervals give a range which include the ‘true’ value with a specific probability.

Neural tube defects in Western Australia ( ) – hypothetical data

Hypothesis testing 1.Calculate the probability of getting an observation as extreme as, or more extreme than, the one observed if the stated hypothesis was true. 2.If this probability is very small, then either a)something very unlikely has occurred; or b)the hypothesis is wrong 3.It is then reasonable to conclude that the data are incompatible with the hypothesis.  The probability is called a ‘p-value’

Remember!  IMPORTANT: Think of the implications  Rejecting H 0 is little use without a conclusion  p<0.05 is arbitrary; nothing special happens between p=0.049 and p=0.051  p= and p=0.6 are easy to interpret  False positive and false negative results  Statistical significance depends on sample size. Flip a coin 3 times  minimum p=0.25 (i.e. 2×1/8)  Statistically significant  clinically important  P values widely used

Conclusions - range of values Objective 3  Describe how ‘observed’ values help us towards a knowledge of the ‘true’ values by: b)Confidence intervals give a range which include the ‘true’ value with a specific probability.  Allowing us to test hypotheses about the true value

Any questions?

Estimation  In a study, we observe a 30% higher risk of TB in Warwick than in the rest of the UK  IRR of 1.3  H 0 ‘rejected’ (p=0.01)  But, what is our ‘best guess’ at the true excess risk?

 Informally  Values outside the range [10% excess risk to 50% excess risk] are in some sense ‘inconsistent’ with the data  The range [10% excess risk to 50% excess risk] probably includes the true value

The 95% confidence interval  A range which we can be 95% certain includes the true value of the underlying tendency.  The IRR for Warwick lies in (1.1, 1.5) with probability 95%.  Centred on the observed value (our best guess at the real underlying value).  So, the observed value always falls inside the 95% confidence interval

The 95% confidence interval  Fortunately, the link between hypothesis tests and confidence intervals means that we don’t have to calculate lots of p-values and check whether to reject the hypothesis. For this course, just use ‘error factor’.  Instead, simply calculate the ‘observed value’ and a second quantity called the ‘error factor’ (e.f.). Then:  (observed value  e.f.) is called the lower 95% confidence limit (CL)  (observed value  e.f.) is called the upper 95% confidence limit (CL)  The full range between the lower and upper 95% CLs is called the 95% confidence interval

An example  Observe 50 new cases of diabetes in a population of 2,000 people over 5 years.  Exposure = 2,000  5 = 10,000 person years  New cases = 50  Incidence = 50/10,000 = per person year = 5 per 1000 person years 

Diabetes example continued  Incidence = 50/10,000 = per person year  = 5 per 1,000 person years  Lower 95% CL =  1.33 =  Upper 95% CL =  1.33 =  So, our best estimate of the true incidence is 5 cases per 1,000 person years and we are 95% certain that the range 3.8 to 6.7 cases per 1,000 person years includes the true rate.

As we get more data  We get more and more sure about the underlying value: e.f. gets smaller and the 95% CI narrower  Observe 200 new cases of diabetes in a population of 40,000 people over 1 year.  Estimated rate = (same as before)   lower 95% CL =  1.15 =  upper 95% CL =  1.15 =  Best estimate still 5 cases per 1,000 person years, but now 95% certain that the true rate lies between 4.3 and 5.8 cases per 1,000 person years.

Any questions?

Confidence intervals  Reflect uncertainty about the true value of something, e.g. an incidence, a population prevalence, a population average height etc.  NOT a range within which 95% of individual observations lie

 50 cases, 10,000 p-yrs.  Estimate = 0.005, 95% CI (see above) = 3.8 to 6.7 cases per 1,000 person years.  But rates in 3 individual years fall outside this range!!

Another example  A sample of 50 students  Observed mean height = 1.675m  The 95% confidence interval for mean height is 1.65m to 1.70m  But 95% of the 50 students fall between 1.55m and 1.85m in height. This is called a reference range (or normal range) not a confidence interval.  This is an important distinction

Inference on a rate ratio  Population 1: d 1 cases in P 1 person years  Population 2: d 2 cases in P 2 person years   Rate ratio = d 1 /P 1  d 2 /P 2  Inference on a rate ratio  Population 1: d 1 cases in P 1 person years  Population 2: d 2 cases in P 2 person years  Confidence interval and test  Rate ratio = d 1 /P 1  d 2 /P 2 

Estimation versus hypothesis testing  Estimation is more informative  Estimation can incorporate a hypothesis test:  Hypothesis: the incidence of diabetes in population A is the same as that in B.  Data: Population A: 12 cases in 2,000 patient years Population B: 16 cases in 4,000 patient years  Rates:A: 12/2,000 = B: 16/4,000 =  Ratio of rates: A  B = 1.5

Estimation vs hypothesis testing ….  Estimation can incorporate a hypothesis test:  Ratio of rates = 1 if rates are the same.  Ratio of rates: A  B = 1.5   95% CI for rate ratio = 1.5  2.15 = 0.70 to 1.5  2.15 = The range [0.70 to 3.23] includes 1.00: data are consistent with the original hypothesis so cannot reject it (p>0.05). This does not prove it’s true!!

Another example  80 deaths in 8,000 person-yrs (male)  50 deaths in 10,000 person-yrs (female)  Rate M = 10 per 1,000 p-y; Rate F = 5 per 1,000 p-y  Observed rate ratio (M/F) = 2.0   95% CI: [2÷1.43 to 2×1.43] = [1.40 to 2.86]  Best estimate of true rate ratio=2.0, and 95% certain that true rate ratio lies between 1.40 and This range does not include 1.00 so able to reject hypothesis of equality (p<0.05)

Inference on an SMR  Observe O deaths  Expect E deaths (based on age-specific rates in the standard population and age-specific population sizes in the test population)  SMR = (O/E)  100 

Example for SMR  On basis of age specific rates in standard population expect 50 deaths in test population. Observe 60. (O=60, E=50)  SMR = (60/50)×100 = 120   95% CI for SMR = 120 ÷/× 1.29 = 93 to 155. CI includes 100 so data consistent with equality of death rate in test and standard populations (p>0.05). But also consistent with e.g. a 50% excess so certainly doesn’t prove equality.

Any questions?

Summary  All observations (disease rates, levels of occupational risk, effectiveness of new drugs etc) are subject to random variation  We always want to know about the underlying tendency = the true value of rates or risks  We use observed data to test hypotheses about the underlying value  We use observed data to estimate the underlying tendency

Summary  In this course the best estimate of the true value of the underlying tendency is the observed value  We express uncertainty by calculating error factors and deriving confidence intervals  A 95% confidence interval is the range which includes the true value of the statistic of interest with probability 95%.  It can also be viewed as the range of true values that are consistent with the observed data. If different values consistent with the observed data would lead to different conclusions you can only be uncertain what to conclude

Summary  Population A: rate=0.008; B: rate=0.002  Rate ratio = 4, e.f.=2, 95% CI [2 to 8]  All values in the 95% CI suggest A higher than B. Can safely conclude A higher than B. This is equivalent to saying the 95% CI does not include 1.00 (null hypothesis) so the rate ratio is significantly different from 1.00 (p<0.05)

Summary Population A: rate=0.01; B: rate=0.005  Rate ratio = 2, 95% CI [0.5 to 8]  Values in 95% consistent with: A much higher than B; A somewhat lower than B; or both the same. Cannot really conclude anything too firmly.  In this case 95% CI does include 1.00 (the null hypothesis) so the rate ratio is not significantly different from 1.00 (p>0.05) so cannot reject hypothesis of equality  But this does not prove that the rates are equal

Any questions?