Statistical Analysis II Lan Kong Associate Professor Division of Biostatistics and Bioinformatics Department of Public Health Sciences December 15, 2015.

Slides:



Advertisements
Similar presentations
Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
Advertisements

Inferential Statistics
Is it statistically significant?
Inference Sampling distributions Hypothesis testing.
Introduction to Statistics
STATISTICAL INFERENCE PART V
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
T-tests Computing a t-test  the t statistic  the t distribution Measures of Effect Size  Confidence Intervals  Cohen’s d.
Inferences On Two Samples
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Chapter 11: Inference for Distributions
Inferences About Process Quality
Stat 217 – Day 15 Statistical Inference (Topics 17 and 18)
Chapter 9 Hypothesis Testing.
5-3 Inference on the Means of Two Populations, Variances Unknown
Statistics for Managers Using Microsoft® Excel 5th Edition
Getting Started with Hypothesis Testing The Single Sample.
Inference about Population Parameters: Hypothesis Testing
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Choosing Statistical Procedures
Statistical Inference Dr. Mona Hassan Ahmed Prof. of Biostatistics HIPH, Alexandria University.
AM Recitation 2/10/11.
Hypothesis Testing:.
Experimental Statistics - week 2
Overview Definition Hypothesis
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Introduction to Biostatistics and Bioinformatics
Fundamentals of Hypothesis Testing: One-Sample Tests
Tests of significance & hypothesis testing Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
1/2555 สมศักดิ์ ศิวดำรงพงศ์
STAT 5372: Experimental Statistics Wayne Woodward Office: Office: 143 Heroy Phone: Phone: (214) URL: URL: faculty.smu.edu/waynew.
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
STATISTICAL INFERENCE PART VII
Comparing Two Population Means
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
Chapter 9: Testing Hypotheses
A Broad Overview of Key Statistical Concepts. An Overview of Our Review Populations and samples Parameters and statistics Confidence intervals Hypothesis.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Hypothesis Testing Hypothesis Testing Topic 11. Hypothesis Testing Another way of looking at statistical inference in which we want to ask a question.
The Practice of Statistics Third Edition Chapter 10: Estimating with Confidence Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.
Chapter 10 AP Statistics St. Francis High School Fr. Chris.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Confidence intervals and hypothesis testing Petter Mostad
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Chapter 221 What Is a Test of Significance?. Chapter 222 Thought Question 1 The defendant in a court case is either guilty or innocent. Which of these.
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Experimental Psychology PSY 433 Appendix B Statistics.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
9.3/9.4 Hypothesis tests concerning a population mean when  is known- Goals Be able to state the test statistic. Be able to define, interpret and calculate.
© Copyright McGraw-Hill 2004
Statistical Inference Drawing conclusions (“to infer”) about a population based upon data from a sample. Drawing conclusions (“to infer”) about a population.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 26 Chapter 11 Section 1 Inference about Two Means: Dependent Samples.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
When  is unknown  The sample standard deviation s provides an estimate of the population standard deviation .  Larger samples give more reliable estimates.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
DSCI 346 Yamasaki Lecture 1 Hypothesis Tests for Single Population DSCI 346 Lecture 1 (22 pages)1.
4-1 Statistical Inference Statistical inference is to make decisions or draw conclusions about a population using the information contained in a sample.
What are their purposes? What kinds?
Presentation transcript:

Statistical Analysis II Lan Kong Associate Professor Division of Biostatistics and Bioinformatics Department of Public Health Sciences December 15, 2015 CTSI BERD Research Methods Seminar Series

Basic statistical concepts l Descriptive statistics (numeric/graphical) l Population distribution vs. Sampling distribution l Standard Deviation vs. Standard Error l Estimation of population mean/proportion l Confidence interval l Hypothesis testing l P-value

Confidence Interval for population mean l An approximate 95% confidence interval for population mean µ is: ± 2×SEM or precisely l is a random variable (vary from sample to sample), so confidence interval is random and it has 95% chance of covering µ before a sample is selected. l Once a sample is taken, we observe, then either µ is within the calculated interval or it is not. l The confidence interval gives the range of plausible values for µ.

Example l 95% CI for  (mean blood pressure in the population) is 125 ± 2 x ± 2.8 l Ways to write CI: to (122.2, 127.8) (122.2 – 127.8) l The 95% error bound on is 2.8. l We are highly confident that the population mean falls in the range to 127.8

Confidence Interval Interpretation Technical interpretation l The CI “works” (includes µ) 95% of the time. l If we were to take 100 random samples each of the same size, approximately 95 of the CIs would include the true value of µ.

Confidence Interval Interpretation  Each bar represents a 95% CI created from a random sample of size n.

Underlying Assumptions In order to be able to use the formula Assumptions: n Random sample from population - important! n Observations in the sample are independent. n Sample size is large enough to support the Central Limit Theorem, how large depends on the population distribution.

Estimation of population proportion (p) Examples: l Proportion of patients who became infected l Proportion of patients who are cured l Proportion of individuals positive on a blood test l Proportion of adverse drug reactions l Proportion of premature infants who survive

Sampling Distribution of Sample Proportion l Sampling distribution of sample proportion can be approximated by normal distribution when sample size is sufficiently large (central limit theorem) l The standard error of a sample proportion is estimated by: l 95% Confidence Interval for a Proportion The rule of thumb for good normal approximation is

Example l In a study of 200 patients, 90 patients experienced adverse drug reactions l The estimated proportion who experience an adverse drug reaction is l 95% confidence interval for the population proportion is = (0.38, 0.52)

Hypothesis Testing One-sample test l Hypothesis specification l Test statistics l p-value l Significance level

Hypothesis for blood pressure example Suppose we want to know if the mean systolic blood pressure for the student population is different from the normal cutoff. Null hypothesis H 0 : μ = μ 0 (=120) l Alternative hypothesis H A : μ  120 n typically represents what you are trying to prove. l We reject H 0 if the sample mean is far away from 120.

Hypothesis Testing Question l Do our sample results allow us to reject H 0 in favor of H A ? n Sample mean would have to be far from 120 to claim H A is true. n Is =125 large enough to claim H A is true? n Maybe we have a large sample mean of 125 from a chance occurrence. n Maybe H 0 is true, and we just have an unusual sample. n We need some measure of how probable the result from our sample is, if the null hypothesis is true.  p-value

Test Statistics Test statistic is a score to measure how many standard errors the observed sample mean is away from null mean μ 0. If H 0 is true (μ = μ 0 ), consider l Z test statistic (normal distribution) when (i) Population is normally distributed or sample size is large enough and (ii) Population variance  2 is known. l T test statistic (t-distribution) when (i) Population is normally distributed or sample size is large enough and (ii) Population variance  2 is unknown.

How are p-values calculated? l In the SBP example, the observed value of T statistic is l We observed a sample mean that was 3.57 standard errors away from what we would have expected the mean to be if we assume H 0 is true. l Is a result of 3.57 standard errors above its mean unusual? n It depends on what kind of distribution we are dealing with. l The p-value is the probability of getting a test statistic as (or more) extreme than what you observed (3.57) by chance if H 0 was true. l The p-value comes from the sampling distribution of the test statistic.

Blood Pressure example l T statistic follows a t-distribution with degrees of freedom = n-1= 99 l p-value=P{|T|≥|t|}=P{|T|≥3.57}= (red area) Sampling distribution of T test statistic t-distribution If the mean SBP in the student population is the same as normal cutoff 120 mmHg, then the chance of seeing a sample mean as extreme or more extreme than 125 in a sample of 100 students is

Using the p-value to Make a Decision l We need to decide if our sample result is unlikely enough to have occurred by chance if the null was true. Our measure of this “unlikeliness” is our p- value, p = l We need to have a cutoff such that all p-values less than the cutoff result in a rejection of the null hypothesis. n The standard cutoff is 0.05, which is a somewhat arbitrary value. n The cutoff value is referred to as  or the significance level of the test.

l At the 0.05 level, the test results for the student SBP example is statistically significantly. There is sufficient evidence to conclude that the mean systolic blood pressure for the student population is different from the normal cutoff. l The p-value alone imparts no information about the scientific importance or substantive content in a study. Using the p-value to Make a Decision

More on the p-value l Statistical significance is not the same as scientific significance. l Suppose in the student SBP Example: n n = 100,000; = mmHg; s = 14 n p-value = l A large n can produce a small p-value, even though the magnitude of the difference is very small and may not be scientifically or substantively significant.

More on the p-value l Not rejecting H 0 is not the same as accepting H 0 l Suppose in the student SBP example n n = 5; = 135; s = 14 n p-value = 0.07 We cannot reject H 0 at significance level  = l But, are we really convinced mean SBP for student population is not different from normal cutoff, 120mmHg? l Maybe we should have taken a bigger sample?

Connection Between Hypothesis Testing and Confidence Intervals l The confidence interval gives a range of plausible values for the population parameter. If μ 0 is not in the 95% CI, then we would reject the null hypothesis that μ = μ 0 at level  = (The p-value will be < 0.05.) l In the student SBP example, the 95% confidence interval (122, 128) does not overlap 120, so we know that the result is statistically significant. Thus, the p-value is less than But it doesn’t tell us that p =

What if my data are clearly not normal? l Is sample size large enough to apply the central limit theorem? l Are there any obvious outliers? l Nonparametric tests Wilcoxon signed-rank test or signed test n Make few assumptions about the distribution of the data. n Test on the median instead of the mean.

Paired design l Paired design n Self-pairing: Measurements are taken at two distinct points in time from a single subject (e.g. Before vs. After) n Matched pairs (e.g., twins, eyes, subjects matched on important characteristics such as age and gender) l Why pairing? n Control extraneous noise n Control confounding factors that affect the comparison n Make comparison more precise

Example: Blood Pressure and Oral Contraceptive Use Participant BP Before OC BP After OC After-Before … 1 st sample2 nd sample Paired samples

Example (cont.) Scientific questions: l What is the mean change in blood pressure after OC use in a population of women who use oral contraceptives?  Estimate the mean change by a confidence interval approach l Is there any change in mean blood pressure after OC use in a population of women who use oral contraceptives?  Hypothesis testing

Inference on mean change l Due to the design of the study, we can reduce the BP information on two samples (women’s BP prior to OC use and the same subject’s BP after OC use) into one piece of information: information on the differences in BP between the times points for the same subject. l Perform the one sample inference on the difference for the relevant research question.

THE END Want to learn more statistics or have more questions programs/biostatisticsepidemiologyresearch-design/ programs/biostatisticsepidemiologyresearch-design/