Download presentation
Presentation is loading. Please wait.
Published bySydney Lawrence Horn Modified over 9 years ago
2
X Treatment population Control population 0 Examples: Drug vs. Placebo, Drugs vs. Surgery, New Tx vs. Standard Tx Let X = cholesterol level (mg/dL); Patients satisfying inclusion criteria RANDOMIZERANDOMIZE Treatment Arm Control Arm RANDOM SAMPLES End of Study T-test F-test (ANOVA ) Experiment significant? possible expected distributions:
3
X Post-Tx population Pre-Tx population Examples: Drug vs. Placebo, Drugs vs. Surgery, New Tx vs. Standard Tx Let X = cholesterol level (mg/dL) Patients satisfying inclusion criteria Pre-Tx Arm Post-Tx Arm PAIRED SAMPLES End of Study Paired T-test, ANOVA F-test “repeated measures” Experiment significant? 0 from baseline, on same patients
4
S(t) = P(T > t) 0 1 T Examples: Drug vs. Placebo, Drugs vs. Surgery, New Tx vs. Standard Tx Let T = Survival time (months); End of Study Log-Rank Test, Cox Proportional Hazards Model Kaplan-Meier estimates population survival curves: significant? S 2 (t) Control S 1 (t) Treatment AUC difference survival probability
5
Case-Control studies Cohort studies
6
E+ vs. E– statistically significant Observational study designs that test for a statistically significant association between a disease D and exposure E to a potential risk (or protective) factor, measured via “odds ratio,” “relative risk,” etc. Lung cancer / Smoking PRESENT E+ vs. E– ?D+ vs. D– ? Case-Control studies Cohort studies Both types of study yield a 2 2 “contingency table” for binary variables D and E: D+D+D–D– E+E+ aba + b E–E– cdc + d a + cb + dn relatively easy and inexpensive relatively easy and inexpensive subject to faulty records, “recall bias” subject to faulty records, “recall bias” D+ vs. D– FUTUREPAST measures direct effect of E on D expensive, extremely lengthy expensive, extremely lengthy… Example: Framingham, MA study where a, b, c, d are the observed counts of individuals in each cell. cases controlsreference group End of Study Chi-squared Test McNemar Test (for paired case- control study designs) H 0 : No association between D and E.
7
–1 0 +1 As seen, testing for association between categorical variables – such as disease D and exposure E – can generally be done via a Chi-squared Test. But what if the two variables – say, X and Y – are numerical measurements? Furthermore, if sample data does suggest that one exists, what is the nature of that association, and how can it be quantified, or modeled via Y = f (X)? JAMA. 2003;290:1486-1493 Correlation Coefficient measures the strength of linear association between X and Y X Y Scatterplot r positive linear correlation negative linear correlation
8
–1 0 +1 As seen, testing for association between categorical variables – such as disease D and exposure E – can generally be done via a Chi-squared Test. Furthermore, if sample data does suggest that one exists, what is the nature of that association, and how can it be quantified, or modeled via Y = f (X)? JAMA. 2003;290:1486-1493 Correlation Coefficient measures the strength of linear association between X and Y X Y Scatterplot r positive linear correlation negative linear correlation But what if the two variables – say, X and Y – are numerical measurements?
9
–1 0 +1 As seen, testing for association between categorical variables – such as disease D and exposure E – can generally be done via a Chi-squared Test. Furthermore, if sample data does suggest that one exists, what is the nature of that association, and how can it be quantified, or modeled via Y = f (X)? JAMA. 2003;290:1486-1493 Correlation Coefficient linear measures the strength of linear association between X and Y X Y Scatterplot r positive linear correlation negative linear correlation But what if the two variables – say, X and Y – are numerical measurements?
10
As seen, testing for association between categorical variables – such as disease D and exposure E – can generally be done via a Chi-squared Test. Furthermore, if sample data does suggest that one exists, what is the nature of that association, and how can it be quantified, or modeled via Y = f (X)? Correlation Coefficient linear measures the strength of linear association between X and Y But what if the two variables – say, X and Y – are numerical measurements? For this example, r = –0.387 (weak, negative linear correl) For this example, r = –0.387 (weak, negative linear correl)
11
For this example, r = –0.387 (weak, negative linear correl) For this example, r = –0.387 (weak, negative linear correl) residuals As seen, testing for association between categorical variables – such as disease D and exposure E – can generally be done via a Chi-squared Test. Furthermore, if sample data does suggest that one exists, what is the nature of that association, and how can it be quantified, or modeled via Y = f (X)? But what if the two variables – say, X and Y – are numerical measurements? Want the unique line that minimizes the sum of the squared residuals. Simple Linear Regression Simple Linear Regression gives the “best” line that fits the data. Regression Methods
12
For this example, r = –0.387 (weak, negative linear correl) For this example, r = –0.387 (weak, negative linear correl) For this example, r = –0.387 (weak, negative linear correl) Y = 8.790 – 4.733 X (p =.0055) For this example, r = –0.387 (weak, negative linear correl) Y = 8.790 – 4.733 X (p =.0055) residuals As seen, testing for association between categorical variables – such as disease D and exposure E – can generally be done via a Chi-squared Test. Furthermore, if sample data does suggest that one exists, what is the nature of that association, and how can it be quantified, or modeled via Y = f (X)? Regression Methods But what if the two variables – say, X and Y – are numerical measurements? Want the unique line that minimizes the sum of the squared residuals. Simple Linear Regression Simple Linear Regression gives the “least squares” regression line. It can also be shown that the proportion of total variability in the data that is accounted for by the line is equal to r 2, which in this case, = (–0.387) 2 = 0.1497 (15%)... very small.
14
Numerical (Quantitative) e.g., $ Annual Income 2 POPULATIONS: H 0 : 1 = 2 Normally distributed? YesNo Wilcoxon Rank Sum (aka Mann- Whitney U) 2-sample T (w/o pooling) Yes “Nonparametric Tests” No YesNo 2-sample T (w/ pooling) Equivariance? Satterwaithe Welch “Approximate” T Q-Q plots Shapiro-Wilk Anderson-Darling others… F-test Bartlett others… 2 POPULATIONS: ANOVA F-test Regression Methods Kruskal- Wallis Various modifications X σ1σ1 σ2σ2 11 22 Independent e.g., RCT Paired (Matched) e.g., Pre- vs. Post- Sample 1Sample 2 YesNo Sign Test Wilcoxon Signed Rank “Nonparametric Tests” Paired T ANOVA F-test (w/ “repeated measures” or “blocking”) Friedman Kendall’s W others…
15
Categorical (Qualitative) e.g., Income Level: Low, Mid, High Categorical (Qualitative) e.g., Income Level: Low, Mid, High 2 CATEGORIES per each of two variables: H 0 : “There is no association between (the categories of) I and (the categories of) J.” r × c contingency table Chi-squared Tests Test of Independence (1 population, 2 categorical variables) Test of Homogeneity (2 populations, 1 categorical variable) “Goodness-of-Fit” Test (1 population, 1 categorical variable) Modifications McNemar Test for paired 2 × 2 categorical data, to control for “confounding variables” e.g., case-control studies Fisher’s Exact Test for small “expected values” (< 5) to avoid possible “spurious significance”
16
Introduction to Basic Statistical Methods Part 1: Statistics in a Nutshell UWHC Scholarly Forum May 21, 2014 Ismor Fischer, Ph.D. UW Dept of Statistics ifischer@wisc.edu Part 2: Overview of Biostatistics: “Which Test Do I Use??” Sincere thanks to… Judith Payne Judith Payne Heidi Miller Heidi Miller Samantha Goodrich Samantha Goodrich Troy Lawrence Troy Lawrence YOU! YOU! All slides posted at http://www.stat.wisc.edu/~ifischer/Intro_Stat/UWHC http://www.stat.wisc.edu/~ifischer/Intro_Stat/UWHC
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.