Introdcution to Epidemiology for Medical Students Université Paris-Descartes Babak Khoshnood INSERM U1153, Equipe EPOPé (Dir. Pierre-Yves Ancel) Obstetric,

Slides:



Advertisements
Similar presentations
KRUSKAL-WALIS ANOVA BY RANK (Nonparametric test)
Advertisements

Departments of Medicine and Biostatistics
INTRODUCTION TO NON-PARAMETRIC ANALYSES CHI SQUARE ANALYSIS.
Statistical Tests Karen H. Hagglund, M.S.
EPI 809 / Spring 2008 Final Review EPI 809 / Spring 2008 Ch11 Regression and correlation  Linear regression Model, interpretation. Model, interpretation.
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
Final Review Session.
Parametric Tests 1) Assumption of population normality 2) homogeneity of variance Parametric more powerful than nonparametric.
Today Concepts underlying inferential statistics
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
5-3 Inference on the Means of Two Populations, Variances Unknown
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chapter 14 Inferential Data Analysis
Statistics Idiots Guide! Dr. Hamda Qotba, B.Med.Sc, M.D, ABCM.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Non-Parametric Methods Professor of Epidemiology and Biostatistics
Leedy and Ormrod Ch. 11 Gray Ch. 14
AM Recitation 2/10/11.
1 STATISTICAL HYPOTHESES AND THEIR VERIFICATION Kazimieras Pukėnas.
Hypothesis Testing Charity I. Mulig. Variable A variable is any property or quantity that can take on different values. Variables may take on discrete.
Analysis of Categorical Data
 Mean: true average  Median: middle number once ranked  Mode: most repetitive  Range : difference between largest and smallest.
Things that I think are important Chapter 1 Bar graphs, histograms Outliers Mean, median, mode, quartiles of data Variance and standard deviation of.
Multiple Choice Questions for discussion
Measuring Associations Between Exposure and Outcomes.
OKU 9 Chapter 15: ORTHOPAEDIC RESEARCH Brian E. Walczak.
Simple Linear Regression
Statistics for clinical research An introductory course.
Biostatistics Breakdown Common Statistical tests Special thanks to: Christyn Mullen, Pharm.D. Clinical Pharmacy Specialist John Peter Smith Hospital 1.
1 Department of Pathophysiology Faculty of Medicine in Pilsen STATISTICS.
Non-Parametric Methods Professor of Epidemiology and Biostatistics
Making decisions about distributions: Introduction to the Null Hypothesis 47:269: Research Methods I Dr. Leonard April 14, 2010.
How to Teach Statistics in EBM Rafael Perera. Basic teaching advice Know your audience Know your audience! Create a knowledge gap Give a map of the main.
Statistics 11 Correlations Definitions: A correlation is measure of association between two quantitative variables with respect to a single individual.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
The binomial applied: absolute and relative risks, chi-square.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
Experimental Design and Statistics. Scientific Method
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Sample size and common statistical tests There are three kinds of lies- lies, dammed lies and statistics…… Benjamin Disraeli.
A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
Master’s Essay in Epidemiology I P9419 Methods Luisa N. Borrell, DDS, PhD October 25, 2004.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Principles of statistical testing
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Chapter 13 Understanding research results: statistical inference.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables.
Nonparametric Statistics
Hypothesis Testing and Statistical Significance
Approaches to quantitative data analysis Lara Traeger, PhD Methods in Supportive Oncology Research.
 Kolmogor-Smirnov test  Mann-Whitney U test  Wilcoxon test  Kruskal-Wallis  Friedman test  Cochran Q test.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Chapter 15 Analyzing Quantitative Data. Levels of Measurement Nominal measurement Involves assigning numbers to classify characteristics into categories.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Methods of Presenting and Interpreting Information Class 9.
The binomial applied: absolute and relative risks, chi-square
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
SDPBRN Postgraduate Training Day Dundee Dental Education Centre
SA3202 Statistical Methods for Social Sciences
Ass. Prof. Dr. Mogeeb Mosleh
Introductory Statistics
Presentation transcript:

Introdcution to Epidemiology for Medical Students Université Paris-Descartes Babak Khoshnood INSERM U1153, Equipe EPOPé (Dir. Pierre-Yves Ancel) Obstetric, Perinatal and Pediatric Epidemiologic Research Team 1

Introduction: A survey (if you will) Continuous (interval-scale) vs. discrete data Nominal / ordinal / categorical data Probability Random sample Binary response Mean - median - mode Variance - standard deviation - standard error Frequency distribution - histogram Parameter estimate Inference Hypothesis testing type I error p-value Type II error power 95 % confidence interval Sample size chi-square t-test ANOVA Nonparametric tests (Wilcoxon / Kruskal- Wallis) Sensitivity Specificity False positive rate False negative rate

Introduction: A survey (2) Positive predictive value Negative predictive value ROC curve Incidence - incidence rate - cumulative incidence Prevalence Rate, Ratio Proportion fetal mortality rate infant mortality rate neonatal mortality rate postneonatal mortality rate odds / odds ratio risk difference risk ratio Standardization - direct and indirect Bias (selection - recall - screening) Confounding Interaction Effect modification Adjustment Mantel-Haenszel method Matching Case-control study Cohort study Prospective Retrospective

Introduction: A survey (3) Monotonic relation Linear relation Bernoulli process Binomial distribution Normal Distribution Poisson Distribution Multiple linear regression Logistic regression Propensity score Mixed (hierarchical) Models Meta-analysis Network meta-analysis Path analysis ….

Study aim vs. its caveats One way to characterize the main aim of any study: Does the exposure (treatment) make a difference in the outcome? The main caveats are: Chance Confounding Bias

Evaluating the Role of Chance Problem of Statistical Inference / Hypothesis Testing: Could the observed difference be due to chance? Or Is the observed association statistically significant? Null Hypothesis (H 0 ): Outcome (e.g., mean) in group 1= Outcome in group 2; i.e., the observed difference is (could be) due to chance / random error

Errors of Hypothesis Testing Type I (false rejection of null hypothesis) error [α] Type II (false acceptance of null hypothesis) error [β] Power = 1 – β  Statistically significant vs. clinically meaningful / causally related

What determines the (statistical) power of a study? Effect sizes – differences to be detected / “worth” detecting In the case of categorical data, power also depends on the (projected) proportions of the outcome in the Treatment and the Control groups Variances – usually estimated / often guessed at based on pervious literature or what might seem reasonable Sample sizes Including the ratio of the sizes of the two groups Type I error (α)

Parametric tests for continuous / interval scale data i) t-test - equality of two means two-sample paired / one-sample ii) Analysis of Variance (ANOVA) - equality of several means F test (global test of significance) post-hoc testing (testing for significance of differences in the means of two of the groups while controlling the overall Type I error rate)

II. Non parametric tests - Particularly useful in case of small sample size (e.g., N < 30) / highly skewed data Make fewer assumptions and therefore are more robust If the assumptions used in parametric tests (e.g., t test) are met or partially met (all else equal) :  non-parametric tests tend to be less efficient - i.e., have higher Type II (beta) error / lower power

II. Non parametric tests Wilcoxon two-sample rank order test / Mann-Whitney U / Median tests ~ equality of two medians One-sample Wilcoxon - non-parametric equivalent of paired t-test Kruskal Wallis – several groups – non parametric ANOVA Friedman test – paired data

III. Categorical Data Analysis Nominal data (e.g., gender / racial/ethnic groups): no natural ordering of categories Ordinal data (e.g., mild / moderate / severe): ordered categories Incidence rate: number of new cases of disease divided by total person-time of observation Cumulative incidence = number of new cases of disease divided by the total population at risk Prevalence = number of existing cases of a disease at a point in time divided by the number of persons in the population

III. Categorical Data Analysis 2x2 Table and its extensions - sets of 2x2 tables; 2xk / kx2 tables Chi-square tests of general association Differences in observed and expected frequencies Others (e.g., Mantel-Haenszel (Score) correlation statistic for ordered data; Armitage-Cochrane test for trend)

Measures of Association Odds Ratio (OR): odds of disease among the exposed divided by the odds of disease among the unexposed. Odds Ratio = 1 implies no association between disease and exposure (H 0 : OR = 1). Risk Ratio (RR) = Probability of disease among the exposed / Prob. of disease among the unexposed (H 0 : RR = 1). Measures of Association – Risk Ratio vs. Odds Ratio

Screening Disease PositiveNegative Test Positiveab Negativecd  Sensitivity: Probability (T + / D +) = a / a+c = 1-False Neg.  Specificity: Probability (T - / D -) = d / b+d = 1-False Pos.  Positive Predictive Value: Probability (D + / T +) = a / a+b  Negative Predictive Value: Probability (D - / T -) = d / b+d

Confounding and Interaction (effect modification) Confounding factor(s) must be associated with both the exposure and the disease More rigorous approach, which includes the distinction between a given confounding factor vs. presence of confounding / how to take multiple confounders into account, is based on Directed Acyclic Graphs / Causal Diagrams Aim to adjust for / “purge” the effect of confounders - e.g., effect of alcohol on risk of heart disease controlling for the confounding effect of smoking Effect modifiers are factors that modify the effect of other exposures  Effect of smoking on the risk of stroke associated with oral contraception use: Higher risk of stroke associated with the use of oral contraceptives but only among smokers  Differential efficacy of a treatment for patients depending on underlying pathophysiology or severity of disease  Higher efficacy of treatment for patients with relatively mild disease or alternatively for patients with more severe disease.

Confounding and Interaction (effect modification) Confounding factor(s) must be associated with both the exposure and the disease More rigorous approach, which includes the distinction between a given confounding factor vs. presence of confounding / how to take multiple confounders into account, is based on Directed Acyclic Graphs / Causal Diagrams Aim to adjust for / “purge” the effect of confounders - e.g., effect of alcohol on risk of heart disease controlling for the confounding effect of smoking Effect modifiers are factors that modify the effect of other exposures  Effect of smoking on the risk of stroke associated with oral contraception use: Higher risk of stroke associated with the use of oral contraceptives but only among smokers  Differential efficacy of a treatment for patients depending on underlying pathophysiology or severity of disease  Higher efficacy of treatment for patients with relatively mild disease or alternatively for patients with more severe disease.

A pproaches to the problem of confounding: Analysis Stratified analysis - stratify by the potential confounder  Sets of 2x2 table - Mantel-Haenszel test  Woolfe’s Test of Homogeneity - Is effect modification or interaction present?  In the absence of significant interactions obtain the Mantel-Haenszel chi-square test and odds ratio of the overall association between exposure and disease adjusted for the potential confounder(s) Regression  A major use of all regression models (multiple regression / logistic regression) is to arrive at estimates of the effect(s) of main factor(s) of interest on the outcome under study controlling for the effects of other factors (potential confounders) in the model.  In addition regression models allow explicit modeling of interaction effects.

Logistic regression Analysis of binary (“yes/no”; “dead/alive”) data Can also be extended to some models of ordinal data (ordinal logit models) data or multinomial (i.e., several categories without natural ordering) data Suppose X 1 represents an indicator (dummy) variable coded as 1 / 0 denoting the presence / absence of an exposure: Then, the odds ratio for the effect of exposure X 1 controlling for the effect of other terms in the model is equal to e  1.  Can also obtain standard errors of the coefficients to calculate 95% confidence intervals for the estimated odds ratio / hypothesis testing.

Logistic regression – Interaction effects In the presence of significant interactions / effect modifications; i.e.,  4 significant in the above formulation:  Then the effect of X 1 depends on whether X 2 is also present or not:  The effect of X1 in the absence of X 2, Odds ratio = e  1  The effect of X1 in the presence of X 2, Odds ratio = e  1+  4

Measures of Impact - Attributable Fractions Assume causal relation between the exposure and the disease Attributable fraction among the exposed: AF e = (RR - 1) / RR Attributable fraction in the population: AF p = P c AF e  Proportion of disease in the population that is attributable to the exposure and thus could be eliminated if the exposure were eliminated.