Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Slides:



Advertisements
Similar presentations
ANOVA and Linear Models. Data Data is from the University of York project on variation in British liquids. Data is from the University of York project.
Advertisements

PTP 560 Research Methods Week 9 Thomas Ruediger, PT.
Chi Square Tests Chapter 17.
Nonparametric Statistics Timothy C. Bates
Statistical Tests Karen H. Hagglund, M.S.
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Basic Statistical Review
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Final Review Session.
Chapter 14 Conducting & Reading Research Baumgartner et al Chapter 14 Inferential Data Analysis.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Analysis of variance (2) Lecture 10. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.
Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at: courses.
Interpreting Bi-variate OLS Regression
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Chapter 14 Inferential Data Analysis
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Statistics Idiots Guide! Dr. Hamda Qotba, B.Med.Sc, M.D, ABCM.
Inferential Statistics
Biostat 200 Lecture 8 1. Hypothesis testing recap Hypothesis testing – Choose a null hypothesis, one-sided or two sided test – Set , significance level,
 Mean: true average  Median: middle number once ranked  Mode: most repetitive  Range : difference between largest and smallest.
Statistics for clinical research An introductory course.
How to Teach Statistics in EBM Rafael Perera. Basic teaching advice Know your audience Know your audience! Create a knowledge gap Give a map of the main.
POTH 612A Quantitative Analysis Dr. Nancy Mayo. © Nancy E. Mayo A Framework for Asking Questions Population Exposure (Level 1) Comparison Level 2 OutcomeTimePECOT.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
2nd Half Review ANOVA (Ch. 11) Non-Parametric (7.11, 9.5) Regression (Ch. 12) ANCOVA Categorical (Ch. 10) Correlation (Ch. 12)
Linear correlation and linear regression + summary of tests
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
Three Statistical Issues (1) Observational Study (2) Multiple Comparisons (3) Censoring Definitions.
Regression & Correlation. Review: Types of Variables & Steps in Analysis.
Chapter 13 CHI-SQUARE AND NONPARAMETRIC PROCEDURES.
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
ANALYSIS PLAN: STATISTICAL PROCEDURES
Biostat 200 Lecture 9 1. Chi-square test when the exposure has several levels E.g. Is sleep quality associated with having had at least one cold in the.
Biostat 200 Lecture 9 1. Chi-square test when the exposure has several levels E.g. Is sleep quality associated with having had at least one cold in the.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Statistics for Neurosurgeons A David Mendelow Barbara A Gregson Newcastle upon Tyne England, UK.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Principles of statistical testing
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
Chapter 13 Understanding research results: statistical inference.
Nonparametric Statistics
Interpretation of Common Statistical Tests Mary Burke, PhD, RN, CNE.
Introdcution to Epidemiology for Medical Students Université Paris-Descartes Babak Khoshnood INSERM U1153, Equipe EPOPé (Dir. Pierre-Yves Ancel) Obstetric,
Chapter 15 Analyzing Quantitative Data. Levels of Measurement Nominal measurement Involves assigning numbers to classify characteristics into categories.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
Nonparametric Statistics
BIOSTATISTICS Qualitative variable (Categorical) DESCRIPTIVE
Multivariate Analysis
Basic Statistics Overview
Analysis of Data Graphics Quantitative data
Medical Statistics Dr. Gholamreza Khalili
SDPBRN Postgraduate Training Day Dundee Dental Education Centre
SA3202 Statistical Methods for Social Sciences
Nonparametric Statistics
NURS 790: Methods for Research and Evidence Based Practice
QM222 Class 15 Section D1 Review for test Multicollinearity
Common Statistical Analyses Theory behind them
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Presentation transcript:

Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology

Overview Bias vs chance Types of data Descriptive statistics Histograms and boxplots Inferential statistics Hypothesis testing: P and CI Comparing groups Correlation and regression

Research Questions? Does CK level predict in hospital mortality post MI? Is there an association between troponin I and renal function? What is the Incidence of amputation in diabetics with renal failure? HOW ARE THEY MEASURED???

Research question Does aspirin reduce CV mortality in diabetics when used for primary prevention? Is there an increased risk between cell phone use and brain cancer? Does level of SES correlate with depression?

Research question So your research question must be phrased in such a manner that you can answer YES or NO or provide some quantification of sorts.

Data analysis Aim: to provide information on the study sample and to answer the research question !

Problems !

Problems Bias and confounding also called systematic error…. Typically dealt with in the planning and execution of the study…can also control for it in the data analysis (eg multivariate analysis) Chance also called random error. Classically P values (and CI) can be used to judge role of chance

First important issues What type of data are you collecting Typically one has some outcome variable and some exposure variable or variables? How and with what are they measured?

Outcome and exposure? Does CK level predict in hospital mortality post MI? Is there an association between troponin I and renal function? What is the Incidence of amputation in diabetics with renal failure? HOW ARE THEY MEASURED???

Research question Does aspirin reduce CV mortality in diabetics when used for primary prevention? Is there an increased risk between cell phone use and brain cancer? Does level of SES correlate with depression?

Research question So your research question must be phrased in such a manner that you can answer YES or NO or provide some quantification of sorts.

Types of data Categorical: HT yes or no, sex, smoking status (usually a %) Ordinal versus nominal Continuous data Spread of continuous data

Data analysis Descriptive stats Mean/median SD or range

Hypothesis testing Differences between groups: Examples: T test/Mann Whitney (2 groups) ANOVA/ Kruskal Wallis (>2 groups) Chi square if it is %

Associations between variables Does coffee cause cancer (OR, RR) Efficacy of Rx (RRR, ARR, NNT) If BMI associated with BP (correlation and regression)

2 X 2 table CancerNo cancer Smokeab Non smokercd RR= (a/a+b)/(c/c+d) OR = (a/b)/(c/d)

TYPES OF DATA

DESCRIPTIVE STATS

Graphics

Using the SD and the Normal Curve

Mean ± 1.96 SD = 95% range of sample Mean ± 1.96 SEM=95% Confidence interval

One of many samples

95% Confidence Intervals

Hypothesis Testing

Type I & II Errors Have an Inverse Relationship   If you reduce the probability of one error, the other one increases so that everything else is unchanged.

Factors Affecting Type II Error True value of population parameter – Increases when the difference between hypothesized parameter and its true value decrease Significance level – Increases when decreases Population standard deviation – Increases when increases Sample size – Increases when n decreases n

Examples Difference in glucose between survivors and non survivors = 5 mmol/l (95% CI -5 to 10 mmol/l) RR for cancer =1.4 (95% CI 0.7 to 1.3)

P value The H0 is NO difference BUT I can find a difference by chance Eg WHAT is the probability that you can find a difference between groups of 5 mmol/l when in TRUTH the difference is ZERO? P=0.10

| Key | | | | frequency | | column percentage | | 0=L E=1 Y/NR | 0 1 | Total N | | 48 | | Y | | 49 | | Total | | 97 | | Pearson chi2(1) = Pr = 0.356

Differences between groups

Parametric comparisons

?

T-test ?

What about 3 groups anova age ethngr, cat(ethngr) Number of obs = 37 R-squared = Root MSE = Adj R-squared = Source | Partial SS df MS F Prob > F Model | | ethngr | | Residual | Total |

Differences between the 3. regress Source | SS df MS Number of obs = F( 2, 34) = 1.13 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = age Coef. Std. Err. t P>|t| [95% Conf. Interval] _cons ethngr (dropped)

Repeated measures One group of schoolkids Muscle strength in January Muscle strength again in March Did things change significantly over time? Paired T –test Two or more groups: RM ANOVA

Non-parametric comparisons Two groups ranksum age, by(menopaus) Two-sample Wilcoxon rank-sum (Mann-Whitney) test menopaus | obs rank sum expected | | combined | unadjusted variance adjustment for ties adjusted variance Ho: age(menopaus==0) = age(menopaus==1) z = Prob > |z| =

Non Parametric Three groups kwallis s_tg, by(ethngr) Test: Equality of populations (Kruskal-Wallis test) | ethngr | Obs | Rank Sum | | | | 1 | 17 | | | 2 | 10 | | | 3 | 10 | | chi-squared = with 2 d.f. probability = chi-squared with ties = with 2 d.f. probability =

summarize Continuous-Non Normal 2 groups: Mann Whitney 3 groups: Kruskal Wallis Continuous-Normal 2 groups: T tests 3 groups: ANOVA

Categorical data

Relationships

Linear Regression

Here the DEPENDENT (logTG) and INDEPENDENT VARIABLES are continuous So how much does logTG increase if waist increases by 1cm = the beta coefficient

What if the INDEP=Categorical regress age menop Source | SS df MS Number of obs = F( 1, 84) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = age | Coef. Std. Err. t P>|t| [95% Conf. Interval] menopaus | _cons | Menop= 0 or 1……. INTERPRETATION??

Logistic regression Outcome is heart disease (Yes/No… ?) Independent var = age. logistic CVD age Logistic regression Number of obs = 48 LR chi2(1) = 2.51 Prob > chi2 = Log likelihood = Pseudo R2 = died | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] age | ?