Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.

Similar presentations


Presentation on theme: "Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology."— Presentation transcript:

1 Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology

2

3

4

5 Overview Bias vs chance Types of data Descriptive statistics Histograms and boxplots Inferential statistics Hypothesis testing: P and CI Comparing groups Correlation and regression

6

7 Research Questions? Does CK level predict in hospital mortality post MI? Is there an association between troponin I and renal function? What is the Incidence of amputation in diabetics with renal failure? HOW ARE THEY MEASURED???

8 Research question Does aspirin reduce CV mortality in diabetics when used for primary prevention? Is there an increased risk between cell phone use and brain cancer? Does level of SES correlate with depression?

9 Research question So your research question must be phrased in such a manner that you can answer YES or NO or provide some quantification of sorts.

10 Data analysis Aim: to provide information on the study sample and to answer the research question !

11 Problems !

12 Problems Bias and confounding also called systematic error…. Typically dealt with in the planning and execution of the study…can also control for it in the data analysis (eg multivariate analysis) Chance also called random error. Classically P values (and CI) can be used to judge role of chance

13 First important issues What type of data are you collecting Typically one has some outcome variable and some exposure variable or variables? How and with what are they measured?

14 Outcome and exposure? Does CK level predict in hospital mortality post MI? Is there an association between troponin I and renal function? What is the Incidence of amputation in diabetics with renal failure? HOW ARE THEY MEASURED???

15 Research question Does aspirin reduce CV mortality in diabetics when used for primary prevention? Is there an increased risk between cell phone use and brain cancer? Does level of SES correlate with depression?

16 Research question So your research question must be phrased in such a manner that you can answer YES or NO or provide some quantification of sorts.

17 Types of data Categorical: HT yes or no, sex, smoking status (usually a %) Ordinal versus nominal Continuous data Spread of continuous data

18

19 Data analysis Descriptive stats Mean/median SD or range

20 Hypothesis testing Differences between groups: Examples: T test/Mann Whitney (2 groups) ANOVA/ Kruskal Wallis (>2 groups) Chi square if it is %

21 Associations between variables Does coffee cause cancer (OR, RR) Efficacy of Rx (RRR, ARR, NNT) If BMI associated with BP (correlation and regression)

22 2 X 2 table CancerNo cancer Smokeab Non smokercd RR= (a/a+b)/(c/c+d) OR = (a/b)/(c/d)

23

24

25 TYPES OF DATA

26 DESCRIPTIVE STATS

27

28

29

30

31

32

33

34

35

36 Graphics

37

38 Using the SD and the Normal Curve

39

40 Mean ± 1.96 SD = 95% range of sample Mean ± 1.96 SEM=95% Confidence interval

41

42 One of many samples

43

44 95% Confidence Intervals

45

46

47 Hypothesis Testing

48

49

50

51

52

53

54

55

56 Type I & II Errors Have an Inverse Relationship   If you reduce the probability of one error, the other one increases so that everything else is unchanged.

57 Factors Affecting Type II Error True value of population parameter – Increases when the difference between hypothesized parameter and its true value decrease Significance level – Increases when decreases Population standard deviation – Increases when increases Sample size – Increases when n decreases n

58

59

60 Examples Difference in glucose between survivors and non survivors = 5 mmol/l (95% CI -5 to 10 mmol/l) RR for cancer =1.4 (95% CI 0.7 to 1.3)

61

62 P value The H0 is NO difference BUT I can find a difference by chance Eg WHAT is the probability that you can find a difference between groups of 5 mmol/l when in TRUTH the difference is ZERO? P=0.10

63 +-------------------+ | Key | |-------------------| | frequency | | column percentage | +-------------------+ | 0=L E=1 Y/NR | 0 1 | Total -----------+----------------------+---------- N | 28 20 | 48 | 53.85 44.44 | 49.48 -----------+----------------------+---------- Y | 24 25 | 49 | 46.15 55.56 | 50.52 -----------+----------------------+---------- Total | 52 45 | 97 | 100.00 100.00 | 100.00 Pearson chi2(1) = 0.8530 Pr = 0.356

64

65 Differences between groups

66 Parametric comparisons

67 ?

68 T-test ?

69 What about 3 groups anova age ethngr, cat(ethngr) Number of obs = 37 R-squared = 0.0621 Root MSE = 7.7883 Adj R-squared = 0.0069 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 136.560095 2 68.2800477 1.13 0.3362 | ethngr | 136.560095 2 68.2800477 1.13 0.3362 | Residual | 2062.35882 34 60.6576125 -----------+---------------------------------------------------- Total | 2198.91892 36 61.0810811

70 Differences between the 3. regress Source | SS df MS Number of obs = 37 -------------+------------------------------ F( 2, 34) = 1.13 Model | 136.560095 2 68.2800477 Prob > F = 0.3362 Residual | 2062.35882 34 60.6576125 R-squared = 0.0621 -------------+------------------------------ Adj R-squared = 0.0069 Total | 2198.91892 36 61.0810811 Root MSE = 7.7883 ------------------------------------------------------------------------------ age Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------------------------------------------------------------------------ _cons 56.6 2.462877 22.98 0.000 51.59483 61.60517 ethngr 1 4.635294 3.103845 1.49 0.145 -1.672479 10.94307 2 2.5 3.483034 0.72 0.478 -4.578376 9.578376 3 (dropped) ------------------------------------------------------------------------------

71 Repeated measures One group of schoolkids Muscle strength in January Muscle strength again in March Did things change significantly over time? Paired T –test Two or more groups: RM ANOVA

72 Non-parametric comparisons Two groups ranksum age, by(menopaus) Two-sample Wilcoxon rank-sum (Mann-Whitney) test menopaus | obs rank sum expected -------------+--------------------------------- 0 | 19 210 826.5 1 | 67 3531 2914.5 -------------+--------------------------------- combined | 86 3741 3741 unadjusted variance 9229.25 adjustment for ties -28.04 ---------- adjusted variance 9201.21 Ho: age(menopaus==0) = age(menopaus==1) z = -6.427 Prob > |z| = 0.0000

73 Non Parametric Three groups kwallis s_tg, by(ethngr) Test: Equality of populations (Kruskal-Wallis test) +-------------------------+ | ethngr | Obs | Rank Sum | |--------+-----+----------| | 1 | 17 | 381.00 | | 2 | 10 | 149.50 | | 3 | 10 | 172.50 | +-------------------------+ chi-squared = 3.350 with 2 d.f. probability = 0.1873 chi-squared with ties = 3.352 with 2 d.f. probability = 0.1871

74 summarize Continuous-Non Normal 2 groups: Mann Whitney 3 groups: Kruskal Wallis Continuous-Normal 2 groups: T tests 3 groups: ANOVA

75 Categorical data

76

77

78 Relationships

79

80 Linear Regression

81 Here the DEPENDENT (logTG) and INDEPENDENT VARIABLES are continuous So how much does logTG increase if waist increases by 1cm = the beta coefficient

82 What if the INDEP=Categorical regress age menop Source | SS df MS Number of obs = 86 -------------+------------------------------ F( 1, 84) = 135.01 Model | 3499.71205 1 3499.71205 Prob > F = 0.0000 Residual | 2177.49725 84 25.9225863 R-squared = 0.6164 -------------+------------------------------ Adj R-squared = 0.6119 Total | 5677.2093 85 66.7906977 Root MSE = 5.0914 ------------------------------------------------------------------------------ age | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- menopaus | 15.37628 1.323348 11.62 0.000 12.74465 18.0079 _cons | 46.57895 1.168053 39.88 0.000 44.25615 48.90175 ------------------------------------------------------------------------------ Menop= 0 or 1……. INTERPRETATION??

83 Logistic regression Outcome is heart disease (Yes/No… ?) Independent var = age. logistic CVD age Logistic regression Number of obs = 48 LR chi2(1) = 2.51 Prob > chi2 = 0.1133 Log likelihood = -29.945379 Pseudo R2 = 0.0402 died | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+------------------------------------------------------------ age | 1.093467.064069 1.52 0.127.9748363 1.226535 --------------------------------------------------------------------------- ?

84


Download ppt "Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology."

Similar presentations


Ads by Google