Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Introduction to biostatistics Lecture plan 1. Basics 2. Variable types 3. Descriptive statistics: Categorical data Categorical data Numerical data Numerical.

Similar presentations


Presentation on theme: "1 Introduction to biostatistics Lecture plan 1. Basics 2. Variable types 3. Descriptive statistics: Categorical data Categorical data Numerical data Numerical."— Presentation transcript:

1 1 Introduction to biostatistics Lecture plan 1. Basics 2. Variable types 3. Descriptive statistics: Categorical data Categorical data Numerical data Numerical data 4. Inferential statistics Confidence intervals Confidence intervals Hipotheses testing Hipotheses testing

2 2 DEFINITIONS STATISTICS can mean 2 things: - the numbers we get when we measure and count things (data) - a collection of procedures for describing and anlysing data. BIOSTATISTICS – application of statistics in nature sciences, when biomedical and problems are analysed.

3 3 Why do we need statistics? ????

4 4 Basic parts of statistics: Descriptive Descriptive Inferential Inferential

5 5 Terminology Population Sample Variables

6 6 Variable types Categorical (qualitative) Categorical (qualitative) Numerical (quantitative) Numerical (quantitative) Combined Combined

7 7 Categorical data Nominal 2 categories 2 categories >2 categories >2 categories Ordinal

8 8 Numerical data Continuous Continuous Discrete Discrete

9 9 Description of categorical data Arranging data Arranging data Frequencies, tables Frequencies, tables Visualization (graphical presentation) Visualization (graphical presentation)

10 10 Frequencies and contingency tables From those who were unsatisfied 4 were males, 6 were females. TotalMalesFemales Satisfied4080%1477,8%2681,3% Unsatisfied10 20 % 422,2%618,7% Total50100%18100%32100%

11 11 Graphical presentation

12 12 Graphical presentation

13 13 Graphical presentation

14 14 Graphical presentation

15 15 Graphical presentation Other: - Maps - Chernoff faces - Star plots, etc.Other: - Maps - Chernoff faces - Star plots, etc.

16 16 Description of numerical data Arranging data Arranging data Frequencies (relative and cumulative), graphical presentation Frequencies (relative and cumulative), graphical presentation Measures of central tendency and variance Measures of central tendency and variance Assessing normality Assessing normality

17 17 Grouping Sorting data Sorting data Groups (5-17 gr.) according researcher’s criteria. Groups (5-17 gr.) according researcher’s criteria. To assess distribution, for graphical presentation in excel

18 18 Frequencies, their comparison and calculation 197 students were asked about the amount of money (litas) they had in cash at the moment.

19 19 Gaphical presentation of frequencies

20 20 Normal distributions Most of them around center Most of them around center Less above and lower central values, approximately the same proportions Less above and lower central values, approximately the same proportions Most often Gaussian distribution Most often Gaussian distribution

21 21 Not normal distributions More observations in one part. More observations in one part.

22 22 Asymmetrical distribution

23 23 How would you describe/present your respondents if the data are numeric? 2 groups of measures: 1. Central tendency (central value, average) 2. Variance

24 24 MEASURES OF CENTRAL TENDENCY Means/averages (arithmetic, geometric, harmonic, etc.) Means/averages (arithmetic, geometric, harmonic, etc.) Mode Mode Median Median Quartiles Quartiles

25 25 MEASURES OF CENTRAL TENDENCY Arithmetic mean (X, μ) Arithmetic mean (X, μ)

26 26 MEASURES OF CENTRAL TENDENCY Median (Me) – the middle value or 50th procentile (the value of the observation, that divides the sorted data in almost equal parts). It is found this way When n odd: median is the middle observation When n odd: median is the middle observation When n even: median is the average of values of two middle observations When n even: median is the average of values of two middle observations

27 27 MEASURES OF CENTRAL TENDENCY Mode (Mo) – the most common values Mode (Mo) – the most common values Can be more than one mode Can be more than one mode

28 28 MEASURES OF CENTRAL TENDENCY Quartiles (Q 1, Q 2, Q 3, Q 4 ) – sample size is divided into 4 equal parts getting 25% of observations in each of them. Quartiles (Q 1, Q 2, Q 3, Q 4 ) – sample size is divided into 4 equal parts getting 25% of observations in each of them.

29 29 Is it enough measure of central tendency to describe respondents?

30 30 MEASURES OF VARIANCE Min and max Min and max Range Range Standard deviation – sqrt of variance (SD) Standard deviation – sqrt of variance (SD) Variance - V= ∑(x i - x) 2 /n-1 Variance - V= ∑(x i - x) 2 /n-1 Interquartile range (Q3-Q1 or 75%- 25%) IQRT Interquartile range (Q3-Q1 or 75%- 25%) IQRT

31 31 What measures are to be used for sample description? If distribution is NORMAL Mean Mean Variance (or standard deviation) Variance (or standard deviation) If distribution is NOT NORMAL Median Median IQRT or min/max IQRT or min/max Those measures are used also with numeric ordinal data

32 32 X, Mo, Me Mean~Median~Mode, SD ir empyric rule

33 33 EMPYRICAL RULE Number of observations (%) 1, 2 ir 2.5 SD from mean if distribution is normal

34 34 Example X -2SD +2SD X=8 SD=2,5

35 35 Normality assessment Summary Graphical Graphical Comparison of measures of central tendency; empyrical rule (mean and standard deviation) Comparison of measures of central tendency; empyrical rule (mean and standard deviation) Skewness and kurtosis (if Gaussian =0) Skewness and kurtosis (if Gaussian =0) Kolmogorov-Smirnov test Kolmogorov-Smirnov test

36 Median Mean( * ) 75th Procentile 25th Procentile 75th Procentile 25th Procentile OutliersBoxplot

37 Boxplot example

38 Central limit theorem

39 39 Inferential statistics Confidence intervals Confidence intervals Hipotheses testing Hipotheses testing

40 40 Confidence intervals Interval where the “true” value most likely could occur.

41 41 The variance of samples and their measures μ, σ, p 0 X 1, SD 1; p 1 X 2, SD 2 ; p 2 X 3, SD 3 ; p 3 X 4 ; SD 4 ; p 4 X

42 42 The variance of samples and confidence intervals μ, p 0

43 43 Confidence interval Statistical definition: Statistical definition: If the study was carried out 100 times, 100 results ir 100 CI were got, 95 times of 100 the “true” value will be in that interval. But it will not appear in that interval 5 times of 100.

44 44 Confidence intervals (general, most common calculation) 95% CI : X ± 1.96 SE X min ; X max Note: for normal distribution, when n is large 95% CI : p ± 1.96 SEp min ; p max Note: when p ir 1-p > 5/n Note: when p ir 1-p > 5/n

45 45 Standard error (SE) Numeric data (X ) Categorical data (p)

46 46 Width of confidence inerval depends on: a) Sample size; b) Confidence level (guaranty - usually 95%, but available any %); c) dispersion.

47 47 Hipotheses testing H 0 : μ 1 =μ 2 ; p 1 =p 2 ; (RR=1, OR=1, difference=0) H A : μ 1 ≠μ 2 ; p 1 ≠p 2 (two sided, one sided)

48 48 Significance level α (agreed 0.05). Test for P value (t-test, χ 2, etc. ). P value is the probability to get the difference (association), if the null hypothesis is true. OR P value is the probability to get the difference (association) due to chance alone, when the null hypothesis is true. Hipotheses testing

49 49 Statistical agreements If P<0.05, we say, that results can’t be explained by chance alone, therefore we reject H 0 and accept H A. If P<0.05, we say, that results can’t be explained by chance alone, therefore we reject H 0 and accept H A. If P≥0.05, we say, that found difference can be due to chance alone, therefore we don’t reject H 0. If P≥0.05, we say, that found difference can be due to chance alone, therefore we don’t reject H 0.

50 50 Tests Test depends on Study design, Study design, Variable type Variable type distribution, distribution, Number of groups, etc. Number of groups, etc. Tests (probability distributions): z test t test (one sample, two independent, paired) Χ2 (+ trend) F test Fisher exact test Mann-Whitney Wilcoxon and others.

51 51 P value tells, if there is statistically significant difference (association). P value tells, if there is statistically significant difference (association). CI gives interval where true value can be. CI gives interval where true value can be. Inferential statistics Summary

52 52 Inferential statistics Summary Neither P value, nor CI give other explanations of the result (bias and confounding). Neither P value, nor CI give other explanations of the result (bias and confounding). Neither P value, nor CI tell anything about the biological, clinical or public health meaning of the results. Neither P value, nor CI tell anything about the biological, clinical or public health meaning of the results.


Download ppt "1 Introduction to biostatistics Lecture plan 1. Basics 2. Variable types 3. Descriptive statistics: Categorical data Categorical data Numerical data Numerical."

Similar presentations


Ads by Google