Download presentation
Presentation is loading. Please wait.
Published byBrianne Willis Modified over 9 years ago
1
Medical Statistics as a science
2
Меdical Statistics: To do this we must assume that all data is randomly sampled from an infinitely large population, then analyse this sample and use results to make inferences about the population Extrapolate from data collected to make general conclusions about larger population from which data sample was derived Allows general conclusions to be made from limited amounts of data
3
Statistical Analysis in a Simple Experiment Half the subjects receive one treatment and the other half another treatment (usually placebo) Define population of interest Use statistical techniques to make inferences about the distribution of the variables in the general population and about the effect of the treatment Measure baseline variables in each group (e.g. age, Apache II to ensure randomisation successful) Randomly select sample of subjects to study (exclusion criteria but define a precise patient population)
4
Outline Power Power Basic Sample Size Information Basic Sample Size Information Examples (see text for more) Examples (see text for more) Changes to the basic formula Changes to the basic formula Multiple comparisons Multiple comparisons Poor proposal sample size statements Poor proposal sample size statements Conclusion and Resources Conclusion and Resources
7
Тypes of descriptive statistics: Measures of central tendency Graphs Measures of variability
8
Categoric al data: values belong to categories Data Nominal data: there is no natural order to the categories e.g. blood groups Numeric al data: the value is a number (either measured or counted) Data Ordinal data: there is natural order e.g. Adverse Events (Mild/Modera te/Severe/Lif e Threatening)
9
Data Categorical data: values belong to categories Categorical data: values belong to categories Nominal data: there is no natural order to the categories e.g. blood groups Nominal data: there is no natural order to the categories e.g. blood groups Ordinal data: there is natural order e.g. Adverse Events (Mild/Moderate/Severe/Life Threatening) Ordinal data: there is natural order e.g. Adverse Events (Mild/Moderate/Severe/Life Threatening) Binary data: there are only two possible categories e.g. alive/dead Binary data: there are only two possible categories e.g. alive/dead Numerical data: the value is a number (either measured or counted) Numerical data: the value is a number (either measured or counted) Continuous data: measurement is on a continuum e.g. height, age, haemoglobin Continuous data: measurement is on a continuum e.g. height, age, haemoglobin Discrete data: a “count” of events e.g. number of pregnancies Discrete data: a “count” of events e.g. number of pregnancies
10
Descriptive Statistics: Descriptive Statistics: concerned with summarising or describing a sample eg. mean, median Inferential Statistics: Inferential Statistics: concerned with generalising from a sample, to make estimates and inferences about a wider population eg. T-Test, Chi Square test
11
1)Basic requirement of medical research Why we need to study statistics? 3)Data manageme nt and treatment 2)Update your medical knowledge.
12
Statistical Terms Mean: the average of the data sensitive to outlying data Mean: the average of the data sensitive to outlying data Median: the middle of the data not sensitive to outlying data Median: the middle of the data not sensitive to outlying data Mode: most commonly occurring value Mode: most commonly occurring value Range: the spread of the data Range: the spread of the data IQ range: the spread of the data commonly used for skewed data IQ range: the spread of the data commonly used for skewed data Standard deviation: a single number which measures how much the observations vary around the mean Standard deviation: a single number which measures how much the observations vary around the mean Symmetrical data: data that follows normal distribution (mean=median=mode) report mean & standard deviation & n Symmetrical data: data that follows normal distribution (mean=median=mode) report mean & standard deviation & n Skewed data: not normally distributed (mean median mode) report median & IQ Range Skewed data: not normally distributed (mean median mode) report median & IQ Range
13
Standard Normal Distribution
14
Mean +/- 1 SD encompasses 68% of observations Mean +/- 2 SD encompasses 95% of observations Mean +/- 3SD encompasses 99.7% of observations
15
15 1. Experimental Design Convenience Sampling Convenience Sampling Use results that are easy to get Use results that are easy to get
16
16 1. Experimental Design Stratified Sampling Stratified Sampling Draw a sample from each stratum Draw a sample from each stratum
17
Basic concepts Homogeneity: All individuals have similar values or belong to same category. Example: all individuals are Chinese, women, middle age (30~40 years old), work in a textile mill ---- homogeneity in nationality, gender, age and occupation. Variation: the differences in height, weight…
18
Steps in Statistical Testing Null hypothesis Ho: there is no difference between the groups Null hypothesis Ho: there is no difference between the groups Alternative hypothesis H1: there is a difference between the groups Alternative hypothesis H1: there is a difference between the groups Collect data Collect data Perform test statistic eg T test, Chi square Perform test statistic eg T test, Chi square Interpret P value and confidence intervals Interpret P value and confidence intervals P value 0.05 Reject Ho P value > 0.05 Accept Ho Draw conclusions Draw conclusions
19
Population and sample Population: The whole collection of individuals that one intends to study Sample: A representati ve part of the population. Randomization: An important way to make the sample representative
20
Probability Measure the possibility of occurrence of a random event. Measure the possibility of occurrence of a random event. A : random event A : random event P(A) : Probability of the random event A P(A) : Probability of the random event A P(A)=1, if an event always occurs. P(A)=1, if an event always occurs. P(A)=0, if an event never occurs. P(A)=0, if an event never occurs.
21
21 2. Descriptive Statistics & Distributions Parameter: population quantity Parameter: population quantity Statistic: summary of the sample Statistic: summary of the sample Inference for parameters: use sample Inference for parameters: use sample Central Tendency Central Tendency Mean (average) Mean (average) Median (middle value) Median (middle value) Variability Variability Variance: measure of variation Variance: measure of variation Standard deviation (sd): square root of variance Standard deviation (sd): square root of variance Standard error (se): sd of the estimate Standard error (se): sd of the estimate Median, quartiles, min., max, range, boxplot Median, quartiles, min., max, range, boxplot Proportion Proportion
22
22 2. Descriptive Statistics & Distributions Normal distribution Normal distribution
23
23 2. Descriptive Statistics & Distributions Standard normal distribution: Standard normal distribution: Mean 0, variance 1 Mean 0, variance 1
24
24 2. Descriptive Statistics & Distributions Z-test for means Z-test for means T-test for means if sd is unknown T-test for means if sd is unknown
25
Meaning of P P Value: the probability of observing a result as extreme or more extreme than the one actually observed from chance alone P Value: the probability of observing a result as extreme or more extreme than the one actually observed from chance alone Lets us decide whether to reject or accept the null hypothesis Lets us decide whether to reject or accept the null hypothesis P > 0.05Not significant P > 0.05Not significant P = 0.01 to 0.05Significant P = 0.01 to 0.05Significant P = 0.001 to 0.01Very significant P = 0.001 to 0.01Very significant P < 0.001Extremely significant P < 0.001Extremely significant
26
26 3. Inference for Means Click ‘Statistics’ to select the statistical procedure. Click ‘File’ to open the SAS data set. Click ‘File’ to import data and create the SAS data set. Click ‘Solution’ to create a project to run statistical test
27
27 3. Inference for Means Mann-Whitney U-Test (Wilcoxon Rank-Sum Test) Mann-Whitney U-Test (Wilcoxon Rank-Sum Test) Nonparametric alternative to two-sample t-test Nonparametric alternative to two-sample t-test The populations don’t need to be normal The populations don’t need to be normal H 0 : The two samples come from populations H 0 : The two samples come from populations with equal medians with equal medians H 1 : The two samples come from populations H 1 : The two samples come from populations with different medians with different medians
28
28 3. Inference for Means Mann-Whitney U-Test Procedure Mann-Whitney U-Test Procedure Temporarily combine the two samples into one big sample, then replace each sample value with its rank Temporarily combine the two samples into one big sample, then replace each sample value with its rank Find the sum of the ranks for either one of the two samples Find the sum of the ranks for either one of the two samples Calculate the value of the z test statistic Calculate the value of the z test statistic
29
T Test T test checks whether two samples are likely to have come from the same or different populations T test checks whether two samples are likely to have come from the same or different populations Used on continuous variables Used on continuous variables Example: Age of patients in the APC study (APC/placebo) Example: Age of patients in the APC study (APC/placebo) PLACEBO: APC: mean age 60.6 years mean age 60.5 years SD+/- 16.5SD +/- 17.2 SD+/- 16.5SD +/- 17.2 n= 840n= 850 n= 840n= 850 95% CI 59.5-61.795% CI 59.3-61.7 95% CI 59.5-61.795% CI 59.3-61.7 What is the P value? What is the P value? 0.01 0.01 0.05 0.05 0.10 0.10 0.90 0.90 0.99 0.99 P = 0.903 not significant patients from the same population (groups designed to be matched by randomisation so no surprise!!) P = 0.903 not significant patients from the same population (groups designed to be matched by randomisation so no surprise!!)
30
T Test: SAFE “Serum Albumin” Q: Are these albumin levels different? Ho = Levels are the same (any difference is there by chance) H1 =Levels are too different to have occurred purely by chance Statistical test: T test P < 0.0001 (extremely significant) Reject null hypothesis (Ho) and accept alternate hypothesis (H1) ie. 1 in 10 000 chance that these samples are both from the same overall group therefore we can say they are very likely to be different PLACEBO ALBUMIN n3500 3500 mean28 30 SD10 10 95% CI27.7-28.3 29.7-30.3
31
Effect of Sample Size Reduction smaller sample size (one tenth smaller) smaller sample size (one tenth smaller) causes wider CI (less confident where mean is) causes wider CI (less confident where mean is) P = 0.008 (i.e. approx 0.01 P is significant but less so) P = 0.008 (i.e. approx 0.01 P is significant but less so) This sample size influence on ability to find any particular difference as statistically significant is a major consideration in study design This sample size influence on ability to find any particular difference as statistically significant is a major consideration in study design PLACEBO ALBUMIN n350 350 mean28 30 SD10 10 95% CI27.0-29.0 29.0-31.0
32
Reducing Sample Size (again) using even smaller sample size (now 1/100) using even smaller sample size (now 1/100) much wider confidence intervals much wider confidence intervals p=0.41 (not significant anymore) p=0.41 (not significant anymore) SMALLER STUDY has LOWER POWER to find any particular difference to be statistically significant (mean and SD unchanged) SMALLER STUDY has LOWER POWER to find any particular difference to be statistically significant (mean and SD unchanged) POWER: the ability of a study to detect an actual effect or difference POWER: the ability of a study to detect an actual effect or difference PLACEBO ALBUMIN n35 35 mean28 30 SD10 10 95% CI24.6-31.4 26.6-33.4
33
33 3. Inference for Means Mann-Whitney U-Test, Example Mann-Whitney U-Test, Example Numbers in parentheses are their ranks beginning with a rank of 1 assigned to the lowest value of 17.7. Numbers in parentheses are their ranks beginning with a rank of 1 assigned to the lowest value of 17.7. R 1 and R 2 : sum of ranks R 1 and R 2 : sum of ranks
34
34 3. Inference for Means Hypothesis: The group means are different Hypothesis: The group means are different H o : Men and women have same median BMI’s H o : Men and women have same median BMI’s H 1 : Men and women have different median BMI’s H 1 : Men and women have different median BMI’s p-value thus we do not reject H 0 at =0.05. There is no significant difference in BMI between men and women.
35
35 3. Inference for Means SAS Programming for Mann-Whitney U-Test Procedure SAS Programming for Mann-Whitney U-Test Procedure Data steps : Data steps : The same as slide 21. The same as slide 21. Procedure steps : Procedure steps : Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’ Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’ Click ‘File’ Click ‘Open By SAS Name’ Click ‘File’ Click ‘Open By SAS Name’ Select the SAS data set and Click ‘OK’ Select the SAS data set and Click ‘OK’ Click ‘Statistics’ Click ‘ ANOVA’ Click ‘Statistics’ Click ‘ ANOVA’ Click ‘Nonparametric One-Way ANOVA’ Click ‘Nonparametric One-Way ANOVA’ Select the ‘Dependent’ and ‘Independent’ variables respectively Select the ‘Dependent’ and ‘Independent’ variables respectively and choose the interested test Click ‘OK’ and choose the interested test Click ‘OK’
36
36 3. Inference for Means Click ‘Statistics’ to select the statistical procedure. Click ‘File’ to open the SAS data set. Select the dependent and independent variables:
37
37 3. Inference for Means Notation for paired t-test Notation for paired t-test d = individual difference between the two d = individual difference between the two values of a single matched pair values of a single matched pair µ d = mean value of the differences d for the µ d = mean value of the differences d for the population of paired data population of paired data = mean value of the differences d for the paired sample data = mean value of the differences d for the paired sample data s d = standard deviation of the differences d for the paired sample data s d = standard deviation of the differences d for the paired sample data n = number of pairs n = number of pairs
38
38 3. Inference for Means Example: Systolic Blood Pressure Example: Systolic Blood Pressure OC: Oral contraceptive OC: Oral contraceptive IDWithout OC’sWith OC’sDifference 111512813 21121153 3107106 41191289 51151227 61381457 71261326 81051094 9104102-2 101151172
39
39 3. Inference for Means Hypothesis: The group means are different Hypothesis: The group means are different H o : vs. H 1 : H o : vs. H 1 : Significance level: = 0.05 Significance level: = 0.05 Degrees of freedom (df): Degrees of freedom (df): Test statistic Test statistic P-value: 0.009, thus reject H o at =0.05 P-value: 0.009, thus reject H o at =0.05 The data support the claim that oral contraceptives affect the systolic bp. The data support the claim that oral contraceptives affect the systolic bp.
40
40 3. Inference for Means Confidence interval for matched pairs Confidence interval for matched pairs 100(1- )% CI: 100(1- )% CI: 95% CI for the mean difference of the systolic bp: 95% CI for the mean difference of the systolic bp: (1.53, 8.07) (1.53, 8.07)
41
41 3. Inference for Means Click ‘Statistics’ to select the statistical procedure. Click ‘File’ to open the SAS data set. Put the two group variables into ‘Group 1’ and ‘Group 2’
42
Chi Square Test Proportions or frequencies Proportions or frequencies Binary data e.g. alive/dead Binary data e.g. alive/dead PROWESS Study: Primary endpoint: 28 day all cause mortality PROWESS Study: Primary endpoint: 28 day all cause mortality ALIVE DEAD TOTAL % DEAD PLACEBO 581 (69.2%) 259 (30.8%)840 (100%) 30.8 DEAD 640 (75.3%) 210 (24.7%)850 (100%) 24.7 TOTAL 1221 (72.2%) 469 (27.8%)1690 (100%) Perform Chi Square test P = 0.006 (very significant) 6 in 1000 times this result could happen by chance 994 in 1000 times this difference was not by chance variation Reduction in death rate = 30.8%-24.7%= 6.1% ie 6.1% less likely to die in APC group
43
Reducing Sample Size Same results but using much smaller sample size (one tenth) Same results but using much smaller sample size (one tenth) ALIVE DEAD TOTAL % DEAD ALIVE DEAD TOTAL % DEAD PLACEBO 58 (69.2%) 26 (30.8%) 84 (100%) 30.8 DEAD 64 (75.3%) 21 (24.7%) 85 (100%) 24.7 TOTAL 122 (72.2%) 47 (27.8%) 169 (100%) Reduction in death rate = 6.1% (still the same) Perform Chi Square test P = 0.39 39 in 100 times this difference in mortality could have happened by chance therefore results not significant Again, power of a study to find a difference depends a lot on sample size for binary data as well as continuous data
44
Summary Size matters=BIGGER IS BETTER Size matters=BIGGER IS BETTER Spread matters=SMALLER IS BETTER Spread matters=SMALLER IS BETTER Bigger difference=EASIER TO FIND Bigger difference=EASIER TO FIND Smaller difference=MORE DIFFICULT TO FIND Smaller difference=MORE DIFFICULT TO FIND To find a small difference you need a big study To find a small difference you need a big study
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.