Medical Statistics as a science. Меdical Statistics: To do this we must assume that all data is randomly sampled from an infinitely large population,

Slides:



Advertisements
Similar presentations
CHAPTER TWELVE ANALYSING DATA I: QUANTITATIVE DATA ANALYSIS.
Advertisements

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Inferential Statistics
Unit 1: Science of Psychology
PSY 307 – Statistics for the Behavioral Sciences Chapter 20 – Tests for Ranked Data, Choosing Statistical Tests.
Introduction to statistics in medicine – Part 1 Arier Lee.
Confidence Interval and Hypothesis Testing for:
Statistical Tests Karen H. Hagglund, M.S.
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
QUANTITATIVE DATA ANALYSIS
DATA ANALYSIS I MKT525. Plan of analysis What decision must be made? What are research objectives? What do you have to know to reach those objectives?
Topic 2: Statistical Concepts and Market Returns
Final Review Session.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Chapter 2 Simple Comparative Experiments
Chapter 11: Inference for Distributions
Inferences About Process Quality
Today Concepts underlying inferential statistics
1 Introduction to biostatistics Lecture plan 1. Basics 2. Variable types 3. Descriptive statistics: Categorical data Categorical data Numerical data Numerical.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
AM Recitation 2/10/11.
Statistical Analysis I have all this data. Now what does it mean?
Selecting the Correct Statistical Test
Hypothesis Testing Charity I. Mulig. Variable A variable is any property or quantity that can take on different values. Variables may take on discrete.
Fundamentals of Statistical Analysis DR. SUREJ P JOHN.
Statistical Analysis Statistical Analysis
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
Statistics in psychology Describing and analyzing the data.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 23, Slide 1 Chapter 23 Comparing Means.
More About Significance Tests
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
Comparing Two Population Means
Statistics for Medical Researchers Hongshik Ahn Professor Department of Applied Math and Statistics Stony Brook University Biostatistician, Stony Brook.
Statistical Analysis I have all this data. Now what does it mean?
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Standard Error and Confidence Intervals Martin Bland Professor of Health Statistics University of York
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Descriptive Statistics, The Normal Distribution, and Standardization.
DATA IDENTIFICATION AND ANALYSIS. Introduction  During design phase of a study, the investigator must decide which type of data will be collected and.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Review Hints for Final. Descriptive Statistics: Describing a data set.
Experimental Design and Statistics. Scientific Method
Medical Statistics as a science
How confident are we in the estimation of mean/proportion we have calculated?
AP Statistics Chapter 24 Comparing Means.
The exam is of 2 hours & Marks :40 The exam is of two parts ( Part I & Part II) Part I is of 20 questions. Answer any 15 questions Each question is of.
Three Broad Purposes of Quantitative Research 1. Description 2. Theory Testing 3. Theory Generation.
Data Analysis.
PCB 3043L - General Ecology Data Analysis.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Statistical Analysis II Lan Kong Associate Professor Division of Biostatistics and Bioinformatics Department of Public Health Sciences December 15, 2015.
Principles of statistical testing
Descriptive and Inferential Statistics Or How I Learned to Stop Worrying and Love My IA.
Introduction to Medical Statistics. Why Do Statistics? Extrapolate from data collected to make general conclusions about larger population from which.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Chapter 13 Understanding research results: statistical inference.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Review Statistical inference and test of significance.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Dr Hidayathulla Shaikh. Objectives At the end of the lecture student should be able to – Discuss normal curve Classify parametric and non parametric tests.
Medical Statistics as a science
Chapter 2 Simple Comparative Experiments
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Presentation transcript:

Medical Statistics as a science

Меdical Statistics: To do this we must assume that all data is randomly sampled from an infinitely large population, then analyse this sample and use results to make inferences about the population Extrapolate from data collected to make general conclusions about larger population from which data sample was derived Allows general conclusions to be made from limited amounts of data

Statistical Analysis in a Simple Experiment Half the subjects receive one treatment and the other half another treatment (usually placebo) Define population of interest Use statistical techniques to make inferences about the distribution of the variables in the general population and about the effect of the treatment Measure baseline variables in each group (e.g. age, Apache II to ensure randomisation successful) Randomly select sample of subjects to study (exclusion criteria but define a precise patient population)

Outline Power Power Basic Sample Size Information Basic Sample Size Information Examples (see text for more) Examples (see text for more) Changes to the basic formula Changes to the basic formula Multiple comparisons Multiple comparisons Poor proposal sample size statements Poor proposal sample size statements Conclusion and Resources Conclusion and Resources

Тypes of descriptive statistics: Measures of central tendency Graphs Measures of variability

Categoric al data:  values belong to categories Data Nominal data: there is no natural order to the categories e.g. blood groups Numeric al data:  the value is a number (either measured or counted) Data Ordinal data: there is natural order e.g. Adverse Events (Mild/Modera te/Severe/Lif e Threatening)

Data Categorical data:  values belong to categories Categorical data:  values belong to categories Nominal data: there is no natural order to the categories e.g. blood groups Nominal data: there is no natural order to the categories e.g. blood groups Ordinal data: there is natural order e.g. Adverse Events (Mild/Moderate/Severe/Life Threatening) Ordinal data: there is natural order e.g. Adverse Events (Mild/Moderate/Severe/Life Threatening) Binary data: there are only two possible categories e.g. alive/dead Binary data: there are only two possible categories e.g. alive/dead Numerical data:  the value is a number (either measured or counted) Numerical data:  the value is a number (either measured or counted) Continuous data: measurement is on a continuum e.g. height, age, haemoglobin Continuous data: measurement is on a continuum e.g. height, age, haemoglobin Discrete data: a “count” of events e.g. number of pregnancies Discrete data: a “count” of events e.g. number of pregnancies

Descriptive Statistics: Descriptive Statistics: concerned with summarising or describing a sample eg. mean, median Inferential Statistics: Inferential Statistics: concerned with generalising from a sample, to make estimates and inferences about a wider population eg. T-Test, Chi Square test

1)Basic requirement of medical research Why we need to study statistics? 3)Data manageme nt and treatment 2)Update your medical knowledge.

Statistical Terms Mean:  the average of the data  sensitive to outlying data Mean:  the average of the data  sensitive to outlying data Median:  the middle of the data  not sensitive to outlying data Median:  the middle of the data  not sensitive to outlying data Mode:  most commonly occurring value Mode:  most commonly occurring value Range:  the spread of the data Range:  the spread of the data IQ range:  the spread of the data  commonly used for skewed data IQ range:  the spread of the data  commonly used for skewed data Standard deviation:  a single number which measures how much the observations vary around the mean Standard deviation:  a single number which measures how much the observations vary around the mean Symmetrical data:  data that follows normal distribution  (mean=median=mode)  report mean & standard deviation & n Symmetrical data:  data that follows normal distribution  (mean=median=mode)  report mean & standard deviation & n Skewed data:  not normally distributed  (mean  median  mode)  report median & IQ Range Skewed data:  not normally distributed  (mean  median  mode)  report median & IQ Range

Standard Normal Distribution

Mean +/- 1 SD  encompasses 68% of observations Mean +/- 2 SD  encompasses 95% of observations Mean +/- 3SD  encompasses 99.7% of observations

15 1. Experimental Design Convenience Sampling Convenience Sampling Use results that are easy to get Use results that are easy to get

16 1. Experimental Design Stratified Sampling Stratified Sampling Draw a sample from each stratum Draw a sample from each stratum

Basic concepts Homogeneity: All individuals have similar values or belong to same category. Example: all individuals are Chinese, women, middle age (30~40 years old), work in a textile mill ---- homogeneity in nationality, gender, age and occupation. Variation: the differences in height, weight…

Steps in Statistical Testing Null hypothesis Ho: there is no difference between the groups Null hypothesis Ho: there is no difference between the groups Alternative hypothesis H1: there is a difference between the groups Alternative hypothesis H1: there is a difference between the groups Collect data Collect data Perform test statistic eg T test, Chi square Perform test statistic eg T test, Chi square Interpret P value and confidence intervals Interpret P value and confidence intervals P value  0.05 Reject Ho P value > 0.05 Accept Ho Draw conclusions Draw conclusions

Population and sample Population: The whole collection of individuals that one intends to study Sample: A representati ve part of the population. Randomization: An important way to make the sample representative

Probability Measure the possibility of occurrence of a random event. Measure the possibility of occurrence of a random event. A : random event A : random event P(A) : Probability of the random event A P(A) : Probability of the random event A P(A)=1, if an event always occurs. P(A)=1, if an event always occurs. P(A)=0, if an event never occurs. P(A)=0, if an event never occurs.

21 2. Descriptive Statistics & Distributions Parameter: population quantity Parameter: population quantity Statistic: summary of the sample Statistic: summary of the sample Inference for parameters: use sample Inference for parameters: use sample Central Tendency Central Tendency Mean (average) Mean (average) Median (middle value) Median (middle value) Variability Variability Variance: measure of variation Variance: measure of variation Standard deviation (sd): square root of variance Standard deviation (sd): square root of variance Standard error (se): sd of the estimate Standard error (se): sd of the estimate Median, quartiles, min., max, range, boxplot Median, quartiles, min., max, range, boxplot Proportion Proportion

22 2. Descriptive Statistics & Distributions Normal distribution Normal distribution

23 2. Descriptive Statistics & Distributions Standard normal distribution: Standard normal distribution: Mean 0, variance 1 Mean 0, variance 1

24 2. Descriptive Statistics & Distributions Z-test for means Z-test for means T-test for means if sd is unknown T-test for means if sd is unknown

Meaning of P P Value: the probability of observing a result as extreme or more extreme than the one actually observed from chance alone P Value: the probability of observing a result as extreme or more extreme than the one actually observed from chance alone Lets us decide whether to reject or accept the null hypothesis Lets us decide whether to reject or accept the null hypothesis P > 0.05Not significant P > 0.05Not significant P = 0.01 to 0.05Significant P = 0.01 to 0.05Significant P = to 0.01Very significant P = to 0.01Very significant P < 0.001Extremely significant P < 0.001Extremely significant

26 3. Inference for Means Click ‘Statistics’ to select the statistical procedure. Click ‘File’ to open the SAS data set. Click ‘File’ to import data and create the SAS data set. Click ‘Solution’ to create a project to run statistical test

27 3. Inference for Means Mann-Whitney U-Test (Wilcoxon Rank-Sum Test) Mann-Whitney U-Test (Wilcoxon Rank-Sum Test) Nonparametric alternative to two-sample t-test Nonparametric alternative to two-sample t-test The populations don’t need to be normal The populations don’t need to be normal H 0 : The two samples come from populations H 0 : The two samples come from populations with equal medians with equal medians H 1 : The two samples come from populations H 1 : The two samples come from populations with different medians with different medians

28 3. Inference for Means Mann-Whitney U-Test Procedure Mann-Whitney U-Test Procedure Temporarily combine the two samples into one big sample, then replace each sample value with its rank Temporarily combine the two samples into one big sample, then replace each sample value with its rank Find the sum of the ranks for either one of the two samples Find the sum of the ranks for either one of the two samples Calculate the value of the z test statistic Calculate the value of the z test statistic

T Test T test checks whether two samples are likely to have come from the same or different populations T test checks whether two samples are likely to have come from the same or different populations Used on continuous variables Used on continuous variables Example: Age of patients in the APC study (APC/placebo) Example: Age of patients in the APC study (APC/placebo) PLACEBO: APC: mean age 60.6 years mean age 60.5 years SD+/- 16.5SD +/ SD+/- 16.5SD +/ n= 840n= 850 n= 840n= % CI % CI % CI % CI What is the P value? What is the P value? P =  not significant  patients from the same population (groups designed to be matched by randomisation so no surprise!!) P =  not significant  patients from the same population (groups designed to be matched by randomisation so no surprise!!)

T Test: SAFE “Serum Albumin” Q: Are these albumin levels different? Ho = Levels are the same (any difference is there by chance) H1 =Levels are too different to have occurred purely by chance Statistical test: T test  P < (extremely significant) Reject null hypothesis (Ho) and accept alternate hypothesis (H1) ie. 1 in chance that these samples are both from the same overall group therefore we can say they are very likely to be different PLACEBO ALBUMIN n mean28 30 SD % CI

Effect of Sample Size Reduction smaller sample size (one tenth smaller) smaller sample size (one tenth smaller) causes wider CI (less confident where mean is) causes wider CI (less confident where mean is) P = (i.e. approx 0.01  P is significant but less so) P = (i.e. approx 0.01  P is significant but less so) This sample size influence on ability to find any particular difference as statistically significant is a major consideration in study design This sample size influence on ability to find any particular difference as statistically significant is a major consideration in study design PLACEBO ALBUMIN n mean28 30 SD % CI

Reducing Sample Size (again) using even smaller sample size (now 1/100) using even smaller sample size (now 1/100) much wider confidence intervals much wider confidence intervals p=0.41 (not significant anymore) p=0.41 (not significant anymore)  SMALLER STUDY has LOWER POWER to find any particular difference to be statistically significant (mean and SD unchanged)  SMALLER STUDY has LOWER POWER to find any particular difference to be statistically significant (mean and SD unchanged) POWER: the ability of a study to detect an actual effect or difference POWER: the ability of a study to detect an actual effect or difference PLACEBO ALBUMIN n35 35 mean28 30 SD % CI

33 3. Inference for Means Mann-Whitney U-Test, Example Mann-Whitney U-Test, Example Numbers in parentheses are their ranks beginning with a rank of 1 assigned to the lowest value of Numbers in parentheses are their ranks beginning with a rank of 1 assigned to the lowest value of R 1 and R 2 : sum of ranks R 1 and R 2 : sum of ranks

34 3. Inference for Means Hypothesis: The group means are different Hypothesis: The group means are different H o : Men and women have same median BMI’s H o : Men and women have same median BMI’s H 1 : Men and women have different median BMI’s H 1 : Men and women have different median BMI’s  p-value  thus we do not reject H 0 at  =0.05.  There is no significant difference in BMI between men and women.

35 3. Inference for Means SAS Programming for Mann-Whitney U-Test Procedure SAS Programming for Mann-Whitney U-Test Procedure Data steps : Data steps : The same as slide 21. The same as slide 21. Procedure steps : Procedure steps : Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’ Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’ Click ‘File’ Click ‘Open By SAS Name’ Click ‘File’ Click ‘Open By SAS Name’ Select the SAS data set and Click ‘OK’ Select the SAS data set and Click ‘OK’ Click ‘Statistics’ Click ‘ ANOVA’ Click ‘Statistics’ Click ‘ ANOVA’ Click ‘Nonparametric One-Way ANOVA’ Click ‘Nonparametric One-Way ANOVA’ Select the ‘Dependent’ and ‘Independent’ variables respectively Select the ‘Dependent’ and ‘Independent’ variables respectively and choose the interested test Click ‘OK’ and choose the interested test Click ‘OK’

36 3. Inference for Means Click ‘Statistics’ to select the statistical procedure. Click ‘File’ to open the SAS data set. Select the dependent and independent variables:

37 3. Inference for Means Notation for paired t-test Notation for paired t-test d = individual difference between the two d = individual difference between the two values of a single matched pair values of a single matched pair µ d = mean value of the differences d for the µ d = mean value of the differences d for the population of paired data population of paired data = mean value of the differences d for the paired sample data = mean value of the differences d for the paired sample data s d = standard deviation of the differences d for the paired sample data s d = standard deviation of the differences d for the paired sample data n = number of pairs n = number of pairs

38 3. Inference for Means Example: Systolic Blood Pressure Example: Systolic Blood Pressure OC: Oral contraceptive OC: Oral contraceptive IDWithout OC’sWith OC’sDifference

39 3. Inference for Means Hypothesis: The group means are different Hypothesis: The group means are different H o : vs. H 1 : H o : vs. H 1 : Significance level:  = 0.05 Significance level:  = 0.05 Degrees of freedom (df): Degrees of freedom (df): Test statistic Test statistic P-value: 0.009, thus reject H o at  =0.05 P-value: 0.009, thus reject H o at  =0.05 The data support the claim that oral contraceptives affect the systolic bp. The data support the claim that oral contraceptives affect the systolic bp.

40 3. Inference for Means Confidence interval for matched pairs Confidence interval for matched pairs 100(1-  )% CI: 100(1-  )% CI: 95% CI for the mean difference of the systolic bp: 95% CI for the mean difference of the systolic bp:  (1.53, 8.07)  (1.53, 8.07)

41 3. Inference for Means Click ‘Statistics’ to select the statistical procedure. Click ‘File’ to open the SAS data set. Put the two group variables into ‘Group 1’ and ‘Group 2’

Chi Square Test Proportions or frequencies Proportions or frequencies Binary data e.g. alive/dead Binary data e.g. alive/dead PROWESS Study: Primary endpoint: 28 day all cause mortality PROWESS Study: Primary endpoint: 28 day all cause mortality ALIVE DEAD TOTAL % DEAD PLACEBO 581 (69.2%) 259 (30.8%)840 (100%) 30.8 DEAD 640 (75.3%) 210 (24.7%)850 (100%) 24.7 TOTAL 1221 (72.2%) 469 (27.8%)1690 (100%)  Perform Chi Square test  P = (very significant)  6 in 1000 times this result could happen by chance  994 in 1000 times this difference was not by chance variation  Reduction in death rate = 30.8%-24.7%= 6.1% ie 6.1% less likely to die in APC group

Reducing Sample Size Same results but using much smaller sample size (one tenth) Same results but using much smaller sample size (one tenth) ALIVE DEAD TOTAL % DEAD ALIVE DEAD TOTAL % DEAD PLACEBO 58 (69.2%) 26 (30.8%) 84 (100%) 30.8 DEAD 64 (75.3%) 21 (24.7%) 85 (100%) 24.7 TOTAL 122 (72.2%) 47 (27.8%) 169 (100%)  Reduction in death rate = 6.1% (still the same)  Perform Chi Square test  P = in 100 times this difference in mortality could have happened by chance therefore results not significant  Again, power of a study to find a difference depends a lot on sample size for binary data as well as continuous data

Summary Size matters=BIGGER IS BETTER Size matters=BIGGER IS BETTER Spread matters=SMALLER IS BETTER Spread matters=SMALLER IS BETTER Bigger difference=EASIER TO FIND Bigger difference=EASIER TO FIND Smaller difference=MORE DIFFICULT TO FIND Smaller difference=MORE DIFFICULT TO FIND To find a small difference you need a big study To find a small difference you need a big study