Group Comparisons Part 1 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal.

Slides:



Advertisements
Similar presentations
Comparison of 2 Population Means Goal: To compare 2 populations/treatments wrt a numeric outcome Sampling Design: Independent Samples (Parallel Groups)
Advertisements

STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Describing Quantitative Variables
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
1 Economics 240A Power One. 2 Outline w Course Organization w Course Overview w Resources for Studying.
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Final Review Session.
Review Chapter 1-3. Exam 1 25 questions 50 points 90 minutes 1 attempt Results will be known once the exam closes for everybody.
Bivariate Statistics GTECH 201 Lecture 17. Overview of Today’s Topic Two-Sample Difference of Means Test Matched Pairs (Dependent Sample) Tests Chi-Square.
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Quantitative Business Methods for Decision Making Estimation and Testing of Hypotheses.
Statistics Or Do our Data mean Diddly?. Why are stat important Sometimes two data sets look different, but aren’t Other times, two data sets don’t look.
1 Introduction to biostatistics Lecture plan 1. Basics 2. Variable types 3. Descriptive statistics: Categorical data Categorical data Numerical data Numerical.
Biostat Didactic Seminar Series Analyzing Binary Outcomes: Analyzing Binary Outcomes: An Introduction to Logistic Regression Robert Boudreau, PhD Co-Director.
Correlation, Regression Covariate-Adjusted Group Comparisons
AM Recitation 2/10/11.
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
Statistics in psychology Describing and analyzing the data.
Rules of Data Dispersion By using the mean and standard deviation, we can find the percentage of total observations that fall within the given interval.
7.2 Confidence Intervals When SD is unknown. The value of , when it is not known, must be estimated by using s, the standard deviation of the sample.
Statistics: Examples and Exercises Fall 2010 Module 1 Day 7.
Statistical inference. Distribution of the sample mean Take a random sample of n independent observations from a population. Calculate the mean of these.
Agresti/Franklin Statistics, 1 of 111 Chapter 9 Comparing Two Groups Learn …. How to Compare Two Groups On a Categorical or Quantitative Outcome Using.
Descriptive Statistics Measures of Variation. Essentials: Measures of Variation (Variation – a must for statistical analysis.) Know the types of measures.
RESULTS & DATA ANALYSIS. Descriptive Statistics  Descriptive (describe)  Frequencies  Percents  Measures of Central Tendency mean median mode.
Biostat Didactic Seminar Series Correlation and Regression Part 2 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical.
Descriptive Statistics becoming familiar with the data.
Group Comparisons Part 3: Nonparametric Tests, Chi-squares and Fisher Exact Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary.
N318b Winter 2002 Nursing Statistics Lecture 2: Measures of Central Tendency and Variability.
Describing distributions with numbers
Descriptive Statistics1 LSSG Green Belt Training Descriptive Statistics.
Descriptive Statistics Examining Your Data Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical Research Center for Rheumatic.
Determination of Sample Size: A Review of Statistical Theory
Agenda Descriptive Statistics Measures of Spread - Variability.
[1] Simple Linear Regression. The general equation of a line is Y = c + mX or Y =  +  X.  > 0  > 0  > 0  = 0  = 0  < 0  > 0  < 0.
PROBABILITY AND STATISTICS WEEK 1 Onur Doğan. What is Statistics? Onur Doğan.
Sampling ‘Scientific sampling’ is random sampling Simple random samples Systematic random samples Stratified random samples Random cluster samples What?
Descriptive Statistics Tabular and Graphical Displays –Frequency Distribution - List of intervals of values for a variable, and the number of occurrences.
Quality Control: Analysis Of Data Pawan Angra MS Division of Laboratory Systems Public Health Practice Program Office Centers for Disease Control and.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 5. Measuring Dispersion or Spread in a Distribution of Scores.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
ENGR 610 Applied Statistics Fall Week 7 Marshall University CITE Jack Smith.
Descriptive Statistics(Summary and Variability measures)
Review Chapter 1-3. Exam 1 25 questions 50 points 90 minutes 1 attempt Results will be known once the exam closes for everybody.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Exploratory Data Analysis
Doc.RNDr.Iveta Bedáňová, Ph.D.
Chapter 10 Two-Sample Tests and One-Way ANOVA.
STAT 4030 – Programming in R STATISTICS MODULE: Basic Data Analysis
Review 1. Describing variables.
Research Methodology Lecture No :25 (Hypothesis Testing – Difference in Groups)
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Statistics in psychology
Psychology 202a Advanced Psychological Statistics
IENG 486: Statistical Quality & Process Control
Description of Data (Summary and Variability measures)
Summary Statistics 9/23/2018 Summary Statistics
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
HMI 7530– Programming in R STATISTICS MODULE: Basic Data Analysis
Descriptive and inferential statistics. Confidence interval
BUS173: Applied Statistics
Experimental Design Data Normal Distribution
Chapter 6: Becoming Acquainted with Statistical Concepts
Welcome!.
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Introductory Statistics
Descriptive Statistics Civil and Environmental Engineering Dept.
Presentation transcript:

Group Comparisons Part 1 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal Diseases Core Director for Biostatistics Center for Aging and Population Health Center for Aging and Population Health Dept. of Epidemiology, GSPH Dept. of Epidemiology, GSPH

Flow chart for group comparisons Measurements to be compared continuous Distribution approx normal or N ≥ 20? NoYes Non-parametrics T-tests discrete ( binary, nominal, ordinal with few values)

Outline For Today Continuous Distributions Normal distribution Normal distribution Mean Mean Standard deviation ( computation, interpretation ) Standard deviation ( computation, interpretation ) Confidence Intervals, t-distribution Confidence Intervals, t-distribution Comparing 2-groups Comparing 2-groups T-tests T-tests Next lecture Wilcoxon Rank-Sum (non-parametric) Wilcoxon Rank-Sum (non-parametric)

Confidence Interval For a Continuous Variable Aflatoxin levels of raw peanut kernels (n=15). 30, 26, 26, 36, 48, 50, 16, 31, 22, 27, 23, 35, 52, 28, 37 Aflatoxin, a natural toxin produced by certain strains of the mold Aspergillus flavus and A. parasiticus that grow on peanuts stored in warm, humid silos. Peanuts aren't the only affected crops. Aflatoxins have been found in pecans, pistachios and walnuts, as well as milk, grains, soybeans and spices. Aflatoxin is a potent carcinogen, known to cause liver cancer in laboratory animals and may contribute to liver cancer in Africa where peanuts are a dietary staple.

Aflatoxin levels of raw peanut kernels Stem-and-leaf plot Stem (tens)Leaf (Units) Range= max-min= 52-16=36 Mode = 26 (highest frequency)

Aflatoxin levels of raw peanut kernels 30, 26, 26, 36, 48, 50, 16, 31, 22, 27, 23, 35, 52, 28, 37 Q1 median Q3 Q1 median Q3 16, 22, 23 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52 1st Quartile: 25%) (3rd Quartile: 75%) (1st Quartile: 25%) (3rd Quartile: 75%) IQR= Q3-Q1= 37-26= 11

<= No outliers Slightly skewed

Box-and-Whisker Plot (full Bell-labs version with outliers)

Standard Deviation (SD) N-1 = degrees of freedom (df) N-1 = degrees of freedom (df) N datapoints (total pieces of information) Parameters estimated: Mean: 1 df, SD: N-1 df Large SD => data points widely spread out from the mean Large SD => data points widely spread out from the mean Small SD => data points clustered closely around the mean Small SD => data points clustered closely around the mean

Empirical rule for interpreting SD in normal distributions

Empirical rule for interpreting SD Hseih, et. al. Effects of high-intensity exercise training in a pulmonary rehabilitation programme for patients with chronic obstructive pulmonary disease. Respirology (2007) 12:381–388 Age of cohort: 73.9 ± 6.7 (Mean ± SD) “Patients who completed high-intensity training had significant improvements in FVC (2.47 ± 0.70 L, P = 0.024) at rest”.

Rules for interpreting SDs that apply to any distribution Chebyshev’s Inequality At least 50% of the values are within √ 2 SDs At least 50% of the values are within √ 2 SDs of the mean At least 75% of the values are within 2 SDs At least 75% of the values are within 2 SDs At least 89% of the values are within 3 SDs At least 89% of the values are within 3 SDs

Rules for interpreting SDs that apply to any distribution Women’s Health Initiative Observational Study (WHI-OS) ~ 90,000 women (WHI-OS) ~ 90,000 women longitudinal cohort study (8yrs and continuing) longitudinal cohort study (8yrs and continuing) Osteoporotic Fractures Ancillary Substudy  case-control study 1200 cases (fractures), 1200 controls 1200 cases (fractures), 1200 controls Inflammatory markers (e.g. IL-6) Inflammatory markers (e.g. IL-6) Hormones (estradiol), bone mineral density, … Hormones (estradiol), bone mineral density, … 25(OH)2 Vitamin D3 (ng/ml) 25(OH)2 Vitamin D3 (ng/ml)

Rules for interpreting SDs that apply to any distribution Women’s Health Initiative Observational Study Osteoporotic Fractures Ancillary Substudy 25(OH)2 Vitamin D3 (ng/ml) 25(OH)2 Vitamin D3 (ng/ml) mean (SD):32.8 ± 10.7 (controls) mean (SD):32.8 ± 10.7 (controls) 21.6 ± 13.6 (cases)

Rules for interpreting SDs that apply to any distribution Women’s Health Initiative Observational Study Osteoporotic Fractures Ancillary Substudy 25(OH)2 Vitamin D3 (ng/ml) 25(OH)2 Vitamin D3 (ng/ml) mean (SD):32.8 ± 10.7 (controls) mean (SD):32.8 ± 10.7 (controls) 21.6 ± 13.6 (cases) Cases: (SD=13.6) At least 50% within √2 SD’s (21.6 ± 19.2, ) At least 50% within √2 SD’s (21.6 ± 19.2, ) At least 75% within 2 SD’s (21.6 ± 27.2, ) At least 75% within 2 SD’s (21.6 ± 27.2, )

Confidence Interval for a Population Mean Standard error of the mean: Mean: * Standard error is general term for standard deviation of some estimator

∞ 1.96  Normal dist (limit) Example: n=19, df=18

Aflatoxin levels of raw peanut kernels n= 15 df=14 (=n-1) 16, 22, 23 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52 t 0.025,14 = % C.I: ± 2.145*(10.63/√15) = ± 2.145*2.744 = ± 5.89 = % C.I: (26.58, 38.36)

Aflatoxin levels of raw peanut kernels n= 15 peanuts sampled from silo 16, 22, 23 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52 95% C.I: (26.58, 38.36) 95% C.I.  p < 0.05 (using t-test) Hypothesis: Mean of entire silo = 30 (p>0.05 => not rejected) H 0 : Mean of all silos = 25 (p rejected)

Aflatoxin levels of raw peanut kernels n= 15 peanuts sampled from silo 95% C.I: (26.58, 38.36) 95% C.I.  p < 0.05 (using t-test) Hypothesis: Mean of entire silo = 30 t = ( mean – 30 ) / Stderr( mean ) = ( )/2.744 = 0.90 t=0.90, df=14, p = ( /2= => see table)

∞ t=0.90, df=14, p = ( /2= )

2-sample independent t-test for comparing means of two groups  General Formula: stdev = sqrt(variance) If two independent estimators (e.g. group means):  Variance(of difference) = sum of variances

2-sample t-test to compare two groups Case 1: Equal variances “pooled” variance estimate df = n 1 + n 2 - 2

2-sample t-test to compare two groups denom = stderr of numerator Case 2: Unequal variances D.F = Welch-Satterthwaite equation (best approx df)

Does Cell Phone Use While Driving Impair Reaction Times? Sample of 64 students from Univ of Utah Randomly assigned: cell phone group or control Randomly assigned: cell phone group or control => 32 in each group => 32 in each group On machine that simulated driving situations: On machine that simulated driving situations: => at irregular periods a target flashed red or green => at irregular periods a target flashed red or green Participants instructed to hit “brake button” as soon as possible when they detected red light Participants instructed to hit “brake button” as soon as possible when they detected red light Control group listened to radio or to books-on-tape Control group listened to radio or to books-on-tape Cell phone group carried on conversation about a political issue with someone in another room Cell phone group carried on conversation about a political issue with someone in another room

Does Cell Phone Use While Driving Impair Reaction Times ? (milliseconds) N Mean SD Cell Phone Control Difference 51.5 = sqrt( / /32)=19.6 = sqrt( / /32)=19.6 = = t = 51.5/19.6 = 2.63, p=0.011 t = 51.5/19.6 = 2.63, p=0.011

Removing one high outlier from cell phone group N Mean SD Cell Phone (“equal Control variances”) Difference 39.4 = (pooled var) = (pooled var) df= n1+n2-2 = 61 df= n1+n2-2 = 61 t = 39.4/(62.69*√(1/31+1/32)) = 2.52 ( p=0.015)