CBER Selecting the Appropriate Statistical Distribution for a Primary Analysis P. Lachenbruch.

Slides:



Advertisements
Similar presentations
Comparison of 2 Population Means Goal: To compare 2 populations/treatments wrt a numeric outcome Sampling Design: Independent Samples (Parallel Groups)
Advertisements

CHAPTER TWELVE ANALYSING DATA I: QUANTITATIVE DATA ANALYSIS.
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007.
Lecture 16: Logistic Regression: Goodness of Fit Information Criteria ROC analysis BMTRY 701 Biostatistical Methods II.
Logistic Regression Example: Horseshoe Crab Data
Introduction to Logistic Regression In Stata Maria T. Kaylen, Ph.D. Indiana Statistical Consulting Center WIM Spring 2014 April 11, 2014, 3:00-4:30pm.
Repeated Measures, Part 3 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
IX. FIXED EFFECTS ANALYSIS OF VARIANCE  Regression analysis with categorical variables and one response measure per subject  One-way analysis of variance.
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
WINKS SDA Statistical Data Analysis (Windows Kwikstat) Getting Started Guide.
1 Logistic Regression EPP 245 Statistical Analysis of Laboratory Data.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
Final Review Session.
Chapter 14 Conducting & Reading Research Baumgartner et al Chapter 14 Inferential Data Analysis.
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 17: Nonparametric Tests & Course Summary.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Analysis of variance (2) Lecture 10. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.
Chi-Square and F Distributions Chapter 11 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 13 Using Inferential Statistics.
Logistic Regression Chapter 8.
Selecting the Correct Statistical Test
Multiple testing in high- throughput biology Petter Mostad.
Methods Workshop (3/10/07) Topic: Event Count Models.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
April 11 Logistic Regression –Modeling interactions –Analysis of case-control studies –Data presentation.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
ANOVA (Analysis of Variance) by Aziza Munir
Bayesian Analysis and Applications of A Cure Rate Model.
Biostat Didactic Seminar Series Correlation and Regression Part 2 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical.
Chapter 4 analysis of variance (ANOVA). Section 1 the basic idea and condition of application.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Analysis of time-stratified case-crossover studies in environmental epidemiology using Stata Aurelio Tobías Spanish Council for Scientific Research (CSIC),
Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests.
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
Count Models 1 Sociology 8811 Lecture 12
Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
Medical Statistics as a science
Lecture 12: Cox Proportional Hazards Model
Biostat 200 Lecture 7 1. Outline for today Hypothesis tests so far – One mean, one proportion, 2 means, 2 proportions Comparison of means of multiple.
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Medical Statistics as a science. Меdical Statistics: To do this we must assume that all data is randomly sampled from an infinitely large population,
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Exact Logistic Regression
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
Approaches to quantitative data analysis Lara Traeger, PhD Methods in Supportive Oncology Research.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
Logistic Regression APKC – STATS AFAC (2016).
From t-test to multilevel analyses Del-2
CHAPTER 7 Linear Correlation & Regression Methods
Lecture 18 Matched Case Control Studies
Introduction to Logistic Regression
Biost 513 Discussion Section Week 9
Selecting the Appropriate Statistical Distribution for a Primary Analysis P. Lachenbruch.
Count Models 2 Sociology 8811 Lecture 13
Common Statistical Analyses Theory behind them
Introductory Statistics
Presentation transcript:

CBER Selecting the Appropriate Statistical Distribution for a Primary Analysis P. Lachenbruch

CBER A Study of Xeroderma Pigmentosa (XP)  A characteristic of XP is the formation of Actinic Keratoses (AK s )  Multiple lesions appear haphazardly on a patient’s back  The rate of appearance may not be the same for different patients

CBER Background  Analysis: Rank Sum test.  Late in study the Statistical Analysis Plan (SAP) was amended to use Poisson regression  Unclear if stepwise selection of covariates was planned a priori

CBER Study Results  Poisson regression analysis showed highly significant treatment difference (p=0.009) adjusting for baseline AK, age, and age x treatment interaction (stepwise selection)  All these effects were highly significant.  Substantial outlier problem

CBER Assumptions Each patient has the same incidence rate,  per area unit. Chance of more than one AK in small area unit is negligible. Non-overlapping lesions are independent, that is, lesions occurring in one area of the body are not affected by those occurring in another area.

CBER Outliers  Outliers are observations that are jarringly different from the remainder of the data May be multiple outliers If frequency is large, this may be evidence that we have a mixture distribution.  Can substantially affect analysis

CBER Analyses Two-Sample Wilcoxon rank-sum (Mann-Whitney) test trt | obs rank sum expected | | Combined| unadjusted variance adjustment for ties adjusted variance Ho: ak12tot(trt==0) = ak12tot(trt==1) z = Prob > |z| =

CBER Distribution of AK Data at Baseline (Stem and Leaf) (Yarosh et al, Lancet) Lead | Trailing digits 0* | // 4* | 27 // 10* | 0  oops! Lead | Trailing digits 0* | // 4* | 27 // 10* | 0  oops!

CBER Distribution of 12 Month AK Total Data (Stem and Leaf). stem ak12tot,w(10) Lead| Trailing digits 0* | * | * | 3* | 7 // 7* | 1 8* | 9 // 19*| 3  same patient - in placebo group. stem ak12tot,w(10) Lead| Trailing digits 0* | * | * | 3* | 7 // 7* | 1 8* | 9 // 19*| 3  same patient - in placebo group

CBER Results of Poisson Analyses Poisson regression Number of obs = 29 LR chi2(3) = Prob > chi2 = Log likelihood = Pseudo R2 = ak12tot | Coef. Std. Err. z P>|z| [95% Conf. Interval] age | trt | akb | _cons |  G-O-F in control group,  2 = with 8 d.f.  G-O-F in treatment group,  2 =682.5 with 19 d.f.

CBER Permutation Test  Procedure: Scramble treatment codes and redo analysis. Repeat many (5,000?) times.  Count number of times the coefficient for treatment exceeds the observed value.

CBER Command and Output. permute trt "permpois trt ak12tot age akb" rtrt=rtrt rage=rage rakb=rakb,reps(5000) d command: permpois trt ak12tot age akb statistics: rtrt = rtrt rage = rage rakb = rakb permute var: trt Monte Carlo permutation statistics Number of obs = 30 Replications = T | T(obs) c n p=c/n SE(p) rtrt | rage | rakb | Note: c = #{|T| >= |T(obs)|} I deleted the confidence intervals for the proportions

CBER Permutation Tests (2)  Poisson with 5000 Replications  Treatment: p = 0.57  Age: p = 0.62  AK Baseline: p = 0.28  All significant results disappear

CBER Results of Poisson Analysis  Sponsor found that all terms were highly significant (including the treatment x age interaction).  We reproduced this analysis.  We also did a Poisson goodness-of-fit test that strongly rejected the assumption of a Poisson distribution.  What does a highly significant result mean when the model is wrong?

CBER Conclusions  The data are poorly fit by both Poisson and Negative Binomial distributions Permutation tests suggest no treatment effect unless treatment by age interaction is included  Justification of interaction term by stepwise procedure is exploratory  Outliers are a problem and can affect the conclusions.

CBER Conclusions (2)  The results of the study are based on exploratory data analysis.  The analysis is based on wrong assumptions of the data.  Our analyses based on distribution free tests do not agree with the sponsor’s results.  The results based on appropriate assumptions do not support approval of the product.

CBER Suggestions  Conduct a phase II study to determine appropriate covariates.  Need to use appropriate inclusion / exclusion criteria.  Stratification.  a priori specification of full analysis

CBER Reference Yarosh D. et al., "Effect of topically applied T4 endonuclease V in liposomes on skin cancer in xeroderma pigmentosum: a randomised study" Lancet 357: , 2001.

CBER The End

CBER Grid on “Back”

CBER The Data | sex trt akb ak12tot| | | | F | | M | | F | | F | | F | | | | M | | F | | M | | M | | M | | | | F | | F | | F | | F | | F | | | | sex trt akb ak12tot| | F | | F | | M | | F | | M | | | | F | | F | | F | | F | | F | | | | M | | F | | F | | F | | M... |

CBER Descriptive Statistics (1) Baseline AK N Mean SD Control Treatment Months Total AK Control Treatment

CBER Descriptive Statistics (2) Baseline AK Median Min Max Control Treatment Months Total AK Control Treatment

CBER Negative Binomial Model  Need a model that allows for individual variability.  Negative binomial distribution assumes that each patient has Poisson, but incidence rate varies according to a gamma distribution.  Treatment: p = 0.64  Age: p = 0.45  AK Baseline: p =  Age x Treat: p <0.001 Main effect of treatment is not interpretable. Need to look at effects separately by age.

CBER Negative Binomial Results  This model shows only that the baseline AK and age x treatment effects are significant factors.  It also gives a test for whether the data are Poisson; the test rejects the Poisson Distribution: p<  A test based on chisquare test (obs - exp) suggests that these data are not negative binomial.