Stratified Covariate Balancing Using R

Slides:

Advertisements

Similar presentations

Multiple Analysis of Variance – MANOVA

Advertisements

If we use a logistic model, we do not have the problem of suggesting risks greater than 1 or less than 0 for some values of X: E[1{outcome = 1} ] = exp(a+bX)/

Three or more categorical variables

Statistical Analysis and Data Interpretation What is significant for the athlete, the statistician and team doctor? important Will Hopkins

April 25 Exam April 27 (bring calculator with exp) Cox-Regression

Chapter 11 Survival Analysis Part 3. 2 Considering Interactions Adapted from "Anderson" leukemia data as presented in Survival Analysis: A Self-Learning.

Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.

Regression Approach To ANOVA

Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of.

McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Business Statistics: Communicating with Numbers By Sanjiv Jaggia.

Multiple Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.

1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.

Categorical Data Prof. Andy Field.

Inference for regression - Simple linear regression

Multiple Choice Questions for discussion

Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.

Introduction to Control Charts: XmR Chart

1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.

1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.

1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.

1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.

From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.

1 THE ROLE OF COVARIATES IN CLINICAL TRIALS ANALYSES Ralph B. D’Agostino, Sr., PhD Boston University FDA ODAC March 13, 2006.

Categorical Independent Variables STA302 Fall 2013.

1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.

Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.

IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.

Using Propensity Score Matching in Observational Services Research Neal Wallace, Ph.D. Portland State University February

Instructor Resource Chapter 15 Copyright © Scott B. Patten, Permission granted for classroom use with Epidemiology for Canadian Students: Principles,

Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.

Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.

Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.

REGRESSION MODEL FITTING & IDENTIFICATION OF PROGNOSTIC FACTORS BISMA FAROOQI.

NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.

Chi Square Test Dr. Asif Rehman.

Comparing Counts Chi Square Tests Independence.

Chi-Square (Association between categorical variables)

BINARY LOGISTIC REGRESSION

Logistic Regression APKC – STATS AFAC (2016).

October 20, 2016 Farrokh Alemi, PhD.

Lecture Slides Elementary Statistics Twelfth Edition

Chapter 12 Tests with Qualitative Data

Data Collection Principles

Chapter 25 Comparing Counts.

Generalized Linear Models

POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.

Stratified Covariate Balancing Using R

Log Linear Modeling of Independence

Variables and Measurement (2.1)

SQL for Predicting from Likelihood Ratios

SQL for Calculating Likelihood Ratios

SQL for Cleaning Data Farrokh Alemi, Ph.D.

Receiver Operating Curves

Saturday, August 06, 2016 Farrokh Alemi, PhD.

Common Problems in Writing Statistical Plan of Clinical Trial Protocol

Propagation Algorithm in Bayesian Networks

The Nature of Probability and Statistics

Categorical Data Analysis Review for Final

Introduction to Comparative Effectiveness Course (HAP 823)

Wednesday, September 21, 2016 Farrokh Alemi, PhD.

Selecting the Right Predictors

Propagation Algorithm in Bayesian Networks

Chapter 26 Comparing Counts.

Improving Overlap Farrokh Alemi, Ph.D.

Expectation Farrokh Alemi Ph.D.

Psychological Research Methods and Statistics

Chapter 26 Comparing Counts.

Risk Adjusted P-chart Farrokh Alemi, Ph.D.

Wednesday, October 05, 2016 Farrokh Alemi, PhD.

Presentation transcript:

Stratified Covariate Balancing Using R Farrokh Alemi, Ph.D. This presentation focusses on how to use stratified covariate balancing package within R.

Download Stratified Balancing The first step is to install the package. Note Capital Letters

Load Package into Library You need to load the plyr package as well.

Remove Impossible Values Prepare Your Data Remove Impossible Values Before doing the analysis, make sure that your data fits the assumptions. Remove impossible values such as zero blood pressure, visits that are after patient is reported to have died, pregnant males and other anomalies in the data. Visit after Death Negative ID Zero Blood Pressure Pregnant Males

Predict from Other Variables Prepare Your Data Impute Missing Values Impute missing values that are missing at random. Typically, the mode or average can be used for missing values. It is also helpful to impute missing values from the levels of other variables. Be careful. If data is not missing at random, you need to create a new dummy variable that would be 1 when the variable is missing and 0 otherwise. Predict from Other Variables Use Mode Use Average No Report No Diagnosis

Prepare Your Data Binary Indicators Initial Analysis Binary Indicators The package transforms your continuous data into binary variables. It uses values above and below average to define the binary variable. This allows a coarse matching of cases and controls. If you prefer a different matching, you can revise your data prior to reading it into R. More cases match to controls Use R discretization software Above or below average Worst category vs. all others No report then no diagnosis

Prepare Your Data Binary Indicators Initial Analysis Binary Indicators Keep in mind that binary data leads to coarse matching and more refined matching can occur through additional categories in the data. We find it that in most data, breaking continuous variables in quintiles helps get a more refined matching of cases and controls. More cases match to controls Use R discretization software Above or below average Worst category vs. all others No report then no diagnosis

Read Data Read into directory Read your data. Here we are reading a csv file called “simulated bundled data.” Data came from http://openonlinecourses.com/causalanalysis/simulated%20bundled%20data.csv

Look at Data Using fix(data) Examine the data. Use the fix function in R to examine the data. The network shown here is the network that was used to simulate the data. If you know the structure of your data, examine to make sure that the data reflects it.

Select Right Subset of Data Data should include, treatment, outcome, and covariates Please note that the file you read into R must have a treatment variable, an outcome variable, and one or more covariates. There should not be any other variable in the data. For example the data should not include a row number or ID.

Don’t Stratify Variables on Causal Path The data file should not contain any variables that are on the causal path from treatment to outcome. So treatment complications should not be included in the data file, otherwise the package will stratify them and distort the relationship between treatment and outcome. Examine sequence of events Avoid complications of treatment Don’t stratify mediators Conduct Collider Tests

Balance Data > balanced=stratadisc(4,5,subset) This command shows how the package is called and its minimum output.

Common Odds Ratio does not include one so it is significant Check Signficance Common Odds Ratio does not include one so it is significant In this output the confidence interval for the common odds ratio across the strata does not include 1 so it is statistically significant at alpha levels of 5%.

Check Clinical Significance In massive data, effect size should be large Keep in mind that in large almost everything is significant and you need to also make sure that the effect size is large enough to be clinically meaningful.

At least 60% of cases should match to controls Check Overlap At least 60% of cases should match to controls Also check the overlap between cases and controls. If the overlap is too low then the results cannot be generalized.

Check the Strata # Check the strata you have created fix(balanced) Look at the various strata that the package has created. Cases are always getting a weight of 1 and controls are weighted so that the number of weighted controls are the same as cases. Check that the weights for controls in some strata is not radically different from weights in other controls.

Are Covariates Balanced? Check that the weighting procedure balances the data. Look at the odds before and after balancing. The odds of observing a covariate among treated and untreated group should be 1 to 1. Check that the weighting procedure has accomplished this and all covariates are balanced. Stratified covariate balancing is guaranteed to balanced all main effects and interactions among the covariates so you will always see that the covariates are balanced. But check and use these charts as many do not believe that the data are balanced until they see it by their own eyes.

Conduct Sensitivity Analysis # Sensitivity analysis of treatment in column 4 # Outcome in column 5 # In data set called subset revised=sensdisc(4,5,subset) revised Use the function sensdisc to conduct sensitivity of conclusions

Most of the work in using stratified covariate balancing is in preparing the data before starting the analysis Most of the work in using stratified covariate balancing is in preparing the data before starting the analysis