Nearest neighbor matching USING THE GREEDY MATCH MACRO Note: Much of the code originally was written by Lori Parsons

Slides:

Advertisements

Similar presentations

Axio Research E-Compare A Tool for Data Review Bill Coar.

Advertisements

Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.

SAS Programming:File Merging and Manipulation. Reading External Files (review) data barf; * create the dataset BARF; infile ’s:\mysas\Table7.1'; * open.

Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.

Propensity Score Matching Lava Timsina Kristina Rabarison CPH Doctoral Seminar Fall 2012.

1 SAS Formats and SAS Macro Language HRP223 – 2011 November 9 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning:

1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.

Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.

SAS Workshop INTRODUCTORY ASPECTS SPRING 2012 January 20121K. F. O'Brien.

Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.

Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.

Merging census aggregate statistics with postal code-based microdata Laine Ruus University of Toronto. Data Library Service ,

Chapter 4 MATLAB Programming Combining Loops and Logic Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

SAS SQL SAS Seminar Series

SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.

X 11 X 12 X 13 X 21 X 22 X 23 X 31 X 32 X 33. Research Question Are nursing homes dangerous for seniors? Does admittance to a nursing home increase risk.

Different Decimal Places For Different Laboratory Tests PharmaSug 2004, TT01 A. Cecilia Mauldin.

1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.

Introduction to SAS Essentials Mastering SAS for Data Analytics

HPR Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.

Key Data Management Tasks in Stata

SAS Macro: Some Tips for Debugging Stat St. Paul’s Hospital April 2, 2007.

Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.

SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.

April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.

Sampling With Replacement How the program works a54p10.sas.

Propensity Scores How to do it – Part 1. X 11 X 12 X 13 X 21 X 22 X 23 X 31 X 32 X 33 No matrices were harmed in this presentation.

Grant Brown.  AIDS patients – compliance with treatment  Binary response – complied or no  Attempt to find factors associated with better compliance.

Manage Variable Lists with Macro Variables 1 for Improved Readability and Modifiability.

1 Efficient SAS Coding with Proc SQL When Proc SQL is Easier than Traditional SAS Approaches Mike Atkinson, May 4, 2005.

Using Weighted Data Donald Miller Population Research Institute 812 Oswald Tower, December 2008.

Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging.

Introduction to SAS Macros Center for Statistical Consulting Short Course April 15, 2004.

Code Generation. 2 Overview of presentation Goal Background Dynamic SQL Method Examples.

SAS Basics. Windows Program Editor Write/edit all your statements here. Log Watch this for any errors in program as it runs. Output Will automatically.

Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.

Chapter 17: Formatting Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.

Lesson 8 - Topics Creating SAS datasets from procedures Using ODS and data steps to make reports Using PROC RANK Programs in course notes LSB 4:11;5:3.

Using Propensity Score Matching in Observational Services Research Neal Wallace, Ph.D. Portland State University February

SAS Basics. Windows Program Editor Write/edit all your statement here.

Time Series Data Processes by Tai Yu April 15, 2013.

Rob Gately OptumInsight, Epidemiology Waltham, MA Adjusting Analyses of Survey Results using a Predicted Probability of Response Presented at DSUG Colorado.

An Introduction Katherine Nicholas & Liqiong Fan.

14b. Accessing Data Files in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.

Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.

An Introduction to Proc Transpose David P. Rosenfeld HR Consultant, Workforce Planning & Data Management City of Toronto.

17b.Accessing Data: Manipulating Variables in SAS ®

Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;

Real Time Remote Access Comparing SAS and SPSS David Price Quy Do April 2013.

The Urban Institute - SAS Training6/9/20161 SAS Training This SAS Training Course was designed to introduce users at The Urban Institute to SAS programming.

SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapter 25 By Tasha Chapman, Oregon Health Authority.

SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.

ACCESS FROM C++ CODE TO DATA FROM VALIDATION DB Dmitri Konstantinov, CERN.

ECE Application Programming

Chapter 6: Modifying and Combining Data Sets

Chapter 18: Modifying SAS Data Sets and Tracking Changes

Creating the Example Data

We normally abbreviate this to ‘nearest 10’

Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”,

Introduction to DATA Step Programming SAS Basics II

SPSS Propensity Score Matching: An overview

Combining Data Sets in the DATA step.

Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.

Appending and Concatenating Files

File Sharing and Processing Grouped Data

Data Manipulation (with SQL)

Tips and Tricks for Using Macros to Automate SAS Reporting.

Introduction to SAS Essentials Mastering SAS for Data Analytics

Presentation transcript:

Nearest neighbor matching USING THE GREEDY MATCH MACRO Note: Much of the code originally was written by Lori Parsons This code has been written with simplicity as a primary concern. If you do not have a large number of controls, you may want to modify it

/* Define the library for formats */ LIBNAME saslib "G:\oldpeople\sasdata\" ; OPTIONS NOFMTERR FMTSEARCH = (saslib) ;

/* Define the library for study data */ LIBNAME study "C:\Users\AnnMaria\Documents\shrug\" ;

Include the Macro %INCLUDE 'C:\Users\AnnMaria\Documents\shrug\nearest macro.sas' ;

%propen (libname, dsname, idvariable, dependent, propensity) LIBNAME = directory for data sets DSNAME = dataset with study data IDVARIABLE = subject ID variable DEPENDENT = dependent variable PROPENSITY = propensity score produced in logistic regression

%propen(study,allpropen,id,athome,p rob); FOR EXAMPLE Remember, we already have the study.allpropen dataset with the propensity score (prob) from the PROC LOGISTIC we just did

Explaining the macro A Challenge

%macro propen(lib,dsn,id,depend,prob); Data in5 ; set &lib..&dsn ; Creates a temporary data set

Propensity scores rounded to 5, then 4, 2, 3 and 1 decimals %Do countr = 1 %to 5 ; %let digits = %eval(6 - &countr) ; %let roundto = %eval(10**&digits) ; %let roundto = %sysevalf(1/&roundto) ; %let nextin = %eval(&digits - 1) ;

MACRO NOTES %Do countr = 1 %to 5 ; /* Starts %DO loop */ Use %EVAL function to do integer arithmetic %let digits = %eval(6 - &countr) ; Use %SYSEVALF function to do non-integers

/* Output control to one data set, intervention to another */ /* Create random number to sort within group */

Create 2 data sets DATA yes1 (KEEP= &prob id_y depend_y randnum) no1 (KEEP = &prob id_n depend_n randnum ) ; SET in&digits ; We go through this loop 5 times and create data sets of records matching to 5, 4, 3, 2 and 1 decimal places We only keep four variables

Assignment statements randnum = RANUNI(0) ; &prob = ROUND(&prob,&roundto) ; Create a random number and Round propensity score to a set number of digits

Output to Case Data set … IF &depend = 1 THEN DO ; id_y = &id ; depend_y = &depend ; OUTPUT yes1 ; END ; We need to rename the dependent & id variables or they’ll get overwritten

… Or output control data set ELSE IF &depend = 0 THEN DO ; id_n = &id ; depend_n = &depend ; OUTPUT no1 ; END ; Notice the data sets were named no1 and yes1 It becomes evident why shortly

/* Runs through control and experimental and matches up to 20 subjects with identical propensity score */

%Do i = 1 %to 20 ; %let j = %eval(&i +1) ; proc sort data = yes&i ; by &prob randnum ; data yes&i yes&j ; set yes&i ; by &prob ; if first.&prob then output yes&i ; else output yes&j ; NOTE: Matching without replacement

Same thing for controls proc sort data = no&i ; by &prob randnum ; data no&i no&j ; set no&i ; by &prob ; if first.&prob then output no&i ; else output no&j ; The randnum insures matching scores are pulled at random

Merge matches, end loop DATA match&i ; MERGE yes&i (in= ina) no&i (in= inb) ; BY &prob ; IF ina AND inb ; run ; %END ;

/* Adds all matches into a single data set */ DATA allmatches ; SET %DO k = 1 %TO 20 ; match&k %END ; Concatenate all data sets with matches (N=20)

Create two data sets with IDs DATA allyes (RENAME = (id_y = &id depend_y = &depend)) allno (RENAME = (id_n = &id depend_n = &depend)); SET allmatches ;

Create one file of all matched IDs DATA matchfile ; SET allyes allno ; And sort it … proc sort data = matchfile ; by &id &depend ;

proc sort data = in&digits ; by &id &depend ;

DATA MATCHES&DIGITS IN&NEXTIN ; MERGE IN&DIGITS (IN = INA) MATCHFILE (IN= INB) ; BY &ID &DEPEND ; IF INA AND INB THEN OUTPUT MATCHES&DIGITS ; ELSE OUTPUT IN&NEXTIN ; /* Creates a data set of all subjects with n-digit match */ /* Creates a second data set of subjects with no match */

TITLE "MATCHES &ROUNDTO " ; PROC FREQ DATA = MATCHES&DIGITS ; TABLES &DEPEND ; RUN ; %END ; JUST A GOOD HABIT TO CHECK AS THE LOOP RUNS THROUGH End loop. Now match to 4 decimal places, etc

/* Adds 1- to 5-digit matches into a single data set */ data &lib..finalset ; set %do m = 1 %to 5 ; matches&m %end ;

One final check & done ! Title "Distribution of Dependent Variable in &lib..finalset " ; proc freq data = &lib..finalset ; tables &depend ; run; %mend propen; run ;

Did it work? VariableQUINTILESNEAREST NEIGHBOR AT Home NOT Home ProbAT HomeNOT Home Prob Age ER visits4.5 ****3.8 **** Female52%54%.3650%.74 Race ** P <.01 **** P <.0001

Model Comparison TEST Without Matching Quintile Matching Nearest Neighbor Likelihood Ratio Score Wald

Odds ratio No MatchQuintilesNearest Neighbor : 13.6: 13.7 : 1

How near? Decimals# Matches