Nearest neighbor matching USING THE GREEDY MATCH MACRO Note: Much of the code originally was written by Lori Parsons This code has been written with simplicity as a primary concern. If you do not have a large number of controls, you may want to modify it
/* Define the library for formats */ LIBNAME saslib "G:\oldpeople\sasdata\" ; OPTIONS NOFMTERR FMTSEARCH = (saslib) ;
/* Define the library for study data */ LIBNAME study "C:\Users\AnnMaria\Documents\shrug\" ;
Include the Macro %INCLUDE 'C:\Users\AnnMaria\Documents\shrug\nearest macro.sas' ;
%propen (libname, dsname, idvariable, dependent, propensity) LIBNAME = directory for data sets DSNAME = dataset with study data IDVARIABLE = subject ID variable DEPENDENT = dependent variable PROPENSITY = propensity score produced in logistic regression
%propen(study,allpropen,id,athome,p rob); FOR EXAMPLE Remember, we already have the study.allpropen dataset with the propensity score (prob) from the PROC LOGISTIC we just did
Explaining the macro A Challenge
%macro propen(lib,dsn,id,depend,prob); Data in5 ; set &lib..&dsn ; Creates a temporary data set
Propensity scores rounded to 5, then 4, 2, 3 and 1 decimals %Do countr = 1 %to 5 ; %let digits = %eval(6 - &countr) ; %let roundto = %eval(10**&digits) ; %let roundto = %sysevalf(1/&roundto) ; %let nextin = %eval(&digits - 1) ;
MACRO NOTES %Do countr = 1 %to 5 ; /* Starts %DO loop */ Use %EVAL function to do integer arithmetic %let digits = %eval(6 - &countr) ; Use %SYSEVALF function to do non-integers
/* Output control to one data set, intervention to another */ /* Create random number to sort within group */
Create 2 data sets DATA yes1 (KEEP= &prob id_y depend_y randnum) no1 (KEEP = &prob id_n depend_n randnum ) ; SET in&digits ; We go through this loop 5 times and create data sets of records matching to 5, 4, 3, 2 and 1 decimal places We only keep four variables
Assignment statements randnum = RANUNI(0) ; &prob = ROUND(&prob,&roundto) ; Create a random number and Round propensity score to a set number of digits
Output to Case Data set … IF &depend = 1 THEN DO ; id_y = &id ; depend_y = &depend ; OUTPUT yes1 ; END ; We need to rename the dependent & id variables or they’ll get overwritten
… Or output control data set ELSE IF &depend = 0 THEN DO ; id_n = &id ; depend_n = &depend ; OUTPUT no1 ; END ; Notice the data sets were named no1 and yes1 It becomes evident why shortly
/* Runs through control and experimental and matches up to 20 subjects with identical propensity score */
%Do i = 1 %to 20 ; %let j = %eval(&i +1) ; proc sort data = yes&i ; by &prob randnum ; data yes&i yes&j ; set yes&i ; by &prob ; if first.&prob then output yes&i ; else output yes&j ; NOTE: Matching without replacement
Same thing for controls proc sort data = no&i ; by &prob randnum ; data no&i no&j ; set no&i ; by &prob ; if first.&prob then output no&i ; else output no&j ; The randnum insures matching scores are pulled at random
Merge matches, end loop DATA match&i ; MERGE yes&i (in= ina) no&i (in= inb) ; BY &prob ; IF ina AND inb ; run ; %END ;
/* Adds all matches into a single data set */ DATA allmatches ; SET %DO k = 1 %TO 20 ; match&k %END ; Concatenate all data sets with matches (N=20)
Create two data sets with IDs DATA allyes (RENAME = (id_y = &id depend_y = &depend)) allno (RENAME = (id_n = &id depend_n = &depend)); SET allmatches ;
Create one file of all matched IDs DATA matchfile ; SET allyes allno ; And sort it … proc sort data = matchfile ; by &id &depend ;
proc sort data = in&digits ; by &id &depend ;
DATA MATCHES&DIGITS IN&NEXTIN ; MERGE IN&DIGITS (IN = INA) MATCHFILE (IN= INB) ; BY &ID &DEPEND ; IF INA AND INB THEN OUTPUT MATCHES&DIGITS ; ELSE OUTPUT IN&NEXTIN ; /* Creates a data set of all subjects with n-digit match */ /* Creates a second data set of subjects with no match */
TITLE "MATCHES &ROUNDTO " ; PROC FREQ DATA = MATCHES&DIGITS ; TABLES &DEPEND ; RUN ; %END ; JUST A GOOD HABIT TO CHECK AS THE LOOP RUNS THROUGH End loop. Now match to 4 decimal places, etc
/* Adds 1- to 5-digit matches into a single data set */ data &lib..finalset ; set %do m = 1 %to 5 ; matches&m %end ;
One final check & done ! Title "Distribution of Dependent Variable in &lib..finalset " ; proc freq data = &lib..finalset ; tables &depend ; run; %mend propen; run ;
Did it work? VariableQUINTILESNEAREST NEIGHBOR AT Home NOT Home ProbAT HomeNOT Home Prob Age ER visits4.5 ****3.8 **** Female52%54%.3650%.74 Race ** P <.01 **** P <.0001
Model Comparison TEST Without Matching Quintile Matching Nearest Neighbor Likelihood Ratio Score Wald
Odds ratio No MatchQuintilesNearest Neighbor : 13.6: 13.7 : 1
How near? Decimals# Matches