Download presentation
Presentation is loading. Please wait.
Published byDulcie Simmons Modified over 9 years ago
1
Nearest neighbor matching USING THE GREEDY MATCH MACRO Note: Much of the code originally was written by Lori Parsons http://www2.sas.com/proceedings/sugi26/p214-26.pdf This code has been written with simplicity as a primary concern. If you do not have a large number of controls, you may want to modify it
2
/* Define the library for formats */ LIBNAME saslib "G:\oldpeople\sasdata\" ; OPTIONS NOFMTERR FMTSEARCH = (saslib) ;
3
/* Define the library for study data */ LIBNAME study "C:\Users\AnnMaria\Documents\shrug\" ;
4
Include the Macro %INCLUDE 'C:\Users\AnnMaria\Documents\shrug\nearest macro.sas' ;
5
%propen (libname, dsname, idvariable, dependent, propensity) LIBNAME = directory for data sets DSNAME = dataset with study data IDVARIABLE = subject ID variable DEPENDENT = dependent variable PROPENSITY = propensity score produced in logistic regression
6
%propen(study,allpropen,id,athome,p rob); FOR EXAMPLE Remember, we already have the study.allpropen dataset with the propensity score (prob) from the PROC LOGISTIC we just did
7
Explaining the macro A Challenge
8
%macro propen(lib,dsn,id,depend,prob); Data in5 ; set &lib..&dsn ; Creates a temporary data set
9
Propensity scores rounded to 5, then 4, 2, 3 and 1 decimals %Do countr = 1 %to 5 ; %let digits = %eval(6 - &countr) ; %let roundto = %eval(10**&digits) ; %let roundto = %sysevalf(1/&roundto) ; %let nextin = %eval(&digits - 1) ;
10
MACRO NOTES %Do countr = 1 %to 5 ; /* Starts %DO loop */ Use %EVAL function to do integer arithmetic %let digits = %eval(6 - &countr) ; Use %SYSEVALF function to do non-integers
11
/* Output control to one data set, intervention to another */ /* Create random number to sort within group */
12
Create 2 data sets DATA yes1 (KEEP= &prob id_y depend_y randnum) no1 (KEEP = &prob id_n depend_n randnum ) ; SET in&digits ; We go through this loop 5 times and create data sets of records matching to 5, 4, 3, 2 and 1 decimal places We only keep four variables
13
Assignment statements randnum = RANUNI(0) ; &prob = ROUND(&prob,&roundto) ; Create a random number and Round propensity score to a set number of digits
14
Output to Case Data set … IF &depend = 1 THEN DO ; id_y = &id ; depend_y = &depend ; OUTPUT yes1 ; END ; We need to rename the dependent & id variables or they’ll get overwritten
15
… Or output control data set ELSE IF &depend = 0 THEN DO ; id_n = &id ; depend_n = &depend ; OUTPUT no1 ; END ; Notice the data sets were named no1 and yes1 It becomes evident why shortly
16
/* Runs through control and experimental and matches up to 20 subjects with identical propensity score */
17
%Do i = 1 %to 20 ; %let j = %eval(&i +1) ; proc sort data = yes&i ; by &prob randnum ; data yes&i yes&j ; set yes&i ; by &prob ; if first.&prob then output yes&i ; else output yes&j ; NOTE: Matching without replacement
18
Same thing for controls proc sort data = no&i ; by &prob randnum ; data no&i no&j ; set no&i ; by &prob ; if first.&prob then output no&i ; else output no&j ; The randnum insures matching scores are pulled at random
19
Merge matches, end loop DATA match&i ; MERGE yes&i (in= ina) no&i (in= inb) ; BY &prob ; IF ina AND inb ; run ; %END ;
20
/* Adds all matches into a single data set */ DATA allmatches ; SET %DO k = 1 %TO 20 ; match&k %END ; Concatenate all data sets with matches (N=20)
21
Create two data sets with IDs DATA allyes (RENAME = (id_y = &id depend_y = &depend)) allno (RENAME = (id_n = &id depend_n = &depend)); SET allmatches ;
22
Create one file of all matched IDs DATA matchfile ; SET allyes allno ; And sort it … proc sort data = matchfile ; by &id &depend ;
23
proc sort data = in&digits ; by &id &depend ;
24
DATA MATCHES&DIGITS IN&NEXTIN ; MERGE IN&DIGITS (IN = INA) MATCHFILE (IN= INB) ; BY &ID &DEPEND ; IF INA AND INB THEN OUTPUT MATCHES&DIGITS ; ELSE OUTPUT IN&NEXTIN ; /* Creates a data set of all subjects with n-digit match */ /* Creates a second data set of subjects with no match */
25
TITLE "MATCHES &ROUNDTO " ; PROC FREQ DATA = MATCHES&DIGITS ; TABLES &DEPEND ; RUN ; %END ; JUST A GOOD HABIT TO CHECK AS THE LOOP RUNS THROUGH End loop. Now match to 4 decimal places, etc
26
/* Adds 1- to 5-digit matches into a single data set */ data &lib..finalset ; set %do m = 1 %to 5 ; matches&m %end ;
27
One final check & done ! Title "Distribution of Dependent Variable in &lib..finalset " ; proc freq data = &lib..finalset ; tables &depend ; run; %mend propen; run ;
28
Did it work? VariableQUINTILESNEAREST NEIGHBOR AT Home NOT Home ProbAT HomeNOT Home Prob Age79.279.3.6079.1.76 ER visits4.5 ****3.8 ****.00014.2.88 Female52%54%.3650%.74 Race.97.67 ** P <.01 **** P <.0001
29
Model Comparison TEST Without Matching Quintile Matching Nearest Neighbor Likelihood Ratio 643.1 180.8 186.6 Score 582.4 176.0 181.4 Wald 485.6 165.7 170.4
30
Odds ratio No MatchQuintilesNearest Neighbor.154.281.269 6.5 : 13.6: 13.7 : 1
31
How near? Decimals# Matches 5902 414 3143 2101 138
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.