SAS Macro: Some Tips for Debugging Stat St. Paul’s Hospital April 2, 2007
When is SAS Macro usually written? same analyses are repeated for a number of variables same logic search to be performed on a number of variables reports to be generated on regular basis passing value(s) from one data step to another …
How to find the error(s)? Log file Results obtained different from expected Via “MPRINT” (i.e. options mprint;) - Any other ways?
Some extra SAS system options: SYMBOLGEN the value of each Macro variable resolves to MLOGIC keep track of the parameter values, the logic that drives %DO loops and %IF logic checks MFILE similar to MPRINT, this option is used to write out the resolved macro code (proper SAS code) to a file
Example 1 data stattalk; input id:$2. age wt ht; cards; ; run; This macro computes the mean of continuous variable from PROC MEANS: * varlist: List of continuous variables for computation * nvar: total number of variables for computation %macro getave(varlist,nvar); %do i=1 %to &nvar; %let var=%scan(&varlist,&i,' '); proc means data=stattalk noprint; var &var; output out=&var.out mean=mean; run; %end; %mend getave;
Log output from SYMBOLGEN SAS Code: options symbolgen; %getave(age wt ht,3); SAS Log File: : : SYMBOLGEN: Macro variable NVAR resolves to 3 SYMBOLGEN: Macro variable VARLIST resolves to age wt ht SYMBOLGEN: Macro variable I resolves to 1 SYMBOLGEN: Macro variable VAR resolves to age NOTE: There were 3 observations read from the data set WORK.STATTALK. NOTE: The data set WORK.AGEOUT has 1 observations and 3 variables. NOTE: PROCEDURE MEANS used (Total process time): real time 0.07 seconds cpu time 0.07 seconds SYMBOLGEN: Macro variable VARLIST resolves to age wt ht SYMBOLGEN: Macro variable I resolves to 2 SYMBOLGEN: Macro variable VAR resolves to wt NOTE: There were 3 observations read from the data set WORK.STATTALK. NOTE: The data set WORK.WTOUT has 1 observations and 3 variables. NOTE: PROCEDURE MEANS used (Total process time): real time 0.07 seconds cpu time 0.06 seconds SYMBOLGEN: Macro variable VARLIST resolves to age wt ht SYMBOLGEN: Macro variable I resolves to 3 SYMBOLGEN: Macro variable VAR resolves to ht NOTE: There were 3 observations read from the data set WORK.STATTALK. NOTE: The data set WORK.HTOUT has 1 observations and 3 variables. NOTE: PROCEDURE MEANS used (Total process time): real time 0.06 seconds cpu time 0.05 seconds
Log output from MLOGIC SAS Code: options mlogic; %getave(age wt ht,3); SAS Log File: MLOGIC(GETAVE): Beginning execution. MLOGIC(GETAVE): Parameter VARLIST has value age wt ht MLOGIC(GETAVE): Parameter NVAR has value 3 MLOGIC(GETAVE): %DO loop beginning; index variable I; start value is 1; stop value is 3; by value is 1. MLOGIC(GETAVE): %LET (variable name is VAR) NOTE: There were 3 observations read from the data set WORK.STATTALK. NOTE: The data set WORK.AGEOUT has 1 observations and 3 variables. NOTE: PROCEDURE MEANS used (Total process time): real time 0.06 seconds cpu time 0.06 seconds MLOGIC(GETAVE): %DO loop index variable I is now 2; loop will iterate again. MLOGIC(GETAVE): %LET (variable name is VAR) NOTE: There were 3 observations read from the data set WORK.STATTALK. NOTE: The data set WORK.WTOUT has 1 observations and 3 variables. NOTE: PROCEDURE MEANS used (Total process time): real time 0.06 seconds cpu time 0.06 seconds MLOGIC(GETAVE): %DO loop index variable I is now 3; loop will iterate again. MLOGIC(GETAVE): %LET (variable name is VAR) NOTE: There were 3 observations read from the data set WORK.STATTALK. NOTE: The data set WORK.HTOUT has 1 observations and 3 variables. NOTE: PROCEDURE MEANS used (Total process time): real time 0.06 seconds cpu time 0.05 seconds MLOGIC(GETAVE): %DO loop index variable I is now 4; loop will not iterate again. MLOGIC(GETAVE): Ending execution.
Log output from MPRINT SAS Code: options mprint; %getave(age wt ht,3); SAS Log File: MPRINT(GETAVE): proc means data=stattalk noprint; MPRINT(GETAVE): var age; MPRINT(GETAVE): output out=ageout mean=mean; MPRINT(GETAVE): run; NOTE: There were 3 observations read from the data set WORK.STATTALK. NOTE: The data set WORK.AGEOUT has 1 observations and 3 variables. NOTE: PROCEDURE MEANS used (Total process time): real time 0.06 seconds cpu time 0.06 seconds MPRINT(GETAVE): proc means data=stattalk noprint; MPRINT(GETAVE): var wt; MPRINT(GETAVE): output out=wtout mean=mean; MPRINT(GETAVE): run; NOTE: There were 3 observations read from the data set WORK.STATTALK. NOTE: The data set WORK.WTOUT has 1 observations and 3 variables. NOTE: PROCEDURE MEANS used (Total process time): real time 0.06 seconds cpu time 0.05 seconds MPRINT(GETAVE): proc means data=stattalk noprint; MPRINT(GETAVE): var ht; MPRINT(GETAVE): output out=htout mean=mean; MPRINT(GETAVE): run; NOTE: There were 3 observations read from the data set WORK.STATTALK. NOTE: The data set WORK.HTOUT has 1 observations and 3 variables. NOTE: PROCEDURE MEANS used (Total process time): real time 0.06 seconds cpu time 0.06 seconds
What does MFILE look like? SAS Code: filename mprint “~/mfileoutput.sas”; options mprint mfile; %getave(age wt ht,3); SAS Log: … same log as in MPRINT… But you will find in a file “mfileoutput.sas” in your (main) directory! And it looks like this: proc means data=stattalk noprint; var age; output out=ageout mean=mean; run; proc means data=stattalk noprint; var wt; output out=wtout mean=mean; run; proc means data=stattalk noprint; var ht; output out=htout mean=mean; run;
Example 2 – MLOGIC output for %IF logic check SAS Code: %macro sillyeg(catvar); %if &catvar=1 %then %do; %put This is a categorical variable; %end; %else %if &catvar=0 %then %do; %put This is not a categorical variable; %end; %mend sillyeg; %sillyeg(0); SAS Log: 46 %sillyeg(0); MLOGIC(SILLYEG): Beginning execution. MLOGIC(SILLYEG): Parameter CATVAR has value 0 MLOGIC(SILLYEG): %IF condition &catvar=1 is FALSE MLOGIC(SILLYEG): %IF condition &catvar=0 is TRUE MLOGIC(SILLYEG): %PUT This is not a categorical variable This is not a categorical variable MLOGIC(SILLYEG): Ending execution. NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA NOTE: The SAS System used: real time 0.35 seconds cpu time 0.26 seconds
Example 3 – combination of different options can be helpful!
Common Mistakes IF or %IF; DO or %DO MISSING vs. NULL VALUE SCAN, %SCAN, %QSCAN SUBSTR, %SUBSTR, %QSUBSTR %STR, %NRSTR, %BQUOTE, %NRBQUOTE Doing math in MACRO environment Range comparison
IF or %IF; DO or %DO %IF (and %DO) can only be used within a MACRO declaration, to control what code is written or how the logic is evaluated within the MACRO. IF (and DO) statement can be used in a MACRO, but will be executed as part of DATA step code within the MACRO.
Example 4 SAS code: %macro whatif(condition=gt 50); data subset; set stattalk; %if age &condition %then output;; run; proc print data=subset; run; %mend whatif; %whatif; Dataset: data stattalk; input id:$2. age wt ht; cards; ; run; SAS output: Obs id age wt ht Why did we get such incorrect output?? Macro code is ALWAYS executed before the DATA step is even compiled AGE in the %IF is not seen as a DATA step variable, but rather as the letters a-g-e Since numbers are smaller than letters alphabetically, the letter ‘a’ comes after 50.
So, an example where both IF and %IF are used in a MACRO…. SAS code: %macro ifagain(condition=gt 30, print=1); data subset; set stattalk; if age &condition then output; run; proc means data=subset %if &print^=1 %then noprint;; var age; output out=subset_out mean=mean std=sd; run; %if &print>=1 %then %do; proc print data=subset_out; run; %end; %mend ifagain; %ifagain;
Missing vs. NULL In the DATA step, there is no such thing as a truly NULL value. Character or numeric variable has a “value” for missing, a single blank space or a period, respectively. E.g.) if sex=‘ ‘ then delete; if age=. then delete. In the MACRO language, there are no characters used to represent a missing value. So when a MACRO variable is NULL, it truly has no value. E.g.) %if &age=. %then %do; – WRONG!! %if &gender=“ “ %then %do; – WRONG!!
3 ways to specify NULL in the logic check: Method 1: %macro sillycheck(age=); %if &age= %then %do; %put It works; %end; %else %do; %put It did not work; %end; %mend sillycheck; Method 2: %macro sillycheck(age=); %if “&age”=“” %then %do; %put It works; %end; %else %do; %put It did not work; %end; %mend sillycheck; Method 3: %macro sillycheck(age=); %if &age=%str() %then %do; %put It works; %end; %else %do; %put It did not work; %end; %mend sillycheck;
A side remark: In MACRO language, everything is TEXT! SAS code: %macro sillyeg(age=50,sex=F); %if &age=50 %then %do; %put Patient is 50 years old; %end; %if &sex=F %then %do; %put Female patient; %end; %mend sillyeg; %sillyeg; SAS LOG: SYMBOLGEN: Macro variable AGE resolves to 50 Patient is 50 years old SYMBOLGEN: Macro variable SEX resolves to F Female patient NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA NOTE: The SAS System used: real time 0.52 seconds cpu time 0.41 seconds SAS code: %macro sillyeg(age=50,sex=F); %if &age=50 %then %do; %put Patient is 50 years old; %end; %if &sex=“F” %then %do; %put Female patient; %end; %mend sillyeg; %sillyeg; SAS LOG: SYMBOLGEN: Macro variable AGE resolves to 50 Patient is 50 years old SYMBOLGEN: Macro variable SEX resolves to F NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA NOTE: The SAS System used: real time 0.52 seconds cpu time 0.41 seconds
SCAN, %SCAN, %QSCAN In DATA step: data example; string=“XYZ,A*BC&HOS”; word1=scan(string,2); word2=scan(string,2,’,’); run; SAS output: Obsstring word1 word2 1 XYZ,A*BC&HOS A A*BC&HOS In MACRO: %let hos=SPH; %let string=%nrstr(XYZ,A*B&HOS); %let word1=%scan(&string,2); %let word2=%scan(&string,2,%str(,)); %let word3=%qscan(&string,2,%str(,)); %put word1=&word1; %put word2=&word2; %put word3=&word3; SAS Log: word1=A word2=A*BCSPH Word3=A*BC&HOS %scan DOES NOT mask & (and %) as regular text %qscan masks & (and %) as regular text
SUBSTR, %SUBSTR, %QSUBSTR SAS Code: %let stuff = clinics; %let string=%nrstr(*&stuff*&dsn*&morestuff ); %let word1=%substr(&string,2,7); %let word2=%qsubstr(&string,2,7); %put word1=&word1; %put word2=&word2; SAS Log: word1=clinics* word2=&stuff* Syntax for %SUBSTR and %QSUBSTR is exactly the same as in SUBSTR in data step The difference between %SUBSTR and %QSUBSTR: %SUBSTR does not mask & (and %) as part of the text %QSUBSTR treats & (and %) as part of the text
Macro Quoting Functions Macro language is a character-based language, and is composed of some of the special characters (e.g. % & ;) or mnemonic (e.g. GE AND LE OR) Macro quoting functions tells the macro processor to interpret special characters/mnemonic simply as text The special characters/mnemonic might require masking are: blank ; ^ ~, ‘ “ ) ( + -- * / = | AND OR NOT EQ NE LE LT GE GT IN % & # The most commonly macro quoting functions are: %STR, %NRSTR, %BQUOTE, %NRBQUOTE, %SUPERQ Two types of macro quoting functions: a) Compilation functions – processor masks the special characters as text in open code or while compiling a macro. E.g. %STR, %NRSTR b) Execution functions – processor will first resolve a macro expression and then masks the special characters in the result as text. E.g. %QUOTE, %NRQUOTE, %BQUOTE, %NRBQUOTE
Example 5 %macro fileit(infile); %if %bquote(&infile) NE %then %do; %let char1 = %bquote(%substr(&infile,1,1)); %if %bquote(&char1) = %str(%') or %bquote(&char1) = %str(%") %then %let command=FILE &infile; %else %let command=FILE "&infile"; %end; %put &command; %mend fileit; %fileit(‘stattalk.sas’) %bquote is used to quote the realization of a macro variable or expression %str is used to quote constant value (i.e. right side of logic check) Unmatched single or double quotation, or unmatched parenthesis should always be accompanied by % in %str, but no need to add % in %bquote (B=by itself)
Example 5 data test; store="Susan's Office Supplies"; call symput('s',store); run; %macro readit; %if %bquote(&s) ne %then %put *** valid ***; %else %put *** null value ***; %mend readit; %readit; - If you change %BQUOTE to %STR, you will get error message! Try it…
Example 6 SAS Code: Options ps=36 ls=69 nocenter; data _null_; call symput(‘authors’,’Smith&Jones’); call symput(‘macroname’,’%macro test;’); run; %let aa=SPH; %let jones=%nrstr(&aa); title1 “Authors1: %SUPERQ(authors)”; title2 “Authors2: %NRSTR(&authors)”; title3 “Authors3: %NRBQUOTE(&authors)”; title4 “Authors4: %UNQUOTE(%NRBQUOTE(&authors)); footnote1 “Name of Macro: %SUPERQ(macroname)”; SAS Output: Authors1: Smith&Jones Authors2: &authors Authors3: Smith&aa Authors4: SmithSPH Name of Macro: %macro test; %NRSTR – mask & as part of the text during compilation %NRBQUOTE – resolve the macro variable during execution; if the result contains &, it will be treated as part of the text NR = Not Resolved
Doing Math in the Macro Language %EVAL and %SYSEVALF allow the language to handle arithmetic operations %EVAL: only for integer arithmetic %SYSEVALF: for non-integer arithmetic (e.g. 1.0,.3, 2.) Error message if %SYSEVALF should be used instead of %EVAL:
Example 7 %let x=5; %let y=&x+1; %let z=%EVAL(&x+1); %let w=%SYSEVALF(&x+1.8); %put &x &y &z &w; The %PUT writes the following to the LOG:
Range Comparisons SAS Code: data _null_; do val=-10,-2,2,10; if -5 le val le 0 then do; put val " is in the negative range (-5 to 0)"; end; else if 1 le val le 5 then do; put val " is in the positive range (1 to 5)"; end; else put val " is WAY out of range“; run; SAS Log: -10 is WAY out of range -2 is in the negative range (-5 to 0) 2 is in the positive range (1 to 5) 10 is WAY out of range NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds SAS Code: %macro checkit(val=); %if -5 le &val le 0 %then %put &val is in the negative range (-5 to 0); %else %if 1 le &val le 5 %then %put &val is in the positive range (1 to 5); %else %put &val is WAY out of range; %mend checkit; %checkit(val=-10); %checkit(val=-2); %checkit(val=2); %checkit(val=10); SAS Log: 182 %checkit(val=-10); -10 is in the negative range (-5 to 0) 183 %checkit(val=-2); -2 is in the positive range (1 to 5) 184 %checkit(val=2); 2 is in the positive range (1 to 5) 185 %checkit(val=10); 10 is in the positive range (1 to 5) ????
In DATA step: if -5 le val le 0 then do; is interpreted as if -5 le val and val le 0 then do; In Macro Language: %if -5 le &val le 0 %then %put &val is in negative range (-5 to 0); is interpreted as %if (-5 le &val) le 0 %then %put &val is in the negative range (-5 to 0); So, if &val=-10, the %if becomes % if (-5 le -10) le 0 %then … The comparison will first check if -5 is less than or equal to -10. If it is FALSE, a zero is returned, and the expression becomes % if 0 le 0 %then …; And this comparison is true, and hence it printed “-10 is in the negative range (-5 to 0) in the LOG file. In summary, for range comparison in Macro Language, always use a compound expression (e.g. -5 le &val AND &val le 0)