YET ANOTHER TIPS, TRICKS, TRAPS, TECHNIQUES PRESENTATION: A Random Selection of What I Learned From 15+ Years of SAS Programming John Pirnat Kaiser Permanente Oct 15, 2009
2 ARE THESE EQUIVALENT? data allschools1; set school1 school2; keep id lname fname; run; #1 #2 data allschools2 (keep = id lname fname); set school1 school2; run; #3 data allschools3; set school1 (keep = id lname fname) school2 (keep = id lname fname); run;
3 EXAMPLE OF WHY HOW YOU WRITE THE CODE MIGHT MATTER: data school1; input id 8. region $1. lname $25. fname $10. pass_math $1. pass_eng $1. pass_span $1.; datalines; Mouse Minnie YYN Duck Donald NNN Hall Annie YYY Sawyer Tom NYN ; run; data school2; input id 8. region 8. lname $25. fname $10.pass_math $1. pass_eng $1. pass_span $1.; datalines; Lane Lois NYN Mouse Mickey YYN Kent Clark NYY Bunker Edith YYY ; run;
4 CODE #1 99 data allschools1; 100 set school1 school2; ERROR: Variable region has been defined as both character and numeric. 101 keep id lname fname; 102 run; NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.ALLschoolS1 may be incomplete. When this step was stopped there were 0 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds
5 CODE #2 105 data allschools2 106 (keep = id lname fname); 107 set school1 school2; ERROR: Variable region has been defined as both character and numeric. 108 run; NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.ALLschoolS2 may be incomplete. When this step was stopped there were 0 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
6 CODE #3 110 data allschools3; 111 set school1 (keep = id lname fname) 112 school2 (keep = id lname fname); 113 run; NOTE: There were 4 observations read from the data set WORK.school1. NOTE: There were 4 observations read from the data set WORK.school2. NOTE: The data set WORK.ALLschoolS3 has 8 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds
7 IN CODE #3 EXAMPLE, ONLY THE COLUMNS IN THE KEEP COMMAND ARE BEING LOADED INTO PROGRAM DATA VECTOR (PDV) IN CODE #1 AND #2 EXAMPLES, ALL COLUMNS OF THE TWO SCHOOL DATASETS ARE BEING READ IN THE PDV, BUT ONLY COLUMNS INCLUDED IN KEEP STATEMENT ARE READ TO THE OUTPUT DATASET
8 DON’T “FREQ” OUT proc freq data = school1; tables pass_math pass_eng pass_span /out = grade_results; run; proc print data = grade_results; run; The SAS System pass_ Obs span COUNT PERCENT 1 N Y 1 25
9 USE ODS INSTEAD proc freq data = school1; tables pass_math pass_eng pass_span; ods output OneWayFreqs = grade_results; run; proc print data = grade_results; var pass_eng pass_math pass_span frequency percent ; run; SAS DATASET NAME ODS TABLE NAME
10 YOU CAN BE TOO CLEVER IN YOUR CODE data school1; input schoolid 8. lname $25. fname $10. pass_math $1. pass_eng $1. pass_span $1.; datalines; Mouse Minnie YYN Duck Donald NNN Hall Annie YYY Sawyer Tom NYN ; run; data school2; input id 8. lname $25. fname $10. pass_math $1. pass_eng $1. pass_span $1.; datalines; Lane Lois NYN Mouse Mickey YYN Kent Clark NYY Bunker Edith YYY ; run;
11 lname fname id schoolid Mouse Minnie Duck Donald Hall Annie Sawyer Tom Lane Lois Mouse Mickey Kent Clark Bunker Edith HUH? data allschools; set school1 school2; if id =. then id = schoolid; run; proc print data = allschools noobs; var lname fname id schoolid; run;
12 “When variables are read with a SET, MERGE, or UPDATE statement, the SAS System sets the value to missing only before the first iteration of the DATA step…Thereafter, the variables retain their values until new values become available …” -SAS® Language Reference Version 6 First Edition
13 ONE WORKAROUND: data allschools; set school1 (rename = (schoolid = id)) school2; run; proc print data = allschools; var lname fname id; run; Obs lname fname id 1 Mouse Minnie Duck Donald Hall Annie Sawyer Tom Lane Lois Mouse Mickey Kent Clark Bunker Edith 42643
14 1. TS486 – Quick Reference Guide to SAS Functions Informats Formats Updated at: and_Formats data school1; input id 8. region $1. lname $25. fname $10. pass_math $1. pass_eng $1. pass_span $1.; datalines; Mouse Minnie YYN Duck Donald NNN Hall Annie YYY Sawyer Tom NYN ; run; 2. =: QUICK HITTERS:
15 data test1; set school1; if lname =: 'M'; run; proc print data = test1 noobs; var lname; run; lname Mouse data test2; set school1; if lname>=: 'M'; run; proc print data = test2 noobs; var lname; run; lname Mouse Sawyer
16 data test3; set school1; if lname<=: 'M'; run; proc print data = test3 noobs; var lname; run; Lname Mouse Duck Hall
17 OFFBEAT MACRO USES %macro skip; Lotsa comments /* */ in code %mend; Use when debugging program and don’t want to run heavily commented code Let %abc = Big chunk of code that you will repeat often ;
18 AND THEN THERE IS PROC IMPORT How many of you receive spreadsheets like this? SOUTHLAND SCHOOL IDLNAMEFNAME PASS_ MATH PASS_E NG PASS_ SPAN 38793NEWMAN NNN 5763GEKKOGORDONYYY SPAULDINGGEOFFREYYYN O'HARASCARLETTNNY 43256MARPLEJANEYNY CHARLESNORANYY MASONPERRYYYY
19 proc import datafile = 'P:\My Documents\CO DAY 2009\SOUTHLAND SCHOOL.xls' out = school3 dbms = excel replace; run; proc print data = school3; run;
20 CORRECTION ATTEMPT 1 proc import datafile = 'P:\My Documents\CO DAY 2009\SOUTHLAND SCHOOL.xls' out = school3 dbms = excel replace; mixed = yes; run; proc print data = school3; run; INSERT MIXED COMMAND
21 TO GET MIXED RESULTS BEYOND 8 OBS NEED TO ADJUST YOUR WINDOWS REGISTRY – I WOULDN’T TRY IT EVEN IF I COULD CORRECTION ATTEMPT 2 proc import datafile = 'P:\My Documents\CO DAY 2009\SOUTHLAND SCHOOL NEW.xls' out = school3 dbms = excel replace; mixed = yes; run; proc print data = school3; run;
22 THIS COULD HAPPEN TO YOU IF YOU ARE NOT CAREFUL – ANOTHER PROC IMPORT EXAMPLE proc import datafile = 'P:\My Documents\CO DAY 2009\ACCEPTED_PROCEDURES.xls' out = procedures dbms = excel replace; run; data patients; input id 8. lname $25. fname $10. procedure 8.; datalines; Mouse Minnie Duck Donald Hall Annie Sawyer Tom ; run; proc sql; select id, lname, procedure from patients where procedure in (select procedure from procedures); quit;
23 id lname procedure Mouse Duck Hall. WHY DID THIS HAPPEN? SPREADSHEET “ACCEPTED PROCEDURES” CONTAINED BLANK CELLS IN “PROCEDURE” COLUMN
24 WORKAROUNDS: 1. Use DDE cf. and search on “DDE” 2. Save EXCEL spreadsheet in CSV or tab-delimitted format and input into SAS through your code 3. Put criteria in code to ensure you input what you really desire
25 QUICK REPORT TIPS FOR THE HARD-TO-SATISFY CLIENT FOR ASSEMBLING SUMMARY DATA FROM VARIOUS DATASETS INTO A SPECIFIED LAYOUT SCHOOL EXAMPLE: DISTRICT SCHOOL SUPERINTENDENT WANTS COUNTS AND PERCENTAGES FOR PASSING MATH IN A SPREADSHEET IN A LAYOUT ONLY HE/SHE WOULD COME UP WITH %macro pass(num); %global tot&num pass# proc sql; select count(*) into :tot&num from school# quit; proc sql; select count(*) into :pass&num from school&num where pass_math = 'Y'; quit; %mend; CREATING MACRO VARIABLE IN PROC SQL SAVE MACRO VARIABLES OUTSIDE MACRO
26 %pass(1); %pass(2); %pass(3); run; data _NULL_; file 'P:\My Documents\CO DAY 2009\MATH COUNTS.txt'; pct1 = &pass1 / &tot1; pct2 = &pass2 / &tot2; pct3 = &pass3 / &tot3; retain t '09'x; if _N_ = 1 then put "MATH PASS RESULTS" / / "SCHOOL" t "STUDENTS" t "NUMBER PASSED" t "PERCENT PASSED" / "SCHOOL1" t "&tot1" t "&pass1" / t t t pct1/ "SCHOOL2" t "&tot2" t "&pass2" / t t t pct2 / "SOUTHLAND SCHOOL" t "&tot3" t "&pass3" / t t t pct3; format pct1 pct2 pct3 percent10.2; options missing = 0; run; TAB DELIMITTED
27 AFTER BRINGING INTO EXCEL AS TAB-DELIMITTED AND SOME MANUAL ADJUSTMENTS TO COLUMN WIDTH: CLIENT GETS DATA IN LAYOUT HE/SHE DESIRES