Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hints and Tips SAUSAG Q2 2015. SORTING – NOUNIQUEKEY The NOUNIQUEKEY option on PROC SORT is a useful way in 9.3 to easily retain only those records with.

Similar presentations


Presentation on theme: "Hints and Tips SAUSAG Q2 2015. SORTING – NOUNIQUEKEY The NOUNIQUEKEY option on PROC SORT is a useful way in 9.3 to easily retain only those records with."— Presentation transcript:

1 Hints and Tips SAUSAG Q2 2015

2 SORTING – NOUNIQUEKEY The NOUNIQUEKEY option on PROC SORT is a useful way in 9.3 to easily retain only those records with a duplicate key. NOUNIQUERECS and UNIQUEOUT are related options. proc sort data=results out=chk_results uniqueout=rest_of_results nouniquekeys; by key_1 key_n ; run; Prior to that the usual method was: proc sort data=results out=chk_results; by key_1 key_n ; run; data chk_results; set chk_results; by key_1 key_n ; if not (first.key_n and last.key_n); run; 2

3 SQL macro variables - TRIMMED By default when creating macro variables in PROC SQL leading and trailing blanks are retained. This can be inconvenient especially for numerics, and usually solved by adding a %LET mvar = &mvar. In 9.3 the TRIMMED option was added to address this. Eg: data raw_data; value = 2; output; value = 15; output; value = 1; output; run; proc sql noprint; select sum(value) into :total_u from raw_data ; select sum(value) into :total_t TRIMMED from raw_data ; quit; %put Total (untrimmed) : ***&total_u***; %put Total (trimmed) : ***&total_t***; %let total_u = &total_u; %put Total (re-trimmed): ***&total_u***; run; Giving: Total (untrimmed) : *** 18*** Total (trimmed) : ***18*** Total (re-trimmed): ***18*** 3

4 SQL macro variable range creation In 9.3 it is possible to specify an open range when creating macro variables in PROC SQL. The old method was to specify a large end value or (more creatively) to use &sysmaxlong. Eg: proc sql noprint; select distinct(value) into :o_val1-:o_val999999 from raw_data ; select distinct(value) into :val1- from raw_data ; quit; %let num_vals = &sqlobs; %put Number of distinct values: &num_vals; %put &val1, &val2, &val3; run; Giving: Number of distinct values: 3 1, 2, 15 4

5 DOSUB and DOSUBL DOSUB and DOSUBL were introduced as experimental functions in 9.3 (production in 9.4) and provide an extension to the CALL EXECUTE concept. They provide the ability to immediately execute SAS code and then return to the calling data step, whereas CALL EXECUTE stacks the code for execution after the data step has completed (but it does immediately resolve macro calls). DOSUB takes a quoted literal string which is a file reference containing code to be executed, and DOSUBL takes (only) a literal string of the code to be executed. For example: data dosubtst ; rc1 = dosubl('data tst; a=42; run;') ; rc2 = dosubl('%runcode(parm);'); run; Most uses would be to call a macro for convenience. A return code of 0 means the code could be executed, non zero not. 5

6 DOSUB and DOSUBL (cont) This opens the possibility of executing global statements such as LIBNAME, macros and data step(s) within the code and then accessing the results (via macro variables) or via the OPEN, FETCH and CLOSE functions. There are plenty of examples of interesting usage on the net, for example: http://support.sas.com/resources/papers/proceedings12/227-2012.pdf https://support.sas.com/resources/papers/proceedings13/032-2013.pdf There is a problem with macro variables being passed before 9.3 TS1M2 so check http://support.sas.com/kb/53/059.html for the workaround if applicable. http://support.sas.com/kb/53/059.html Over is an example of how it can be used to create recursive code (but this can be dangerous!) 6

7 DOSUB and DOSUBL (cont) Just for later perusal, recursive code to calculate a factorial: filename code temp; data _null_; file code; put 'data _null_;'; put ' x = input(symget("parm"),32.);'; put ' y = coalesce(input(symget("control"),32.),1);'; put ' if x > 0 then do;'; put ' call symputx("control",x*y);'; put ' call symputx("parm",x-1);'; put ' rc = dosub("code");'; put ' end;'; put ' else'; put ' call symputx("result",y);'; put 'run;'; run; data factorial; x = 5; call symputx('parm',x); call symputx('control',.); rc = dosub('code'); y = input(symget('result'),32.); run;

8 Tracking task progress in EG The SYSECHO global statement which displays a string in the EG task status bar (and window) can be combined with the DOSUBL function to build a neat task progress display. It takes the total number of records that will be processed plus how often the display is to be updated, and maintains a running display on the status bar. There is a undocumented bug in 9.3 TS1M0 (probably fixed in M2), which abends the task after 32 DOSUBL calls within a data step, so set to say 5% display as a minimum. %macro display_pct_complete(totalrecs,by_pct=10); if int(100*(_n_-1)/(&totalrecs)) ne int(100*_n_/(&totalrecs)) then do; if mod(int(100*(_n_-1)/(&totalrecs)),&by_pct) = 0 then do; * Report on each complete by_pct ; drop _rc _s; _rc = dosubl(cat('SYSECHO "Percentage complete: ', put(int(100*(_n_-1)/(&totalrecs)),3.), '%";')); _s = sleep(.1,1); * A delay for fast code ; end; end %mend display_pct_complete; data results; set large_dataset nobs=totrecs; * processing ; %display_pct_complete(totrecs,by_pct=1); run; 8

9 ZIP file processing New with 9.4 is the FILENAME ZIP access method which makes processing standard WinZip like zip files much easier compared to using the undocumented SASZIPAM filename engine or unnamed pipes. It makes the zip file look and act like a directory, allowing selective file read/write access. It does have a limitation in that it won’t handle other zip types like bzip2, so pipes still have their place, so long as the data in a line feed delimited format not binary. This is an example of using a complicated pipe construct to read a group of related datasets (ID_DATA_01, ID_DATA_02 etc) from a zip file containing bzipped members without having to unzip any of them, something the new ZIP engine can’t handle. The data is CSV-like data. filename archive pipe "unzip -p '&latest_archive' 'ID_DATA_*.csv.bz2' | bunzip2"; data id_data; infile archive dsd dlm='~' termstr=lf missover lrecl=300; length id $20. type_code $6. ; input id type_code ; run; 9

10 ZIP file processing (cont) The advantage of the FILENAME ZIP access method is that all the standard, and more importantly, the less used filename options are available (and work properly). Probably the most useful is the binary streaming, or RECFM=S option. filename inzip zip "path/ebcdic_data.zip" member="VB_data"; * Reads a variable blocked mainframe sourced EBCDIC file with RDW from a ZIP archive ; data ebcdic_data; infile inzip recfm=s nbyte=_datalen; length line $300.; * maximum variable line length ; * Read the (4 byte) Record Descriptor Word to determine the line length ; _datalen = 4; input; * Reset the amount of data to read next based on the RDW (only the first 2 bytes used); * and save the line length in the dataset. ; _datalen = input(_infile_,s370fibu2.)-4; data_len = _datalen; * Read the exact number of bytes in the variable length line ; input; line = _infile_; run; 10

11 Questions? 11


Download ppt "Hints and Tips SAUSAG Q2 2015. SORTING – NOUNIQUEKEY The NOUNIQUEKEY option on PROC SORT is a useful way in 9.3 to easily retain only those records with."

Similar presentations


Ads by Google