SET statement in DATA step Based on S. David Riba’s The Set statement and beyond: Uses and Abuses of the SET statement
Simple SET statement *Simple Set statement; data temp1b; set temp1; run; *Concatenate; data temp1x3; set temp1 temp1 temp1; *Interleave; data temp12a; set temp1 temp2;by i; *Combine; data temp12b; set temp2; *Attach first observation of temp1 to all observations of temp2; data temp12c; if (_n_ eq 1) then set temp1;
Data set options Inside SET statement /*Data step options in SET Statement DROP = varlist KEEP = varlist FIRSTOBS = num IN = var OBS = num RENAME = varlist WHERE = condition */ *Combine data with itself to calculate change in a variables' value; data temp3; set temp1 ( keep = x ) ; set temp1 ( firstobs = 2 rename = ( x=frwx ) ); delta = x-frwx; run; *The IN = data set option is used with multiple data sets where it is important to know which data set contributed an observation; data temp4 ; set temp1 ( in = in_1 ) temp2 ( in = in_2 ) ;by i; if ( in_1 ) then x2=x**2 ; else if ( in_2 ) then yexp=exp(y); DATA temp5 ; Set temp1 ( where = ( x>.5 ) ) temp2 ( where = ( y<.5 ) ) ;
SET statement options /*SET statement OPTIONS END = var KEY = index NOBS = var POINT = var */ *END statement; *The END = option is used to identify the last observation processed by a SET statement.; data temp6; set temp1 end = eof ; set temp2; if ( eof ) then do ; lx=x; ly=y; end; run;
can be either a simple key or a composite key; *KEY statement; *The KEY = option retrieves observations from an indexed data set based on the index key, which can be either a simple key or a composite key; data pan1; do i=1 to 30; k=i; x=rand('unif'); output; end; run; data pan2(index=(k)); do i=1 to 20; k=i+10; y=rand('unif'); data pan3 ; set pan1; set pan2 key = k; xymax=max(x,y);
*NOBS statement; *The NOBS = option creates a variable which contains the total number of observations in the input data set(s). If multiple data sets are listed in the SET statement, the value in the NOBS = variable are the total number of observations in all the listed data sets.; *use a data set if nobs is what you want; data temp7; if (0) then set temp1 ( drop=_ALL_ ) temp2 ( drop=_ALL_ ) nobs=totobs; if ( totobs ) then set temp1 temp2; else abort ; run; *just figure out the nobs of your data set; data _null_; call symput( 'n_obs' , put ( n_obs, 5. ) ) ; stop; set temp1 temp2 nobs = n_obs; %put &n_obs;
*POINT statement; *The POINT = option uses a numeric variable for direct (or random) access into a SAS data set. The value of the POINT = variable must be specified before it can be used.; *use the third observation; data temp8; ptr = 3; set temp1 point = ptr ; if ( _error_ ) then abort ; output; stop; run; *reverse the order of your data; data temp9; do ptr = lastrec to 1 by -1 ; set temp1 point = ptr nobs = lastrec ; end;
*Random replicates of data set; data john1; do i = 1 to 20; x=rand('unif'); output; end; run; data john2; do _i_ = 1 to 10; ptr = ceil ( totobs * ranuni ( totobs ) ) ; set john1 point = ptr nobs = totobs ; if ( _error_ ) then abort ; stop;
*Replicates of observations; data kevin1; do i=1 to 10; start=1; stop=i; output; end; run; data kevin2; x=rand('unif'); data kevin3; set kevin1; do ptr = start to stop ; set kevin2 point = ptr ; if ( _error_ ) then abort ;
*Input Min, Max, Sum etc. in your data set; data voytek; retain minval maxval sumval ; if ( _N_ eq 1 ) then do until (lastrec) ; set temp1 (keep = x) end = lastrec; minval = min ( minval, x ) ; maxval = max ( maxval, x ) ; sumval = sum ( sumval, x ) ; end; set temp1 ; run;