Combining Lags and Arrays [and a little macro] Lynn Lethbridge SHRUG OCT 28, 2011
Lag Function A tool to manipulate data across observations Can lag back as far as you like (default is a lag of 1) Can be frustrating! Don’t put a lag in a conditional statement; create a variable with the lag value first and use the variable in the conditional statement
Example 1a Obs x sex z 1 1 M. 2 1 M F. 4 5 M F. 6 8 F. 7 8 F. 8 8 F. 9 9 F M 9 data work.temp; input x sex $; lx=lag(x); if sex='M' then z=lx; drop lx; cards; 1 M 3 F 5 M 5 F 8 F 9 F 10 M ; run;
Example 1b data work.temp; input x sex $; if sex='M' then z=lag(x); cards; 1 M 3 F 5 M 5 F 8 F 9 F 10 M ; run; Obs x sex z 1 1 M. 2 1 M F. 4 5 M F. 6 8 F. 7 8 F. 8 8 F. 9 9 F M 5
Arrays Allows you to assign a single name to multiple variables Can use an array to manipulate data across variables without having to write them out multiple times Can create new variables in arrays
data work.temp; input id sex $ a b c; array oldvars {*} a b c; array newvars {*} var1-var3; do i=1 to dim(oldvars); if sex='f' then newvars{i}=oldvars{i}; end; drop i; cards; 1 m m m f m f f m ; run;
Obs id sex a b c var1 var2 var3 1 1 m m m f m f f m
Lags and Arrays Lags allow you to work across observations Arrays allow you to work across variables Combining the two lets you work efficiently with your data as a matrix
Combined Example Suppose you have multiple observations per ID Ordered first by ID and then by date You want to know if a code appears twice in a row for any individual ID across multiple variables
data work.temp; input id date a $ b $ c $; cards; a1 b2 g a4 a1 v s3 f5 g a1 a1 g h6 i8 t b2 f4 f d4 f4 s c1 g4 d f4 d4 s d1 s2 w a1 s2 d a1 a2 a f4 f3 f d1 a1 a1 ; run; Data
Lags and Arrays data work.temp2; set work.temp; lagid=lag(id); array vars {*} a b c; array newvars {*} $ na nb nc; do i=1 to 3; lagvars=lag(vars{i}); if id=lagid and lagvars=vars{i} then newvars{i}=vars{i}; end; drop lagid i lagvars; run;
Printed Output Obs id date a b c na nb nc JUN95 a1 b2 g AUG95 a4 a1 v SEP97 s3 f5 g JUN90 a1 a1 g JUN93 h6 i8 t JUL96 b2 f4 f JUL99 d4 f4 s3 f MAY87 c1 g4 d JUN90 f4 d4 s FEB87 d1 s2 w MAY87 a1 s2 d1 s NOV96 a1 a2 a1 a NOV99 f4 f3 f OCT03 d1 a1 a1
Higher Order Lag data work.temp; input id a ; la3=lag3(a); cards; ; run; Obs a la
Higher Order Lag with a Macro %let num=3; data work.temp; input id a ; la&num=lag&num(a); cards; ; run; Obs a la
Thank you for your attention!