Download presentation
Presentation is loading. Please wait.
Published byQuentin Collins Modified over 9 years ago
1
“LAG with a WHERE” and other DATA Step Stories Neil Howard A
2
Table of Contents Chapter 1 “LAG with a WHERE” Chapter 2 “A DIFferent LAG” Chapter 3 “To LAG or to LEAD” Chapter 4 “When RETAIN Doesn’t Retain” Chapter 5 “Don’t Order My Variables Around” Chapter 6 “The Case of the Missing Values”
3
Chapter One LAG with a WHERE
4
“LAG with a WHERE” an interesting fairy tale must first understand: u LAG function u WHERE statement u implications of conditional execution
5
LAG function Syntax: n specifies number of lagged values argument is numeric or character LAG (argument)
6
LAG function LAG functions return values from a queue A LAGn function stores a value in a queue and returns a value stored previously in that queue Each occurrence of a LAGn function generates its own queue n is the length of the queue
7
LAG function LAG function is executable LAG function can be conditionally executed NOTE: storing and returning values from the queue occurs only when the function is executed
8
data new; input x @@; input x @@; lag1=lag1(x); lag1=lag1(x); lag2=lag2(x); lag2=lag2(x);cards; 1 2 3 4 5 6 ; SIMPLE LAG
9
X LAG1 LAG2 1.. 2 1. 3 2 1 4 3 2 5 4 3 6 5 4 (Note initialization to missing)
10
data new; input a b @@; LAGa = LAG(a); if b=2 then LAGb=LAG(a); cards; 1 1 2 1 3 2 4 1 5 2 6 1 ; CONDITIONAL LAG
11
A B LAGA LAGB 1 1.. 2 1 1. 3 2 2. 4 1 3. 5 2 4 3 6 1 5.
12
data new; input x @@; * conditional; if mod(x,2)=0 then condLAG1 = lag(x); LAGx=lag(x); * unconditional; if mod(x,2)=0 then condLAG2 = LAGx; cards; 1 2 3 4 5 6 7 8 ; Every other lagged value ?
13
X LAGx condLAG1 condLAG2 1... 2 1. 1 3 2.. 4 3 2 3 5 4.. 6 5 4 5 7 6.. 8 7 6 7 right answer
14
WHERE statement Selects observations before they’re brought into the LPDV After data set options applied Before any other data step statements executed, including SET, BY, etc. Functions differently with BY and first. and last. Only works w/ SAS data (not raw data)
15
VISIT WEIGHT VISIT WEIGHT 01JAN2003 88 02JAN2003 22 03JAN2003 154 04JAN2003 21 05JAN2003 112 CUTOFF Given this data:
16
data w ; set q ; lagwgt = lag(weight) ; where visit>"01jan2003"d ; run ; data w ; set q ; lagwgt = lag(weight) ; if visit > "01jan2003"d ; run ; Subsetting IF WHERE DIFFERENCE? WHERE will not pick up first lagged valueWHERE will not pick up first lagged value subsetting IF will…subsetting IF will…
17
Output from WHERE VISIT WEIGHT LAGWGT 02JAN2003 22. 03JAN2003 154 22 04JAN2003 21 154 05JAN2003 112 21
18
Output from subsetting IF VISIT WEIGHT LAGWGT 02JAN2003 22 88 03JAN2003 154 22 04JAN2003 21 154 05JAN2003 112 21
19
Chapter Two A DIFferent LAG
20
“A DIFferent LAG” DIF function Syntax: n specifies number of lags argument is numeric DIF (argument)
21
DIF function DIF function returns the first difference between the argument and its n th lag. Defined as: DIF(X) = X - LAGn(X) ;
22
DIF function Same storing/returning from LAGn queues apply Same caveats for conditional execution
23
data new; input x @@; input x @@; lagx = lag(x); lagx = lag(x); difx = dif(x); difx = dif(x);cards; 1 2 8 4 3 9 7 ;
24
x lagx difx 1.. 2 - 1 = 1 8 2 6 4 8 -4 3 4 -1 9 3 6 7 9 -2
25
Chapter Three To LAG or to LEAD
26
Is there a LEAD function? No LEAD function or negative LAG Several solutions at: u www.sconsig.com www.sconsig.com Including: u Sort in descending order (reverse) u …then use the LAG function
27
Most elegant solution: MERGE the data set with itself Read the data set twice u Using a 1:1 MERGE u No BY statement u Using firstobs=2
28
data lagged ; merge master ( keep = var ) master ( keep = var ) master ( firstobs = 2 master ( firstobs = 2 rename = (var =nextvar ) ) ; rename = (var =nextvar ) ) ; **** no BY statement ; **** no BY statement ;run;
29
varnextvar 12 23 34 45 56 6. Results of merge
30
Chapter Four When RETAIN Doesn’t Retain
31
Retained Variables all SAS special variables, e.g. u _N_ u _ERROR_ all vars in RETAIN statement all vars from SET or MERGE accumulator vars in SUM stmt
32
Variables Not Retained Variables from INPUT statement User-defined variables/ vars created in DATA step UNLESS……what?
33
data B ; input id $; cards;020010300506900; data c; set A B ; set A B ; if missing(site) then site = substr(id,1,2); if missing(site) then site = substr(id,1,2);run; data A ; input id $ site $; site $;cards; 10212 00 10213 00 ;concatenation
34
idsite 1021200 1021300 0200102 0300502 0690002 ??
35
data C; set A B (in=inb); set A B (in=inb); if inb then site = substr(id,1,2); if inb then site = substr(id,1,2);run; test that the observation has come from B and only then extract the site value.... Solution
36
idsite 1021200 1021300 0200102 0300503 0690006 !
37
Chapter Five Don’t Order My Variables Around “the variable order is not always declared where it seems to occur…” Ron Fehd
38
Question posed: How do I reorder the variables in my SAS data set?
39
“Don’t Order My Variables Around” WHY? u exporting / export wizard u SAS Viewer end users u manipulate groups/lists of vars (age - - diag) u with PUT or ARRAY u what else?
40
“Don’t Order My Variables Around” storage: u in LPDV u in SAS data set presentation layer
41
My question to you: What forces the order of the variables in a SAS data set in the first place? The order in which they are seen by the compiler when the data set is created.
42
“Don’t Order My Variables Around” RETAIN statement (ATTRIB statement) (LENGTH statement) (PROC TRANSPOSE) ??????
43
“Don’t Order My Variables Around” Why RETAIN? retain functionality implicit for vars coming from SET or MERGE Nothing you can mess up (attributes, etc.)!
44
Original Original CONTENTS PROCEDURE Variables Ordered by Position- ----- Variables Ordered by Position- ---- # Variable Type Len Pos 1 NAME Char 8 0 2 SEX Char 8 8 3 AGE Num 8 16 4 ID Num 8 24 5 RX_GRP Num 8 32
45
Original Original NAME SEX AGE ID RX_GRP John M 35 101 2 Dan M 53 206 1 Howard M 45 321 3
46
data new; retain id rx_grp name sex age; retain id rx_grp name sex age; *** 1 st reference to compiler; *** 1 st reference to compiler; set master; set master;run;
47
Reordered Reordered CONTENTS PROCEDURE -----Variables Ordered by Position----- -----Variables Ordered by Position----- # Variable Type Len Pos 1 ID Num 8 0 2 RX_GRP Num 8 8 3 NAME Char 8 16 4 SEX Char 8 24 5 AGE Num 8 32
48
Reordered Reordered ID RX_GRP NAME SEX AGE 101 2 John M 35 206 1 Dan M 53 321 3 Howard M 45
49
Chapter Six The Case of the Missing Values
50
“How do MISSINGs compare?” QUESTION: If A > B then ; If either A or B is missing, isn’t the statement just ignored? What if both are missing?
51
28* NUMERIC Missing Values._..a.b.c …….z All less than all negative numbers >>> low…….………………..to………..…….………high >>> * Confirmed in SAS documentation…..only 28
52
To answer to a question raised in the meeting about missing values: SPECIAL MISSING VALUE is a type of numeric missing value that enables you to represent different categories of missing data by using the letters A-Z or an underscore. SAS accepts either uppercase or lowercase letters. Values are displayed and printed as uppercase. If you do not begin a special numeric missing value with a period, SAS identifies it as a variable name. Therefore, to use a special numeric missing value in a SAS expression or assignment statement, you must begin the value with a period, followed by the letter or underscore, as in the following example: x=.d; When SAS prints a special missing value, it prints only the letter or underscore. When data values contain characters in numeric fields that you want SAS to interpret as special missing values, use the MISSING statement to specify those characters.
53
Master File NAME A B John 10 7 Dan X _ Howard.. numeric
54
data subset ; set old; set old; if A > B; if A > B;run;
55
Records deleted NAME A B John 10 > 7 Dan X > _ Howard. =. deleted numeric
56
One (1) Character Missing Value less than all negative numbers regardless of length collating sequence determines where “ ” falls in order of values <blank> b
57
if C=D then MSG=‘same’; if C=D then MSG=‘same’; C D C D NAME length=4 length=1 MSG John XXXX Y Dan Q Q same Howard same
58
Deus ex machina* www.sasCommunity.org www.sasCommunity.org http://groups.google.com/group/com p.soft-sys.sas/topics http://groups.google.com/group/com p.soft-sys.sas/topics http://groups.google.com/group/com p.soft-sys.sas/topics SAS-L www.support.sas.com www.support.sas.com Documentation and HELP facility TRIAL and ERROR Testing!! * resolution
59
Thank you!! Thank you!! Neil.Howard@amgen.com
60
Subscribing to SAS-L To have the messages mailed to you as they are available, send e-mail to any of the mail servers: listserv@vm.marist.edu Marist University listserv@listserv.vt.edu Virginia Polytechnic University listserv@listserv.uga.edu University of Georgia listserv@AKH-WIEN.AC.AT University of Vienna The subject line is ignored and the body should contain the command: subscribe sas-l your name here e.g. subscribe sas-l Tom Smith is how Tom Smith would subscribe.
61
SAS-L Stuff http://www.listserv.uga.edu/archives/ sas-l.html http://www.listserv.uga.edu/archives/ sas-l.html http://www.listserv.uga.edu/archives/ sas-l.html http://groups.google.com/group/com p.soft-sys.sas/topics http://groups.google.com/group/com p.soft-sys.sas/topics http://groups.google.com/group/com p.soft-sys.sas/topics From www.sconsig.com: www.sconsig.com u SAS-L On-Line (from www.sconsig.com) SAS-L On-Linewww.sconsig.com SAS-L On-Linewww.sconsig.com u How to Subscribe to SAS-L How to Subscribe to SAS-L How to Subscribe to SAS-L
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.