Download presentation
Presentation is loading. Please wait.
Published byÉmilie Lavoie Modified over 6 years ago
1
SAS Essentials How SAS Thinks
2
“The DATA step is your most powerful programming tool
“The DATA step is your most powerful programming tool. So understand and use it well.” Socrates
3
Objectives understand DATA step: processes internals defaults
4
processes internals defaults compilation of DATA step source code execution of resultant machine code
5
compile and execute phases of: INPUT (non SAS data) SET
processes internals defaults compile and execute phases of: INPUT (non SAS data) SET
6
Compile Time Activities
processes internals defaults Compile Time Activities syntax scan source code translation to machine language definition of input and output files
7
Compile Time Activities
processes internals defaults Compile Time Activities input buffer LPDV (logical program data vector) data set descriptor information
8
Variables added in the order seen by the compiler
processes internals defaults Creation of LPDV Variables added in the order seen by the compiler during parsing and interpretation of source statements
9
Compile Time Statements
processes internals defaults Compile Time Statements location critical BY WHERE ARRAY ATTRIB FORMAT INFORMAT LENGTH location irrelevant DROP KEEP LABEL RENAME RETAIN
10
Retained Variables processes internals defaults
all SAS special variables _N_ _ERROR_ all vars in RETAIN statement all vars from SET, MERGE, or UPDATE accumulator vars in SUM statement(s)
11
Variables Not Retained
processes internals defaults Variables Not Retained Variables from input statement user defined variables (other than SUM statement)
12
Type and Length of Variables
processes internals defaults Type and Length of Variables determined at compile time by first reference to the compiler (in the DATA step) Numerics: length is 8 during DATA step processing length is an output property
13
INPUT statement reading non-SAS data
14
Compile Loop and LPDV data a ; put _all_ ; *write LPDV to LOG;
input idnum diagdate: mmddyy8. sex $ rx_grp $ 10. ; time = intck (‘year’, diagdate, today() ) ; put _all_; *write LPDV to LOG; cards ; F placebo M 300 mg. F 600 mg. run;
15
logical program data vector
input buffer logical program data vector idnum diagdate sex rx_grp time numeric numeric char char numeric Building descriptor portion of SAS data set
16
logical program data vector
idnum diagdate sex rx_grp time _N_ _ERROR_ numeric numeric char char numeric DKR* keep keep keep keep keep drop drop *Drop/keep/rename
17
Execution of a DATA Step
18
Execution of a DATA Step
Initialization of LPDV read input file Y next step end of file? N process statements in step termination implied output
19
DATA Step Execution processes internals defaults
Implied read/write loop, stopped by: no more data to read explicit STOP no input data some execution time errors
20
Execution Time Activities
processes internals defaults Execution Time Activities execute initialize-to-missing (ITM) read from input source modify data using user-controlled statements supply values of variables to LPDV output observation to SAS data set
21
Initialization processes internals defaults _N_ set to loop count
_ERROR_ set to 0 user variables set to missing
22
Execution Loop - raw data
data a ; put _all_ ; *write LPDV to LOG; input idnum diagdate: mmddyy8. sex $ rx_grp $ 10. ; time = intck (‘year’, diagdate, today() ) ; put _all_; *write LPDV to LOG; cards ; F placebo M 300 mg. F 600 mg. run; proc contents; run; proc print; run;
23
LPDV IDNUM DIAGDATE SEX RX_GRP TIME _N_ F placebo M 300 mg F 600 mg (over all executions of DATA step……..)
24
IDNUM=. DIAGDATE=. SEX= RX_GRP= TIME=. _ERROR_=0 _N_=1
2 data a ; 3 put _all_ ; *write LPDV to LOG; 4 input idnum diagdate: mmddyy8. sex $ rx_grp $ 10. ; 8 time = intck ('year', diagdate, today() ) ; 9 put _all_; *write LPDV to LOG; 10 cards ; IDNUM=. DIAGDATE=. SEX= RX_GRP= TIME=. _ERROR_=0 _N_=1 IDNUM=1 DIAGDATE=-2670 SEX=F RX_GRP=placebo TIME=49 _ERROR_=0 _N_=1 IDNUM=. DIAGDATE=. SEX= RX_GRP= TIME=. _ERROR_=0 _N_=2 IDNUM=2 DIAGDATE=1780 SEX=M RX_GRP=300 mg. TIME=37 _ERROR_=0 _N_=2 IDNUM=. DIAGDATE=. SEX= RX_GRP= TIME=. _ERROR_=0 _N_=3 IDNUM=3 DIAGDATE=-4286 SEX=F RX_GRP=600 mg. TIME=53 _ERROR_=0 _N_=3 IDNUM=. DIAGDATE=. SEX= RX_GRP= TIME=. _ERROR_=0 _N_=4 NOTE: The data set WORK.A has 3 observations and 5 variables. NOTE: The DATA statement used 0.59 seconds. 14 run; 15 16 proc contents; run; NOTE: The PROCEDURE CONTENTS used 0.39 seconds.
25
-----Alphabetic List of Variables and Attributes-----
Data Set Name: WORK.A Observations: Member Type: DATA Variables: Engine: V Indexes: Created: :18 Saturday, January 20, Observation Length: 42 Last Modified: 11:18 Saturday, January 20, Deleted Observations: 0 Protection: Compressed: NO Data Set Type: Sorted: NO Label: -----Engine/Host Dependent Information----- Data Set Page Size: Number of Data Set Pages: 1 File Format: First Data Page: Max Obs per Page: Obs in First Data Page: 3 -----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 5 TIME Num 2 DIAGDATE Num 1 IDNUM Num 4 RX_GRP Char 3 SEX Char
26
PROC PRINT IDNUM DIAGDATE SEX RX_GRP TIME 1 -2670 F placebo 48
M mg F mg
27
reading existing SAS data
SET statement reading existing SAS data
28
DATA Step Compile no input buffer
compiler reads descriptor portion of input SAS data set to build the LPDV returns same variables/attributes, including new variables
29
SET processes internals defaults
determine which SAS data set to be read identify next observation to be read copy variable values to LPDV
30
Execution Loop - SAS data
data sas_a ; put _all_ ; set a ; tot_rec + 1 ; run;
31
logical program data vector
Building LPDV from descriptor portion of old SAS data set logical program data vector idnum diagdate sex rx_grp time tot_rec numeric numeric char char numeric numeric Building descriptor portion of new SAS data set
32
LPDV 1 -2670 F placebo 48 1 1 1 -2670 F placebo 48 1 2
IDNUM DIAGDATE SEX RX_GRP TIME TOT_REC _N_ F placebo F placebo M mg M mg F mg F mg (over all executions of DATA step……..)
33
LOG idnum=. diagdate=. sex= rx_grp= time=. tot_rec=0 _ERROR_=0 _N_=1
idnum=1 diagdate=-2670 sex=F rx_grp=placebo time=48 tot_rec=1 _ERROR_=0 _N_=1 idnum=1 diagdate=-2670 sex=F rx_grp=placebo time=48 tot_rec=1 _ERROR_=0 _N_=2 idnum=2 diagdate=1780 sex=M rx_grp=300 mg. time=36 tot_rec=2 _ERROR_=0 _N_=2 idnum=2 diagdate=1780 sex=M rx_grp=300 mg. time=36 tot_rec=2 _ERROR_=0 _N_=3 idnum=3 diagdate=-4286 sex=F rx_grp=600 mg. time=52 tot_rec=3 _ERROR_=0 _N_=3 idnum=3 diagdate=-4286 sex=F rx_grp=600 mg. time=52 tot_rec=3 _ERROR_=0 _N_=4
34
PROC PRINT 1 -2670 F placebo 48 1 2 1780 M 300 mg. 36 2
IDNUM DIAGDATE SEX RX_GRP TIME TOT_REC F placebo M mg F mg
35
Logic of a MERGE compile execute
36
data left; input ID X Y ; cards; ; data right; input ID A $ B $ ; cards; 1 A14 B32 3 A53 B11 ;
37
merge left (in=inleft) right (in=inright); by ID ; run;
proc sort data=left; by ID; run; proc sort data=right; by ID; run; data both; merge left (in=inleft) right (in=inright); by ID ; run;
38
logical program data vector first iteration: MATCH
ID X Y A B INLEFT INRIGHT _N_ _ERROR_ A14 B
39
logical program data vector second iteration: NO MATCH
ID X Y A B INLEFT INRIGHT _N_ _ERROR_
40
logical program data vector third iteration: MATCH
ID X Y A B INLEFT INRIGHT _N_ _ERROR_ A53 B
41
Let’s try this again…………………
data left; input ID X Y ; cards; ; data right; input ID A $ B $ ; cards; 1 A14 B32 3 A53 B11 ;
42
merge left (in=inleft) right (in=inright);
proc sort data=left; by ID; run; proc sort data=right; by ID; run; data both; merge left (in=inleft) right (in=inright); ***** by ID (one-on-one merge); run;
43
logical program data vector first iteration: 1:1 “MATCH”
ID X Y A B _N_ _ERROR_ A14 B 1 OVERWRITTEN – value came from data set “right”
44
logical program data vector second iteration: 1:1 “MATCH”
ID X Y A B _N_ _ERROR_ A53 B 3 OVERWRITTEN – value came from data set “right”
45
logical program data vector third iteration: 1:1 “NO MATCH”
ID X Y A B _N_ _ERROR_ MISSING – no values from “right”
46
Output SAS data set ID X Y A B A14 B32 A53 B11
47
DATA Step Conclusions Understanding internals and default activities allows you to: make informed coding decisions write flexible and efficient code debug and test effectively interpret results readily
48
Remember We have discussed DEFAULTS
As soon as you add options, statements, features, etc., the default actions change; TEST them! You can use these same tools to track what’s happening.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.