Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.

Similar presentations


Presentation on theme: "Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com."— Presentation transcript:

1 Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com

2 Reading Raw Data Using the following SAS program: DATA NEW; INPUT ID $ AGE TEMPC; TEMPF=TEMPC*(9/5)+32; DATALINES; 0001 24 37.3 0002 35 38.2 ; run; proc print;run; Alan C. Elliott, stattutorials.com

3 Overview of SAS Data Step Compile Phase (Look at Syntax) Execution Phase (Read data, Calculate) Output Phase (Create Data Set) Alan C. Elliott, stattutorials.com

4 Compile Phase DATA NEW; INPUT ID $ AGE TEMPC; TEMPF=TEMPC*(9/5)+32; DATALINES; 0001 24 37.3 0002 35 38.2 ; run; proc print;run; SAS Checks the syntax of the program. Identifies type and length of each variable Does any variable need conversion? If everything is okay, proceed to the next step. If errors are discovered, SAS attempts to interpret what you mean. If SAS can’t correct the error, it prints an error message to the log. Alan C. Elliott, stattutorials.com

5 Create Input Buffer SAS creates an input buffer INPUT BUFFER contains data as it is read in DATALINES; 0001 24 37.3 0002 35 38.2 ; 123456789101112 00012437.3 INPUT BUFFER Alan C. Elliott, stattutorials.com

6 Execution Phase PROGRAM DATA VECTOR (PDV) is created and contains information about the variables Two automatic variables _N_ and _ERROR_ and a position for each of the four variables in the DATA step. Sets _N_ = 1 _ERROR_ = 0 (no initial error) and remaining variables to missing. _N__ERROR_IDAGETEMPCTEMPF 10... Alan C. Elliott, stattutorials.com

7 Buffer to PDV 123456789101112 00012437.3 _N__ERROR_IDAGETEMPCTEMPF 1000012437.3. Calculated value Buffer PDV _N__ERROR_IDAGETEMPCTEMPF 1000012437.399.14 Processes the code TEMPF=TEMPC*(9/5)+32; Initially missing Reads 1 st record If there is an executable statement… Alan C. Elliott, stattutorials.com

8 Output Phase The values in the PDV are written to the output data set (NEW) as the first observation: _N__ERROR_IDAGETEMPCTEMPF 1000012437.399.14 IDAGETEMPCTEMPF 00012437.399.14 This is the first record in the output data set named “NEW.” Note that _N_ and _ERROR_ are dropped. From PDV Write data to data set. Alan C. Elliott, stattutorials.com

9 Exceptions to Missing in PDV Some data values are not initially set to missing in the PDV – variables in a RETAIN statement – variables created in a SUM statement – data elements in a _TEMPORARY_ array – variables created with options in the FILE or INFILE statements These exceptions are covered later. _N__ERROR_IDAGETEMPCTEMPF 10... Initial values usually set to missing in PDV Alan C. Elliott, stattutorials.com

10 Next data record read Once SAS finished reading the first data record, it continues the same process, and reads the second record…sending results to output data set (named NEW in this case.) …and so on for all records. IDAGETEMPCTEMPF 00012437.399.14 00023538.2100.76 Alan C. Elliott, stattutorials.com

11 Descriptor Information For the data set, SAS creates and maintains a description about each SAS data set: – data set attributes – variable attributes – the name of the data set – member type, the date and time that the data set was created, and the number, names and data types (character or numeric) of the variables. Alan C. Elliott, stattutorials.com

12 Data Set Description proc datasets ; contents data=new; run; Contents output… (abbreviated) #NameMember Type File SizeLast Modified 1NEWDATA512020Nov13:0 8:59:32 Alternate program proc contents data= new; run; Alan C. Elliott, stattutorials.com

13 Description output continued… Data Set NameWORK.NEWObservations2 Member TypeDATAVariables4 EngineV9Indexes0 CreatedWed, Nov 20, 2013 08:59:32 AM Observation Length32 Last ModifiedWed, Nov 20, 2013 08:59:32 AM Deleted Observations 0 ProtectionCompressedNO Data Set TypeSortedNO Label Data RepresentationWINDOWS_64 Encodingwlatin1 Western (Windows) Alan C. Elliott, stattutorials.com

14 Description output continued… Alphabetic List of Variables and Attributes #VariableTypeLen 2AGENum8 1IDChar8 3TEMPCNum8 4TEMPFNum8 Alan C. Elliott, stattutorials.com

15 Original Program DATA NEW; INPUT ID $ AGE TEMPC; TEMPF=TEMPC*(9/5)+32; DATALINES; 0001 24 37.3 0002 35 38.2 ; run; proc print;run; Alan C. Elliott, stattutorials.com

16 Original Program DATA NEW; INPUT ID $ AGE TEMPC; TEMPF=TEMPC*(9/5)+32; DATALINES; 0001 24 37.3 0002 35 38.2 ; run; proc print;run; ObsIDAGETEMP C TEMP F 100012437.399.14 200023538.2100.76 Program output Alan C. Elliott, stattutorials.com

17 Example of Error DATA NEW; INPUT ID $ AGE TEMPC; TEMPF=TEMPC*(9/5)+32 DATALINES; 0001 24 37.3 0002 35 38.2 ; run; proc print;run; proc datasets ; contents data=new; run; Missing Semi-colon Alan C. Elliott, stattutorials.com

18 76 DATA NEW; 77 INPUT ID $ AGE TEMPC; 78 TEMPF=TEMPC*(9/5)+32 79 DATALINES; --------- 22 80 0001 24 37.3 ---- 180 ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, *, **, +, -, /,, =, >, > =, AND, EQ, GE, GT, IN, LE, LT, MAX, MIN, NE, NG, NL, NOTIN, OR, ^=, |, ||, ~=. ERROR 180-322: Statement is not valid or it is used out of proper order. 81 0002 35 38.2 82 ; 83 run; ERROR: No DATALINES or INFILE statement. Error found during compilation Alan C. Elliott, stattutorials.com

19 Summary - Compilation Phase During Compilation – Check syntax – Identify type and length of each new variable (is a data type conversion needed?) – creates input buffer if there is an INPUT statement for an external file – creates the Program Data Vector (PDV) – creates descriptor information for data sets and variable attributes – Other options not discussed here: DROP; KEEP; RENAME; RETAIN; WHERE; LABEL; LENGTH; FORMAT; ARRAY; BY; ATTRIB; END=, IN=, FIRST, LAST, POINT= Alan C. Elliott, stattutorials.com

20 Summary – Execution Phase 1.The DATA step iterates once for each observation being created. 2.Each time the DATA statement executes, _N_ is incremented by 1. 3.Newly created variables set to missing in the PDV. 4.SAS reads a data record from a raw data file into the input buffer (there are other possibilities not discussed here). 5.SAS executes any other programming statements for the current record. 6.At the end of the data statements (RUN;) SAS writes an observation to the SAS data set (OUTPUT PHASE) 7.SAS returns to the top of the DATA step (Step 3 above) 8.The DATA step terminates when there is no more data. Alan C. Elliott, stattutorials.com

21 End Alan C. Elliott, stattutorials.com


Download ppt "Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com."

Similar presentations


Ads by Google