Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.

Slides:



Advertisements
Similar presentations
The SAS ® System Additional Information on Statistical Analysis Programming.
Advertisements

The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Program Data Vector PDV Example 1. Data A; Input X1 X2 Y; Datalines; Z X1 X2 Y _N_ _ERROR_ PDV (green  drop) Data.
Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.
Chapter 3: Editing and Debugging SAS Programs. Some useful tips of using Program Editor Add line number: In the Command Box, type num, enter. Save SAS.
Introduction to SAS Programming Christina L. Ughrin Statistical Software Consulting Some notes pulled from SAS Programming I: Essentials Training.
Statistics in Science  Introducing SAS ® software Acknowlegements to David Williams Caroline Brophy.
Basic And Advanced SAS Programming
Using Proc Datasets for Efficiency Originally presented as a Coder’s NESUG2000 by Ken Friedman Reviewed by Karol Katz.
Into to SAS ®. 2 List the components of a SAS program. Open an existing SAS program and run it. Objectives.
Creating SAS® Data Sets
Data Cleaning 101 Ron Cody, Ed.D Robert Wood Johnson Medical School Piscataway, NJ.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Chapter 21 Reading Hierarchical Files Reading Hierarchical Raw Data Files.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
SAS SQL Part 2 Alan Elliott. Dealing with Missing Values Title "Dealing with Missing Values in SQL"; PROC SQL; select INC_KEY,GENDER, RACE, INJTYPE, case.
Chapter 20 Creating Multiple Observations from a Single Record Objectives Create multiple observations from a single record containing repeating blocks.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina Chapter 17 supplement: Review of Formatting Data STAT 541.
Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the.
SAS Macro: Some Tips for Debugging Stat St. Paul’s Hospital April 2, 2007.
SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze Data Using.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Chapter 1: Introduction to SAS  SAS programs: A sequence of statements in a particular order  Rules for SAS statements: –Every SAS statement ends in.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Statistical Analysis System SAS. (Statistical Analysis System) It was developed by James Good knight It was package 1980-Language 1990-Software.
SQL Chapter Two. Overview Basic Structure Verifying Statements Specifying Columns Specifying Rows.
Chapter 3 “Working With Your Data” concerns programming in the DATA step - putting lines of SAS code between a DATA and PROC statement… Creating new variables.
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.
SAS Basics. Windows Program Editor Write/edit all your statements here. Log Watch this for any errors in program as it runs. Output Will automatically.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
Controlling Input and Output
SAS Basics. Windows Program Editor Write/edit all your statement here.
Lecture 4 Ways to get data into SAS Some practice programming
An Introduction Katherine Nicholas & Liqiong Fan.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,
Chapter 2 Getting Data into SAS Directly enter data into SAS data sets –use the ViewTable window. You can define columns (variables) with the Column Attributes.
BMTRY 789 Lecture 6: Proc Sort, Random Number Generators, and Do Loops Readings – Chapters 5 & 6 Lab Problem - Brain Teaser Homework Due – HW 2 Homework.
Use the SET statement to: –create an exact copy of a SAS dataset –modify an existing SAS dataset by creating new variables, subsetting (using a subsetting.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Chapter 11 Reading SAS Data
By Sasikumar Palanisamy
Database application MySQL Database and PhpMyAdmin
SAS Programming Introduction to SAS.
By Don Henderson PhilaSUG, June 18, 2018
Chapter 22 Reading Hierarchical Files
Topics Introduction to File Input and Output
SAS Essentials How SAS Thinks
Introduction to SAS A SAS program is a list of SAS statements executed in order Every SAS statement ends with a semicolon! SAS statements can be in caps.
Introduction to DATA Step Programming: SAS Basics II
Topics Introduction to File Input and Output
Hans Baumgartner Penn State University
Introduction to SAS Essentials Mastering SAS for Data Analytics
Presentation transcript:

Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com

Reading Raw Data Using the following SAS program: DATA NEW; INPUT ID $ AGE TEMPC; TEMPF=TEMPC*(9/5)+32; DATALINES; ; run; proc print;run; Alan C. Elliott, stattutorials.com

Overview of SAS Data Step Compile Phase (Look at Syntax) Execution Phase (Read data, Calculate) Output Phase (Create Data Set) Alan C. Elliott, stattutorials.com

Compile Phase DATA NEW; INPUT ID $ AGE TEMPC; TEMPF=TEMPC*(9/5)+32; DATALINES; ; run; proc print;run; SAS Checks the syntax of the program. Identifies type and length of each variable Does any variable need conversion? If everything is okay, proceed to the next step. If errors are discovered, SAS attempts to interpret what you mean. If SAS can’t correct the error, it prints an error message to the log. Alan C. Elliott, stattutorials.com

Create Input Buffer SAS creates an input buffer INPUT BUFFER contains data as it is read in DATALINES; ; INPUT BUFFER Alan C. Elliott, stattutorials.com

Execution Phase PROGRAM DATA VECTOR (PDV) is created and contains information about the variables Two automatic variables _N_ and _ERROR_ and a position for each of the four variables in the DATA step. Sets _N_ = 1 _ERROR_ = 0 (no initial error) and remaining variables to missing. _N__ERROR_IDAGETEMPCTEMPF Alan C. Elliott, stattutorials.com

Buffer to PDV _N__ERROR_IDAGETEMPCTEMPF Calculated value Buffer PDV _N__ERROR_IDAGETEMPCTEMPF Processes the code TEMPF=TEMPC*(9/5)+32; Initially missing Reads 1 st record If there is an executable statement… Alan C. Elliott, stattutorials.com

Output Phase The values in the PDV are written to the output data set (NEW) as the first observation: _N__ERROR_IDAGETEMPCTEMPF IDAGETEMPCTEMPF This is the first record in the output data set named “NEW.” Note that _N_ and _ERROR_ are dropped. From PDV Write data to data set. Alan C. Elliott, stattutorials.com

Exceptions to Missing in PDV Some data values are not initially set to missing in the PDV – variables in a RETAIN statement – variables created in a SUM statement – data elements in a _TEMPORARY_ array – variables created with options in the FILE or INFILE statements These exceptions are covered later. _N__ERROR_IDAGETEMPCTEMPF Initial values usually set to missing in PDV Alan C. Elliott, stattutorials.com

Next data record read Once SAS finished reading the first data record, it continues the same process, and reads the second record…sending results to output data set (named NEW in this case.) …and so on for all records. IDAGETEMPCTEMPF Alan C. Elliott, stattutorials.com

Descriptor Information For the data set, SAS creates and maintains a description about each SAS data set: – data set attributes – variable attributes – the name of the data set – member type, the date and time that the data set was created, and the number, names and data types (character or numeric) of the variables. Alan C. Elliott, stattutorials.com

Data Set Description proc datasets ; contents data=new; run; Contents output… (abbreviated) #NameMember Type File SizeLast Modified 1NEWDATA512020Nov13:0 8:59:32 Alternate program proc contents data= new; run; Alan C. Elliott, stattutorials.com

Description output continued… Data Set NameWORK.NEWObservations2 Member TypeDATAVariables4 EngineV9Indexes0 CreatedWed, Nov 20, :59:32 AM Observation Length32 Last ModifiedWed, Nov 20, :59:32 AM Deleted Observations 0 ProtectionCompressedNO Data Set TypeSortedNO Label Data RepresentationWINDOWS_64 Encodingwlatin1 Western (Windows) Alan C. Elliott, stattutorials.com

Description output continued… Alphabetic List of Variables and Attributes #VariableTypeLen 2AGENum8 1IDChar8 3TEMPCNum8 4TEMPFNum8 Alan C. Elliott, stattutorials.com

Original Program DATA NEW; INPUT ID $ AGE TEMPC; TEMPF=TEMPC*(9/5)+32; DATALINES; ; run; proc print;run; Alan C. Elliott, stattutorials.com

Original Program DATA NEW; INPUT ID $ AGE TEMPC; TEMPF=TEMPC*(9/5)+32; DATALINES; ; run; proc print;run; ObsIDAGETEMP C TEMP F Program output Alan C. Elliott, stattutorials.com

Example of Error DATA NEW; INPUT ID $ AGE TEMPC; TEMPF=TEMPC*(9/5)+32 DATALINES; ; run; proc print;run; proc datasets ; contents data=new; run; Missing Semi-colon Alan C. Elliott, stattutorials.com

76 DATA NEW; 77 INPUT ID $ AGE TEMPC; 78 TEMPF=TEMPC*(9/5) DATALINES; ERROR : Syntax error, expecting one of the following: !, !!, &, *, **, +, -, /,, =, >, > =, AND, EQ, GE, GT, IN, LE, LT, MAX, MIN, NE, NG, NL, NOTIN, OR, ^=, |, ||, ~=. ERROR : Statement is not valid or it is used out of proper order ; 83 run; ERROR: No DATALINES or INFILE statement. Error found during compilation Alan C. Elliott, stattutorials.com

Summary - Compilation Phase During Compilation – Check syntax – Identify type and length of each new variable (is a data type conversion needed?) – creates input buffer if there is an INPUT statement for an external file – creates the Program Data Vector (PDV) – creates descriptor information for data sets and variable attributes – Other options not discussed here: DROP; KEEP; RENAME; RETAIN; WHERE; LABEL; LENGTH; FORMAT; ARRAY; BY; ATTRIB; END=, IN=, FIRST, LAST, POINT= Alan C. Elliott, stattutorials.com

Summary – Execution Phase 1.The DATA step iterates once for each observation being created. 2.Each time the DATA statement executes, _N_ is incremented by 1. 3.Newly created variables set to missing in the PDV. 4.SAS reads a data record from a raw data file into the input buffer (there are other possibilities not discussed here). 5.SAS executes any other programming statements for the current record. 6.At the end of the data statements (RUN;) SAS writes an observation to the SAS data set (OUTPUT PHASE) 7.SAS returns to the top of the DATA step (Step 3 above) 8.The DATA step terminates when there is no more data. Alan C. Elliott, stattutorials.com

End Alan C. Elliott, stattutorials.com