SAS Essentials How SAS Thinks Neil.Howard@amgen.com.

Slides:



Advertisements
Similar presentations
CATHERINE AND ANNIE Python: Part 3. Intro to Loops Do you remember in Alice when you could use a loop to make a character perform an action multiple times?
Advertisements

Axio Research E-Compare A Tool for Data Review Bill Coar.
Examples from SAS Functions by Example Ron Cody
Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.
Combining Lags and Arrays [and a little macro] Lynn Lethbridge SHRUG OCT 28, 2011.
AN INTRODUCTION TO PL/SQL Mehdi Azarmi 1. Introduction PL/SQL is Oracle's procedural language extension to SQL, the non-procedural relational database.
Statistics in Science  Introducing SAS ® software Acknowlegements to David Williams Caroline Brophy.
Tutorial 12 Working with Arrays, Loops, and Conditional Statements
Loops – While, Do, For Repetition Statements Introduction to Arrays
© 2004 Pearson Addison-Wesley. All rights reserved5-1 Iterations/ Loops The while Statement Other Repetition Statements.
Introduction to SQL Session 1 Retrieving Data From a Single Table.
Programming Logic and Design, Introductory, Fourth Edition1 Understanding Computer Components and Operations (continued) A program must be free of syntax.
Basic And Advanced SAS Programming
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Creating SAS® Data Sets
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
SAS SQL SAS Seminar Series
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
SAS Macro: Some Tips for Debugging Stat St. Paul’s Hospital April 2, 2007.
SAS ® PROC SQL or Vanilla Flavor Cecilia Mauldin January
Chapter 4: Decision Making with Control Structures and Statements JavaScript - Introductory.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
1 Filling in the blanks with PROC FREQ Bill Klein Ryerson University.
CS Class 05 Topics  Selection: switch statement Announcements  Read pages 74-83, ,
Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Procedural Programming Criteria: P2 Task: 1.2 Thomas Jazwinski.
Chapter 17: Formatting Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
1 Statistical Software Programming. STAT 6360 –Statistical Software Programming Modifying and Combining Datasets For most tasks we need to work with multiple.
Controlling Input and Output
Parser Generation Using SLK and Flex++ Copyright © 2015 Curt Hill.
Lecture 4 Ways to get data into SAS Some practice programming
“LAG with a WHERE” and other DATA Step Stories Neil Howard A.
Chapter 17 Supplement: Alternatives to IF-THEN/ELSE Processing STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
CS Class 04 Topics  Selection statement – IF  Expressions  More practice writing simple C++ programs Announcements  Read pages for next.
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
Copyright 2009 The Little Engine That Could: Using EXCEL LIBNAME Engine Options to Enhance Data Transfers between SAS® and Microsoft® Excel Files William.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
Session 1 Retrieving Data From a Single Table
Chapter 11 Reading SAS Data
By Sasikumar Palanisamy
Today: Feb 28 Reading Data from existing SAS dataset One-way ANOVA
REPETITION CONTROL STRUCTURE
Tutorial 12 Working with Arrays, Loops, and Conditional Statements
Chapter 6: Modifying and Combining Data Sets
3.01 Apply Controls Associated With Visual Studio Form
Chapter 2 Assignment and Interactive Input
Loop Structures.
Lecture 2 Introduction to Programming
SAS Programming Introduction to SAS.
Chapter 3: Working With Your Data
Prof. Neary Adapted from slides by Dr. Katherine Gibson
Chapter 18: Modifying SAS Data Sets and Tracking Changes
By Don Henderson PhilaSUG, June 18, 2018
Instructor: Raul Cruz-Cano
Lesson 7 - Topics Reading SAS data sets
Introduction to SAS A SAS program is a list of SAS statements executed in order Every SAS statement ends with a semicolon! SAS statements can be in caps.
Defining and Calling a Macro
Introduction to DATA Step Programming SAS Basics II
3 Iterative Processing.
Remembering lists of values lists
Introduction to DATA Step Programming: SAS Basics II
Data Structures & Algorithms
Introduction to SAS Essentials Mastering SAS for Data Analytics
Presentation transcript:

SAS Essentials How SAS Thinks Neil.Howard@amgen.com

“The DATA step is your most powerful programming tool “The DATA step is your most powerful programming tool. So understand and use it well.” Socrates

Objectives understand DATA step: processes internals defaults

processes internals defaults compilation of DATA step source code execution of resultant machine code

compile and execute phases of: INPUT (non SAS data) SET processes internals defaults compile and execute phases of: INPUT (non SAS data) SET

Compile Time Activities processes internals defaults Compile Time Activities syntax scan source code translation to machine language definition of input and output files

Compile Time Activities processes internals defaults Compile Time Activities input buffer LPDV (logical program data vector) data set descriptor information

Variables added in the order seen by the compiler processes internals defaults Creation of LPDV Variables added in the order seen by the compiler during parsing and interpretation of source statements

Compile Time Statements processes internals defaults Compile Time Statements location critical BY WHERE ARRAY ATTRIB FORMAT INFORMAT LENGTH location irrelevant DROP KEEP LABEL RENAME RETAIN

Retained Variables processes internals defaults all SAS special variables _N_ _ERROR_ all vars in RETAIN statement all vars from SET, MERGE, or UPDATE accumulator vars in SUM statement(s)

Variables Not Retained processes internals defaults Variables Not Retained Variables from input statement user defined variables (other than SUM statement)

Type and Length of Variables processes internals defaults Type and Length of Variables determined at compile time by first reference to the compiler (in the DATA step) Numerics: length is 8 during DATA step processing length is an output property

INPUT statement reading non-SAS data

Compile Loop and LPDV data a ; put _all_ ; *write LPDV to LOG; input idnum diagdate: mmddyy8. sex $ rx_grp $ 10. ; time = intck (‘year’, diagdate, today() ) ; put _all_; *write LPDV to LOG; cards ; 1 09-09-52 F placebo 2 11-15-64 M 300 mg. 3 04-07-48 F 600 mg. run;

logical program data vector input buffer logical program data vector idnum diagdate sex rx_grp time numeric numeric char char numeric 8 8 8 10 8 Building descriptor portion of SAS data set

logical program data vector idnum diagdate sex rx_grp time _N_ _ERROR_ numeric numeric char char numeric 8 8 8 10 8 DKR* keep keep keep keep keep drop drop *Drop/keep/rename

Execution of a DATA Step

Execution of a DATA Step Initialization of LPDV read input file Y next step end of file? N process statements in step termination implied output

DATA Step Execution processes internals defaults Implied read/write loop, stopped by: no more data to read explicit STOP no input data some execution time errors

Execution Time Activities processes internals defaults Execution Time Activities execute initialize-to-missing (ITM) read from input source modify data using user-controlled statements supply values of variables to LPDV output observation to SAS data set

Initialization processes internals defaults _N_ set to loop count _ERROR_ set to 0 user variables set to missing

Execution Loop - raw data data a ; put _all_ ; *write LPDV to LOG; input idnum diagdate: mmddyy8. sex $ rx_grp $ 10. ; time = intck (‘year’, diagdate, today() ) ; put _all_; *write LPDV to LOG; cards ; 1 09-09-52 F placebo 2 11-15-64 M 300 mg. 3 04-07-48 F 600 mg. run; proc contents; run; proc print; run;

LPDV IDNUM DIAGDATE SEX RX_GRP TIME _N_ . . . 1 1 -2670 F placebo 48 1 . . . 2 2 1780 M 300 mg. 36 2 . . . 3 3 -4286 F 600 mg. 52 3 . . . 4 (over all executions of DATA step……..)

IDNUM=. DIAGDATE=. SEX= RX_GRP= TIME=. _ERROR_=0 _N_=1 2 data a ; 3 put _all_ ; *write LPDV to LOG; 4 input idnum 5 diagdate: mmddyy8. 6 sex $ 7 rx_grp $ 10. ; 8 time = intck ('year', diagdate, today() ) ; 9 put _all_; *write LPDV to LOG; 10 cards ; IDNUM=. DIAGDATE=. SEX= RX_GRP= TIME=. _ERROR_=0 _N_=1 IDNUM=1 DIAGDATE=-2670 SEX=F RX_GRP=placebo TIME=49 _ERROR_=0 _N_=1 IDNUM=. DIAGDATE=. SEX= RX_GRP= TIME=. _ERROR_=0 _N_=2 IDNUM=2 DIAGDATE=1780 SEX=M RX_GRP=300 mg. TIME=37 _ERROR_=0 _N_=2 IDNUM=. DIAGDATE=. SEX= RX_GRP= TIME=. _ERROR_=0 _N_=3 IDNUM=3 DIAGDATE=-4286 SEX=F RX_GRP=600 mg. TIME=53 _ERROR_=0 _N_=3 IDNUM=. DIAGDATE=. SEX= RX_GRP= TIME=. _ERROR_=0 _N_=4 NOTE: The data set WORK.A has 3 observations and 5 variables. NOTE: The DATA statement used 0.59 seconds. 14 run; 15 16 proc contents; run; NOTE: The PROCEDURE CONTENTS used 0.39 seconds.

-----Alphabetic List of Variables and Attributes----- Data Set Name: WORK.A Observations: 3 Member Type: DATA Variables: 5 Engine: V612 Indexes: 0 Created: 11:18 Saturday, January 20, 2001 Observation Length: 42 Last Modified: 11:18 Saturday, January 20, 2001 Deleted Observations: 0 Protection: Compressed: NO Data Set Type: Sorted: NO Label: -----Engine/Host Dependent Information----- Data Set Page Size: 8192 Number of Data Set Pages: 1 File Format: 607 First Data Page: 1 Max Obs per Page: 194 Obs in First Data Page: 3 -----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 5 TIME Num 8 34 2 DIAGDATE Num 8 8 1 IDNUM Num 8 0 4 RX_GRP Char 10 24 3 SEX Char 8 16

PROC PRINT IDNUM DIAGDATE SEX RX_GRP TIME 1 -2670 F placebo 48 2 1780 M 300 mg. 36 3 -4286 F 600 mg. 52

reading existing SAS data SET statement reading existing SAS data

DATA Step Compile no input buffer compiler reads descriptor portion of input SAS data set to build the LPDV returns same variables/attributes, including new variables

SET processes internals defaults determine which SAS data set to be read identify next observation to be read copy variable values to LPDV

Execution Loop - SAS data data sas_a ; put _all_ ; set a ; tot_rec + 1 ; run;

logical program data vector Building LPDV from descriptor portion of old SAS data set logical program data vector idnum diagdate sex rx_grp time tot_rec numeric numeric char char numeric numeric 8 8 8 10 8 8 Building descriptor portion of new SAS data set

LPDV 1 -2670 F placebo 48 1 1 1 -2670 F placebo 48 1 2 IDNUM DIAGDATE SEX RX_GRP TIME TOT_REC _N_ . . . 0 1 1 -2670 F placebo 48 1 1 1 -2670 F placebo 48 1 2 2 1780 M 300 mg. 36 2 2 2 1780 M 300 mg. 36 2 3 3 -4286 F 600 mg. 52 3 3 3 -4286 F 600 mg. 52 3 4 (over all executions of DATA step……..)

LOG idnum=. diagdate=. sex= rx_grp= time=. tot_rec=0 _ERROR_=0 _N_=1 idnum=1 diagdate=-2670 sex=F rx_grp=placebo time=48 tot_rec=1 _ERROR_=0 _N_=1 idnum=1 diagdate=-2670 sex=F rx_grp=placebo time=48 tot_rec=1 _ERROR_=0 _N_=2 idnum=2 diagdate=1780 sex=M rx_grp=300 mg. time=36 tot_rec=2 _ERROR_=0 _N_=2 idnum=2 diagdate=1780 sex=M rx_grp=300 mg. time=36 tot_rec=2 _ERROR_=0 _N_=3 idnum=3 diagdate=-4286 sex=F rx_grp=600 mg. time=52 tot_rec=3 _ERROR_=0 _N_=3 idnum=3 diagdate=-4286 sex=F rx_grp=600 mg. time=52 tot_rec=3 _ERROR_=0 _N_=4

PROC PRINT 1 -2670 F placebo 48 1 2 1780 M 300 mg. 36 2 IDNUM DIAGDATE SEX RX_GRP TIME TOT_REC 1 -2670 F placebo 48 1 2 1780 M 300 mg. 36 2 3 -4286 F 600 mg. 52 3

Logic of a MERGE compile execute

data left; input ID X Y ; cards; 1 88 99 2 66 77 44 55 ; data right; input ID A $ B $ ; cards; 1 A14 B32 3 A53 B11 ;

merge left (in=inleft) right (in=inright); by ID ; run; proc sort data=left; by ID; run; proc sort data=right; by ID; run; data both; merge left (in=inleft) right (in=inright); by ID ; run;

logical program data vector first iteration: MATCH ID X Y A B INLEFT INRIGHT _N_ _ERROR_ 1 88 99 A14 B32 1 1 1 0

logical program data vector second iteration: NO MATCH ID X Y A B INLEFT INRIGHT _N_ _ERROR_ 2 66 77 1 0 2 0

logical program data vector third iteration: MATCH ID X Y A B INLEFT INRIGHT _N_ _ERROR_ 3 44 55 A53 B11 1 1 3 0

Let’s try this again………………… data left; input ID X Y ; cards; 1 88 99 2 66 77 44 55 ; data right; input ID A $ B $ ; cards; 1 A14 B32 3 A53 B11 ;

merge left (in=inleft) right (in=inright); proc sort data=left; by ID; run; proc sort data=right; by ID; run; data both; merge left (in=inleft) right (in=inright); ***** by ID (one-on-one merge); run;

logical program data vector first iteration: 1:1 “MATCH” ID X Y A B _N_ _ERROR_ 1 88 99 A14 B32 1 0 1 OVERWRITTEN – value came from data set “right”

logical program data vector second iteration: 1:1 “MATCH” ID X Y A B _N_ _ERROR_ 2 66 77 A53 B11 2 0 3 OVERWRITTEN – value came from data set “right”

logical program data vector third iteration: 1:1 “NO MATCH” ID X Y A B _N_ _ERROR_ 3 44 55 3 0 MISSING – no values from “right”

Output SAS data set ID X Y A B 1 88 99 A14 B32 3 66 77 A53 B11 3 44 55

DATA Step Conclusions Understanding internals and default activities allows you to: make informed coding decisions write flexible and efficient code debug and test effectively interpret results readily

Remember We have discussed DEFAULTS As soon as you add options, statements, features, etc., the default actions change; TEST them! You can use these same tools to track what’s happening.