SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 8, 13, & 24 By Tasha Chapman, Oregon Health Authority
Topics covered… DO Loops DO Groups Sum statement Iterative DO loops DO Until/DO While BY-group Processing FIRST. / LAST. Arrays
Cody’s rules of SAS programming “If you are writing a SAS program, and it is becoming very tedious, stop. There is a good chance that there is a SAS tool that will make your task less tedious.”
DO Groups
If, Then, Else If Score >= 90 Then Grade = 'A'; ELSE If Score >= 80 Then Grade = 'B'; ELSE If Score >= 70 Then Grade = 'C'; ELSE If Score >= 60 Then Grade = 'D'; ELSE If Score < 60 Then Grade = 'F'; StudentScoreGrade Jane75C Dave56F Jack90A Sue68D
If, Then, Else If Score >= 90 Then Pass_Fail = 'Pass'; ELSE If Score >= 80 Then Pass_Fail = 'Pass'; ELSE If Score >= 70 Then Pass_Fail = 'Pass'; ELSE If Score >= 60 Then Pass_Fail = 'Fail'; ELSE If Score < 60 Then Pass_Fail = 'Fail'; StudentScoreGradePass_Fail Jane75CPass Dave56FFail Jack90APass Sue68DFail
If, Then, Else
IF THEN DO; ; ; ; END; If Score >= 90 Then Do; Grade = 'A'; Pass_Fail = 'Pass'; End; DO Groups Get done all the stuff you need in just one pass
DO Groups DO Groups can be nested within each other
DO Groups DO Group #1 DO Group #2
DO Groups DO Group #A DO Group #2 DO Group #C DO Group #B Each DO Group must begin with a DO; and end with an END; Each DO Group must begin with a DO; and end with an END; DO Group #1
Sum statement
Adds the result of an expression to an accumulator variable Allows you to calculate running totals or counters in your dataset variable + expression
Sum statement How do we calculate a running total?
Sum statement Creates a variable called “Total” (initial value = 0) Adds the value of Revenue for each observation
Sum statement Will skip over missing data
Sum statement Can be used with conditional logic
Iterative DO Loops
Iterative DO Loop Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Create a variable “Interest” with a value of.0375 (for all observations)
Iterative DO Loop Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Create a variable “Balance” with an initial value of 100 (to be modified later by SUM statements)
Iterative DO Loop Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Create a variable “Year” Add 1 to “Year” Add “Interest*Balance” to Balance Output – explicit instruction to write out an observation to the dataset
Iterative DO Loop Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Ditto
Iterative DO Loop Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? …but there’s an easier way…
Iterative DO Loop Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year?
Iterative DO Loop Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year?
Iterative DO Loop Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Nested DO loops
DO Until Want to invest $100 with a 3.75% interest rate. When will the fund balance reach $200? DO UNTIL : Keep running the loop until the condition is true
DO While Want to invest $100 with a 3.75% interest rate. When will the fund balance reach $200? DO WHILE : Keep running the loop until the condition is false
DO Loop Whoops When using UNTIL or WHILE, make sure that your condition becomes true at some point Otherwise you could end up in an infinite loop! Loop will run forever because the balance will never equal exactly 200
DO Loop Whoops When using UNTIL or WHILE, make sure that your condition becomes true at some point Otherwise you could end up in an infinite loop! Safeguard alternative: Loop will run until condition is true or 100 times, whichever comes first
A review of DO DO group processing Designates a group of statements to be executed as a unit Iterative DO loop Executes statements repetitively based on the value of an index variable DO UNTIL Executes DO loop until a condition is true Checks the condition after the iteration of each DO loop DO WHILE Executes DO loop until a condition is false Checks the condition before the iteration of each DO loop
BY-group processing
BY statement (PROC Print redux) id statement – Assigns an observation ID based on listed variable (instead of OBS number) by statement – Produces a separate section of the report for each BY group pageby statement – Creates a page break after each BY group (not shown) Must use be used with BY statement From Week 6 – Chapters 14 & 19
BY statement (PROC Print redux) From Week 6 – Chapters 14 & 19
BY statement (MERGE redux) DATA step merge From Week 4 – Chapters 7 & 10
BY-group processing BY group is a set of observations with the same BY value BY-group processing is a method of processing observations that are grouped by this common value Can be invoked in both DATA steps and PROC steps using a BY statement Every PROC and DATA step with BY statement must use dataset sorted (or indexed) by BY variable
Clinic dataset IDVisitDateDxDx_DescHRSBPDBP 10110/21/20054GI Problems /25/20062Cold /1/20051Routine Visit /18/20051Routine Visit /1/20063Heart Problems /1/20063Heart Problems /10/20061Routine Visit /1/20056Injury /2/20051Routine Visit /15/20061Routine Visit /6/20067Infection /15/20067Infection
Clinic dataset IDVisitDateDxDx_DescHRSBPDBP 10110/21/20054GI Problems /25/20062Cold /1/20051Routine Visit /18/20051Routine Visit /1/20063Heart Problems /1/20063Heart Problems /10/20061Routine Visit /1/20056Injury /2/20051Routine Visit /15/20061Routine Visit /6/20067Infection /15/20067Infection Multiple visits per patient
FIRST. / LAST. IDFirst.IDLast.ID When using the BY statement, SAS creates two temporary variables: FIRST.var and LAST.var These can be used for identifying the first and last observation in a group When using the BY statement, SAS creates two temporary variables: FIRST.var and LAST.var These can be used for identifying the first and last observation in a group
When was the first visit for each patient? FIRST. / LAST. Observations grouped by patient (ID) with the first visit at the top of the list
When was the first visit for each patient? FIRST. / LAST. Using the BY statement will create the temp variables FIRST.ID and LAST.ID
When was the first visit for each patient? FIRST. / LAST. The subsetting IF statement will only include the first visit for each patient in the new dataset (Initial_Visit)
Clinic dataset IDVisitDateDxDx_DescHRSBPDBP 10110/21/20054GI Problems /25/20062Cold /1/20051Routine Visit /18/20051Routine Visit /1/20063Heart Problems /1/20063Heart Problems /10/20061Routine Visit /1/20056Injury /2/20051Routine Visit /15/20061Routine Visit /6/20067Infection /15/20067Infection Multiple visits for same issue per patient
When was the first visit for each health issue for each patient? FIRST. / LAST. Observations grouped by patient (ID), then diagnosis, with the first visit at the top of the list
When was the first visit for each health issue for each patient? FIRST. / LAST. Using the BY statement will create the temp variables FIRST.ID, LAST.ID, FIRST.Dx_Desc, and LAST.Dx_Desc
FIRST. / LAST. IDDx_DescFirst.IDLast.IDFirst.Dx_DescLast.Dx_Desc 101GI Problems Cold Heart Problems Heart Problems Routine Visit Routine Visit Routine Visit Injury Routine Visit Routine Visit Infection Infection 0101
When was the first visit for each health issue for each patient? FIRST. / LAST. Using the BY statement will create the temp variables FIRST.ID, LAST.ID, FIRST.Dx_Desc, and LAST.Dx_Desc Subsetting IF statement will only include first visit for each new diagnosis per patient
How many visits did each patient have per diagnosis? FIRST. / LAST. Every time a new Dx group is encountered (FIRST.Dx_Desc = 1), N_visits is reset to 0
How many visits did each patient have per diagnosis? FIRST. / LAST. For each observation encountered in the group, N_visits is incremented by 1 (using the SUM statement)
How many visits did each patient have per diagnosis? FIRST. / LAST. When the last observation in the group is encountered (LAST.Dx_Desc = 1), an observation is written to the new dataset (Count_Visits)
Sampling BY-group processing can also be used as a quick and dirty way to get a random sample If you need to use a statistically rigorous sampling method, use PROC SurveySelect (part of SAS/STAT)
Sampling Need to randomly select 25 records per coder for proofing Creates a dummy variable (X) that generates a random number for every observation
Sampling Need to randomly select 25 records per coder for proofing Grouped by Coder_ID and randomly sorted by X
Sampling Need to randomly select 25 records per coder for proofing Every time a new Coder_ID group is encountered, Count is reset to 0 For each observation encountered in the group, Count is incremented by 1
Sampling Need to randomly select 25 records per coder for proofing If the Count is less than or equal to 25 (i.e. the first 25 observations per coder), then the observation is output to the new dataset (“Sample”)
Sampling Need to randomly select 25 records per coder for proofing The dummy variables created for this process (X and Count) are dropped from the final dataset
A review of BY By-group processing can be a useful way of dealing with groups of observations Can be used for: De-duping observations Finding the first or last observation Counting or summing observations Comparing observations Finding a quick and dirty random sample …and much more.
Arrays
SAS Arrays are a collection of elements defined as a single group Arrays allow you to write SAS statements referencing a group of variables SAS Arrays are different than arrays in many other programming languages
Example array Let’s say you have a dataset created in SPSS where all unknown numeric data is set to 999. How do you convert this to a missing value? Performing the same calculation on multiple variables …maybe there’s an easier way…
Example array Let’s say you have a dataset created in SPSS where all unknown numeric data is set to 999. How do you convert this to a missing value?
Example array Define the array List all the variables you want to perform the manipulation on
Example array Do the DO Use an iterative DO loop to run through all seven variables
Example array Drop i i is just the temp variable created for the iterative DO loop
Array statement ARRAY array-name {subscript} ; array-name : specifies the name of the array Think of it as an alias for this group of variables Cannot be the name of an existing SAS variable in the same DATA step Should not be the name of a SAS function
Array statement ARRAY array-name {subscript} ; subscript : describes the number and arrangement of elements in the array Dimension-size(s) Explicitly specify number of elements in the array Lower/Upper bounds Range from 1 to n Asterisk Have SAS count the variables in the array
Array statement ARRAY array-name {subscript} ; $ : specifies that the elements in the array are character (optional) Useful when array creates new variables length : specifies the length of the elements in the array (optional) Useful when array creates new variables
Array statement ARRAY array-name {subscript} ; array-elements : the elements (variables) that make up the array (optional) Must be either all character or all numeric Can be listed in any order Can use keywords _NUMERIC_, _CHARACTER_, or _ALL_ Can also use _TEMPORARY_ to create an array of temporary elements initial-value-list : initial values for the elements in the array (optional)
Array statement A simple (and common) array statement looks like this: ARRAY array-name {subscript} array-elements; Name of the array Number of elements in the array List of elements in the array
Example array Variable nameArray reference Height oldvars{1} Weight oldvars{2} Age oldvars{3} SBP oldvars{4} DBP oldvars{5} Temp oldvars{6} HR oldvars{7}
Example array if oldvars{1} = 999 then oldvars{1} =.; if Height = 999 then Height =.;
More examples of arrays Convert monthly average temperature from Fahrenheit to Celsius
More examples of arrays If the DART rate is missing at the full NAICS level, impute missing values with the DART rate at the 3- digit NAICS level
More examples of arrays Collapse monthly income into quarterly income
* and Dim() Use the asterisk {*} as the subscript to have SAS count the elements for you Cannot use with an array of temporary elements or multidimensional arrays Use the DIM function in the DO Loop to return the stop value by counting the number of elements in the array
Creating character variables By default, newly created variables will be numeric Use the $ to denote that they should be character May also need to define the length
Temporary arrays You can create a temporary array of values to use during the DO Loop The array only exists for the duration of the DATA step Useful for storing constant values used in calculations
Temporary arrays How do you apply a performance bonus to monthly income?
A review of arrays Whenever you need to run a set of variables through the same DATA step manipulations – think arrays! Can be used to: Read data Compare variables Create many variables with the same attributes Perform repetitive calculations Transpose datasets …and more!
Additional reading Summing with SAS DO which? Loop, Until, or While? The power of the BY statement A closer look at FIRST.var and LAST.var Arrays made easy: An introduction to arrays and array processing Arrays in SAS Using SAS Arrays to Manipulate Data
Read chapter 25 For next week…