Download presentation
Presentation is loading. Please wait.
Published byEzra Curtis Modified over 8 years ago
1
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 8, 13, & 24 By Tasha Chapman, Oregon Health Authority
2
Topics covered… DO Loops DO Groups Sum statement Iterative DO loops DO Until/DO While BY-group Processing FIRST. / LAST. Arrays
3
Cody’s rules of SAS programming “If you are writing a SAS program, and it is becoming very tedious, stop. There is a good chance that there is a SAS tool that will make your task less tedious.”
4
DO Groups
5
If, Then, Else If Score >= 90 Then Grade = 'A'; ELSE If Score >= 80 Then Grade = 'B'; ELSE If Score >= 70 Then Grade = 'C'; ELSE If Score >= 60 Then Grade = 'D'; ELSE If Score < 60 Then Grade = 'F'; StudentScoreGrade Jane75C Dave56F Jack90A Sue68D
6
If, Then, Else If Score >= 90 Then Pass_Fail = 'Pass'; ELSE If Score >= 80 Then Pass_Fail = 'Pass'; ELSE If Score >= 70 Then Pass_Fail = 'Pass'; ELSE If Score >= 60 Then Pass_Fail = 'Fail'; ELSE If Score < 60 Then Pass_Fail = 'Fail'; StudentScoreGradePass_Fail Jane75CPass Dave56FFail Jack90APass Sue68DFail
7
If, Then, Else
8
IF THEN DO; ; ; ; END; If Score >= 90 Then Do; Grade = 'A'; Pass_Fail = 'Pass'; End; DO Groups Get done all the stuff you need in just one pass
9
DO Groups DO Groups can be nested within each other
10
DO Groups DO Group #1 DO Group #2
11
DO Groups DO Group #A DO Group #2 DO Group #C DO Group #B Each DO Group must begin with a DO; and end with an END; Each DO Group must begin with a DO; and end with an END; DO Group #1
12
Sum statement
13
Adds the result of an expression to an accumulator variable Allows you to calculate running totals or counters in your dataset variable + expression
14
Sum statement How do we calculate a running total?
15
Sum statement Creates a variable called “Total” (initial value = 0) Adds the value of Revenue for each observation
16
Sum statement Will skip over missing data
17
Sum statement Can be used with conditional logic
18
Iterative DO Loops
19
Iterative DO Loop Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Create a variable “Interest” with a value of.0375 (for all observations)
20
Iterative DO Loop Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Create a variable “Balance” with an initial value of 100 (to be modified later by SUM statements)
21
Iterative DO Loop Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Create a variable “Year” Add 1 to “Year” Add “Interest*Balance” to Balance Output – explicit instruction to write out an observation to the dataset
22
Iterative DO Loop Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Ditto
23
Iterative DO Loop Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? …but there’s an easier way…
24
Iterative DO Loop Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year?
25
Iterative DO Loop Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year?
26
Iterative DO Loop Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Nested DO loops
27
DO Until Want to invest $100 with a 3.75% interest rate. When will the fund balance reach $200? DO UNTIL : Keep running the loop until the condition is true
28
DO While Want to invest $100 with a 3.75% interest rate. When will the fund balance reach $200? DO WHILE : Keep running the loop until the condition is false
29
DO Loop Whoops When using UNTIL or WHILE, make sure that your condition becomes true at some point Otherwise you could end up in an infinite loop! Loop will run forever because the balance will never equal exactly 200
30
DO Loop Whoops When using UNTIL or WHILE, make sure that your condition becomes true at some point Otherwise you could end up in an infinite loop! Safeguard alternative: Loop will run until condition is true or 100 times, whichever comes first
31
A review of DO DO group processing Designates a group of statements to be executed as a unit Iterative DO loop Executes statements repetitively based on the value of an index variable DO UNTIL Executes DO loop until a condition is true Checks the condition after the iteration of each DO loop DO WHILE Executes DO loop until a condition is false Checks the condition before the iteration of each DO loop
32
BY-group processing
33
BY statement (PROC Print redux) id statement – Assigns an observation ID based on listed variable (instead of OBS number) by statement – Produces a separate section of the report for each BY group pageby statement – Creates a page break after each BY group (not shown) Must use be used with BY statement From Week 6 – Chapters 14 & 19
34
BY statement (PROC Print redux) From Week 6 – Chapters 14 & 19
35
BY statement (MERGE redux) DATA step merge From Week 4 – Chapters 7 & 10
36
BY-group processing BY group is a set of observations with the same BY value BY-group processing is a method of processing observations that are grouped by this common value Can be invoked in both DATA steps and PROC steps using a BY statement Every PROC and DATA step with BY statement must use dataset sorted (or indexed) by BY variable
37
Clinic dataset IDVisitDateDxDx_DescHRSBPDBP 10110/21/20054GI Problems6812080 1012/25/20062Cold6812284 2559/1/20051Routine Visit76188100 25512/18/20051Routine Visit7418095 2552/1/20063Heart Problems79210110 2554/1/20063Heart Problems7218088 30310/10/20061Routine Visit7213884 4099/1/20056Injury8814292 40910/2/20051Routine Visit7213690 40912/15/20061Routine Visit6813084 7124/6/20067Infection5811870 7124/15/20067Infection5611872
38
Clinic dataset IDVisitDateDxDx_DescHRSBPDBP 10110/21/20054GI Problems6812080 1012/25/20062Cold6812284 2559/1/20051Routine Visit76188100 25512/18/20051Routine Visit7418095 2552/1/20063Heart Problems79210110 2554/1/20063Heart Problems7218088 30310/10/20061Routine Visit7213884 4099/1/20056Injury8814292 40910/2/20051Routine Visit7213690 40912/15/20061Routine Visit6813084 7124/6/20067Infection5811870 7124/15/20067Infection5611872 Multiple visits per patient
39
FIRST. / LAST. IDFirst.IDLast.ID 101 10 01 255 10 00 00 01 303 11 409 10 00 01 712 10 01 When using the BY statement, SAS creates two temporary variables: FIRST.var and LAST.var These can be used for identifying the first and last observation in a group When using the BY statement, SAS creates two temporary variables: FIRST.var and LAST.var These can be used for identifying the first and last observation in a group
40
When was the first visit for each patient? FIRST. / LAST. Observations grouped by patient (ID) with the first visit at the top of the list
41
When was the first visit for each patient? FIRST. / LAST. Using the BY statement will create the temp variables FIRST.ID and LAST.ID
42
When was the first visit for each patient? FIRST. / LAST. The subsetting IF statement will only include the first visit for each patient in the new dataset (Initial_Visit)
43
Clinic dataset IDVisitDateDxDx_DescHRSBPDBP 10110/21/20054GI Problems6812080 1012/25/20062Cold6812284 2559/1/20051Routine Visit76188100 25512/18/20051Routine Visit7418095 2552/1/20063Heart Problems79210110 2554/1/20063Heart Problems7218088 30310/10/20061Routine Visit7213884 4099/1/20056Injury8814292 40910/2/20051Routine Visit7213690 40912/15/20061Routine Visit6813084 7124/6/20067Infection5811870 7124/15/20067Infection5611872 Multiple visits for same issue per patient
44
When was the first visit for each health issue for each patient? FIRST. / LAST. Observations grouped by patient (ID), then diagnosis, with the first visit at the top of the list
45
When was the first visit for each health issue for each patient? FIRST. / LAST. Using the BY statement will create the temp variables FIRST.ID, LAST.ID, FIRST.Dx_Desc, and LAST.Dx_Desc
46
FIRST. / LAST. IDDx_DescFirst.IDLast.IDFirst.Dx_DescLast.Dx_Desc 101GI Problems 1011 101Cold 0111 255Heart Problems 1010 255Heart Problems 0001 255Routine Visit 0010 255Routine Visit 0101 303Routine Visit 1111 409Injury 1011 409Routine Visit 0010 409Routine Visit 0101 712Infection 1010 712Infection 0101
47
When was the first visit for each health issue for each patient? FIRST. / LAST. Using the BY statement will create the temp variables FIRST.ID, LAST.ID, FIRST.Dx_Desc, and LAST.Dx_Desc Subsetting IF statement will only include first visit for each new diagnosis per patient
48
How many visits did each patient have per diagnosis? FIRST. / LAST. Every time a new Dx group is encountered (FIRST.Dx_Desc = 1), N_visits is reset to 0
49
How many visits did each patient have per diagnosis? FIRST. / LAST. For each observation encountered in the group, N_visits is incremented by 1 (using the SUM statement)
50
How many visits did each patient have per diagnosis? FIRST. / LAST. When the last observation in the group is encountered (LAST.Dx_Desc = 1), an observation is written to the new dataset (Count_Visits)
51
Sampling BY-group processing can also be used as a quick and dirty way to get a random sample If you need to use a statistically rigorous sampling method, use PROC SurveySelect (part of SAS/STAT)
52
Sampling Need to randomly select 25 records per coder for proofing Creates a dummy variable (X) that generates a random number for every observation
53
Sampling Need to randomly select 25 records per coder for proofing Grouped by Coder_ID and randomly sorted by X
54
Sampling Need to randomly select 25 records per coder for proofing Every time a new Coder_ID group is encountered, Count is reset to 0 For each observation encountered in the group, Count is incremented by 1
55
Sampling Need to randomly select 25 records per coder for proofing If the Count is less than or equal to 25 (i.e. the first 25 observations per coder), then the observation is output to the new dataset (“Sample”)
56
Sampling Need to randomly select 25 records per coder for proofing The dummy variables created for this process (X and Count) are dropped from the final dataset
57
A review of BY By-group processing can be a useful way of dealing with groups of observations Can be used for: De-duping observations Finding the first or last observation Counting or summing observations Comparing observations Finding a quick and dirty random sample …and much more.
58
Arrays
59
SAS Arrays are a collection of elements defined as a single group Arrays allow you to write SAS statements referencing a group of variables SAS Arrays are different than arrays in many other programming languages
60
Example array Let’s say you have a dataset created in SPSS where all unknown numeric data is set to 999. How do you convert this to a missing value? Performing the same calculation on multiple variables …maybe there’s an easier way…
61
Example array Let’s say you have a dataset created in SPSS where all unknown numeric data is set to 999. How do you convert this to a missing value?
62
Example array Define the array List all the variables you want to perform the manipulation on
63
Example array Do the DO Use an iterative DO loop to run through all seven variables
64
Example array Drop i i is just the temp variable created for the iterative DO loop
65
Array statement ARRAY array-name {subscript} ; array-name : specifies the name of the array Think of it as an alias for this group of variables Cannot be the name of an existing SAS variable in the same DATA step Should not be the name of a SAS function
66
Array statement ARRAY array-name {subscript} ; subscript : describes the number and arrangement of elements in the array Dimension-size(s) Explicitly specify number of elements in the array Lower/Upper bounds Range from 1 to n Asterisk Have SAS count the variables in the array
67
Array statement ARRAY array-name {subscript} ; $ : specifies that the elements in the array are character (optional) Useful when array creates new variables length : specifies the length of the elements in the array (optional) Useful when array creates new variables
68
Array statement ARRAY array-name {subscript} ; array-elements : the elements (variables) that make up the array (optional) Must be either all character or all numeric Can be listed in any order Can use keywords _NUMERIC_, _CHARACTER_, or _ALL_ Can also use _TEMPORARY_ to create an array of temporary elements initial-value-list : initial values for the elements in the array (optional)
69
Array statement A simple (and common) array statement looks like this: ARRAY array-name {subscript} array-elements; Name of the array Number of elements in the array List of elements in the array
70
Example array Variable nameArray reference Height oldvars{1} Weight oldvars{2} Age oldvars{3} SBP oldvars{4} DBP oldvars{5} Temp oldvars{6} HR oldvars{7}
71
Example array if oldvars{1} = 999 then oldvars{1} =.; if Height = 999 then Height =.;
72
More examples of arrays Convert monthly average temperature from Fahrenheit to Celsius
73
More examples of arrays If the DART rate is missing at the full NAICS level, impute missing values with the DART rate at the 3- digit NAICS level
74
More examples of arrays Collapse monthly income into quarterly income
75
* and Dim() Use the asterisk {*} as the subscript to have SAS count the elements for you Cannot use with an array of temporary elements or multidimensional arrays Use the DIM function in the DO Loop to return the stop value by counting the number of elements in the array
76
Creating character variables By default, newly created variables will be numeric Use the $ to denote that they should be character May also need to define the length
77
Temporary arrays You can create a temporary array of values to use during the DO Loop The array only exists for the duration of the DATA step Useful for storing constant values used in calculations
78
Temporary arrays How do you apply a performance bonus to monthly income?
79
A review of arrays Whenever you need to run a set of variables through the same DATA step manipulations – think arrays! Can be used to: Read data Compare variables Create many variables with the same attributes Perform repetitive calculations Transpose datasets …and more!
80
Additional reading Summing with SAS DO which? Loop, Until, or While? The power of the BY statement A closer look at FIRST.var and LAST.var Arrays made easy: An introduction to arrays and array processing Arrays in SAS Using SAS Arrays to Manipulate Data
81
Read chapter 25 For next week…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.