Practical Uses of the DOW Loop Richard Allen Peak Stat April 8, 2009.

Slides:



Advertisements
Similar presentations
Introduction to Macromedia Director 8.5 – Lingo
Advertisements

The Game of Algebra or The Other Side of Arithmetic The Game of Algebra or The Other Side of Arithmetic © 2007 Herbert I. Gross by Herbert I. Gross & Richard.
Programming with App Inventor Computing Institute for K-12 Teachers Summer 2012 Workshop.
How SAS implements structured programming constructs
DIVERSE REPORT GENERATION By Chris Speck PAREXEL International Durham, NC.
NEXTGEN PROBLEM LIST MAPPING Demonstration & Troubleshooting This demonstration reviews what has happened in the transition from the Chronic Conditions.
I OWA S TATE U NIVERSITY Department of Animal Science Modifying and Combing SAS Data Sets (Chapter in the 6 Little SAS Book) Animal Science 500 Lecture.
Recursion, pt. 2: Thinking it Through. What is Recursion? Recursion is the idea of solving a problem in terms of solving a smaller instance of the same.
Program Design and Development
Recursion.
Chapter 1 Program Design
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
CS0007: Introduction to Computer Programming Introduction to Arrays.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
DCT 1123 PROBLEM SOLVING & ALGORITHMS INTRODUCTION TO PROGRAMMING.
MrsBillinghurst. net A2 Computing A2 Computing Projects Game Animation in Pascal.
CC0002NI – Computer Programming Computer Programming Er. Saroj Sharan Regmi Week 7.
CPSC 171 Introduction to Computer Science 3 Levels of Understanding Algorithms More Algorithm Discovery and Design.
Simple Program Design Third Edition A Step-by-Step Approach
Stacking Rich Text Format (RTF) - %SRiT Duong Tran – Independent Contractor, London, UK Stacking Rich Text Format (RTF) - %SRiT Duong Tran – Independent.
Chapter 21 Reading Hierarchical Files Reading Hierarchical Raw Data Files.
Learning to love the SAS LAG function Phuse 9-12 October 2011 Herman Ament, MSD, Oss NL Phuse 9-12 October 2011.
Nonvisual Arrays and Recursion by Chris Brown under Prof. Susan Rodger Duke University June 2012.
Chapter 4: Decision Making with Control Structures and Statements JavaScript - Introductory.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Documentation and Comments. What’s a comment? A comment is a simple form of documentation. Documentation is text that you the programmer write to explain.
Recursion Textbook chapter Recursive Function Call a recursive call is a function call in which the called function is the same as the one making.
Chapter 3 Developing an algorithm. Objectives To introduce methods of analysing a problem and developing a solution To develop simple algorithms using.
SE: CHAPTER 7 Writing The Program
CS 320 Assignment 1 Rewriting the MISC Osystem class to support loading machine language programs at addresses other than 0 1.
Introduction to Using the Data Step Hash Object with Large Data Sets Richard Allen Peak Stat.
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Grant Brown.  AIDS patients – compliance with treatment  Binary response – complied or no  Attempt to find factors associated with better compliance.
Chapter 5: Control Structures II (Repetition). Objectives In this chapter, you will: – Learn about repetition (looping) control structures – Learn how.
1 Using the Magical Keyword “INTO” in PROC SQL Thiru Satchi Blue Cross and Blue Shield of Massachusetts Boston Area SAS Users Group April 5, 1999.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.
Diagonalization and Similar Matrices In Section 4.2 we showed how to compute eigenpairs (,p) of a matrix A by determining the roots of the characteristic.
The Software Development Process
The Power of the BY Statement SVSUG Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
Summer SAS Workshop Lecture 3. Summer SAS Workshop Website
1 Statistical Software Programming. STAT 6360 –Statistical Software Programming Modifying and Combining Datasets For most tasks we need to work with multiple.
SEG 4110 – Advanced Software Design and Reengineering Topic T Introduction to Refactoring.
Principals of Research Writing. What is Research Writing? Process of communicating your research  Before the fact  Research proposal  After the fact.
Efficient SAS programming with Large Data Aidan McDermott Computing Group, March 2007.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
BMTRY 789 Lecture 6: Proc Sort, Random Number Generators, and Do Loops Readings – Chapters 5 & 6 Lab Problem - Brain Teaser Homework Due – HW 2 Homework.
Searching CSE 103 Lecture 20 Wednesday, October 16, 2002 prepared by Doug Hogan.
Use the SET statement to: –create an exact copy of a SAS dataset –modify an existing SAS dataset by creating new variables, subsetting (using a subsetting.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
Program Design. Simple Program Design, Fourth Edition Chapter 1 2 Objectives In this chapter you will be able to: Describe the steps in the program development.
4 - Conditional Control Structures CHAPTER 4. Introduction A Program is usually not limited to a linear sequence of instructions. In real life, a programme.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 8, 13, & 24 By Tasha Chapman, Oregon Health Authority.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
Longitudinal Data Techniques: Looking Across Observations Ronald Cody, Ed.D., Robert Wood Johnson Medical School.
Session 1 Retrieving Data From a Single Table
By Sasikumar Palanisamy
Loops BIS1523 – Lecture 10.
Chapter 6: Modifying and Combining Data Sets
Jonathan W. Duggins; James Blum NC State University; UNC Wilmington
Chapter 3: Working With Your Data
ECONOMETRICS ii – spring 2018
By Don Henderson PhilaSUG, June 18, 2018
Fundamentals of Programming
Creating BDS DERIVED Parameters for a Subject-level Frequency Summary Table? Then this macro can be useful.
How to clean up dirty data in Patient reported outcomes?
Clip, Merge, Cluster, and Repeat
Presentation transcript:

Practical Uses of the DOW Loop Richard Allen Peak Stat April 8, 2009

The DOW loop is a powerful technique that moves the DATA step SET statement inside of an explicitly-coded DO-loop. This gives the programmer complete control over retention of variable values and the population of the Program Data Vector (PDV) by allowing a natural isolation of DO-loop instructions related to a certain break-event. The DOW isolates actions performed before and after the DO-loop from the instructions inside the loop and eliminates the necessity of retaining variables in most applications. In its most basic and well-known form using a DO UNTIL (LAST.ID) construct, it naturally lends itself to BY-processing of grouped data. Introduction:

The Basic Structure of the DOW Loop is as follows: data... ; ; do until ( break-event ) ; set … ; by …; ; end ; ; run ;

There are three separate sections of code in this structure where instructions can be grouped for different types of processing, depending on how you need to handle the data. 1.Between the top of the implied data step loop and before the first record in a by-group is read if the action needs to be done before the by-group is processed. 2.Inside the DOW-loop, for each record in the by-group, if the action needs to be done to each record. 3.After the last record in the by-group has been processed and before the bottom of the implied data step loop, if the action such as summarizing needs to be done after the by-group is processed.

Here’s a brief description of how this works: The DOW-Loop itself begins with the DO UNTIL statement, and takes control from the traditional implicit DATA step loop. Because the SET statement is inside of the DOW-loop, the loop is not exited until after the last record for the break-event has been processed. Variables populated inside of the loop are retained while all of the records for the break-event are read in. A single record for the break-event, containing the filled arrays, is output at the completion of the DOW-loop.

Applications of the DOW: The compact data step structure of the DOW loop lends itself to a number of interesting programming applications. As an introduction to this type of coding we will look at some examples of the following types of applications of the DOW. Transposing multiple variables simultaneously Change from Baseline Calculations LOCF (Last Observation Carried Forward) Select and delete all records in a by-group with a certain characteristic using a Double DOW

Example 1: To transpose a number of variables in a by-group using PROC TRANSPOSE is not an easy task. One would have to perform multiple transposes (or split one transpose output ) and a merge. This can all be done in one data step using the DOW-loop. The Brucken paper has a very nice example of this. I use this technique often to transpose data for presentation purposes as follows: Suppose we have the following summarized data: Disease analyte Group n mean DL HDL DL HDL DL HDL DL LDL DL LDL DL LDL DL Total DL Total DL Total DM A1c DM A1c DM A1c

Example 1: We’d like to have the final output look like the following: Disease analyte n1 n2 n3 Mean1 Mean2 Mean3 Diff13 Diff23 DL HDL DL LDL DL Total DM A1c If we use proc transpose to do this as follows: proc transpose data=Stats out=Raw_MeansDiffs_Trans; by Disease Analyte; var n mean; run; We get this output: Disease analyte _NAME_ COL1 COL2 COL3 DL HDL n DL HDL mean DL LDL n DL LDL mean DL Total n DL Total mean DM A1c n DM A1c mean

Example 1: The transpose output needs to be split and merged back together by Disease and Analyte to obtain the desired results. However, this can be done in one data step using the following DOW code: data Raw_MeansDiffs(drop=n Group mean); array _en n1-n3; array _mn Mean1-Mean3; do until (last.Analyte); set Stats; by Disease Analyte; _en(Group)=n; _mn(Group)=mean; end; Diff13=Mean3-Mean1; Diff23=Mean3-Mean2; output; run;

Example 1: One can even get fancy and use a multi-dimensional array inside the DOW loop. The following is a simple sample of how this can be done with the above data. data Raw_MeansDiffs_2dim(drop=n Group mean); array _grp(3,2) n1 Mean1 n2 Mean2 n3 Mean3; do until(last.Analyte); set Stats; by Disease Analyte; _grp(Group,1)=n; _grp(Group,2)=mean; end; Diff13=Mean3-Mean1; Diff23=Mean3-Mean2; output; run;

Example 2: Change of baseline calculations can be implified considerably using the DOW. The second Brucken paper has a great example of this and explains how stepping through the DOW loop changes the PDV from record to record. Let’s say we have the following blood pressure data. We’d like to calculate the baseline value as the last value on or before the treatment start date and the change from baseline for all other visits. Most commonly the baseline values would be created in a separate dataset for each subject and merged with the original dataset. The change from baseline calculation can then be done in the resultant data. This can all be done in one data step using DOW logic.

Example 2: SAMPLE DATA data BP; input subject vsdt tmtstdt mmddyy10. sysbp datalines; /15/ /15/ /15/ /15/ /29/ /15/ /15/ /15/ /13/ /15/ /29/ /15/ /15/ /15/ /30/ /15/ /01/ /30/ /15/ /30/ /30/ /30/ /14/ /30/ /30/ /30/ /13/ /30/ /28/ /30/ /29/ /30/ ;;;; run;

Example 2: The following is the DOW loop code to perform these change from baseline calculations: data Change; do until(last.subject); set BP; by subject visit; if vsdt<=tmtstdt then do; b_sysbp=sysbp; b_diabp=diabp; end; else do; c_sysbp=sysbp-b_sysbp; c_diabp=diabp-b_diabp; end; if vsdt<tmtstdt then do; b_sysbp=.; b_diabp=.; end; output; end; run;

Example 2: RESULTS: subject visit vsdt tmtstdt sysbp diabp b_sysbp b_diabp c_sysbp c_diabp /15/ /15/ /15/ /15/ /29/ /15/ /13/ /15/ /29/ /15/ /15/ /15/ /30/ /15/ /13/ /15/ /01/ /30/ /15/ /30/ /30/ /30/ /14/ /30/ /30/ /30/ /13/ /30/ /28/ /30/ /29/ /30/

Example 3: We can use the DOW-loop and the implicit retain that is one of its most useful properties to greatly simplify LOCF (Last Observation Carried Forward) calculations. The Chakravarthy article is a great source for learning more about this method. Let’s say we have some data with missing values for some visits that we’d like to fill in with the most recent value for the missing values. Below is some sample data: data lab; input PT:$3. visit labstd; cards; ;;;; run;

Example 3: These are the results we’d like to get with most recent value added as the LOCF values to each visit for each subject when that data is missing. locf PT visit labstd LabStd

Example 3: We start by creating a shell dataset for the visits that we should have for each subject. There are multiple ways to do this, including the following: Data ClassData; do visit=1 to 8; output; end; run; proc summary data=lab nway classdata=ClassData; by pt; class visit; output out=VisitFrame(drop=_:); run; We then use the VisitFrame data set created above with each patient (pt) and the visits they should have (visits 1-8) to update the lab data and create the LOCF variable in one data step

Example 3: The following is the DOW data step that accomplishes this update data work.locf; do until(last.pt); merge VisitFrame Lab; by pt visit; if missing(labstd)=0 then locfLabStd=labstd; output; end; run; Another possible way to do this not using DOW logic is as follows: data work.locf2; merge work.visitFrame work.lab; by pt visit; retain locfLabStd; if missing(labstd)=0 then locfLabStd=labstd; if first.pt & missing(labstd) then locfLabStd=.; run; Note that when using the DOW It is not necessary to issue a retain statement and the adjustment for the first patient record being missing is not necessary.

Example 4: This example came across SAS-L just a few days ago as an answer from Paul Dorfman to a question about “Removing groups of observations”. It illustrates the use of the double DOW which was introduced by Howard Schreier, Question: I have some data that is something like this. I would like to remove all groups of observations that have a missing probability. I need to select and delete the group by date track and racenum if probability is missing. So in the sample data I would like to delete all observations that have date= , track=AP, and racenum=4.

data test ; input date: yymmdd8. track $ racenum probability ; format date yymmdd10. ; datalines ; AP AP AP AP AP AP AP AP AP AP AP AP AP AP AP AP AP AP AP 4.17 run

Solution: data nomissprob (drop = _:) ; do _n_ = 1 by 1 until (last.racenum) ; set test ; by date track racenum ; if nmiss (probability) then _missprob = 1 ; end ; do _n_ = 1 to _n_ ; set test ; if not _missprob then output ; end ; run ; Note that the first DOW loop flags the by-groups with missing probabilities. These values are automatically retained in the PDV for the data set “test” for the next loop and then this loop reads through the entire “test” dataset and outputs records with non-missing probabilities from the original data.

Conclusions: The DOW-Loop is an extremely powerful programming technique which 1.gives the programmer more control over the input and retention of variables 2.allows the programmer to create a number of variables for each by-group and output these for the last record of the by-group. 3.can reduce the number of passes through a dataset required by a program resulting in faster and more efficient programs. Though these methods may at first seem peculiar, especially to the novice programmer, once they have been understood and mastered they can provide a very simple technique for programming a variety of situations involving by-group processing.

References: - 2 PROC TRANSPOSEs = 1 DATA Step DOW-Loop, Nancy Brucken, PharmaSUG 2007 Proceedings - One-Step Change from Baseline Calculations, Nancy Brucken, PharmaSUG 2008 Proceedings - The DOW (not that DOW!!!) and the LOCF in Clinical Trials, Venky Chakravarthy, SUGI 28 Proceedings - The DOW loop unrolled, Paul Dorfman & Lessia Shajenko, PharmaSUG 2008 Proceedings