Download presentation
Presentation is loading. Please wait.
Published byMarylou Nichols Modified over 8 years ago
1
The Worst Data Step Traps and Pitfalls How to Recognize Them How to Avoid Them 1
2
2 Focus on Pitfalls that … Produce wrong/unintended results Produce wrong/unintended results Issue no WARNING or ERROR Issue no WARNING or ERROR Likely to be encountered in data step programming Likely to be encountered in data step programming
3
3 Scope (Exclusions) Just data step traps/pitfalls, not covering: PROC SQL traps/pitfalls PROC SQL traps/pitfalls Macro traps/pitfalls Macro traps/pitfalls Other proc specific traps/pitfalls Other proc specific traps/pitfalls –
4
4 Pitfalls with Missing Values 4
5
5 Missing Compares Low AspectSpecifics MistakeYou do a < or <= compare and don’t account for missing comparing low ResultIncorrect branch of logic taken, missing becomes non-missing when shouldn’t Run intoDichotomization, recoding variables ProtectionAlways start IF/ELSE/IF block with “IF missing(X) then …
6
6 Missing Is FALSE AspectSpecifics MistakeYou do an IF X THEN … and expect X to evaluate TRUE when X is missing. (Missing certainly isn’t zero!) ResultIncorrect branch of logic taken, wrong action applied Run intoFrom time to time ProtectionAlways use explicit compares, e.g IF X ne 0 rather than IF X
7
Comparing to “.” AspectSpecifics MistakeYou forget about the possibility of special missing values being in the data Result.A,.B, … do not compare equal to “.” so intended action is skipped for same Run intoWhenever special missing values are or might be used ProtectionJust always use “MISSING(X)” in place of “X eq.” => always!
8
8 Pitfalls with FIRST/LAST 8
9
9 Failing to RETAIN AspectSpecifics MistakeYou use FIRST and LAST to summarize data across a BY group but the summary variable is not RETAINED ResultValue of variable based on just last obs in the BY group so get wrong answers Run intoDichotomization, recoding variables ProtectionAlways have a RETAIN statement when using FIRST/LAST (at least in concept)
10
FIRST/LAST Variable Confusion AspectSpecifics MistakeYou have data “BY state city;” and use FIRST.city when should use FIRST.state ResultPopulation for last city in each state rather than the intended population sums of all cities in each state Run intoExposure whenever using FIRST/LAST with multiple BY variables ProtectionFIRST means first of a group ; ask yourself: What variable defines the group?
11
FIRST/LAST with Subsetting IF AspectSpecifics MistakeUsing FIRST/LAST after a subsetting IF ResultToo few output obs with some some values of summary stats right and some wrong Run intoWhen selecting and summarizing at same time ProtectionUse WHERE/WHERE= instead of subsetting IF
12
Pitfalls with Merging 12
13
13 MERGE without BY AspectSpecifics MistakeYou forget to add the BY statement after your MERGE statement ResultContents are slammed together line by line till one DS runs out of obs Run intoWhen doing lots of merges with same BY variables ProtectionSet the option MERGENOBY=WARN or ERROR
14
14 MERGE with Overwrites AspectSpecifics MistakeYou MERGE DS2 onto DS1 to pick some additional variables but fail to notice that variables other than BY variables shared. ResultValues from DS2 overwrite values in DS1 Run intoMERGEing DS with large numbers of variables, MERGING 3 or more DS ProtectionUse KEEP= with DS2; use %overlappedVars
15
15 %macro OverlappedVars( listOfDs=, /* the datasets vars scanned/compared */ outDs=, /* the DS detailing the overlaps found */ sp=olv_tmp, /* i.e. scratch prefix */ cleanUp=1 /*if true deletes scratch datasets */ ); /* Invoke with: %OverlappedVars(listOfDs=, outDs= ); Determine the variables that occur in two or more of a user supplied list ofdatasets. These variables must be dealt with before a merge of these datasets can be safely done. */ 15
16
16 Other Assorted Pitfalls 16
17
Variable Range vs. Subtraction AspectSpecifics MistakeYou use “x1-x10” meaning x1,x2…x10; SAS sees it as x1 minus x10, e.g. with SUM ResultSUM(x1-x10) evaluates to x1 minus x10 Run intoFunctions such as sum, min, max, std, sumabs, coalesce, cv, gcd, harmean, … ProtectionUse func( of x1-x10) and be wary!
18
18 Default String Lengths AspectSpecifics MistakeYou forget to set the length of a string variable and you get length at first use ResultString length shorter than expected & important characters can be discarded Run intoString build-ups, values retained on some condition, when setting an initial value ProtectionExplicitly set length of any new string variables, examine values in datasets
19
Take Aways SAS data step programming can be tricky SAS data step programming can be tricky Good coding practices and awareness can help you avoid the worst traps Good coding practices and awareness can help you avoid the worst traps Inspection, cross-checking, and testing are not really optional; they are a MUST Inspection, cross-checking, and testing are not really optional; they are a MUST David.Abbott@va.gov
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.