Download presentation
Presentation is loading. Please wait.
Published byHugo Hunt Modified over 9 years ago
1
What’s wrong NOW?! An introduction to debugging SAS programs for beginners Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie
2
What’s wrong NOW?! Types of Bugs Bug Droppings Bug Repellant
Bug Killing – General Approach Bug Killing – Specific Bugs What we ARE going to talk about: Types of Bugs Bug Droppings: the types of messages you get when bugs occur Bug Repellant: how to prevent errors in the first place Bug Killing – General Approach Bug Killing – Specific Bugs: a look at a few common problems and some tips for diagnosing and fixing them What we’re NOT going to talk about: For this talk, I’m assuming that you’re reading in SAS data sets. Will not discuss the various problems that can occur with INPUT, INFILE. Sticking with basic data step and procs from Base SAS. Not bugs associated with macro, specialized procs in SAS/Stat, SAS/Graph, etc.
3
Types of Bugs Syntax / Semantic
Syntax and semantic errors are detected at compile time. Syntax errors: You wrote it wrong. Programming statements do not conform to the rules of the SAS language Examples: misspelling a SAS keyword using an unmatched quotation mark forgetting a semicolon specifying an invalid statement option specifying an invalid data set option Semantic errors: You wrote it right, but you put it in the wrong place. The language element is correct, but the element may not be valid for a particular usage. specifying the wrong number of arguments for a function Using a numeric variable name where only a character variable is valid using illegal reference to an array
4
Types of Bugs Syntax / Semantic Execution-Time Execution-Time errors:
occur when SAS executes a program that contains data values Examples: illegal arguments to functions illegal mathematical operations (e.g., division by zero) observations not sorted for BY-group processing reference to a non-existent member of an array (array subscript out of range) other things related to INPUT, INFILE
5
Types of Bugs Syntax / Semantic Execution-Time Data Data Errors:
occur when some data values are not appropriate for the SAS statements that you have specified in the program Examples: character to numeric conversions invalid data values as arguments to a function
6
Types of Bugs Syntax / Semantic Execution-Time Data Logic
Logic Errors: perfectly good code that does not produce the intended result No messages in the log. Examples: Using FIRST. and LAST. processing in a data step to summarize something, but forgetting to include a conditional output statement. Using a DO loop to search for a special value among an array of variables, and setting a flag to 1 when you find it, but forgetting to stop the loop so that subsequent array values don’t overwrite the fact that you found it.
7
Bug Droppings ERROR WARNING NOTE no messages ERROR
Printed in red. Hard to ignore. Probably not your biggest problem; usually things that would make Homer Simpson say “d’oh!” SAS tries to help (by one or more of the following): SAS underlines the point at which it detected the error (not necessarily where the error occurred). Dumps the Program Data Vector Gives you the line and column numbers in the log where the problem was detected Stops the program. WARNING Printed in green. Gets your attention. Sometimes critical; sometimes just information. SAS makes assumptions and keeps going. NOTE Printed in blue. Too damn many of them. Easy to ignore. Can indicate serious problems. no messages SAS is happy but the results are wrong. Very easy to ignore. Can be your worst nightmare.
8
Bug Repellant Make your program easy to read
Indent – within a step or a DO block One Semicolon per Line (or less!) Every step gets a RUN Every PROC gets a DATA= Comments, comments, comments Every reference suggested this first. my 5 rules for writing code that you and others can still understand 2 weeks from now
9
Make your program easy to read
data hyst; set hospital(keep=year pcode1-pcode10); array pcodes{*} pcode1-pcode10; length code $3 year 3; do i = 1 to 10; if pcodes{i}=:'689‘ then do; code = substr(pcodes{i},1,3); output; end; end; keep year code; title1 'Number of Selected Hysterectomy Procedures by Year'; proc freq noprint; tables year*code/list out=temp; run; This code would work just fine, but...
10
/* Create a data set containing hysterectomy codes. */ data hyst;
set hospital (keep=year pcode1-pcode10); array pcodes{*} pcode1-pcode10; length code $3 year 3 ; do i = 1 to 10; if pcodes{i} =: '689‘ then do; code = substr(pcodes{i},1,3); output; end; ** if **; end; ** do i **; keep year code; run; title1 'Number of Selected Hysterectomy Procedures by Year'; proc freq data=hyst noprint; tables year*code/list out=temp; ... isn’t this easier to read and understand.
11
Bug Repellant Make your program easy to read Don’t destroy your data
12
Don’t Destroy Your Data: access=readonly
libname survey 'C:\My Documents\Survey\' access=readonly; libname indata 'C:\Thesis\Data\' access=readonly; libname outdata 'C:\Thesis\Data\' ; Prevent overwriting source data by adding the option access=readonly to your libname statement. But what if you want to protect the input data set, but also need to write a new data set to that same directory. You can have multiple libname statements pointing to the same directory. Then just make a habit of using INDATA on your SET statement and OUTDATA on your DATA statement.
13
Don’t Destroy Your Data: Use OUT= on PROC SORT
proc sort data=thesis.hospital (keep=patient addate); by patient; run; (keep=patient addate) out=patlist ; What if you paid somebody like me to extract some hospital data to use for your thesis, and then you accidentally put this code in a program (see the first PROC on the slide)? Oops. You’ll be calling me (and paying me) again. Instead, use the OUT= option to write the subset to a new name (see the second PROC on the slide).
14
Don’t Destroy Your Data: options DATASTMTCHK=
7 options datastmtchk=none; 8 data adults set survey.adult; if age > 65; 11 run; NOTE: Variable age is uninitialized. NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.ADULTS may be incomplete. When this step was stopped there were 0 observations and 1 variables. SAS will let you use almost any word as a data set name or variable name, sometimes even tho that word is a SAS function or other keyword in another context. Sometimes this is great. If you make a mistake in your code, however, it can be disastrous. DATASTMTCHK controls whether certain SAS keywords are allowed in the DATA statement. Look what happens if it were set to NONE, and you left a semicolon off the DATA statement. SAS treats everything up to the first semicolon as part of the DATA statement. So it thinks you want to create three data sets: ADULTS, SET, and SURVEY.ADULT. The SET statement has been obliterated, so you have no input data. Therefore, SAS doesn’t know where to find the variable AGE.
15
Don’t Destroy Your Data: options DATASTMTCHK=
12 options datastmtchk=corekeywords; 13 data adults NOTE: SCL source line. set survey.adult; --- 56 ERROR : SET is not allowed in the DATA statement when option DATASTMTCHK=COREKEYWORDS. Check for a missing semicolon in the DATA statement, or use DATASTMTCHK=NONE. if age > 65; 16 run; With DATASTMTCHK=COREKEYWORDS, “SET” is not allowed in a DATA statement. So you get a much more meaningful error message. COREKEYWORDS only protects: MERGE, RETAIN, SET, and UPDATE. You can also use DATASTMTCHK=ALLKEYWORDS to protect any keyword that can begin a SAS statement in the DATA step (e.g., ABORT, ARRAY, INFILE) from being used as a one-level data set name.
16
Don’t Destroy Your Data: options MERGENOBY=
data valid not_valid; merge TEST_DATA(keep=ID in=a) master.master02(in=b); if a and b then output valid; else if not b then output not_valid; run; NOTE: There were 899 observations read from the data set TEST_DATA. NOTE: There were observations read from the data set MASTER.MASTER02. NOTE: The data set WORK.VALID has 899 observations and 9 variables. NOTE: The data set WORK.NOT_VALID has 0 observations and 9 variables. Here’s an example of a merge to subset MASTER02 for the list of subject IDs in TEST_DATA. Whenever you do a merge, look at the number of observations read in and the number written out and see if it looks reasonable. In this case, TEST_DATA has 899 obs which represents the people we want. And VALID has 899 obs. So everything worked, right? Well... notice that the programmer forgot to include a BY statement. This is legit – SAS assumes you want the data sets to be matched up one-to-one sequentially. That is, the first record in TEST_DATA gets merged with the first record in MASTER02. The second to the second, and so on. Oops!
17
Don’t Destroy Your Data: options MERGENOBY=
options mergenoby=error; data valid not_valid; merge TEST_DATA(keep=ID in=a) master.master02(in=b); if a and b then output valid; else if not b then output not_valid; run; +ERROR: No BY statement was specified for a MERGE statement. NOTE: The SAS System stopped processing this step because of errors. If you use the MERGENOBY option, SAS will warn you that you’ve done this. If you specify MERGENOBY=WARN, SAS will generate a WARNING message and keep processing. If you specify MERGENOBY=ERROR, SAS generates an ERROR message and stops processing. This is safer since data sets created in subsequent code will not be overwritten incorrectly. But you can’t have this turned on if you are intentionally doing a sequential merge somewhere in your code.
18
Bug Repellant Make your program (& your log) easy to read
Don’t destroy your data Know your data
19
Know your data 21 data males; 22 set survey.adult; 23 if gender = 'M';
24 run; NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 23:12 NOTE: Invalid numeric data, 'M' , at line 23 column 12. There are a million ways that lack of familiarity with the data values or variable attributes can cause you grief. Here’s just a simple example. We’ll see a few more when we talk about some specific bugs later. When you begin working with a new data set, run a PROC CONTENTS. Do a PROC FREQ on key variables. And PROC PRINT the first 25 observations. Or if you have access to interactive SAS, open the data set and troll around. Whatever you need to do to familiarize yourself with the structure and expected values.
20
Bug Repellant Make your program (& your log) easy to read
Don’t destroy your data Know your data Do a syntax check SAS runs a program by compiling the first step and reporting any syntax or semantic errors. Then executing that step. Compiles and then executes the second step, and so on. Since the most common errors are syntax errors, when you’re writing a long program, it might be nice to have SAS just compile all the code and tell you about syntax and semantic problems without waiting for and wading through execution errors and data problems. Here’s how...
21
Do a syntax check options OBS=0 NOREPLACE;
OBS=0 tells SAS to read zero records from any data set it’s supposed to read. NOREPLACE tells it not to write out any records to output data sets. So basically no execution will take place.
22
Bug Repellant Make your program (& your log) easy to read
Don’t destroy your data Know your data Do a syntax check So, if you’ve done all you can to prevent bugs and you still get one, what do you do? First, a few ideas about your overall approach to the problem.
23
Bug Killing Read the LOG! Read the LOG!
Don’t just search for “ERROR” or “WARNING.” Remember, NOTEs can be your worst nightmare. Don’t just rely on system message for batch jobs.
24
Bug Killing Start at the top & deal with one mess at a time.
With each problem, SAS makes some assumptions and tries to continue. These assumptions may not be appropriate for what you’re trying to do. All subsequent processing will be based on these assumptions, and therefore may generate more messages, even tho your code is OK.
25
Bug Killing Start at the top & deal with one mess at a time.
Use interactive SAS, if possible Start at the top & deal with one mess at a time Use interactive SAS, if possible. By “interactive SAS,” I mean windowed SAS (with a little “w” – as opposed to SAS for Windows with a big Microsoft “W”). Most people are familiar with this mode from using SAS on the PC, although it is possible to submit batch jobs to your PC, and windowed “interactive” SAS is available for UNIX workstations, VMS workstations, and other platforms. Using SAS in interactive mode allows you to submit a single DATA or PROC step and check the output using the Explorer window, or run a PROC FREQ or PROC PRINT as appropriate to verify the code is doing what you want, or to diagnose problems.
26
Bug Killing Start at the top & deal with one mess at a time.
Use interactive SAS, if possible Using ENDSAS Start at the top & deal with one mess at a time For BATCH JOBS ONLY: Try ENDSAS when you need to run the first few lines of your program without commenting or deleting subsequent lines. This is especially good for when you’ve added some PUTs or a PROC PRINT to check something.
27
Using ENDSAS < lots of SAS statements > proc freq data=workhrs;
tables hrs / nofreq nocum norow; run; endsas; < lots more SAS statements >
28
Bug Killing Start at the top & deal with one mess at a time.
Use interactive SAS, if possible Using ENDSAS Using OPTIONS ERRORABEND Start at the top & deal with one mess at a time For BATCH JOBS ONLY: Try OPTIONS ERRORABEND. This causes SAS to abort the job when it encounters an error, even one that might not otherwise terminate the job. This lets you find and fix a problem without waiting for processing time associated with SAS struggling through all those assumptions and writing more messages. Do not use ENDSAS or ERRORABEND in interactive SAS. These techniques kill the SAS job. For interactive mode, this means all the windows disappear with no written log or other evidence to say what happened.
29
Using ERRORABEND 1 options errorabend;
/* Number of Hours Worked Per Week */ data workhrs(drop=b151); set survey.newadult (keep=wrk); ERROR: The variable wrk in the DROP, KEEP, or RENAME list has never been referenced. length hrs 8; hrs = input(b151, 8.); run; ERROR: SAS ended due to errors. You specified: OPTIONS ERRORABEND;.
30
Bug Killing Look for horses, not zebras. Look for horses, not zebras.
Medical students sometimes see a patient with common symptoms and diagnose whatever bizarre exotic illness they learned in class this week, even tho the patient has something simple. Hence, the adage “When you hear hooves, don’t look for zebras.” In SAS programming, most problems are syntax errors. The most common of these, even for experienced programmers, are missing semicolons, missing quotation marks, or unclosed comments. First, make sure these aren’t your problem.
31
Bug Killing Read the LOG!
Start at the top & deal with one mess at a time. Look for horses, not zebras. section summary page
32
Bug Killing Some Specific Bugs
33
WARNING or ERROR: misspelled keywords
33 data adult; NOTE: SCL source line. st aborig.newadult; -- 1 WARNING 1-322: Assuming the symbol SET was misspelled as st. 35 run; SAS tries to figure out what you want to do. Sometimes it guesses correctly; sometimes not. If the beginning of the word is wrong, (such as UNFILE instead of INFILE) you will usually get: ERROR: Invalid options name UNFILE.
34
ERROR: Invalid option name, parameter, or statement.
Possible Causes: missing semicolon misspelled keyword unmatched quotation mark unclosed comment a DATA step statement in a PROC step a valid option used in the wrong place This slide refers to a group of error messages like: ERROR: Invalid option name. ERROR: The option or parameter is not recognized. ERROR: Statement is not valid or it is used out of proper order.
35
ERROR: Invalid option name, parameter, or statement.
3 data males(rop=gender); --- 22 ERROR 22-7: Invalid option name ROP. set survey.adult; if gender = 2; 6 run; NOTE: The SAS System stopped processing this step because of errors. misspelled keyword First, look at what SAS has underlined. Is it something obvious, like a misspelled word?
36
ERROR: Invalid option name, parameter, or statement.
8 proc print data=survey.adult; NOTE: SCL source line. set gender; --- 180 ERROR : Statement is not valid or it is used out of proper order. 10 run; NOTE: The SAS System stopped processing this step because of errors. good statement in a bad place Next, check the preceding line of code. Does the statement SAS underlined go with the statement above it? This example is more common than you think. A SET statement goes in a DATA step, not a PROC. This was probably supposed to be a VAR statement.
37
ERROR: Invalid option name, parameter, or statement.
11 proc print data=survey.adult NOTE: SCL source line. var gender; --- -- ERROR : Syntax error, expecting one of the following: ;, (, DATA, DOUBLE, HEADING, LABEL, N, NOOBS, OBS, ROUND, ROWS, SPLIT, STYLE, UNIFORM, WIDTH. ERROR : The option or parameter is not recognized and will be ignored. 13 run; NOTE: The SAS System stopped processing this step because of errors. missing semicolon If the statements look OK and belong together, check for a missing semicolon. In this example, because there’s no semicolon on the PROC PRINT statement, SAS tries to interpret “VAR” and “gender” as options for PROC PRINT. Thus, the message.
38
ERROR: Invalid option name, parameter, or statement.
14 * Print a list of Adult ages. 15 proc print data=survey.newadult; NOTE: SCL source line. var b4; --- 180 ERROR : Statement is not valid or it is used out of proper order. 17 run; unclosed comment You may need to look back more than one line before what SAS underlined. In this example, the VAR statement seems out of proper order because there’s a missing semicolon on the comment. So SAS assumes that the comment ends with the semicolon after “PRINT”. Thus, obliterating the PROC statement and giving the VAR statement no context.
39
WARNING: unbalanced quotes
proc print data=hospital(obs=10); title1 "Sample of Hospital Data; run; 93 < more SAS statements > _____________________ 32 WARNING : The quoted string currently being processed has become more than 262 characters long. You may have unbalanced quotation marks. The problem seems obvious in this case, but depending on where it occurs and how soon SAS hits the 262 character limit, or bumps into a matching quotation mark, all sorts of other problems or messages might appear. If there are no matching quotation marks remaining and the rest of the program is less than 262 characters long, the job may appear to have just quit with no explanation. The use of color-coded editors can help prevent this. The enhanced editor in interactive SAS does this. UltraEdit can be configured to recognize SAS conventions. Using one of these, the text changes color when you type a quotation mark and doesn’t change back until you type a matching one. Once you have the problem: If your editor’s search tool has a word count, use it to count the single quotes and then count the double quotes. In some editors, if you use find/replace and select “replace all,” you’ll get a message saying how many were replaced. Use this to “replace” a single quote with a single quote, etc. If one of them is an odd number, this MAY be the one to look for. Also, beware of strings that you started with a single and ended with a double.
40
NOTE: Missing values generated.
48 data test; a = 2; b = 4; c = .; x = a + b + c; y = sum(a,b,c); a = .; b = .; z = sum(a,b,c); z2 = sum(a,b,c,0); put x= y= z= z2= ; 56 run; x=. y=6 z=. z2=0 NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 1 at 50: at 53:7 Again, know your data. Are a few missing values expected and OK? SAS tells you how many times this happened. Is it reasonable for all or nearly all values of a variable to be missing? If not, investigate. Missing values are often caused by simple arithmetic. If you use (a+b+c) and any one of those values is missing, the result will be missing. Use SUM, MEAN, or other functions instead. These functions will ignore missing values and perform the operation on the non-missing values. In this example, SUM(a,b,c) results in 6 because a+b=2+4=6 and the missing c is ignored. Trick: If the values of ALL the arguments to a function are missing, the result will be missing. If you would prefer to have a result of zero in this case, just include zero in your list of arguments. (see z2 in the example in the slide).
41
NOTE: Numeric to character conversion (or vice versa)
109 data adult2; length sexage $5 household 8 ; set adult; sexage = sex || age ; household = sum(kids, adults, 1); 114 run; NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 112:19 NOTE: Character values have been converted to numeric 113:19 In this example: SEX is a character variable (M,F). AGE is numeric, in years. KIDS is the number of children living in the household and is character. ADULTS is the number of adults living in the household and is numeric. We want to create a variable called SEXAGE that is character and has values like “F28.” But since AGE is numeric, SAS tries to convert it to character and perform the operation as you asked. SAS will use the BEST. format to convert AGE, and since it was numeric, it will be right-justified within whatever width SAS chooses. So the resulting value of SEXAGE could be something like “F ”. Since we defined a length of only 5, this value is truncated and the actual numbers that represent the AGE don’t make it into the value at all. The second example here is HOUSEHOLD, which is supposed to be the number of people living in the survey respondent’s household. For some reason, our data entry clerk entered KIDS (the number of children living there) as a character variable. So when we attempt to add this to another variable, we get a character-to-numeric conversion error.
42
NOTE: Numeric to character conversion (or vice versa)
122 data adult3; length sexage $5 household 8 ; set adult; sexage = sex || put(age, 3. -L); kidsnum = input(kids, 8.); household = sum(kidsnum, adults, 1); 128 run; NOTE: There were 493 observations read from the data set WORK.ADULT. NOTE: The data set WORK.ADULT3 has 493 observations and 8 variables. Here’s a better way to code this, using explicit type conversions where needed. Use the PUT function for converting numeric to character. The format used should be the same type as the variable being PUT, in this case a numeric format. Then to get around that right-justification problem, use the format modifier “-L” which left-justifies the value. For KIDS, use the INPUT function to convert it from character to numeric. The informat should be the same type of the variable being created. Since you want to convert KIDS to numeric, you use a numeric informat. Make sure all values for KIDS are indeed numerals before doing this, or you’ll get an invalid data error.
43
NOTE: Variable is uninitialized.
Possible Causes: misspelling the variable name using a variable that was dropped in a previous step using the wrong data set using a variable before it is created
44
NOTE: Variable is uninitialized.
140 data males; set survey.adult (keep=id community); if gender=2; 144 run; NOTE: Variable gender is uninitialized. NOTE: There were 493 observations read from the data set SURVEY.ADULT. NOTE: The data set WORK.MALES has 0 observations and 3 variables. This is a common one for me. I use DROP and KEEP frequently to avoid dragging a lot of unneeded variables around. But if you edit that code later and add a new condition, it’s easy to forget to add the new variables to the KEEP option.
45
No Messages: Character values are being truncated.
data groups; set survey.adult; if age < 20 then AgeGroup = 'Teen'; else if age > 65 then AgeGroup = 'Senior'; else AgeGroup = 'Adult'; run; proc freq data=groups; tables agegroup / nocum; Let’s say we want to determine the number of subjects in our survey by age group. Here’s a program to assign the groups and count them.
46
No Messages: Character values are being truncated.
The FREQ Procedure Age Group Frequency Percent Adul Seni Teen output from first try What happened? Since the variable AGEGROUP was not explicitly defined, SAS assumes that its length should be the length of the first value assigned to it. Apparently that was “Teen”.
47
No Messages: Character values are being truncated.
data groups; set survey.adult; attrib AgeGroup length=$6 label='Age Group'; if age < 20 then AgeGroup = 'Teen'; else if age > 65 then AgeGroup = 'Senior'; else AgeGroup = 'Adult'; run; proc freq data=groups; tables agegroup / nocum; To fix it, include a length statement. LENGTH AgeGroup $6; Or better yet, use an ATTRIB statement, which lets you also assign a format and a label. It’s good practice to use ATTRIB statements to give a label to every variable you create, even the temporary ones. It’s another way of documenting what you’re trying to do so that you’ll understand the code when you pick it up again cold months from now.
48
No Messages: Character values are being truncated.
The FREQ Procedure Age Group Frequency Percent Adult Senior Teen Voila! With the variable length explicitly defined long enough to handle all possible values, we get the results we wanted.
49
Questions?
50
References Delwiche, Lora D., and Susan J. Slaughter, The Little SAS Book: a primer, Third Edition, SAS Institute, Cary, NC, 2003 Staum, Roger, To Err is Human; to Debug Divine, NESUG 15 Conference Proceedings Howard, Neil, How SAS Thinks, or Why the DATA Step Does What It Does, SUGI 29 Conference Proceedings Knapp, Peter, Debugging 101, NESUG 15 Conference Proceedings Rhodes, Dianne Louise, So You Want to Write a SUGI Paper? That Paper About Writing a Paper, SUGI 29 Conference Proceedings
51
Resources Online documentation http://v9doc.sas.com/sasdoc/
Online conference proceedings SUGI: sugi/proceedings/index.html NESUG:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.