Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAS PROGRAMMING I RETAIN statement RENAME statement Formats and labels in a DATA step Creating user-defined formats Conditional execution Indicator variables.

Similar presentations


Presentation on theme: "SAS PROGRAMMING I RETAIN statement RENAME statement Formats and labels in a DATA step Creating user-defined formats Conditional execution Indicator variables."— Presentation transcript:

1 SAS PROGRAMMING I RETAIN statement RENAME statement Formats and labels in a DATA step Creating user-defined formats Conditional execution Indicator variables DO groups Arrays Transformations involving missing values Sum statement

2 Declarative Statements These statements supply information to SAS during the compilation phase of DATA step processing. They define and modify the actions to be taken during the execution phase and affect the composition and contents of the PDV and the new data set being created. These statements are "non-executable" and their placement in the DATA step usually is unimportant, although there are cases where the order of these statements does affect the outcome. Remember, the PDV is created during the compile phase by compiling the statements in the order in which the compiler comes to them.

3 THE COMPILATION PHASE THE MACHINE LANGUAGE THE PDV THE DESCRIPTOR

4 Examples: compilation phase RETAIN DROP, KEEP, RENAME LENGTH, LABEL, FORMAT

5 THE RETAIN STATEMENT The RETAIN statement lists those variables in the PDV that should not be initialized to missing at the beginning of each execution of the DATA step. SYNTAX RETAIN var-list [initial value]..., where Var-list is a list of variable names to be exempt. Initial value is the value placed in the PDV at compile time. Multiple RETAIN statements may be entered in the same DATA step. A single RETAIN statement may specify both numeric and character variables.

6 THE RETAIN STATEMENT If the first reference to a variable is in the RETAIN statement, SAS assumes it is numeric. To indicate a character variable, provide an initial value of the proper length, e.g., using a preceding LENGTH statement. Only variables created by assignment and INPUT statements may be retained. Retaining a variable brought into the DATA step with a SET, MERGE, or UPDATE statement is not an error, just an action with no effect. Values of retained variables are held over from the last observation, but they can be changed with SET or assignment statements.

7 EXAMPLE: Cumulative Totals Generate several cumulative measures. DATA one; SET classlib.class; cumht = cumht + ht; cumwt = cumwt + wt; cumage = cumage + age; RUN; DATA one; SET classlib.class; RETAIN cumht cumwt 0 cumage; cumht = cumht + ht; cumwt = cumwt + wt; cumage = cumage + age; RUN; This one does not work. But why?This one works! But not all of them. b/c if no initial value is given, character variables are initially blank and numeric variables are initially missing (.). b/c PDV is initialized to missing at the beginning of each execution of the DATA step.

8 RETAIN EXAMPLE: COUNTING Find the # of cases, and the # of males in the data set. DATA one; SET classlib.class; RETAIN count 0 nmales 0; count = count + 1; nmales = nmales + (sex='M'); KEEP name sex count nmales; RUN; Boolean logic indicator

9 THE RENAME STATEMENT The RENAME statement changes the name of a variable between the PDV and the output data set. SYNTAX RENAME oldname1=newname1 oldname2=newname2...; EXAMPLE: DATA b; SET a; RENAME name=lastname ht=height; IF ht > 20; halfht = ht/2; RUN;

10 THE RENAME STATEMENT halfht = ht/2; RENAME ht=height;name=lastname

11 Using Labels in a DATA Step The LABEL statement is used during the DATA step to add variable labels of up to 256 characters each to the descriptor section of the output data set being created. SYNTAX LABEL varname1 = ‘256 char label’ varname2 = ‘256 char label’; Labels containing special characters such as “ or ; must be enclosed in single quotes. Labels containing single quotes must be enclosed in double quotes, or you can substitute two single quotes for one single quote inside single quotes, e.g., LABEL name = ‘Subject’’s Name’; or LABEL name = “Subject’s Name”;

12 Using Labels in a DATA Step Otherwise, labels can be enclosed in either single or double quotes. Existing variable labels from the input data set are included on the output data set, unless they are modified by a LABEL statement. DATA newclass; SET classlib.class; LABEL age = ' ' ht = 'Height in Inches'; RUN;

13 An Introduction on SAS Formats What is a SAS Format? – A SAS format is an instruction that SAS uses to write data values. – For example, if you had the number 587, you could ask SAS to apply the WORDS. Format and display the value as “five hundred eighty-seven”. Formats are used to – 1. Control the written appearance (or display) of data values to make your SAS output more readable. You can print numeric values as character values (for example, print 1 as Male and 2 as Female). You can print one character string as another character string (for example, print YES as OUI). – 2. Group data values together for analysis.

14 SAS Formats SAS formats have the form format-name. where – $ Indicates a character format; no $ indicates a numeric format – format-name Gives the name of the format to be applied. Can be either a format supplied by SAS or a user-defined format created with PROC FORMAT. – w (optional) Gives the number of characters to be used in the display; if omitted, SAS makes a reasonable choice. – d (optional) For numeric formats, how many of the w are to be to the right of the decimal place.

15 SAS Formats The name of a format that is being supplied always contains a period (.) as part of the name. – SAS issues an error message if you try to use a character format with a numeric variable or vice versa. – Most formats support widths and alignments. – Width refers to the number of columns used to display the values. – Alignment refers to how SAS behaves when a value’s formatted length is less than the specified width. Usually, numeric formats right-align and character formats left-align.

16 SAS Formats This means that if there are leftover columns, numbers are shifted flush right in the columns and characters are shifted flush left in the columns. – If a value is displayed as a series of asterisks (***), the applied format width was too narrow for the value. – There are many ways to apply SAS formats, but when using a procedure you apply them with a FORMAT statement inside the procedure step. FORMAT statement syntax: FORMAT var-list format. var-list format. … ; PROC PRINT DATA=classlib.sales; FORMAT cost price dollar9.2; RUN;

17 Apply SAS Formats Remember: Applying a format does not change the “internal” values of a variable. It only changes the way the values are displayed.

18 Using Formats: DATA vs. PROC DATA class2; SET classlib.class; FORMAT sex $hex2. age roman6. ht words15. wt e8.; RUN; PROC PRINT DATA= classlib.class; VAR sex age ht wt; FORMAT sex $hex2. age roman6. ht words15. wt e8.; RUN; It changes the way the values are displayed in output window. It permanently changes the way the values are displayed in data.

19 Types of SAS-Supplied Formats CategoryDescription Character instructs SAS to write character data values from character variables. Date and Time instructs SAS to write data values from variables that represent dates, times, and datetimes. ISO 8601 instructs SAS to write date, time, and datetime values using the ISO 8601 standard. Numericinstructs SAS to write numeric data values from numeric variables.

20 Creating User-Defined Formats The user-defined formats can be used to assign value labels for variables in either DATA or PROC steps. They can also be used to recode variables in a DATA step or to collapse variable categories in a PROC step. There are two types of user-defined formats. – VALUE formats convert one or more user-specified values into a single character string. – PICTURE formats specify a template for how to print a number or range of numbers. Syntax : PROC FORMAT; VALUE fmt1name range1 = ‘label1’ range2 = ‘label2’ … ; RUN;

21 Creating User-Defined Formats Format names: – Are 32 characters or fewer in length. – Begin with a letter or underscore for formats to be assigned to numeric variables. – Begin with a dollar sign ($) followed by a letter or underscore for formats to be assigned to character variables. – Cannot end in a number. – Cannot conflict with the names of SAS-supplied formats. Note: – Format names do not end in a dot (.) in a VALUE statement, although they do everywhere else. – User-defined formats may be used in DATA step FORMAT statements. If the data set is stored permanently, you must have the format available whenever the data set is used, since the format is part of the permanent data set.

22 Creating User-Defined Formats Ranges can be – Single value or OTHER – Ranges of values including LOW and HIGH – Lists of values and ranges – Character values, ranges, or lists VALUE AFT 1=’Agree’ 2=’Disagree’ OTHER=’ERROR’; VALUE AGEFT LOW-12 = ‘Kids’ 13-19 = ‘Teens’ 20-HIGH = ‘Adults’; VALUE SFMT 1,3 = ‘Male’ 2,4 = ‘Female’ 0,5-9,. = ‘Miscoded’; VALUE $ GR ‘A+’ = ‘H’ ‘A’-‘C’= ‘P’ ‘D’ = ‘L’;

23 User-Defined Formats: Notes Values not in any ranges are displayed as is, unformatted. Missing values are not included in the LOW specification. Missing values can be included in the OTHER specification. Except in special cases (see the MULTILABEL option), ranges should not overlap. The less than symbol (<) can be used in ranges to exclude either end point of the range, e.g., – 1-10 (values 1-10 inclusive of each end point) – 1<-10 (values greater than 1 up to and including 10) – 1-<10 (1 up to but not including 10)

24 Example: the Problem Print the SALES2 data set (first 10 observations) and then run PROC FREQ on PRICE and WEEKDAY, in both cases formatting PRICE and WEEKDAY as described below. In the FREQ step, include missing values in calculations of percentages and other statistics. Do not use data step.  Value of PRICE  Value of WEEKDAY 0<price<=100 Low 1000<price<=500 Medium 500<price<700 High price>=700 Very High.<price<=0 Invalid. Missing MON, TUE, WED, SUN No R THR, FRI, SAT R Other Invalid

25 Example: the Solution PROC FORMAT; VALUE prfmt 0<-100 = ‘Low ‘ 100<-500 = ‘Medium’ 500<-<700 = ‘High’ 700-high = ‘Very High’ LOW-0 = ‘Invalid’ Other = ‘Missing’; VALUE $rfmt ‘MON’,’TUE’,’WED’,’SUN’ = ‘No R’ ‘THR’,’FRI’,’SAT’ = ‘R’ other = ‘Invalid’; RUN; PROC PRINT DATA=sales2(OBS=10); FORMAT price prfmt. weekday $rfmt.; RUN; PROC FREQ DATA=sales2; TABLES price weekday / MISSING; FORMAT price prfmt. weekday $rfmt.; RUN;

26 Example: Another example User-defined formats for display & grouping purposes. PROC PRINT DATA=class; FORMAT ht hft. wt wft. sex $sft.; RUN; PROC FREQ DATA=class; TABLES ht wt; TABLES sex / MISSING; FORMAT ht hft. wt wft. sex $sft.; RUN; PROC FORMAT; VALUE hft low-<55.0 = 'Short‘ 70.1-high = 'Tall'; VAlUE wft low-75.0 = 'Skinny‘ 75.0<-125.0 = 'Average' 125.0<-high = 'Robust‘ other = 'Missing'; VALUE $sft 'M' = 'Male‘ 'F' = 'Female‘ other = 'Invalid'; RUN;

27 CONDITIONAL EXECUTION Up to this point, the statements in the DATA step have been executed sequentially for each observation processed. The ability to execute or not execute a statement based on whether or not some condition is met is one of the most powerful features of a computer. We have already discussed a very specialized conditional statement, the subsetting IF, which is used to control whether or not observations are added to the output data set. The general form of the IF statement is IF expression THEN statement1; ELSE statement2; where expression is any valid SAS expression, and statements 1 and 2 are any executable SAS statements.

28 CONDITIONAL EXECUTION NOTES: – If expression is "true" (non-zero and non-missing) then statement 1 is executed. If expression is "false" (zero or missing) then statement 2 is executed. – The expression is usually a comparison expression (x LT 4), in which case the expression has a value of 1 for true and 0 for false. – Arithmetic expressions (y + z) are also valid. – The ELSE statement is optional. If it is not used and the expression is false, control is transferred to the next statement.

29 CONDITIONAL EXECUTION RECODING EXAMPLE /* Create age, height, and weight categories */ DATA class; SET classlib.class; IF age < 30 THEN agecat = 'Young'; ELSE agecat = 'Old'; IF ht < 30 THEN htcat = 'Short'; ELSE IF 30 <= ht < 70 THEN htcat = 'Ave'; ELSE IF ht >= 70 THEN htcat = 'Tall'; IF 2 < wt < 100 THEN wtcat = 'Light'; ELSE IF wt >= 100 THEN wtcat = 'Heavy'; RUN;

30 CONDITIONAL EXECUTION IF / THEN / ELSE RETAIN EXAMPLE DATA one; SET classlib.class; RETAIN nmales nfemales problem 0; IF sex = 'M' THEN nmales = nmales + 1; ELSE IF sex = 'F' THEN nfemales = nfemales + 1; ELSE problem = problem + 1; RUN;

31 CONDITIONAL EXECUTION IF / THEN / ELSE RETAIN EXAMPLE, + IF DATA class; SET classlib.class end=last; RETAIN nmales nfemales problem 0; IF sex = 'M' THEN nmales = nmales + 1; ELSE IF sex = 'F' THEN nfemales = nfemales + 1; ELSE problem = problem + 1; IF last; RUN;

32 CREATING INDICATOR VARIABLES An indicator variable has the value – 1 if certain conditions are true – 0 otherwise For example: Make an indicator variable HighBP such that HighBP is 1 if SYSBA > 162 or DSTBA > 92, otherwise HighBP is 0. Two methods: if sysba > 162 or dstba > 92 then HighBP = 1; else HighBP = 0; OR HighBP = (sysba > 162) or (dstba > 92);

33 CREATING INDICATOR VARIABLES Be careful if your conditionals check for “less than” conditions, to avoid having missing values result in a value of 1 for the indicator variable. Use expressions like these: if (.z < sysba < 110) or (.z < dstba < 70) then LowBP = 1; else LowBP = 0; OR LowBP = (.z < sysba < 110 ) or (.z < dstba < 70);

34 DO/END STATEMENTS The DO and END statements define the beginning and end of a group of statements called a DO group. The DO group can be used within IF-THEN/ELSE statements to conditionally execute groups of statements. Execution of a DO statement specifies that all statements between the DO and its matching END statement are to be executed. SYNTAX: DO; or DO GROUP STATEMENTS; …. …… END;

35 Example Consider a data set where some height and weight measurements were collected in English units and some were collected in metric units. Convert all measurements to centimeters and grams. DATA fixed; SET mixed; DROP units; IF units = 'M' THEN DO; /* metric units */ ht = ht * 100; /* meters to cm */ wt = wt * 1000; /* kilos to grams */ END; ELSE DO; /* English units */ ht = ht * 2.54; /* inches to cm */ wt = wt * 1000/2.2; /* pounds to grams */ END; RUN;

36 ITERATIVE EXECUTION OF DO GROUPS Iterative DO loops are used to repeatedly execute the statements within a DO group, changing the value of the index-variable each time. The number of iterations and the value of the index variable are determined by the "start", "stop" and "increment“ parameters, or by the list of “values”. DO index-variable=start TO stop BY increment; DO index-variable=start TO stop; DO index-variable=value1, value2, … valuen;

37 Example For example, the following code to add the odd integers from 1 to 7, is equivalent to: X=0 ; DO I = 1 to 7 by 2; X = X + I; END; Condition EvaluationValue iAction I = 1; if (i > 7) then "leave loop" False1X=X+i;X=0+1; I = I + 2;False3X=X+i;X=1+3; I = I + 2;False5X=X+i;X=4+5; I = I + 2;False7X=X+i;X=9+7; I = I + 2;True9Stop X = 0;Initial value X.

38 INTERPRETATION OF THE CONTROL EXPRESSION DO index-variable = start TO stop BY increment; 1) The first time the DO statement is encountered, the index-variable is set to "start". 2) If the index-variable is greater than "stop", then control passes to the statement following the END statement. 3) If the value of the index-variable is less than or equal to "stop", the statements in the DO group are executed. 4) At the end of the DO group, "increment" is added to the index- variable and control branches back to the test against "stop" (step 2 above). 5) This process is repeated until the index-variable is greater than the "stop" value. Control then passes to the statement following the END statement. "Start", "stop", and "increment" can all be arbitrarily complex expressions whose values are only evaluated once, the first time through the loop.

39 EXAMPLES 1) Compute the final balance resulting from depositing a given amount (CAPITAL) for a given number of years (TERM) at a given rate of interest (RATE). Assume interest is compounded yearly. DATA compound; SET money; DO year=1 TO term BY 1; interest = capital * rate; capital = capital + interest; END; DROP year interest; RUN;

40 EXAMPLES 2) Count the number of leap years experienced by each member of the data set before 1999 WORK.KIDS, shown below;* DATA leaps; SET kids; nleap = 0; DO year = birthyr to 1999 by 1; IF MOD(year,4)=0 THEN nleap=nleap+1; END; DROP year; RUN;

41 EXAMPLES 3) Compute the factorial of X, X!, where; X! = 1 * 2 * 3... X By definition, 0! = 1 and X! is undefined for X < 0. DATA factorial; SET math; factx=1; IF if x<0 then factx=.; ELSE if x>0 THEN DO ; DO i = 1 TO x; factx = factx*i; END; RUN; Obsxfactxi 1112 24245 3-3.. 401. 51247900160013

42 DO-LOOPS: USAGE NOTES DO statements have five components: – 1. The INDEX is a numeric variable controlling the execution of the loop. – 2. The START is a numeric variable, constant, or expression that defines the beginning value taken by the INDEX. – 3. The STOP is a numeric variable, constant, or expression that defines the last value for which the loop is to be executed. – 4. The INCREMENT is a numeric variable, constant, or expression that controls how the value of the INDEX changes. Its default value is 1. Generally, once INDEX plus INCREMENT exceeds STOP, the loop terminates. – 5. If a list of values is used, VALUE is a numeric or character variable, constant, or expression.

43 DO-LOOPS: USAGE NOTES Loops execute until the index exceeds the stop value. This means that the stop value must be reachable from start. You cannot have a start at 100 and a stop at 50 unless you used a negative increment. The index becomes part of the SAS data set being created unless it is included in a DROP statement. The start, stop, increment, and value must be non- missing. You can combine the various forms of the indexed DO statements, using start, stop, and optionally, increment, with one or more "value" specifications. Each form of these DO loops can be nested within each other or a DO group.

44 MORE COMPLEX DO LOOPS The general syntax of the iterative DO statement is; DO control expression 1, control expression 2,...; Where each "control expression" has the general form; start TO stop BY increment If more than one control expression is included, each is executed in turn. Thus the statements within the loop; DO i=1 to 3 by 1, 10 to 40 by 10; END; would be executed 7 times, with I having, successively, the values 1, 2, 3, 10, 20, 30, 40. If "BY increment" is omitted, "BY 1" is assumed. If "TO stop BY increment" is omitted, the loop will be executed once with "index variable=start"

45 Examples 1) Check the variable Y for the missing value codes 99, 998, and 999 and recode to a SAS missing value: 2) Creating a data set without input data DO miss=99, 998, 999; IF y=miss THEN y=.; END; DATA one; LENGTH var1 3; DO var1 = 1 TO 8 BY 2, 10, 40, 50; var2 = SQRT(var1); END; RUN; DATA one; LENGTH var1 3; DO var1 = 1 TO 8 BY 2, 10, 40, 50; var2 = SQRT(var1); OUTPUT; END; RUN; Q: How many cases are in each of these data sets?

46 MORE COMPLEX DO LOOPS A DO loop can also "count down." In that case "increment" is negative and "stop“ must be less than "start." In such cases, the loop is repeated until the value of the index variable is less than "stop". Example: Find the length of the value of a character variable, NAME. The length will be defined as the position of the rightmost non-blank character in NAME. Assume that the length of the variable NAME is 20. length = 0; DO i=20 TO 1 BY -1; IF (SUBSTR(name,i,1) NE ' ') AND (length EQ 0) THEN length=i; END;

47 Other Forms of Iterative DO Groups Two additional forms of DO loop available are the DO WHILE and DO UNTIL loops. These are used in cases in which you want to execute a loop as long as (WHILE) some logical expression is true or as long as some logical expression is false (UNTIL it is TRUE). DO WHILE (Expression); DO UNTIL(Expression);

48 DO WHILE statement General form: DO WHILE (Expression); Executable statements END; In a DO WHILE statement, the expression is evaluated at the top of the loop, before the statements in the DO group are executed. If the expression is true, the DO group is executed. Example: Count the number of years needed to double an initial amount (CAPITAL) at a given rate of interest (RATE), compounding yearly.

49 Example: the Double Time Count the number of years needed to double an initial amount (CAPITAL) at a given rate of interest (RATE), compounding yearly. DATA double; SET compound; total = capital; term = 0; DO WHILE(total < (capital*2)); total = total + (total*rate); term = term + 1; END; RUN;

50 Example: Make Change Compute the smallest number of quarters, dimes, nickels, and pennies which add to an arbitrary amount between 0 and 99. E.g. 97, 32, and 11. DATA change; amount=97; OUTPUT; amount=32; OUTPUT; amount=11; OUTPUT; RUN; DATA coins; SET change; left = amount; quarters=0; dimes=0; nickels=0; pennies=0; DO WHILE(left >=25); quarters = quarters+1; left = left-25; END; DO WHILE(left >=10); dimes = dimes + 1; left = left - 10; END; IF (left >=5) THEN DO; nickels = 1; left = left - 5; END; pennies = left; DROP left; RUN;

51 DO UNTIL statement In a DO UNTIL Statement, the expression is evaluated at the bottom of the loop, after the statements in the DO group are executed. If the expression is true, the DO group is not executed again. The DO group is always executed at least once. General form: DO UNTIL (Expression); Executable statements END; Example: Find the position of the first occurrence of a character (CHAR) in NAME

52 DO UNTIL statement Find the position of the first occurrence of a character (CHAR) in NAME: Q: WHAT HAPPENS IF CHAR IS NOT IN NAME? i = 1; char = '*'; name = 'JOE*SMITH'; DO UNTIL (SUBSTR(name,i,1) EQ char); i = i + 1; END; PUT i= ;

53 ARRAYS statement Sometimes, we need to conduct a similar task for several variables, e.g., ARRAY – a group of variables given a collective name – a convenient way of temporarily identifying a group of variables by assigning an alias to them – Calculations in the DATA step can operate on arrays as they can on variables. – ARRAY statements are usually used in conjunction with DO loops so that you can execute one or more statements for each of a group of related variables. – An array in SAS is not a permanent data structure as in some other languages. It exists only for the duration of the DATA step. ratio1 = verbal1/math1 ; ratio2 = verbal2/math2 ; ratio3 = verbal3/math3; if date1=98 or date1=99 then date1=.; if date2=98 or date2=99 then date2=.; if date3=98 or date3=99 then date3=.;

54 Introductory Array Example The following ARRAY statement lists the three variables ExamScore1, ExamScore2, and Paper: This statement tells SAS to – Make a group named SCORELIST for the duration of this DATA step. – Put three variable names in SCORELIST: ExamScore1, ExamScore2, and Paper. – From this point in the DATA step, one of these variables can be referenced by either its original name or its array reference. For example, the names ExamScore1 and SCORELIST{1} are equivalent. ARRAY scorelist {3} ExamScore1 ExamScore2 Paper; Group name# of memberList of member

55 Array statement By listing a variable in an ARRAY statement, you assign the variable an extra name with the form array- name{position}, where position is the position of the variable in the list (1, 2, or 3 in this case). The position can either be a number or the name of a variable whose value is the number. The additional name is called an array reference, and the position is called the subscript. A simple ARRAY statement has the following form: ARRAY array-name{number-of-elements} vari-1 ; For example, ARRAY bpmeasures{4} bp_visit1 bp_visit2 bp_visit3 bp_visit4;

56 ARRAY SYNTAX ARRAY name{dim} [$] [len] [elements] [(starting_values)] ; – NAME is the name of the array; cannot be a variable or an array already in the data set. – DIM is the number of elements in the array. An asterisk (*) may also be entered. The DIM can also be enclosed in [ ] or (). – The $ indicates that the elements of the array are character variables. – The LEN indicates the length of any variables that have not yet been assigned a length. – The ELEMENTS are the names of the variables in the array. Any combination of variable lists and variable names are permitted. All elements in the array must be of the same data type. – STARTING_VALUES indicate initial values for array elements. These values are separated by a comma and/or one or more blanks. Starting values do not replace variable values already known to SAS.

57 ARRAY EXAMPLES The following are valid ARRAY statements: ARRAY test{3} test1-test3 ; ARRAY test{*} test1-test3; ARRAY test{3} ; /* create variables test1-test3 */ ARRAY day{4} $2 day1-day4 ('S','M','TU','W') ; ARRAY x{*} _NUMERIC_ ; ARRAY y{*} test1-test3 score4-score6 ; ARRAY z{*} x y ; ARRAY scores{5,3} score1-score15 ;

58 EXAMPLES 1) Convert the homework scores HW1- HW3 from a 10-point scale to a 100-point scale. 2) The DATA step below searches through the homework scores in NEWHW, recording each student's worst homework score. DATA newhw; SET oldhw; ARRAY homework(3) hw1 hw2 hw3; DO I = 1 TO 3; homework(i) = homework(i) * 10; END; DROP i; RUN; DATA work.worst; SET work.newhw; KEEP name score; ARRAY hw(3) hw1 hw2 hw3; score = 101; DO j = 1 TO 3; IF (hw(j) < score) THEN score = hw(j); END; RUN;

59 EXAMPLES 3) Convert HW1 - HW3 & EXAM from the NEWHW data set to letter grades L1-13 and LEXAM using the following grading scale: – Grade > 90 A – 80 < Grade < 90 B – 70 < Grade < 80 C – Grade < 70 or missing F DATA letterhw; SET newhw; ARRAY hw(4) hw1-hw3 exam; ARRAY l(4) $1 l1-l3 lexam; DO i = 1 TO 4; IF (hw(i) < 70) THEN l(i) = 'F'; ELSE IF (70 <= hw(i) < 80) THEN l(i) = ‘C'; ELSE IF (80 <= hw(i) < 90) THEN l(i) = ‘B'; ELSE l(i) = ‘A'; END; RUN;

60 EXAMPLES 4) Data set ONE contains one record per subject. Each record contains three scores. Convert the data sets to 3 records per subject, where each record contains one score and the score number (1, 2, or 3). DATA two; SET one; DROP score1-score3; ARRAY s(3) score1-score3; DO i = 1 TO 3; score = s(i); OUTPUT; END; RUN; DATA one; id = 101; score1 = 20; score2 = 30; score3 = 40; OUTPUT; id = 102; score1 = 50; score2 = 60; score3 = 70; OUTPUT; RUN;

61 EXAMPLES 5) Convert data set TWO (from previous example) from three observations per subject back to one observation per subject. DATA three; SET two; RETAIN score1-score3; DROP i score; IF i=1 THEN score1=score; ELSE IF i=2 THEN score2=score; ELSE IF i=3 THEN score3=score; IF i=3 THEN OUTPUT; RUN; DATA three; SET two; RETAIN score1-score3; DROP i j score; ARRAY s{3} score1-score3; DO j=1 TO 3; if j=i then s{i}=score; END; if i=3 then output; RUN;

62 EXAMPLES 6) Data set ONE contains three systolic blood pressure measures for each subject. Find the first non-missing measure. DATA two; SET one; ARRAY s{3} sbp1-sbp3; SBP=.; i=1; DO UNTIL (SBP>.z OR i>3); IF s{i} >.z THEN SBP=s{i}; i = i+1; END; RUN; DATA one; sbp1=.; sbp2=90; sbp3=104; OUTPUT; sbp1=101; sbp2=120; sbp3=130; OUTPUT; sbp1=.; sbp2=.; sbp3=95; OUTPUT; sbp1=.; sbp2=.; sbp3=.; OUTPUT; RUN;

63 Other Useful Array Notes You can refer to an entire array using a special asterisk notation. ARRAY var{3} var1-var3; x2 = SUM(OF var{*}); You can use the DIM function to return the number of elements in a dimension of an array. ARRAY test{*} test1-test3; DO i=1 TO DIM(test); test{i} = test{i} + 10; END; ARRAY var{*} _NUMERIC_; DO i=1 TO DIM(var); IF var{i} =. THEN var{i} = 0; END;

64 Other Useful Array Notes You can define an array of temporary data elements by using the _TEMPORARY_ keyword to do calculations. The elements of temporary arrays are not variables, but constants. DATA one; ARRAY test{6} _TEMPORARY_ (90 80 70 70 95 60); ARRAY grade{6} $; DO j = 1 TO DIM(test); IF test(j) >= 95 THEN grade{j} = 'A'; ELSE IF test{j} >= 85 THEN grade{j} = 'B'; ELSE IF test{j} >= 75 THEN grade{j} = 'C'; ELSE IF test{j} >= 70 THEN grade{j} = 'D'; ELSE grade{j} = 'F'; END; DROP j; RUN;

65 TRANSFORMATIONS INVOLVING MISSING VALUES Missing values occur in most data, and it is important to understand what effect these missing values have on transformations of variables. Missing values for numeric variables are represented in programming statements by._,.,.A-.z Missing value for character variables are represented by blank field (‘’) – Variables can be assigned missing values. – Missing values propagate through arithmetic expressions. – Illegal arguments sent to functions return missing values. Missing values are treated like minus infinity in comparison expressions. Special missing values compare in the sort sequence.

66 THE SUM STATEMENT A sum statement can be used to sum expressions over observations. It implies a RETAIN and only sums nonmissing values. SYNTAX Sum_variable + Expression; Example: generate cumulative measures DATA accumulate2; SET classlib.class; cumht + ht; cumwt + wt; cumage + age; RUN; DATA one; SET classlib.class; cumht = cumht + ht; cumwt = cumwt + wt; cumage = cumage + age; RUN; This one does not work. It needs a retain statement. But this one does! b/c the sum statement implies a retain.


Download ppt "SAS PROGRAMMING I RETAIN statement RENAME statement Formats and labels in a DATA step Creating user-defined formats Conditional execution Indicator variables."

Similar presentations


Ads by Google