Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 3: Working With Your Data

Similar presentations


Presentation on theme: "Chapter 3: Working With Your Data"— Presentation transcript:

1 Chapter 3: Working With Your Data
**SAS Alert** In this chapter, we will see some examples of SAS at its worst. The “implicit” looping of the DATA step makes several operations that would be easy in R difficult or clumsy in SAS We see good things too © Fall 2011 John Grego and the University of South Carolina

2 Assigning Variables Creating variables based on other variables is easily done within the data step Assignment is carried out with the = sign input var1 var2 var3; mysum=var1+var2+var3; mycube=var3**3; always6=6; newname=var2;

3 Assigning Variables II
Order of operations is followed, but use parentheses when necessary and for clarity Previously defined variables can be over-written var1=var1-15;

4 SAS functions In addition to simple math expressions, you can use built-in SAS functions to create variables Section 3.3 (pages 78-81) lists many built-in functions

5 Working with your data Some of the most useful: LOG, MEAN, ROUND, SUM, TRIM, UPCASE, SUBSTR, CAT, COMPRESS, DAY, MONTH Note: MEAN takes the mean of several variables, not the mean of all values of one variable. Similarly with SUM, etc. They are row operators, not column operators. Open charfunctions.sas

6 Using IF-THEN Statements
Conditional statements in SAS rely on several important keywords like IF, THEN and ELSE and logical keywords like EQ,NE,GT,LT,GE,LE,IN,AND,OR All of these have symbolic equivalents (see page 82) IN: Checks whether a variable value occurs in a specified list

7 Simple IF-THEN statement
An IF-THEN statement is a simple conditional statement, usually resulting in only one action, unless the keywords DO and END are specified (like braces in R) IF X>0 AND X<2 THEN Y=X; ELSE Y=2-X; Open pact.sas

8 IF THEN/ELSE IF/ELSE Several conditions may be checked using ELSE IF or ELSE statements The last action is carried out if none of the previous conditions are true IF .. THEN ..; ELSE IF .. THEN ..; ELSE ..; Open retain2008.sas

9 IF-THEN vs. IF-THEN/ELSE
Using several ELSE statements is more efficient than using several IF-THEN statements (though errors in logic are more likely) Note: Parentheses may be useful with AND/OR statements.

10 Missing Values Be careful with missing values when making comparisons! SAS considers missing values to be “less than” practically any value, so if data contains missing values, handle them separately IF weight=. THEN size=‘unknown’; ELSE IF weight<25 THEN size=‘small’; ELSE IF ..

11 Using IF to select a subset of data
We can retain cases from the data using logical operators with a subsetting IF. The syntax is unusual; it seems as though part of the statement is missing DATA B; SET A; IF type=‘Pine’; Open pact.sas. again

12 Implict vs Explicit subsetting IF
Data set B will then include only the cases that match the condition Most people get used to this odd syntax, but you can include a THEN KEEP/THEN DELETE clause if this makes you really uncomfortable

13 SAS Dates SAS stores dates internally as the number of days since January 1, 1960 Special informats for reading dates (pp ) When a year is specified by two digits (‘03, ‘45, etc.), you can use YEARCUTOFF to specify the century

14 SAS Dates-Year cutoff The default is 1920; SAS assumes dates range from 1920 to 2019. It’s better to simply avoid this ambiguity options yearcutoff=1930; options yearcutoff=1800;

15 SAS Dates-TODAY Handy function: TODAY() is set to the current date
Special date form for logical operators if birthdate>’01JAN1988’d then age=‘Under 21; 15

16 SAS Date functions and formats
Printing dates in a conventional format: Use FORMAT command in PROC PRINT; We can also output data using date formats (pp ) Other useful functions: MONTH(),DAY(),YEAR(),MDY(),QTR() Run date.sas. DATETIME formats not covered here.

17 cumul_sum+value_added;
RETAIN statement The RETAIN statement tells SAS to retain the value of a variable as SAS moves from observation to observation A clumsy solution to a common SAS problem Can be useful for cumulative analyses A sum statement creates a cumulative sum: cumul_sum+value_added; David’s code—open sas_quakes_retain.sas. SAS supports implicit addition too.

18 Using arrays We have seen how to alter variables that have been read into a SAS data set Sometimes we want to operate on more than one variable in the same way This can be accomplished quickly by creating an array (another example of an awkward solution to a common problem in SAS)

19 Array properties An array is a group of variables (either all numeric or all character) These could be already-existing variables or new ones The array itself does not become an object in the data set

20 Defining an array Once an array is defined, you can refer to its variables using “subscripts”. Ex: array_name(2) Using a DO statement is clunky—I like to use DO OVER instead ARRAY array_name (n) $ ; Book ex: ARRAY song (5) wj kt tr filp ttr; Do i=1 TO 5; IF song(i)=9 THEN song(i)=.; END; (n) is used for explicit arrays with predefined numbers of variables. Use [] or {} in addition to (). Open array.sas, which changes missing values to 0’s, and counts item response categories.

21 Shortcuts for lists of variables
Suppose variable names begin with a common character string, and end with a number sequence: var1, var2, var3, var4 You can refer to them in shortcut fashion: var1-var4

22 Lists of variables in functions
When specifying abbreviated lists in functions, you must use the keyword OF sum(of var1-var4); mean(of var1-var7);

23 proc contents data=dsname position;
Variable ordering You can abbreviate lists of named variables using a double hyphen: firstvar—lastvar These must follow the creation order of the variables as defined in the SAS data sheet. Check this either in the worksheet or by entering: proc contents data=dsname position; Open array.sas rather than doubeldash.txt. Open variable_order.sas

24 Special variables _ALL_ is short for “all variables in the data set”
_NUMERIC_ is short for “all numeric variables in the data set” _CHARACTER_ is short for “all character variables in the data set” _N_ is short for “current observation index in the data set SUM(OF _NUMERIC_);


Download ppt "Chapter 3: Working With Your Data"

Similar presentations


Ads by Google