Chapter 3: Working With Your Data

Slides:



Advertisements
Similar presentations
Instructor: Craig Duckett CASE, ORDER BY, GROUP BY, HAVING, Subqueries
Advertisements

Basic Elements of C++ Chapter 2.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
CIS162AD - C# Decision Statements 04_decisions.ppt.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Chapter 4: Decision Making with Control Structures and Statements JavaScript - Introductory.
Using Client-Side Scripts to Enhance Web Applications 1.
Chapter 1: Introduction to SAS  SAS programs: A sequence of statements in a particular order  Rules for SAS statements: –Every SAS statement ends in.
Flow of Control Part 1: Selection
Chapter 05 (Part III) Control Statements: Part II.
©TheMcGraw-Hill Companies, Inc. Permission required for reproduction or display. Selection Statements Selection Switch Conditional.
Chapter 7 Selection Dept of Computer Engineering Khon Kaen University.
Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.
Java Programming Fifth Edition Chapter 5 Making Decisions.
Section 3.9: RETAIN & sum statements –because all variables are set to missing at the start of each iteration of the DATA step, we need a statement to.
Chapter 7: Macros in SAS  Macros provide for more flexible programming in SAS  Macros make SAS more “object-oriented”, like R 1 © Fall 2011 John Grego.
Midterm Review Important control structures Functions Loops Conditionals Important things to review Binary Boolean operators (and, or, not) Libraries (import.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
1 Agenda  Unit 7: Introduction to Programming Using JavaScript T. Jumana Abu Shmais – AOU - Riyadh.
C++ for Engineers and Scientists Second Edition Chapter 4 Selection Structures.
Control Structures- Decisions. Smart Computers Computer programs can be written to make computers seem smart Making computers smart is based on decision.
Copyright © 2014 Pearson Addison-Wesley. All rights reserved. 4 Simple Flow of Control.
JavaScript: Conditionals contd.
Java Programming Fifth Edition
Chapter Topics The Basics of a C++ Program Data Types
Chapter 5: Enhancing Your Output with ODS
Loops BIS1523 – Lecture 10.
Chapter 6: Modifying and Combining Data Sets
Shell Scripting March 1st, 2004 Class Meeting 7.
Basic Elements of C++.
EGR 2261 Unit 4 Control Structures I: Selection
IF statements.
Chapter 2: Getting Data into SAS
The Selection Structure
Miscellaneous Items Loop control, block labels, unless/until, backwards syntax for “if” statements, split, join, substring, length, logical operators,
Other Kinds of Arrays Chapter 11
Chapter 3 Selections Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved
Chapter 7 Arrays.
2. Understanding VB Variables
Conditionals & Boolean Expressions
JavaScript: Control Statements.
CHAPTER 4 CLIENT SIDE SCRIPTING PART 2 OF 3
Intro to PHP & Variables
MATLAB: Structures and File I/O
Creating a Workbook Part 2
Basic Elements of C++ Chapter 2.
Chapter 18: Modifying SAS Data Sets and Tracking Changes
Boolean Expressions And if…else Statements.
Chapter 1: Introduction to SAS
Arrays, For loop While loop Do while loop
Chapter 4: Sorting, Printing, Summarizing
Chapter 10 Programming Fundamentals with JavaScript
More Selections BIS1523 – Lecture 9.
Variables In programming, we often need to have places to store data. These receptacles are called variables. They are called that because they can change.
Chapter 7: Macros in SAS Macros provide for more flexible programming in SAS Macros make SAS more “object-oriented”, like R Not a strong suit of text ©
Topics Introduction to File Input and Output
SAS Essentials How SAS Thinks
Number and String Operations
Introduction to SAS A SAS program is a list of SAS statements executed in order Every SAS statement ends with a semicolon! SAS statements can be in caps.
PHP.
T. Jumana Abu Shmais – AOU - Riyadh
Starting Out with Programming Logic & Design
Chapter 3: Selection Structures: Making Decisions
Boolean Expressions to Make Comparisons
Chapter 3 Selections Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved.
Programming Logic and Design Fifth Edition, Comprehensive
Instructor: Raul Cruz-Cano 7/19/2012
Chapter 3: Selection Structures: Making Decisions
Topics Introduction to File Input and Output
Class code for pythonroom.com cchsp2cs
Presentation transcript:

Chapter 3: Working With Your Data **SAS Alert** In this chapter, we will see some examples of SAS at its worst. The “implicit” looping of the DATA step makes several operations that would be easy in R difficult or clumsy in SAS We see good things too © Fall 2011 John Grego and the University of South Carolina

Assigning Variables Creating variables based on other variables is easily done within the data step Assignment is carried out with the = sign input var1 var2 var3; mysum=var1+var2+var3; mycube=var3**3; always6=6; newname=var2;

Assigning Variables II Order of operations is followed, but use parentheses when necessary and for clarity Previously defined variables can be over-written var1=var1-15;

SAS functions In addition to simple math expressions, you can use built-in SAS functions to create variables Section 3.3 (pages 78-81) lists many built-in functions

Working with your data Some of the most useful: LOG, MEAN, ROUND, SUM, TRIM, UPCASE, SUBSTR, CAT, COMPRESS, DAY, MONTH Note: MEAN takes the mean of several variables, not the mean of all values of one variable. Similarly with SUM, etc. They are row operators, not column operators. Open charfunctions.sas

Using IF-THEN Statements Conditional statements in SAS rely on several important keywords like IF, THEN and ELSE and logical keywords like EQ,NE,GT,LT,GE,LE,IN,AND,OR All of these have symbolic equivalents (see page 82) IN: Checks whether a variable value occurs in a specified list

Simple IF-THEN statement An IF-THEN statement is a simple conditional statement, usually resulting in only one action, unless the keywords DO and END are specified (like braces in R) IF X>0 AND X<2 THEN Y=X; ELSE Y=2-X; Open pact.sas

IF THEN/ELSE IF/ELSE Several conditions may be checked using ELSE IF or ELSE statements The last action is carried out if none of the previous conditions are true IF .. THEN ..; ELSE IF .. THEN ..; ELSE ..; Open retain2008.sas

IF-THEN vs. IF-THEN/ELSE Using several ELSE statements is more efficient than using several IF-THEN statements (though errors in logic are more likely) Note: Parentheses may be useful with AND/OR statements.

Missing Values Be careful with missing values when making comparisons! SAS considers missing values to be “less than” practically any value, so if data contains missing values, handle them separately IF weight=. THEN size=‘unknown’; ELSE IF weight<25 THEN size=‘small’; ELSE IF ..

Using IF to select a subset of data We can retain cases from the data using logical operators with a subsetting IF. The syntax is unusual; it seems as though part of the statement is missing DATA B; SET A; IF type=‘Pine’; Open pact.sas. again

Implict vs Explicit subsetting IF Data set B will then include only the cases that match the condition Most people get used to this odd syntax, but you can include a THEN KEEP/THEN DELETE clause if this makes you really uncomfortable

SAS Dates SAS stores dates internally as the number of days since January 1, 1960 Special informats for reading dates (pp. 44-45) When a year is specified by two digits (‘03, ‘45, etc.), you can use YEARCUTOFF to specify the century

SAS Dates-Year cutoff The default is 1920; SAS assumes dates range from 1920 to 2019. It’s better to simply avoid this ambiguity options yearcutoff=1930; options yearcutoff=1800;

SAS Dates-TODAY Handy function: TODAY() is set to the current date Special date form for logical operators if birthdate>’01JAN1988’d then age=‘Under 21; 15

SAS Date functions and formats Printing dates in a conventional format: Use FORMAT command in PROC PRINT; We can also output data using date formats (pp. 90-91) Other useful functions: MONTH(),DAY(),YEAR(),MDY(),QTR() Run date.sas. DATETIME formats not covered here.

cumul_sum+value_added; RETAIN statement The RETAIN statement tells SAS to retain the value of a variable as SAS moves from observation to observation A clumsy solution to a common SAS problem Can be useful for cumulative analyses A sum statement creates a cumulative sum: cumul_sum+value_added; David’s code—open sas_quakes_retain.sas. SAS supports implicit addition too.

Using arrays We have seen how to alter variables that have been read into a SAS data set Sometimes we want to operate on more than one variable in the same way This can be accomplished quickly by creating an array (another example of an awkward solution to a common problem in SAS)

Array properties An array is a group of variables (either all numeric or all character) These could be already-existing variables or new ones The array itself does not become an object in the data set

Defining an array Once an array is defined, you can refer to its variables using “subscripts”. Ex: array_name(2) Using a DO statement is clunky—I like to use DO OVER instead ARRAY array_name (n) $ .. .. ..; Book ex: ARRAY song (5) wj kt tr filp ttr; Do i=1 TO 5; IF song(i)=9 THEN song(i)=.; END; (n) is used for explicit arrays with predefined numbers of variables. Use [] or {} in addition to (). Open array.sas, which changes missing values to 0’s, and counts item response categories.

Shortcuts for lists of variables Suppose variable names begin with a common character string, and end with a number sequence: var1, var2, var3, var4 You can refer to them in shortcut fashion: var1-var4

Lists of variables in functions When specifying abbreviated lists in functions, you must use the keyword OF sum(of var1-var4); mean(of var1-var7);

proc contents data=dsname position; Variable ordering You can abbreviate lists of named variables using a double hyphen: firstvar—lastvar These must follow the creation order of the variables as defined in the SAS data sheet. Check this either in the worksheet or by entering: proc contents data=dsname position; Open array.sas rather than doubeldash.txt. Open variable_order.sas

Special variables _ALL_ is short for “all variables in the data set” _NUMERIC_ is short for “all numeric variables in the data set” _CHARACTER_ is short for “all character variables in the data set” _N_ is short for “current observation index in the data set SUM(OF _NUMERIC_);