SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 7 & 10 By Tasha Chapman, Oregon Health Authority.

Slides:



Advertisements
Similar presentations
Haas MFE SAS Workshop Lecture 3:
Advertisements

CSE 1301 Lecture 5B Conditionals & Boolean Expressions Figures from Lewis, “C# Software Solutions”, Addison Wesley Briana B. Morrison.
Introduction to SQL Session 2 Retrieving Data From Multiple Tables.
XP Chapter 3 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Analyzing Data For Effective Decision Making.
Decisions (Conditional Programming) Chapter 5 (Sec. 5.1 & 5.2)
Introduction to Structured Query Language (SQL)
Introduction to Structured Query Language (SQL)
WRITING BASIC SQL SELECT STATEMENTS Lecture 7 1. Outlines  SQL SELECT statement  Capabilities of SELECT statements  Basic SELECT statement  Selecting.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
1 Copyright © Oracle Corporation, All rights reserved. Writing Basic SQL SELECT Statements.
Chapter 3 Single-Table Queries
Analyzing Data For Effective Decision Making Chapter 3.
Key Data Management Tasks in Stata
Chapter 5 Using Data and COBOL Operators. Initializing Variables When you define a variable in WORKING- STORAGE, you also can assign it an initial value.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
2 Copyright © 2004, Oracle. All rights reserved. Restricting and Sorting Data.
1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,
Making Decisions Chapter 5.  Thus far we have created classes and performed basic mathematical operations  Consider our ComputeArea.java program to.
Saeed Ghanbartehrani Summer 2015 Lecture Notes #5: Programming Structures IE 212: Computational Methods for Industrial Engineering.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Conditional Expression One of the most useful tools for processing information in an event procedure is a conditional expression. A conditional expression.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 3 Decision Making. What is IF? The primary method of changing the flow of a program is by making decisions using the IF verb. The following example.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
Summer SAS Workshop Lecture 3. Summer SAS Workshop Website
SAS for Data Management and Analysis
Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.
2 Copyright © 2009, Oracle. All rights reserved. Restricting and Sorting Data.
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
Chapter 4 Select … Case Multiple-Selection Statement & Logical Operators 1 © by Pearson Education, Inc. All Rights Reserved. -Edited By Maysoon.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapter 26 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 8, 13, & 24 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 14 & 19 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
IFS180 Intro. to Data Management Chapter 10 - Unions.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
More SQL: Complex Queries,
Writing Basic SQL SELECT Statements
Putting tables together
Chapter 6: Modifying and Combining Data Sets
DATA MANAGEMENT MODULE: Concatenating, Stacking and Merging
Intro to C Tutorial 4: Arithmetic and Logical expressions
Chapter 4 Select…Case Multiple-Selection Statement & Logical Operators
Chapter 3: Working With Your Data
Chapter 4 Select…Case Multiple-Selection Statement & Logical Operators
ECONOMETRICS ii – spring 2018
Chapter 18: Modifying SAS Data Sets and Tracking Changes
And now for something completely different . . .
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Introduction to DATA Step Programming SAS Basics II
Chapter 4 Select…Case Multiple-Selection Statement & Logical Operators
Introduction to DATA Step Programming: SAS Basics II
Combining Data Sets in the DATA step.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Chapter 4 Select…Case Multiple-Selection Statement & Logical Operators
Stata Basic Course Lab 4.
Chapter 4 Select…Case Multiple-Selection Statement & Logical Operators
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
SE1H421 Procedural Programming LECTURE 4 Operators & Conditionals (1)
Chapter 3: Selection Structures: Making Decisions
Boolean Expressions to Make Comparisons
Chapter 3 Selections Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved.
Chapter 3: Selection Structures: Making Decisions
Chapter 4 Select…Case Multiple-Selection Statement & Logical Operators
Chapter 4 Select…Case Multiple-Selection Statement & Logical Operators
Presentation transcript:

SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 7 & 10 By Tasha Chapman, Oregon Health Authority

Topics covered…  Conditional logic  If, Then, Else  Subsetting datasets  IF and WHERE  Selecting variables  DROP and KEEP  Appending datasets  Joins and merges

If, Then, Else Conditional logic

If, Then, Else IF THEN ; ELSE ; If Score >= 70 Then Grade = 'Passing Grade'; Else Grade = 'Failing Grade'; StudentScoreGrade Jane75Passing Grade Dave56Failing Grade Jack90Passing Grade Sue68Failing Grade

If, Then, Else IF THEN ; ELSE ; If Score >= 70 Then Grade = 'Passing Grade'; Else Grade = 'Failing Grade'; StudentScoreGrade Jane75Passing Grade Dave56Failing Grade Jack90Passing Grade Sue68Failing Grade Assignment statement Used to create new variables

If, Then, Else IF THEN ; ELSE IF THEN ; ELSE ; If Score >= 70 Then Grade = 'Passing Grade'; Else If 60 <= Score <= 69 Then Grade = 'Incomplete'; Else Grade = 'Failing Grade'; StudentScoreGrade Jane75Passing Grade Dave56Failing Grade Jack90Passing Grade Sue68Incomplete

If, Then, Else  When using ELSE IF:  Processes IF-THEN conditions until first true statement is met, then it moves on to the next observation  Once a condition is met, the observation is not reevaluated

If, Then, Else If Score >= 70 Then Grade = 'Passing Grade'; Else If 60 <= Score <= 69 Then Grade = 'Incomplete'; Else Grade = 'Failing Grade'; StudentScoreGrade Jane75Passing Grade Dave56Failing Grade Jack90Passing Grade Sue68Incomplete

If, Then, Else If Score >= 90 Then Grade = 'A'; If Score >= 80 Then Grade = 'B'; If Score >= 70 Then Grade = 'C'; If Score >= 60 Then Grade = 'D'; If Score < 60 Then Grade = 'F'; StudentScoreGrade Jane75D Dave56F Jack90D Sue68D

If, Then, Else If Score >= 90 Then Grade = 'A'; If Score >= 80 Then Grade = 'B'; If Score >= 70 Then Grade = 'C'; If Score >= 60 Then Grade = 'D'; If Score < 60 Then Grade = 'F'; StudentScoreGrade Jane75D Dave56F Jack90D Sue68D Common mistake: Not using ELSE IF Each subsequent IF re-evaluated every observation Common mistake: Not using ELSE IF Each subsequent IF re-evaluated every observation

If, Then, Else If Score >= 90 Then Grade = 'A'; If Score >= 80 Then Grade = 'B'; If Score >= 70 Then Grade = 'C'; If Score >= 60 Then Grade = 'D'; If Score < 60 Then Grade = 'F'; StudentScoreGrade Jack90 ABCD

If, Then, Else If Score >= 90 Then Grade = 'A'; ELSE If Score >= 80 Then Grade = 'B'; ELSE If Score >= 70 Then Grade = 'C'; ELSE If Score >= 60 Then Grade = 'D'; ELSE If Score < 60 Then Grade = 'F'; StudentScoreGrade Jack90 A

If, Then, Else If Score >= 90 Then Grade = 'A'; ELSE If Score >= 80 Then Grade = 'B'; ELSE If Score >= 70 Then Grade = 'C'; ELSE If Score >= 60 Then Grade = 'D'; ELSE If Score < 60 Then Grade = 'F'; StudentScoreGrade Jane75C Dave56F Jack90A Sue68D

Operators

Arithmetic operators ArithmeticSymbolExample Addition+ Xplus = 4+2; Subtraction– Xminus = 4-2; Multiplication* Xmult = 4*2; Division/ Xdiv = 4/2; Exponents** Xexp = 4**2; Negative numbers– Xneg = -2;

Comparison operators Logical comparisonMnemonicSymbol Equal toEQ= Not equal toNE^= or ~= Less thanLT< Less than or equal toLE<= Greater thanGT> Greater than or equal toGE>= Equal to one in a listIN Not equal to any in a listNOT IN Note: <> also used for not equal to, but only in PROC SQL

Logical operators Boolean operator And Or Not

Logical operators  POP QUIZ: If A or B and C; is the same as… If (A or B) and C; If A or (B and C);

Logical operators  POP QUIZ: If A or B and C; is the same as… If (A or B) and C; If A or (B and C);

Order of operations 1. Arithmetic operators 2. Comparison operators (, =, LIKE, etc.) 3. Logical operators a. NOT b. AND c. OR Use parentheses to control the order of operations

If, Then, Else  Logical conditions can be as complicated as you need them to be  Just make sure your order of operations is correct

Using IF and WHERE Subsetting datasets

 Can use IF or WHERE statements to only include observations you need  Both IF and WHERE statements can be used within DATA step if using SET statement to read in SAS data  IF statement must be used within DATA step if using INPUT statement to read raw data  WHERE statement must be used within PROC step

Subsetting datasets  Can use either IF or WHERE in a DATA step with SET statement  In both examples “Minors” dataset will only include observations where age is less than 18

Subsetting datasets  Think of if age lt 18; as short for if age lt 18 then output;  Can output to multiple datasets using IF/THEN logic

Subsetting IF  Use IF in the DATA step to bring in only the selected observations when using an INPUT statement  “Females” dataset will only include observations where gender = ‘F’

Subsetting IF  To improve efficiency, read the value of gender before using it to subset the data  Must use a sign  (See chapter 21, section 11)

WHERE statement  Use a WHERE statement in a PROC step to only include selected observations  “Test_data” dataset still includes all observations  Only observations where age is less than 18 will be included in the calculations and output of the procedure

OperatorDescriptionExampleMatches IS MISSINGMatches a missing value where gender is missing; A missing character value IS NULLEquivalent to IS MISSING where age is null; A missing numeric value BETWEEN AND An inclusive range where age between 20 and 40; All values between 20 and 40, including 20 and 40 CONTAINSMatches a substring where name contains 'MAC'; MACON, IMMACULATE LIKEMatching with wildcards where name like 'R_N%'; RON, RONALD, RUN, RUNNING =*Phonetic matching where name =* 'NICK'; NICK, NACK, NIKKI WHERE statement  WHERE statement allows for additional operators

WHERE statement OperatorDescriptionExampleMatches IS MISSINGMatches a missing value where gender is missing; A missing character value IS NULLEquivalent to IS MISSING where age is null; A missing numeric value BETWEEN AND An inclusive range where age between 20 and 40; All values between 20 and 40, including 20 and 40 CONTAINSMatches a substring where name contains 'MAC'; MACON, IMMACULATE LIKEMatching with wildcards where name like 'R_N%'; RON, RONALD, RUN, RUNNING =*Phonetic matching where name =* 'NICK'; NICK, NACK, NIKKI  WHERE statement allows for additional operators

Using DROP and KEEP Selecting variables

 By default, SAS will keep all variables of the input dataset  Use DROP to exclude certain variables from the output dataset  Use KEEP to include only certain variables from the output dataset

Selecting variables  Can be written as statements in a DATA step or as DROP= / KEEP= DATA step options Which method you use will affect which variables are available within the DATA step as well as processing efficiency.

Selecting variables  Can be written as statements in a DATA step or as DROP= / KEEP= DATA step options All variables (including gender) will be read to into working memory. Gender will be excluded when ‘DEMO’ dataset is written. Gender available for use in the DATA step Less processing efficiency All variables (including gender) will be read to into working memory. Gender will be excluded when ‘DEMO’ dataset is written. Gender available for use in the DATA step Less processing efficiency

Selecting variables  Can be written as statements in a DATA step or as DROP= / KEEP= DATA step options Gender will not be read to into working memory (and will be excluded when ‘DEMO’ dataset is written ) Gender not available for use in the DATA step More processing efficiency Gender will not be read to into working memory (and will be excluded when ‘DEMO’ dataset is written ) Gender not available for use in the DATA step More processing efficiency

Selecting variables  Can use DROP= and KEEP= together ID, Quiz1, Quiz2, and Quiz3 will be read into working memory (and available for use in the DATA step) Only ID and QuizGrade will be written to the “Final” dataset ID, Quiz1, Quiz2, and Quiz3 will be read into working memory (and available for use in the DATA step) Only ID and QuizGrade will be written to the “Final” dataset

Appending datasets

SET statement  The simplest method for adding observations to a SAS dataset is through the SET statement  Datasets are simply stacked on top of each other (concatenation)  Can use a BY statement to interleave datasets (if input datasets are sorted)  Duplicate observations are not overwritten

PROC Append  SET statement method works best if datasets are small and manageable  For better processing efficiency, use PROC Append  Can only append one dataset at a time  Input datasets must have same variables and attributes (otherwise use FORCE option) Master (larger) dataset Smaller dataset to be added

Appending raw data  Use FILENAME statement to append raw data during input  Input datasets must have same format (same variables and attributes)

Joins and merges

 Merging data – combining columns from two or more datasets  Can use DATA step MERGE statement or PROC SQL  Can produce inner joins and outer joins  Can merge entire datasets or subsets of datasets  Can produce one-to-one and one-to-many, but not many-to-many joins  Can produce many-to-many joins in PROC SQL

MERGE statement  Input datasets must have a common identifying variable (primary key)  Input datasets must be sorted by this key variable  Key variable must have same name and attributes  All other variables must have a unique name or they will be overwritten by last merged dataset

One-to-one joins Quiz 1Quiz 2 One-to-one Student_IDQuiz Student_IDQuiz

Student_IDQuiz1Quiz One-to-one joins Quiz 1Quiz 2 One-to-one

One-to-many joins Student demographicsCourses taken Student_IDGender 001F 002M 003M 004F Student_IDCourse_Title 001Psychology Philosophy Math Writing Psychology Spanish 301 …… One-to-many

One-to-many joins Student demographicsCourses taken Student_IDGenderCourse_Title 001FPsychology FPhilosophy FMath MWriting MPsychology MSpanish 301 ……… One-to-many

MERGE statement  Example – 3 datasets Want one dataset with all three quiz grades PersQuizSocQuizCogQuiz

MERGE statement  First sort the datasets using the PROC Sort Datasets must be sorted or the merge will not work properly

MERGE statement  Then merge in the DATA step

Inner Joins  Example – 2 datasets  Some students took quiz 1, but not quiz 2  Some students took quiz 2, but not quiz 1  Some students took both quizzes

Inner Joins Students who took quiz 2 Students who took quiz 1 Students who took both quizzes

Inner Joins  Use the IN= option Use IN= to create temporary aliases for the input datasets Use subsetting IF to only include observations in both datasets Q1 = 1 means the observation is in the Q1 (PersQuiz) dataset

Excluding Joins Students who took quiz 2 Students who took quiz 1 Students who took both quizzes

Excluding Joins  Who took Quiz 1, but not Quiz 2? Use subsetting IF to only include observations in one dataset that are not in the other Q2 = 0 means the observation is in not Q2 (SocQuiz) dataset We’ll discuss other inner and outer joins with PROC SQL

Read chapters 9, 11 & 12 and section For next week…