Download presentation
Presentation is loading. Please wait.
Published byTracy Arnold Modified over 8 years ago
1
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 7 & 10 By Tasha Chapman, Oregon Health Authority
2
Topics covered… Conditional logic If, Then, Else Subsetting datasets IF and WHERE Selecting variables DROP and KEEP Appending datasets Joins and merges
3
If, Then, Else Conditional logic
4
If, Then, Else IF THEN ; ELSE ; If Score >= 70 Then Grade = 'Passing Grade'; Else Grade = 'Failing Grade'; StudentScoreGrade Jane75Passing Grade Dave56Failing Grade Jack90Passing Grade Sue68Failing Grade
5
If, Then, Else IF THEN ; ELSE ; If Score >= 70 Then Grade = 'Passing Grade'; Else Grade = 'Failing Grade'; StudentScoreGrade Jane75Passing Grade Dave56Failing Grade Jack90Passing Grade Sue68Failing Grade Assignment statement Used to create new variables
6
If, Then, Else IF THEN ; ELSE IF THEN ; ELSE ; If Score >= 70 Then Grade = 'Passing Grade'; Else If 60 <= Score <= 69 Then Grade = 'Incomplete'; Else Grade = 'Failing Grade'; StudentScoreGrade Jane75Passing Grade Dave56Failing Grade Jack90Passing Grade Sue68Incomplete
7
If, Then, Else When using ELSE IF: Processes IF-THEN conditions until first true statement is met, then it moves on to the next observation Once a condition is met, the observation is not reevaluated
8
If, Then, Else If Score >= 70 Then Grade = 'Passing Grade'; Else If 60 <= Score <= 69 Then Grade = 'Incomplete'; Else Grade = 'Failing Grade'; StudentScoreGrade Jane75Passing Grade Dave56Failing Grade Jack90Passing Grade Sue68Incomplete
9
If, Then, Else If Score >= 90 Then Grade = 'A'; If Score >= 80 Then Grade = 'B'; If Score >= 70 Then Grade = 'C'; If Score >= 60 Then Grade = 'D'; If Score < 60 Then Grade = 'F'; StudentScoreGrade Jane75D Dave56F Jack90D Sue68D
10
If, Then, Else If Score >= 90 Then Grade = 'A'; If Score >= 80 Then Grade = 'B'; If Score >= 70 Then Grade = 'C'; If Score >= 60 Then Grade = 'D'; If Score < 60 Then Grade = 'F'; StudentScoreGrade Jane75D Dave56F Jack90D Sue68D Common mistake: Not using ELSE IF Each subsequent IF re-evaluated every observation Common mistake: Not using ELSE IF Each subsequent IF re-evaluated every observation
11
If, Then, Else If Score >= 90 Then Grade = 'A'; If Score >= 80 Then Grade = 'B'; If Score >= 70 Then Grade = 'C'; If Score >= 60 Then Grade = 'D'; If Score < 60 Then Grade = 'F'; StudentScoreGrade Jack90 ABCD
12
If, Then, Else If Score >= 90 Then Grade = 'A'; ELSE If Score >= 80 Then Grade = 'B'; ELSE If Score >= 70 Then Grade = 'C'; ELSE If Score >= 60 Then Grade = 'D'; ELSE If Score < 60 Then Grade = 'F'; StudentScoreGrade Jack90 A
13
If, Then, Else If Score >= 90 Then Grade = 'A'; ELSE If Score >= 80 Then Grade = 'B'; ELSE If Score >= 70 Then Grade = 'C'; ELSE If Score >= 60 Then Grade = 'D'; ELSE If Score < 60 Then Grade = 'F'; StudentScoreGrade Jane75C Dave56F Jack90A Sue68D
14
Operators
15
Arithmetic operators ArithmeticSymbolExample Addition+ Xplus = 4+2; Subtraction– Xminus = 4-2; Multiplication* Xmult = 4*2; Division/ Xdiv = 4/2; Exponents** Xexp = 4**2; Negative numbers– Xneg = -2;
16
Comparison operators Logical comparisonMnemonicSymbol Equal toEQ= Not equal toNE^= or ~= Less thanLT< Less than or equal toLE<= Greater thanGT> Greater than or equal toGE>= Equal to one in a listIN Not equal to any in a listNOT IN Note: <> also used for not equal to, but only in PROC SQL
17
Logical operators Boolean operator And Or Not
18
Logical operators POP QUIZ: If A or B and C; is the same as… If (A or B) and C; If A or (B and C);
19
Logical operators POP QUIZ: If A or B and C; is the same as… If (A or B) and C; If A or (B and C);
20
Order of operations 1. Arithmetic operators 2. Comparison operators (, =, LIKE, etc.) 3. Logical operators a. NOT b. AND c. OR Use parentheses to control the order of operations
21
If, Then, Else Logical conditions can be as complicated as you need them to be Just make sure your order of operations is correct
22
Using IF and WHERE Subsetting datasets
23
Can use IF or WHERE statements to only include observations you need Both IF and WHERE statements can be used within DATA step if using SET statement to read in SAS data IF statement must be used within DATA step if using INPUT statement to read raw data WHERE statement must be used within PROC step
24
Subsetting datasets Can use either IF or WHERE in a DATA step with SET statement In both examples “Minors” dataset will only include observations where age is less than 18
25
Subsetting datasets Think of if age lt 18; as short for if age lt 18 then output; Can output to multiple datasets using IF/THEN logic
26
Subsetting IF Use IF in the DATA step to bring in only the selected observations when using an INPUT statement “Females” dataset will only include observations where gender = ‘F’
27
Subsetting IF To improve efficiency, read the value of gender before using it to subset the data Must use a trailing @ sign (See chapter 21, section 11)
28
WHERE statement Use a WHERE statement in a PROC step to only include selected observations “Test_data” dataset still includes all observations Only observations where age is less than 18 will be included in the calculations and output of the procedure
29
OperatorDescriptionExampleMatches IS MISSINGMatches a missing value where gender is missing; A missing character value IS NULLEquivalent to IS MISSING where age is null; A missing numeric value BETWEEN AND An inclusive range where age between 20 and 40; All values between 20 and 40, including 20 and 40 CONTAINSMatches a substring where name contains 'MAC'; MACON, IMMACULATE LIKEMatching with wildcards where name like 'R_N%'; RON, RONALD, RUN, RUNNING =*Phonetic matching where name =* 'NICK'; NICK, NACK, NIKKI WHERE statement WHERE statement allows for additional operators
30
WHERE statement OperatorDescriptionExampleMatches IS MISSINGMatches a missing value where gender is missing; A missing character value IS NULLEquivalent to IS MISSING where age is null; A missing numeric value BETWEEN AND An inclusive range where age between 20 and 40; All values between 20 and 40, including 20 and 40 CONTAINSMatches a substring where name contains 'MAC'; MACON, IMMACULATE LIKEMatching with wildcards where name like 'R_N%'; RON, RONALD, RUN, RUNNING =*Phonetic matching where name =* 'NICK'; NICK, NACK, NIKKI WHERE statement allows for additional operators
31
Using DROP and KEEP Selecting variables
32
By default, SAS will keep all variables of the input dataset Use DROP to exclude certain variables from the output dataset Use KEEP to include only certain variables from the output dataset
33
Selecting variables Can be written as statements in a DATA step or as DROP= / KEEP= DATA step options Which method you use will affect which variables are available within the DATA step as well as processing efficiency.
34
Selecting variables Can be written as statements in a DATA step or as DROP= / KEEP= DATA step options All variables (including gender) will be read to into working memory. Gender will be excluded when ‘DEMO’ dataset is written. Gender available for use in the DATA step Less processing efficiency All variables (including gender) will be read to into working memory. Gender will be excluded when ‘DEMO’ dataset is written. Gender available for use in the DATA step Less processing efficiency
35
Selecting variables Can be written as statements in a DATA step or as DROP= / KEEP= DATA step options Gender will not be read to into working memory (and will be excluded when ‘DEMO’ dataset is written ) Gender not available for use in the DATA step More processing efficiency Gender will not be read to into working memory (and will be excluded when ‘DEMO’ dataset is written ) Gender not available for use in the DATA step More processing efficiency
36
Selecting variables Can use DROP= and KEEP= together ID, Quiz1, Quiz2, and Quiz3 will be read into working memory (and available for use in the DATA step) Only ID and QuizGrade will be written to the “Final” dataset ID, Quiz1, Quiz2, and Quiz3 will be read into working memory (and available for use in the DATA step) Only ID and QuizGrade will be written to the “Final” dataset
37
Appending datasets
38
SET statement The simplest method for adding observations to a SAS dataset is through the SET statement Datasets are simply stacked on top of each other (concatenation) Can use a BY statement to interleave datasets (if input datasets are sorted) Duplicate observations are not overwritten
39
PROC Append SET statement method works best if datasets are small and manageable For better processing efficiency, use PROC Append Can only append one dataset at a time Input datasets must have same variables and attributes (otherwise use FORCE option) Master (larger) dataset Smaller dataset to be added
40
Appending raw data Use FILENAME statement to append raw data during input Input datasets must have same format (same variables and attributes)
41
Joins and merges
42
Merging data – combining columns from two or more datasets Can use DATA step MERGE statement or PROC SQL Can produce inner joins and outer joins Can merge entire datasets or subsets of datasets Can produce one-to-one and one-to-many, but not many-to-many joins Can produce many-to-many joins in PROC SQL
43
MERGE statement Input datasets must have a common identifying variable (primary key) Input datasets must be sorted by this key variable Key variable must have same name and attributes All other variables must have a unique name or they will be overwritten by last merged dataset
44
One-to-one joins Quiz 1Quiz 2 One-to-one Student_IDQuiz1 00185 00286 00395 00497 Student_IDQuiz2 00196 00288 00385 00494
45
Student_IDQuiz1Quiz2 0018596 0028688 0039585 0049794 One-to-one joins Quiz 1Quiz 2 One-to-one
46
One-to-many joins Student demographicsCourses taken Student_IDGender 001F 002M 003M 004F Student_IDCourse_Title 001Psychology 101 001Philosophy 105 001Math 212 002Writing 222 002Psychology 101 002Spanish 301 …… One-to-many
47
One-to-many joins Student demographicsCourses taken Student_IDGenderCourse_Title 001FPsychology 101 001FPhilosophy 105 001FMath 212 002MWriting 222 002MPsychology 101 002MSpanish 301 ……… One-to-many
48
MERGE statement Example – 3 datasets Want one dataset with all three quiz grades PersQuizSocQuizCogQuiz
49
MERGE statement First sort the datasets using the PROC Sort Datasets must be sorted or the merge will not work properly
50
MERGE statement Then merge in the DATA step
51
Inner Joins Example – 2 datasets Some students took quiz 1, but not quiz 2 Some students took quiz 2, but not quiz 1 Some students took both quizzes
52
Inner Joins Students who took quiz 2 Students who took quiz 1 Students who took both quizzes
53
Inner Joins Use the IN= option Use IN= to create temporary aliases for the input datasets Use subsetting IF to only include observations in both datasets Q1 = 1 means the observation is in the Q1 (PersQuiz) dataset
54
Excluding Joins Students who took quiz 2 Students who took quiz 1 Students who took both quizzes
55
Excluding Joins Who took Quiz 1, but not Quiz 2? Use subsetting IF to only include observations in one dataset that are not in the other Q2 = 0 means the observation is in not Q2 (SocQuiz) dataset We’ll discuss other inner and outer joins with PROC SQL
56
Read chapters 9, 11 & 12 and section 10.13 For next week…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.