Lesson 4 - Topics Creating new variables in the data step SAS Functions.

Slides:



Advertisements
Similar presentations
Debugging SAS Programs
Advertisements

I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
Statistical Techniques I EXST7005 Start here Measures of Dispersion.
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Professional Seminar Northwestern Polytechnic University By Dr. Michael M Cheng.
SAS Programming: Working With Variables. Data Step Manipulations New variables should be created during a Data step Existing variables should be manipulated.
Today: Run SAS programs on Saturn (UNIX tutorial) Runs SAS programs on the PC.
Assignmnet: Simple Random Sampling With Replacement Some Solutions.
Regression Diagnostics Using Residual Plots in SAS to Determine the Appropriateness of the Model.
15b. Accessing Data: Frequencies in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
PROC FREQ 1SHRUG November 28, What good is Proc FREQ It Counts! Answers question how many Display data (error checks), descriptive Analyze categorical.
Data Cleaning 101 Ron Cody, Ed.D Robert Wood Johnson Medical School Piscataway, NJ.
Week 3 Topic - Descriptive Procedures Program 3 in course notes Cody & Smith (Chapter 2)
STAT 3130 Statistical Methods II Missing Data and Imputation.
SAS PROC REPORT PROC TABULATE
Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
USING SAS PROCEDURES SAS System Options OPTIONS Statement
April 11 Logistic Regression –Modeling interactions –Analysis of case-control studies –Data presentation.
PREPARING DATA FOR STATISTICAL ANALYSIS Data Cleaning Data Cleaning Dataset Preparation Dataset Preparation Documentation Documentation 9 September 2008.
Lesson 5 - Topics Formatting Output Working with Dates Reading: LSB:3:8-9; 4:1,5-7; 5:1-4.
SAS Macro: Some Tips for Debugging Stat St. Paul’s Hospital April 2, 2007.
SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze Data Using.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Grant Brown.  AIDS patients – compliance with treatment  Binary response – complied or no  Attempt to find factors associated with better compliance.
1 Filling in the blanks with PROC FREQ Bill Klein Ryerson University.
Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
Lecture 3 Topic - Descriptive Procedures Programs 3-4 LSB 4:1-4.4; 4:9:4:11; 8:1-8:5; 5:1-5.2.
Simple Linear Regression. Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting.
SAS Basics. Windows Program Editor Write/edit all your statements here. Log Watch this for any errors in program as it runs. Output Will automatically.
1 EPIB 698C Lecture 4 Raul Cruz-Cano Summer 2012.
Lesson 8 - Topics Creating SAS datasets from procedures Using ODS and data steps to make reports Using PROC RANK Programs in course notes LSB 4:11;5:3.
FOR MONDAY: Be prepared to hand in a one-page summary of the data you are going to use for your project and your questions to be addressed in the project.
SAS Basics. Windows Program Editor Write/edit all your statement here.
Lecture 4 Ways to get data into SAS Some practice programming
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
Chapter 17 Supplement: Alternatives to IF-THEN/ELSE Processing STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
The Urban Institute - SAS Training6/9/20161 SAS Training This SAS Training Course was designed to introduce users at The Urban Institute to SAS programming.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
Lesson 10 - Topics SAS Procedures for Standard Statistical Tests and Analyses Programs 19 and 20 LSB 8:16-17.
Lecture 3 Topic - Descriptive Procedures
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Applied Business Forecasting and Regression Analysis
Lesson 4 Descriptive Procedures
Lesson 6 - Topics Formatting Output Working with Dates
Lesson 3 Overview Descriptive Procedures Controlling SAS Output
Lecture 2 Topics - Descriptive Procedures
Lesson 10 - Topics SAS Procedures for Standard Statistical Tests and Analyses Programs 19 and 20 LSB 9:4-7;12-13 Welcome to lesson 10. In this lesson.
Lesson 8 - Topics Creating SAS datasets from procedures
Lesson 5 - Topics Creating new variables in the data step
Working With Dates: Dates Come in Many Ways
Lecture 2 Topics - Descriptive Procedures
Working With Dates: Dates Come in Many Ways
Producing Descriptive Statistics
Hans Baumgartner Penn State University
Lecture 2 Topics - Descriptive Procedures
Presentation transcript:

Lesson 4 - Topics Creating new variables in the data step SAS Functions

Creating New Variables Direct assignments(formulas): c = a + b ; d = 2*a + 3*b + 7*c ; bmi = weight/(height*height); Indirect assignments (if/then/else) if age < 50 then young = 1; else young = 2; if income < 15 then tax = 1; else if income < 25 then tax = 2; else if income >=25 then tax = 3;

Direct Assignments (Formulas) Example c = a + b ; So if a = 2, b =3, c = 5; What if a is missing, what is c? C will be missing What if b is missing?

If/then/else Statements With if-then-else definitions SAS stops executing after the first true statement if income < 15 then tax = 1; else if income < 25 then tax = 2; else if income >=25 then tax = 3; What if income is 10? What if income is 23? What if income is 30? What if income is missing? Tax = 1 Tax = 2 Tax = 3 Tax = 1

Create a new variable with 2 levels, one for college graduates and one for non-college graduates. Creating New Variables

Program 5 DATA tdata; INFILE ‘C:\SAS_Files\tomhs.data' ; 1 ptid 49 educ sbp12 3. ; * This way will code missing values to the value 2; if educ < 7 then grad1 = 2 ; else if educ >=7 then grad1 = 1 ; * The next two ways are equivalent and are correct; if educ < 7 and educ ne. then grad2 = 2; else if educ >=7 then grad2 = 1; * IN is a useful function in SAS ; if educ IN(1,2,3,4,5,6) then grad3 = 2; else if educ IN(7,8,9) then grad3 = 1; New variable defines go after the input statement

PROC FREQ DATA=tdata; TABLES educ grad1 grad2 grad3 ; Cumulative Cumulative educ Frequency Percent Frequency Percent Frequency Missing = 1 Cumulative Cumulative grad1 Frequency Percent Frequency Percent Cumulative Cumulative grad2 Frequency Percent Frequency Percent Frequency Missing = 1 Cumulative Cumulative grad3 Frequency Percent Frequency Percent Frequency Missing = 1 Coded the missing value for educ to 2

PROC FREQ DATA=tdata; TABLES educ*grad1 /MISSING NOCUM NOPERCENT NOROW NOCOL; TITLE 'Use Crosstabulation to Verify Recoding'; RUN; Table of educ by grad1 educ grad1 Frequency‚ 1‚ 2‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ. ‚ 0 ‚ 1 ‚ 1 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 1 ‚ 0 ‚ 3 ‚ 3 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 3 ‚ 0 ‚ 4 ‚ 4 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 4 ‚ 0 ‚ 23 ‚ 23 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 5 ‚ 0 ‚ 14 ‚ 14 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 6 ‚ 0 ‚ 12 ‚ 12 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 7 ‚ 16 ‚ 0 ‚ 16 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 8 ‚ 10 ‚ 0 ‚ 10 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 9 ‚ 17 ‚ 0 ‚ 17 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total This shows that the missing value for educ got assigned a value of 2

* Recode sbp12 into 3 levels; if sbp12 =. then sbp12c =. ; else if sbp12 < 120 then sbp12c = 1 ; else if sbp12 < 140 then sbp12c = 2 ; else if sbp12 >=140 then sbp12c = 3 ; With if-then-else definitions SAS stops executing after the first true statement Values < 120 will be assigned value of 1 Values will be assigned value of 2 Values >=140 will be assigned value of 3 Missing values will be assigned to missing

PROC FREQ DATA=tdata; TABLES sbp12c sbp12; RUN; OUTPUT Cumulative Cumulative sbp12c Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Frequency Missing = 8 Cumulative Cumulative sbp12 Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ (more values) Frequency Missing = 8

* Easy but costly error to make; if sbp12 =. then sbp12c =. ; else if sbp12 < 120 then sbp12c = 1 ; else if sbp12 < 140 then sbp12 = 2 ; else if sbp12 >=140 then sbp12c = 3 ; PROC FREQ DATA=tdata; TABLES sbp12c; RUN; The FREQ Procedure Cumulative Cumulative sbp12c Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Frequency Missing = 51 How come no values of 2 and why so many missing?

Important Facts When Creating New Variable 1.New variables are initialized to missing 2.Missing values are < any value if var < value (true if var is missing) 3.Reference missing values for numeric variables as. 4.Reference missing values for character variables as ' ' if sbp =. then... (or if missing(sbp)) if clinic = ' ' then...

SAS Handling of Missing Data When Creating New Variables Direct assignments(formulas): c = a + b ; d = 2*a + 3*b + 7*c ; bmi = weight/(height*height); If any variable on the right-hand side is missing then the new variable will be missing Indirect assignments if age < 50 then young = 1; else young=2; New variables are initialized to missing but may be given a value if any of the IF statements are true

What Value to Set New Variable if age < 20 then teenager = 1; else if age >=20 then teenager = 2; if age < 20 then teenager = 1; else if age >=20 then teenager = 0; if age < 20 then teenager = ‘YES’; else if age >=20 then teenager = ‘NO’;

* Program 6 SAS Functions ; DATA example; INFILE ‘C:\SAS_Files\tomhs.data' ; height weight ursod (se1-se10) ( ); bmi = (weight* )/(height*height); rbmi1 = ROUND(bmi,1); lursod = LOG(ursod); seavg = MEAN (OF se1-se10); semax = MAX (OF se1-se10); semin = MIN (OF se1-se10);

* Use of dash notation ; seavg = MEAN (OF se1-se10); This is the same as seavg = MEAN (se1,se2,se3,se4,se5,se6,se7,se8,se9,se10); The OF is very important. Otherwise SAS thinks you are subtracting se10 from se1. To use this notation the ROOT of the name must be the same.

* Two ways of computing average ; seavg = MEAN (se1,se2,se3,se4,se5,se6,se7,se8,se9,se10); Versus seavg = (se1+se2+se3+se4+se5+se6+se7+se8+se9+se10)/10; Using mean function computes the average of non- missing values. Result is missing only if all values all missing. Using + formula requires all values be non-missing otherwise result will be missing if N(of se1-se10) > 5 then seavg = MEAN(of se1-se10); What does this statement do?

PROC PRINT DATA = example (OBS=15); VAR bmi rbmi1 rbmi2 seavg semin semax ; TITLE 'Listing of Selected Data for 15 Patients '; RUN; PROC FREQ DATA = example; TABLES semax; TITLE 'Distribution of Worse Side Effect Value'; TITLE2 'Side Effect Scores Range from 1 to 4'; RUN; ods graphics on; PROC UNIVARIATE DATA = example ; VAR ursod lursod; QQPLOT ursod lursod; TITLE 'Quantile Plots for Urine Sodium Data'; RUN;

Listing of Selected Data for 10 Patients Obs bmi rbmi1 seavg semin semax

Distribution of Worse Side Effect Value Side Effect Scores Ranges from 1 to 4 The FREQ Procedure Cumulative Cumulative semax Frequency Percent Frequency Percent patients had at least 1 severe side effect

Log transformed value shows a better linear pattern