Introduction to SAS Lecture 2 Brian Healy.

Slides:



Advertisements
Similar presentations
Effecting Efficiency Effortlessly Daniel Carden, Quanticate.
Advertisements

The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Chapter 9: Introducing Macro Variables 1 © Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 3: Editing and Debugging SAS Programs. Some useful tips of using Program Editor Add line number: In the Command Box, type num, enter. Save SAS.
Outline Proc Report Tricks Kelley Weston. Outline Examples 1.Text that spans columnsText that spans columns 2.Patient-level detail in the titlesPatient-level.
Statistics in Science  Introducing SAS ® software Acknowlegements to David Williams Caroline Brophy.
SAS Programming: Working With Variables. Data Step Manipulations New variables should be created during a Data step Existing variables should be manipulated.
Today: Run SAS programs on Saturn (UNIX tutorial) Runs SAS programs on the PC.
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Introduction to SQL Session 1 Retrieving Data From a Single Table.
1 Computer Applications in Epidemiology Dongmei Li Lecture 26 5/6/2009.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
Into to SAS ®. 2 List the components of a SAS program. Open an existing SAS program and run it. Objectives.
Creating SAS® Data Sets
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
“SAS macros are just text substitution!” “ARRRRGGHHH!!!”
I OWA S TATE U NIVERSITY Department of Animal Science Writing Flexible Codes with the SAS Macro Facility (Chapter in the 7 Little SAS Book) Animal Science.
Computing for Research I Spring 2014 January 22, 2014.
STAT 3130 Statistical Methods II Missing Data and Imputation.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
A First Book of C++: From Here To There, Third Edition2 Objectives You should be able to describe: Function and Parameter Declarations Returning a Single.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
Key Data Management Tasks in Stata
1 Experimental Statistics - week 2 Review: 2-sample t-tests paired t-tests Thursday: Meet in 15 Clements!! Bring Cody and Smith book.
Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the.
Math 3400 Computer Applications of Statistics Lecture 1 Introduction and SAS Overview.
SAS Macro: Some Tips for Debugging Stat St. Paul’s Hospital April 2, 2007.
SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze Data Using.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1.
Chapter 1: Introduction to SAS  SAS programs: A sequence of statements in a particular order  Rules for SAS statements: –Every SAS statement ends in.
BMTRY 789 Lecture 2 SAS Syntax, entering raw data, etc. Lecturer: Annie N. Simpson, MSc. Readings – Chapters 1, 2, 12, & 13 Lab Problems 1.1, 1.2, 1.3,
I OWA S TATE U NIVERSITY Department of Animal Science Getting Your Data Into SAS (Chapter 2 in the Little SAS Book) Animal Science 500 Lecture No. 3 September.
5/30/2010 SAS Macro Language Group 6 Pradnya Nimkar, Li Lin, Linsong Zhang & Loc Tran.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.
Introduction to SAS Macros Center for Statistical Consulting Short Course April 15, 2004.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.
XP Tutorial 8 Adding Interactivity with ActionScript.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
Summer SAS Workshop Lecture 3. Summer SAS Workshop Website
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapter 25 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 14 & 19 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Session 1 Retrieving Data From a Single Table
Chapter 2: Getting Data into SAS
Two “identical” programs
Chapter 1: Introduction to SAS
Tamara Arenovich Tony Panzarella
Defining and Calling a Macro
Presentation transcript:

Introduction to SAS Lecture 2 Brian Healy

Why use statistical packages Built-in functions Data manipulation Updated often to include new applications Different packages complete certain tasks more easily than others Packages we will introduce SAS R (S-plus)

SAS Easy to input and output data sets Preferred for data manipulation “proc” used to complete analyses with built-in functions Macros used to build your own functions

Outline SAS Structure Efficient SAS Code for Large Files SAS Macro Facility

Common errors Missing semicolon Misspelling Unmatched quotes/comments Mixed proc and data statement Using wrong options

SAS Structure Data Step: input, create, manipulate or output data Always start with a data line Ex. data one; Procedure Step: complete an operation on data Always start with a proc line Ex. proc contents;

SAS System Options System options are global instructions that affect the entire SAS session and control the way SAS performs operations. SAS system options differ from SAS data set options and statement options in that once you invoke a system option, it remains in effect for all subsequent data and proc steps in a SAS job, unless you specify them. In order to view which options are available and in effect for your SAS session, use proc options; run;

Log, output and procedure options center controls whether SAS procedure output is centered. By default, output is centered. To specify not centered, use nocenter. date prints the date and time to the log and output window. By default, the date and time is printed. To suppress the printing of the date, use nodate. label allows SAS procedures to use labels with variables. By default, labels are permitted. To suppress the printing of labels, use nolabel. notes controls whether notes are printed to the SAS log. By default, notes are printed. To suppress the printing of notes, use nonotes. number controls whether page numbers are printed. By default, page numbers are printed. To suppress the printing of page numbers, use nonumber. linesize= specifies the line size (printer line width) for the SAS log and the SAS procedure output file used by the data step and procedures. pagesize= specifies # of lines that can be printed per page of SAS output. missing= specifies the character to be printed for missing numeric values. formchar= specifies the the list of graphics characters that define table boundaries. Example: OPTIONS NOCENTER NODATE NONOTES LINESIZE=80 MISSING=. ;

SAS data set control options SAS data set control options specify how SAS data sets are input, processed, and output. firstobs= causes SAS to begin reading at a specified observation in a data set. The default is firstobs=1. obs= specifies the last observation from a data set or the last record from a raw data file that SAS is to read. To return to using all observations in a data set use obs=all replace specifies whether permanently stored SAS data sets are to be replaced. By default, the SAS system will over-write existing SAS data sets if the SAS data set is re-specified in a data step. To suppress this option, use noreplace. Example: OPTIONS OBS=100 NOREPLACE;

Error handling options Error handling options specify how the SAS System reports on and recovers from error conditions. errors= controls the maximum number of observations for which complete error messages are printed. The default maximum number of complete error messages is errors=20 fmterr controls whether the SAS System generates an error message when the system cannot find a format to associate with a variable. SAS will generate an ERROR message for every unknown format it encounters and will terminate the SAS job without running any following data and proc steps. To read a SAS system data set without requiring a SAS format library, use nofmterr. Example: OPTIONS ERRORS=100 NOFMTERR;

Statements for Reading Data data statement names the data set you are making Can use any of the following commands to input data infile Identifies an external raw data file to read with an INPUT statement input Lists variable names in the input file cards Indicates internal data set Reads a SAS data set

Looking at the data To look at the variables in a data set, use proc contents data=dataset; run; To look at the actual data in the data set, proc print data=dataset (obs=num); var varlist;

Example data treat; infile “g:\shared\BIO271\treat.dat”; input id bpa bpb chola cholb; run; proc print data = treat (obs=10); proc contents data=treat;

Delimiter Option blank space (default) DELIMITER= option specifies that the INPUT statement use a character other than a blank as a delimiter for data values that are read with list input

Delimiter Example Sometimes you want to input the data yourself Try the following data step: data nums; infile datalines dsd delimiter=‘&'; input X Y Z; datalines; 1&2&3 4&5&6 7&8&9 ; Notice that there are no semicolons until the end of the datalines

Cards Another way to input data using the keyboard (and often a last resort if having problems input the data) is cards Similar to datalines data score; input test1 test2 test3; cards; 91 87 95 97 . 92 . 89 99 ; run;

Inputting character variables Sometimes your data will have characters Example: data fam; input name$ age; cards; Brian 27 Andrew 29 Kate 24 run; proc print data=fam; What is different and what happens if you don’t have the dollar sign?

Using the libname command The final way we will show to input data is if you have a SAS data set , you can use a libname command libname summer "g:\shared\bio271"; data treat2; set summer.treat2; run; Look at the data set with proc print

Labeling variables Variable label: Use the label statement in the data step to assign labels to the variables.  You could also assign labels to variables in proc steps, but then the labels only exist for that step.  When labels are assigned in the data step they are available for all procedures that use that data set. Example: DATA labtreat; SET treat; LABEL id=“patient id” bpa =“BP on treatment A" bpb =“BP on treatment B" cholA=“Cholesterol on treatment A” cholB=“Cholesterol on treatment B"; RUN; PROC CONTENTS DATA=labtreat;

Try on your own Make a data set with the following data calling it redsox 8, 58, 491, 163 7, 50, 469, 133 31, 107, 458, 136 33, 111, 410, 117 Label the variables HR, RBI, AB, HITS Use proc print to ensure that you have input the data correctly

Data Manipulations One of the best parts of SAS is the ability to complete data manipulations There are four major types of manipulations Subset of data Drop / keep variables Drop observations Concatenate data files Merge data files Create new variables

Drop / Keep SAS easily allows you to make a data set with a subset of the variables What do you think happens with this code? DATA redsox2; SET redsox; KEEP ba rbi; RUN; How do you think you could use drop to do the same thing?

Dropping observations We can also get a subset of the observations Read in treat2 from the g: drive This is helpful when we want to remove missing data DATA notreat2; SET treat2; IF cholA ^= . ; RUN;

Concatenating data files in SAS SAS allows us to combine dataset by adding more observations, using data tottreat; set treat treat2; run; Check that it worked using proc print If a variable is called by a different name in each dataset, you must use: data momdad; set dads(RENAME=(dadinc=inc)) moms(RENAME=(mominc=inc));

Merge data files SAS also allows us to add more variables by merging data files The data set demo gives demographic information about the patients in treat Read in demo Now, use this code to combine the information data extratreat; merge treat demo; by id; run; Note: the data in each data set must be sorted to use this code

Making new variables We can make new variables in a data step Let’s make a new variable in the redsox data set by finding batting average and a variable for hr30 data redsox2; set redsox; ba=hits/ab; if hr>=30 then hr30=1 else hr30=0; run;

Try on your own Make a new data set called redsox3 using the following data and combine it with redsox 7, 51, 378, 113 4, 41, 367, 99 20, 58, 361, 109 Make a new variable in redsox3 that equals 1 if rbi is more than 100 and 0 if rib is less than or equal to 100

Statements for Outputting Data file: Specifies the current output file for PUT statements put: Writes lines to the SAS log, to the SAS procedure output file, or to an external file that is specified in the most recent FILE statement. Example: data _null_; set redsox; file ‘p:\redsox.csv' delimiter=',' dsd; put hr rbi ab hits; run;

Comparisons The INFILE statement specifies the input file for any INPUT statements in the DATA step. The FILE statement specifies the output file for any PUT statements in the DATA step. Both the FILE and INFILE statements allow you to use options that provide SAS with additional information about the external file being used. An INFILE statement usually identifies data from an external file. A DATALINES statement indicates that data follow in the job stream. You can use the INFILE statement with the file specification DATALINES to take advantage of certain data-reading options that effect how the INPUT statement reads in-stream data.

Missing values Missing values in SAS are shown by . As a general rule, SAS procedures that perform computations handle missing data by omitting the missing values, including proc means, proc freq, proc corr, and proc reg Check SAS web page for more information

Missing values in logical statements SAS treats a missing value as the smallest possible value (e.g., negative infinity) in logical statements. data times6; set times ; if (var1 <= 1.5) then varc1 = 0; else varc1 = 1 ; run ; Output: Obs id var1 varc1 1 1 1.5 0 2 . 0 3 2.1 1

Basic procs proc print and proc contents- we have seen these proc sort proc means proc univariate proc plot

Options in most procs var: lists the variables you want to perform the proc on by: breaks the data into groups where: limits the data set to a specific group of observations output: allows you to output the results into a data set

Sort data We can use proc sort to sort data The code to complete this is proc sort data=extratreat ; by gender ; run ; proc sort data=extratreat out=extreat ; by gender ; run ; proc sort data=extratreat out=extreat2; by descending gender ; run ; proc sort data=extratreat out=extreat3 noduplicates; by gender ; run ;

proc means / univariate The basic form of proc means is proc means data=extratreat; var ______; by _______; where _______; output out=stat mean=bpamean cholamean; run; The basic form of proc univariate is the same, but much more information is given It is helpful to use the output window to get the info you need

proc plot To make different plots in SAS, you use proc plot Scatterplot proc plot data=redsox; plot rbi*ab; run; You can also make plots using proc univariate data=redsox plot; var rbi;

Try on your own Find the mean blood pressure on treatment A in women Make a scatterplot of blood pressure on treatment B versus blood pressure on treatment A in men Find the median number of home runs hit by the Red Sox

SAS Macro Macros are the SAS method of making functions Avoid repetitious SAS code Create generalizable and flexible SAS code Pass information from one part of a SAS job to another Conditionally execute data steps and PROCs

SAS Macro Facility SAS macro variable SAS Macro There are many discussions of macro variables on the web; one good one is given here: http://www2.sas.com/proceedings/sugi30/130-30.pdf

SAS Macro Delimiters Two delimiters will trigger the macro processor in a SAS program. &macro-variable This refers to a macro variable. The current value of the variable will replace &macro-variable; %macro-name This refers to a macro, which consists of one or more complete SAS statements, or even whole data or proc steps.

SAS Macro Variables SAS Macro variables can be defined and used anywhere in a SAS program, except in data lines. They are independent of a SAS dataset.

SAS Macro Variables %LET: assign text to a macro variable; %LET macrovar = value 1. Macrovar is the name of a global macro variable; 2. Value is macro variable value, which is a character string without quotation or macro expression. %PUT: display macro variable values as text in the SAS log; %put _all_, %put _user_ &macrovar: Substitute the value of a macro variable in a program;

SAS Macro Variables Here is an example of how to use a macro variable: %let int=treat; proc means data=∫ run; Now we can rerun the code again simply changing the value of the macro variable, without altering the rest of the code. %let int=redsox; This is extremely helpful when you have a large amount of code you want to reference

Create SAS Macro Application: Definition: %MACRO macro-name (parm1, parm2,…parmk); Macro definition (&parm1,&parm2,…&parmk) %MEND macro-name; Application: %macro-name(values of parm1, parm2,…,parmk);

SAS Macro Example Import Excel to SAS Datasets by a Macro %macro excelsas(in, out); proc import out=work.&out datafile=“g:\shared\bio271\&in" dbms=excel replace; getnames=yes; run; %mend excelsas; % excelsas(practice.xls,test) Use proc print to ensure that you have the data input properly

What is this macro doing %let int=treat; %let dop=%str(id bpa); %macro happy; data new; set ∫ drop &dop; run; proc means data=new; %mend happy; %happy

In class practice Use the auto data from the g: drive read data into SAS (variables: id, weight, mpg, foreign) create a new variable for better than 20 mpg get means/frequencies for weight and mpg for foreign and domestic vehicles Are there any missing values? Write a macro to sort a data set by a variable and then print the first 10 observations (use macro variables)