BMTRY 789 Lecture 2 SAS Syntax, entering raw data, etc. Lecturer: Annie N. Simpson, MSc. Readings – Chapters 1, 2, 12, & 13 Lab Problems 1.1, 1.2, 1.3, 1.5, 1.10, 12.1, 12.2, 12.6, 12.16, 13.3, 13.8 Homework Due – None Homework for Next Week – No Class but turn in HW1!
Summer 2009 BMTRY 789 Intro. To SAS Programming2
Summer 2009 BMTRY 789 Intro. To SAS Programming3 Parts of a SAS Program What are the two main parts of a SAS program?
Summer 2009 BMTRY 789 Intro. To SAS Programming4 Parts of a SAS Program What is a SAS STATEMENT?
Summer 2009 BMTRY 789 Intro. To SAS Programming5 DATA Step What takes place in a DATA step?
Summer 2009 BMTRY 789 Intro. To SAS Programming6 DATA Step = Do/Create Things What takes place in a DATA step? Input Data (what types?) Do END loops IF-THEN-ELSE statements Subset data: IF expression/ IF expression THEN DELETE Create and redefine variables Functions Interleave, merge, and update
Summer 2009 BMTRY 789 Intro. To SAS Programming7 PROC Step What takes place in a PROC step?
Summer 2009 BMTRY 789 Intro. To SAS Programming8 PROC Step = Produce Results What takes place in a PROC step? Perform specific analysis or function Sorting Printing Univariate Analysis Analysis of variance Regression…
Summer 2009 BMTRY 789 Intro. To SAS Programming9 PROC Step What PROCs have you learned about in your readings so far?
Summer 2009 BMTRY 789 Intro. To SAS Programming10 PROC Step What PROC would you use to produce Simple Descriptive Statistics? What about to produce a stem-and-leaf plot, boxplot, histogram, QQPlot, etc?
Summer 2009 BMTRY 789 Intro. To SAS Programming11 PROC Step broken down into subgroups How do you get the Proc Means output separately for men and women if you have a GENDER variable? What descriptive stats can you do on the non-numeric data? What Proc would you use?
Summer 2009 BMTRY 789 Intro. To SAS Programming12 PROC Step for Graphics? What PROCs can you use to produce graphs and charts?
Summer 2009 BMTRY 789 Intro. To SAS Programming13 PROC Step for Graphics? What is the difference between Proc Plot and GPlot? Proc Chart and Gchart?
Summer 2009 BMTRY 789 Intro. To SAS Programming14 DATA…How do we work with it? What type of data is this? Data EX1; INPUT Group$ X Y Z; DATALINES; Control Treat Control Treat ; Run;
Summer 2009 BMTRY 789 Intro. To SAS Programming15 SAS INPUT & INFILE Statements In what 2 situations do you use an INPUT statement? 1. ________ 2. ________ When is the only time that you use an INFILE statement? What is the INPUT statement really accomplishing? (i.e. why does SAS need it)
Summer 2009 BMTRY 789 Intro. To SAS Programming16 SAS INPUT Statement Before you can analyze your data with SAS software, your data must be in a form that SAS can read If you put raw data directly in your SAS program, then your data are internal You may want to do this when you have small amounts of data, or you are testing a program with a small test data set INPUT is used to read data from an external source or from internal data contained in your SAS program The INFILE statement names an external file from which to read the data; otherwise the CARDS (or DATALINES) statement is used to precede the internal data
Summer 2009 BMTRY 789 Intro. To SAS Programming17 External raw data files Usually you will want to keep your data in external files, separating the data from the program. Use the INFILE statement to tell SAS the filename and path (directory) of the external file containing the data. The INFILE statement follows the DATA statement and must precede the INPUT statement. After the INFILE keyword, the file path and name are enclosed in single quotes.
Summer 2009 BMTRY 789 Intro. To SAS Programming18 INPUT statement example Data one; INFILE ‘c:\MyData\diabetes.dat’; Input a$ b c; Run; *Reading from an external file into a SAS data set Data one; Input a$ b c; cards; ; Run; *Reading internal data to create SAS data set ‘one’
Summer 2009 BMTRY 789 Intro. To SAS Programming19 *Note - SAS log Whenever you read data from an external file, SAS gives some very valuable information about the file in the SAS log Always check this information after you read a file as it could indicate problems A simple comparison of the number of records read from the INFILE with the number of observations in the SAS data set can tell you a lot about whether or not SAS is reading your data correctly
Summer 2009 BMTRY 789 Intro. To SAS Programming20 *Note – Long Records In some operating environments, SAS assumes external files have a record length of 256 or less. (The record length is the number of characters, including spaces, on a data line.) If you data lines are long, and it looks like SAS is not reading all your data, then use the LRECL= option in the INFILE statement to specify a record length at least as long as the longest record in your data file. INFILE ‘c:\MyData\Diabetes.dat’ LRECL=2000;
Summer 2009 BMTRY 789 Intro. To SAS Programming21 Controlling INPUT with Options in the INFILE statement The following options are useful for reading particular types of data files. Place these options after the filename in the INFILE statement. FIRSTOBS= This tells SAS at what line to begin reading data. This is useful if you have a data file that contains descriptive text or header information at the beginning and you want to skip over these lines to begin reading the data. OBS= This tells SAS to stop reading when it gets to that line in the raw data file.
Summer 2009 BMTRY 789 Intro. To SAS Programming22 Controlling INPUT with Options in the INFILE statement (cont.) MISSOVER By default, SAS will go to the next data line to read more data if SAS has reached the end of the data line and there are still more variables in the INPUT statement that have not been assigned values. The MISSOVER option tells SAS that if it runs out of data, don’t go to the next data line. Instead, assign missing values to any remaining variables before proceeding to the next line.
Summer 2009 BMTRY 789 Intro. To SAS Programming23 Controlling INPUT with Options in the INFILE statement (cont.) PAD You need this option when you are reading data using column or formatted input and some data lines are shorter than others. If a variable’s field extends past the end of the data line, then, by default, SAS will go to the next line to start reading the variable’s value. This option tells SAS to read data for the variable until it reaches the end of the data line, or the last column specified in the format or column range, whichever comes first.
Summer 2009 BMTRY 789 Intro. To SAS Programming24 Data Step: input statement There are three basic forms of the input statement: 1. List input (free form) – data fields must be separated by at least one blank. List the names of the variables, follow the name with $ for character data Example: Input Name$ Age; 2. Column input – follow the variable name (and $ for character) with a startingcolumn – endingcolumn Example: Input Name$ 1-15; 3. Formatted input – Optionally precede the variable name follow the variable name with a SAS format designation Example: Name$ DOB mmddyy8.;
Summer 2009 BMTRY 789 Intro. To SAS Programming25 LIST INPUT: Reading Raw Data Separated by Spaces If the values in your raw data file are all separated by at least one space, then using list input to read the data may be appropriate Any missing data must be indicated with a period Character data, if present, must be simple: no embedded spaces, and no values greater than eight characters in length. (Use the LENGTH statement to change the length) LENGTH Name$ 20.; If the data files contains dates or other values which need special treatment, then list input may not be appropriate INPUT Name$ Age Height; The $ after Name indicates that it is a character variable, whereas the Age and Height variables are both numeric
Summer 2009 BMTRY 789 Intro. To SAS Programming26 COLUMN INPUT: Reading Raw Data Separated by Columns If each of the variable’s values is always found in the same place in the data line, then you can use column input as long as all values are character or standard numeric Standard numeric data contain only number, decimal points, plus and minus signs, and E for scientific notation. Dates or numbers with embedded commas, for example, are not standard INPUT Name$ 1-10 Age Height 14-18; The first variable, Name, is character and the data values are in columns 1 through 10. The Age and Height variables are both numeric, since they are not followed by a $, and data values for both of these variables are in the column ranges listed after their names
Summer 2009 BMTRY 789 Intro. To SAS Programming27 FORMATTED INPUT: Reading Raw Data NOT in Standard Format This is where you want to use a Formatted Input or a Mixed Input. Informats are useful anytime you have non-standard data Numbers with embedded commas or dollar signs are examples of non-standard data Dates are perhaps the most common non-standard data Using date informats, SAS will convert conventional forms of dates into a number, the number of days since January 1, This number is referred to as a SAS date value (0)
Summer 2009 BMTRY 789 Intro. To SAS Programming28 Difference between INFORMAT and FORMAT? INFORMATs give SAS special instructions for reading a variable FORMATs give SAS special instructions for writing a variable If specified in a DATA step, the name of the informat or format will be saved in the data set and will be printed by PROC CONTENTS Like the LABEL statement, these can also be used in the PROC step to customize your reports, but they would not be stored in the data set
Summer 2009 BMTRY 789 Intro. To SAS Programming29 Informats: 3 basic types Character, numeric, date Character: $informatw. Numeric: informatw.d Date: informatw. The $ indicates character informats, INFORMAT is the name of the informat, w is the total width, and d is the number of decimal places (numeric only) Two informats do not have names: $w., which reads standard character data, and w.d, which reads standard numeric data
Summer 2009 BMTRY 789 Intro. To SAS Programming30 Informats (cont.) The period in an informat is very important because it distinguishes an informat from a variable name, which, by default, cannot contain any special characters except the underscore INPUT Name : $10. Age : 3. Height : 5.1 DOB : MMDDYY10. *Selected Informats can be found in pp (3 rd Ed) in “The Little SAS Book”.
Summer 2009 BMTRY 789 Intro. To SAS Programming31 Formatted Input Example INPUT Name : $16. Age : Type : $1. +1 Date MMDDYY10. (Score1 Score2 Score3 Score4 Score5) (4.1); The variable Name has an informat of $16., meaning that it is a character variable 16 columns wide. Variable Age has an informat of three, is numeric, three columns wide, and has no decimal places. The +1 skips over one column. Variable Type is character, and it is one column wide. Variable Date has an informat MMDDYY10. And reads dates in the form or 10/31/1999, each 10 columns wide. The remaining variables, Score1 through Score5, all require the same informat, 4.1. By putting the variables and the informat in separate sets of parentheses, you have only to list the informat once.
Summer 2009 BMTRY 789 Intro. To SAS Programming32 Mixing Input Styles List style is the easiest; column style is a bit more work; and formatted style is the hardest of the three. However, column and formatted styles do not require spaces (or other delimiters) between variables and can read embedded blanks. Sometimes you use one style, sometimes another, and sometimes the easiest way is to use a combination of styles. SAS is so flexible that you can mix and match any of the input styles for your own convenience.
Summer 2009 BMTRY 789 Intro. To SAS Programming33 Mixing Input Styles (cont.) With list style input, SAS automatically scans to the next non-blank field and starts reading. With column style input, SAS starts reading in the exact column that you specify. But with formatted input, SAS just starts reading-wherever the pointer is, that is where SAS reads. Sometimes you need to move the pointer explicitly, and you can do that by using the column where n is the number of the column SAS should move to.
Summer 2009 BMTRY 789 Intro. To SAS Programming34 Mixed Input example INPUT ParkName$ 1-22 State$ Acreage COMMA9.; Yellowstone ID/MT/WY 1872 *4,065,493 Everglades FL 1934 *1,398,800 Yosemite CA 1864 * 760,917 Great Smokey Mountains NC/TN 1926* 520,269 Wolf Trap Farm VA 1966 * 130 INPUT ParkName$ 1-22 State$ Year Acreage COMMA9.; Acreage would look like (It would start reading at the *):
Summer 2009 BMTRY 789 Intro. To SAS Programming35 Reading Multiple Lines of Raw Data per Observation In a typical raw data file each line of data represents one observation, but sometimes the data for each observation are spread out over more than one line. To tell SAS when to skip to a new line, you simply add line pointers to your INPUT statement. To read more than one line of raw data for a single observation, you simply insert a slash (/) into your INPUT statement when you want to skip to the next line of raw data.
Summer 2009 BMTRY 789 Intro. To SAS Programming36 Reading Multiple Lines of Raw Data per Observation (con.) The (#n) works the same as (/) but it is more fexible. The #n works by inserting the number of the column for that observation where you want to read your raw data. Nome AK INPUT City$ State$ / NormHi NormLo #3 RecHi RecLo; Miami FL …
Summer 2009 BMTRY 789 Intro. To SAS Programming37 Reading Multiple Observations per Line of Raw Data When you have multiple observations per line of raw data, you can use double trailing at signs at the end of your INPUT statement. SAS will hold that line of data, continuing to read observations until it either runs out of data or reaches an INPUT statement that does not end with a double This is also known as a “hard hold”. Nome AK Miami FL Atlanta INPUT City$ State$ NormHi NormLo RecHi RecLo
Summer 2009 BMTRY 789 Intro. To SAS Programming38 Reading Part of a Raw Data File You don’t have to read all the data before you tell SAS whether to keep an observation. Instead, you can read just enough variables to decide whether to keep the current observation. Similar to the SAS will hold that line of data with a single This is known as a “soft hold”. While the holds that line, you can test the observation with an IF statement to see if it’s one you want to keep. If it is, you can then read the data for the remaining variables with a second INPUT statement. With the trailing SAS will automatically start reading the next line of raw data with each INPUT statement.
Summer 2009 BMTRY 789 Intro. To SAS Programming39 Reading Part of a Raw Data File Example Suppose you have a dataset containing heart and lung transplant information but you are trying to construct a dataset of only lung transplant patients. It is a very large data set that takes a lot of time to run so you don’t want to read it all in first and then select out the portion you want to keep. It would be better to read in only those data that you want initially.
Summer 2009 BMTRY 789 Intro. To SAS Programming40 Reading Part of a Raw Data File Example (cont.) Heart nov1989 Heart sep1992 Lung jul1995 Heart jan1990 Lung mar1998 DATA Lung; INFILE ‘c:\MyData\Trnsplnt.dat’; INPUT If Type = ‘Heart’ then DELETE; INPUT RecNum TranDt : Date9.; Run;
Summer 2009 BMTRY 789 Intro. To SAS Programming41 Reading external comma- delimited data We have two choices when given this type of data We can use an editor and replace all the commas with blanks, or We can leave the commas in the data and use the DLM= option in the INFILE statement Data HtWt; Infile ‘c:\MyData\survey.txt’ DLM=‘,’; Input ID Gender$ Age Height Weight; Run;
Summer 2009 BMTRY 789 Intro. To SAS Programming42 Reading external comma- delimited data (cont.) Another method besides the DLM= option is to use DSD in the INFILE This option performs several other functions besides treating commas as delimiters. If it finds two adjacent commas, it will assign a missing value It will allow text strings surrounded by quotes to be read into a character variable and will strip the quotes in the process Data HtWt; Infile ‘c:\MyData\survey.txt’ DSD; Input ID Gender$ Age Height Weight; Run;
Summer 2009 BMTRY 789 Intro. To SAS Programming43 Permanent SAS Data Sets A two level name…a Temporary SAS data set is the one level name that we have been using: LibraryName.DataSetName Temporary SAS data sets will not exist when you shut down the instance of SAS in which they were created. Data new; Set AIDS; Run; First define a SAS Library (Libref)
Summer 2009 BMTRY 789 Intro. To SAS Programming44 Libname Statement Use this statement to define your SAS Library location before using your SAS data sets Example: LIBNAME Annie ‘C:\SASDATA’; Proc Means Data = Annie.EX4A N MEAN STD; Var X Y Z; Run;
Summer 2009 BMTRY 789 Intro. To SAS Programming45 Creating Permanent SAS Data Sets Libname annie “C:\SASDATA”; Data Annie.EX1; INPUT Group$ X Y Z; DATALINES; Control Treat Control Treat ; Run;
Summer 2009 BMTRY 789 Intro. To SAS Programming46 Using the Permanent SAS Data Sets Libname xyz “C:\SASDATA”; Title “Means from EX1”; Proc Means Data=xyz.EX1; Var X Y Z; Run;
Summer 2009 BMTRY 789 Intro. To SAS Programming47 Now let’s try the in-class problems listed on slide 1