Lesson 7 - Topics Reading SAS data sets

Lesson 7 - Topics Reading SAS data sets
Sub-setting and merging SAS data sets Permanent SAS data sets Programs in course notes LSB 2: :7 6:1-2,4-5,9,11-13 Welcome to Lesson 7. In this lesson we will look at working with SAS data sets. This includes creating and using permanent data sets, sub-setting datasets, and merging data sets. These topics are illustrated in programs in the course notes and discussed in the indicated sections of the LSB.

Working With SAS Data Sets
Reading SAS dataset SET Statement Merging SAS datasets MERGE Statement There are two key SAS statements when working with SAS data sets. The first is the SET statement which reads or brings-in a SAS dataset. The second is the MERGE statements which brings in multiple SAS datasets and merges them into one SAS dataset. All of these actions will take place in the DATA step. That is where new datasets are created, whether from raw data or from an existing SAS dataset. Done within a DATA step

SET STATEMENT DATA new; SET old (KEEP = varlist); WHERE = condition;
Reads SAS data set (one row at a time) Replaces INFILE and INPUT statements used when reading in raw data KEEP brings in selected variables (columns) Where brings in selected observations (rows) DATA new; SET old (KEEP = varlist); WHERE = condition; RUN; The SET statement is a simple but important statement – it reads a SAS dataset (one row at a time). The syntax is the keyword SET followed by the name of the data set to read. An optional KEEP= option in parenthesis can be used to restrict the variables brought in. If there is no KEEP statement then all variables are brought in. The WHERE statement limits the observations brought in, based on one or more variables in the dataset. So as the KEEP statement makes the new dataset skinnier, the WHERE statement make the new dataset shorter. These 4 lines of SAS code creates a new dataset called new that contains selected observations and variables from old. If there was no KEEP or WHERE statement then the dataset new would be identical to old. The SET statement replaces both the INFILE and INPUT statements when reading in raw data. You can see that it is much easier to read a SAS dataset. . This creates a new data set called new that has the variables in varlist and selected observations from old.

Making SAS Datasets from Other SAS Datasets;
PROGRAM 10 Making SAS Datasets from Other SAS Datasets; DATA tdata; INFILE ‘C:\SAS_Files\tomhs.data' ; INPUT @ ptid $10. @ clinic $1. @ group @ sex @ sbp @ randdate $10. ; RUN; * Making a new dataset containing only men; DATA men; SET tdata; * reads the existing dataset; WHERE sex = 1; This does the selection; if group in(1,2,3,4,5) then active = 1; else if group in(6) then active = 2; KEEP ptid clinic group sbp12 randdate active; We will illustrate working with SAS datasets in program 10. We first create a dataset called tdata reading in six variables from the raw file tomhs.data. The RUN statement completes the DATA step. We then create a “subset” data set called men which will contain the rows from tdata for which sex = 1. We do this by using the SET statement followed by the WHERE statement. We do not have a KEEP option so all variables are brought in. We then add a new variable called active based on the variable group. We add a KEEP statement to limit the variables written to the data set men. The new dataset will have 6 variables and as many rows are there are men on the dataset. There is no need to have the variable sex on the data set since we know the data set contains all men. Note: Even if we did not want to limit the new data set to men we would still need a data step to add the variable active.

* Making a new dataset containing only women; DATA women; SET tdata;
WHERE sex = 2; if group in(1,2,3,4,5) then active = 1; else if group in(6) then active = 2; KEEP ptid clinic group sbp12 randdate active; RUN; We now have 3 datasets “active” tdata men women We can do the same thing to create a data set containing only women. We now have three datasets to use, tdata which has all rows, and two subset data sets, one for men and one for women. We can use any of these datasets in a procedure depending on which group of participants we want to analyze.

* Making both datasets in one data step; DATA men women; SET tdata;
if group in(1,2,3,4,5) then active = 1; else if group in(6) then active = 2; if sex = 1 then OUTPUT men; else if sex = 2 then OUTPUT women; KEEP ptid clinic randdate group sbp12 active; RUN; Partial Log: NOTE: There were 100 obs read from WORK.TDATA NOTE: The data wet WORK.MEN has 73 obs and 7 variables NOTE: The data set WORK.WOMEN has 27 obs and 7 variables If SAS sees an OUTPUT statement then SAS will output only then; if there is no OUTPUT statement SAS outputs at the end of the data step. We could actually create both datasets in one DATA STEP as illustrated here. We start by naming both datasets on the DATA statement. We then set the dataset tdata (without a where statement) but then use a conditional OUTPUT statement to output the observation to the dataset men or women depending on the variable sex. Noted here is that when a data step has no output statement then SAS outputs the observation at the end of the data step. If SAS encounters an OUTPUT statement then SAS will only output at that time and to the dataset specified The log at the bottom shows us that there are 73 observations on the dataset men and 27 observations on dataset women.

KEEP OPTION vs KEEP STATEMENT
Purpose: Restricts variables read-in or written out DATA highbp; * Brings in only these variables; SET tdata (KEEP = ptid sex sbp12); RUN; DATA highbp; SET tdata ; * Reads in all variables; * Writes out only these variables; KEEP ptid sex sbp12; RUN; * There is also a DROP option/statement; You may have noticed that there are two uses of the keyword KEEP in SAS. There is the KEEP option on the SET statement and there is the KEEP statement in the data step. These are different, although they sometimes have the same effect. The KEEP option on the SET statement limits the variables that are brought in from the dataset specified whereas the KEEP statement limits the variables written to the created dataset. In the first example only the variables ptid, sex, and sbp12 are brought in from the dataset tdata. In the second example all the variables are brought in but only these 3 variables are written to the new dataset. In these two cases the results are the same but the first data step would be more efficient. The KEEP statement would be required if you wanted to reference variables created after the set statement. There is also a corresponding DROP statement and DROP option. I tend to use KEEP versus DROP unless only a couple of variables need to be dropped.

WHERE Versus Logical IF
DATA highbp; * Brings in only certain rows; SET tdata; WHERE sbp12 > 140; RUN; DATA highbp; SET tdata ; * Brings in all rows, outputs only certain rows (can be a new variable); IF sbp12 > 140;* If true then continue; RUN; In addition to the WHERE statement you can also use what is called a subsetting if statement to create a subset dataset, however, there is a difference between the two methods on how SAS processes the statements. The WHERE statement brings in only those observations (rows) where the statement is true. The WHERE variable (or variables) must be on the dataset referenced by the SET statement. Using the logical if statement (without a WHERE) would bring in all the rows but would only output to the new dataset rows where the logical if statement is true. Variables in the logical if statement can be variables created in the data step. So if you want to subset a dataset based on new variables created you would need to use a sub-setting if statement. In the example above, both data steps produce the same results but in two different ways. SAS data steps are pretty fast so either method may be similarly efficient. However, with a large dataset you might want to use the WHERE statement to save computer resources.

PROGRAM 11 - Merging SAS Datasets
DATA clinic; INFILE DATALINES; INPUT id $ sbp ; DATALINES; C B B D A B B B A … more data ; DATA lab; INFILE DATALINES; INPUT id $ glucose; DATALINES; C B D A B B B A D … more data ; Frequently you will have multiple SAS datasets that you want to merge together. For example, laboratory data may be in one dataset and clinical data in another dataset. To perform analyses across the datasets (for example, relating laboratory variables and clinical variables) you first need to merge them together. Program 11 illustrates how to merge datasets using a simple example where all the data is within the program. In this example data comes from two sources: the first is data collected at the clinical center; the second source of data comes from the laboratory. In the first DATA step we create the dataset clinic reading in two variables, the patient ID and systolic BP. The second DATA step creates the dataset lab, reading in the patient ID and glucose. Note that the variable name for patient ID is identical in both datasets (variable id). When you merge datasets you need to have a common variable to link the data together. This is typically an ID of some sort. .

* Creating merged dataset; PROC SORT DATA= clinic; BY id;
PROC SORT DATA= lab; BY id; DATA study; MERGE clinic lab; BY id ; RUN; Note: The BY statement is very important! Before we can merge datasets we need to be sure each dataset is sorted by the variable you want to merge by, in this case the variable id (think of trying to put two stacks of exams together by student ID – it will be much easier and faster if the piles are sorted by student ID). To sort a dataset you use PROC SORT. The syntax is PROC SORT followed by the dataset to be sorted. You follow this by the keyword BY followed by the variable you want the dataset sorted by. PROC SORT sorts the data by the variable in BY and writes the sorted dataset (by default) back to the same dataset. After the two PROC SORTS shown here the datasets baseline and follow-up will be sorted by patient ID. We are then ready to create a new dataset that is the merged data of the two datasets. We do this in a DATA step. Instead of a SET statement we use a MERGE statement followed by the datasets to be merged. This is immediately followed by a BY statement giving the variable that links the data together. That is all you need! In just a couple of lines of code we have created a merged dataset called study. An important note here: Do not forget the BY statement. If you omit the BY statement then SAS does a one-to-one matching, taking the first row of clinic and merging it with the first row of lab. This is not what you want to do; the data from one patient may get merged with a different patient. (think of the example of merging the two exams together)!

Merged Dataset Obs id sbp glucose 1 A00869 110 99 2 A01088 117 93
Here is the result of the merged dataset , displaying the variables using PROC PRINT. The two observations in red had data only in the clinic dataset and the observation in blue had data only in the lab dataset. Note their data is missing for all variables from the missing dataset. That is what you would expect SAS to do. Here then is an important thing to remember. When merging datasets if an observation is not in a dataset then all variables from that dataset are set to missing. What if the same variable name is in both datasets and the subject has data in both datasets? Well, this could cause a problem. SAS will take the right most dataset value. If that is what you want then you are OK. In general, though, you want variables to have unique names across the datasets (except for the merge by variable).

What if you want only observations that are in both datasets?
DATA study; MERGE clinic (IN=in1) lab (IN=in2); BY id; if in1 and in2; RUN; PROC PRINT DATA=study; TITLE ‘Patients with Clinic and Lab'; What if you want to include on the merged dataset only persons that were in both datasets (i.e. you wanted a dataset to include only persons who had clinic and lab data). You can do this by using the IN= option on the MERGE statement. For each dataset set IN to a variable name. Here I simply use the names in1 and in2. What does SAS set these variables to? Well, as SAS merges the two datasets together if the id is in clinic then in1 will be set to 1; if not then in1 is set to 0. The variable in2 is defined similarly. We can use these variables to select which patients to include on the merged data, based on which datasets the patients are found in. We do this by using a logical IF statement. Here the simple statement IF in1 and in2 tells SAS to include only merged observations that have data in both datasets. Without any restrictions SAS will include on the merged dataset patients that are in either dataset.

* Must be in both datasets; if in1 and in2;
Logical Statements * Must be in 1st dataset; if in1; * Same as: if in1 = 1; * Must be in 2nd dataset; if in2; * Must be in both datasets; if in1 and in2; Here are other logical statements you could use to restrict which patients are included in the merged dataset. If you want to include only patients that are in the clinic dataset then you would use the statement IF in1; similarly if you want to include only patients in the lab dataset you would use if in2. Note the statement if in1 is the same as if in1 = 1 (The value 1 represents TRUE and 0 represents FALSE).

Things to Remember When Merging Datasets
Need to have common variable name in each dataset to use as linking variable Variables in dataset with no match will be set to missing Rows matched that have same variable names will be assigned right-most dataset value Always remember the BY statement in the merge Here is a summary of important points to remember when merging datasets. First, you need to have a common variable in each dataset to be merged. This is usually an ID of some sort. Second, if an ID is found in one dataset but not the other than variables in the dataset not found are set to missing. Third, if there are common variable names in both datasets then the values form the second dataset will be used. As mentioned, you usually want unique names across datasets, except for the linking variable. Lastly always remember the BY statement after the MERGE. Otherwise you will usually get the wrong results. If you are database inclined, you may want to explore PROC SQL to merge your data. It is more complicated to use, but it is more flexible, and if you have database experience you will be familiar with the syntax.

Temporary vs Permanent SAS Datasets
Temporary (or working) SAS dataset - After SAS session is over the dataset is deleted. DATA bp; * bp is deleted after SAS session; (rest of program) Permanent SAS dataset - After program is run the dataset is saved and is available for use in future programs. You need to tell SAS where to store/retrieve the dataset. Note: For PC SAS the working dataset is available until you end the SAS session. In the programs we have looked at so far the SAS data sets created have been temporary SAS datasets. They are temporary in that after your SAS session is over the datasets are deleted. These are also called “work” datasets. In the DATA statement here the dataset bp is temporary. It will be deleted after the SAS session is over. This is usually not a problem because we created the data set with a DATA step in a program. To recreate the data set we can just re-run the program. However, there are times when you would like to have the dataset available after your SAS session is over, without having to re-create it using a DATA step. To create a permanent data set you will need to tell SAS where to store the file containing the dataset. We will look at some reasons why you may want to create a permanent SAS dataset and the syntax for how to do that.

Reasons to Create Permanent SAS Datasets
Read raw data and compute calculated variables only once All variables have assigned names and labels. Data is ready to be analyzed. Dataset can be sent to other computers or users. Listed here are reasons for creating a permanent SAS dataset. One reason is that you will then need to read the raw data and create your calculated variables only once. The calculated variables may involve complicated formula or logic. You don’t want to do that every time, especially if multiple programmers are using the data. You want to get all the variable defined up-front. This will reduce the chance of errors. With the SAS dataset all the variables have assigned names and is ready to go, i.e. be analyzed. The data-step that read the raw data and computed new variables is eliminated, once the dataset is created. Often times you may still need to create or recode additional variables in a DATA step, but this will usually be more simple and straightforward. The last reason given here is that SAS data sets are a good way to send data to other users or computers. You simply sent the file to another user and everything is ready to go – all the variables have been defined. SAS will need to be installed on the new computer for the dataset to be accessed.

Creating a Permanent Dataset
LIBNAME mylib ‘C:\My SAS Datasets’; LIBNAME – assigns a directory (folder) reference name. In this example the directory ‘C:\My SAS Datasets’ is assigned a reference name of mylib. DATA mylib.sescore; Tells SAS to create a dataset called sescore in the directory referenced by mylib, which is ‘C:\My SAS Datasets’. To create a permanent SAS dataset you need to use a LIBNAME statement to tell SAS where to store the dataset. LIBNAME stands for library name. After the keyword LIBNAME follows what is called a library reference. This is a name you assign that points to a directory (i.e. a folder) on your system. After the library reference you indicate the directory that the library reference points to. In the example the library reference named mylib points to the ‘My SAS Datasets’ folder. Library names are similar to filename statements that we learned about in program 2, except the filename statements point to a file whereas libname statements point to a folder. You then use the library reference name in the DATA statement as shown here. The DATA statement here tells SAS to create a data set called sescore in the directory referenced by mylib, which points to the C:\My SAS Datasets directory. Note the form of specifying the dataset, the library reference (where to put the file), a period, followed by the name of the dataset. One note – temporary datasets have a library reference called WORK. In this case you do not need to specify the WORK library, i.e. if no library reference is given, SAS assumes the WORK directory, which SAS assigns to some place on your computer. As stated before these files are then deleted by SAS when your SAS session is ended.

Examples of LIBNAME Statements LIBNAME points to a directory (folder)
LIBNAME mylib ‘C:\My SAS Files'; LIBNAME class ‘C:\My SAS Files' ; LIBNAME ph6420 'C:\My SAS Files\SASClass\' ; LIBNAME points to a directory (folder) DATA mylib.datasetname; DATA class.datasetname; DATA ph6420.datasetname; On UNIX and PC the file will be called datasetname.sas7bdat Here are some examples of LIBNAME statements and how they would be used with the DATA statement. Note the first two LIBNAME statements have different library references (mylib and class) but point to the same folder. This illustrates that you can use whatever name for a library reference you like subject to the naming rules - they must be 8 characters or less. However, once you define the reference name you must use that same name in the DATA statement. Thus, if you use CLASS as the reference in the LIBNAME you must use CLASS in the DATA statement. One important note: The name of the system file created for the dataset will be the dataset name with an extension of sas7bdat. This is true for both PC and UNI X platforms. The library reference does not become any part of the file name or extension.

LIBNAME mylib ‘C:\SAS_Files'; DATA mylib.sescore;
PROGRAM 12 LIBNAME mylib ‘C:\SAS_Files'; DATA mylib.sescore; INFILE ‘C:\SAS_Files\tomhs.data' LRECL =400; INPUT @ ptid $10. @ clinic $1. @ randdate mmddyy10. @ group @ educ @ wtbl @ wt @ sbpbl @ sbp @ (sebl_1-sebl_20) (1. +1) @ (se12_1-se12_20) (1. +1) ; In program 12 we will create a permanent SAS data set called sescore that will be stored in the ‘C:\SAS_Files’ folder. We first define a library reference called mylib that points to this folder. We could have called it anything, we would just need to be consistent with the name in the DATA statement. The two words in red here need to be the same. We then read-in several variables from tomhs.data, including 20 side-effect items at baseline and the same 20 side-effects at the 12-month visit. We use array type notation as a shortcut in reading in these variables.

sescrbl = MEAN (OF sebl_1 - sebl_20) ;
wtd12 = wt12 - wtbl; sbpd12 = sbp12 - sbpbl; sescrbl = MEAN (OF sebl_1 - sebl_20) ; sescr12 = MEAN (OF se12_1 - se12_20) ; sescrd12 = sescr12 - sescrbl ; LABEL educ = 'Highest Education Level'; LABEL wt = 'Weight (lbs) at 12 Months'; LABEL wtbl = 'Weight (lbs) at Baseline'; LABEL wtd12 = 'Weight Change at Baseline'; LABEL sbpbl = 'Systolic BP (mmHg) at Baseline'; LABEL sbp12 = 'Systolic BP (mmHg) at 12 Months'; LABEL sbpd12 = 'Systolic BP Change at 12 Months'; LABEL group = 'Treatment Group (1-6)'; LABEL sescrbl = 'Side Effect at Baseline'; LABEL sescr12 = 'Side Effect at 12 Months'; LABEL sescrd12 = 'Side Effect Change Score'; FORMAT randdate mmddyy10. ; DROP sebl_1-sebl_20 se12_1-se12_20 ; We then create several new variables: weight and blood pressure change, and two side-effect summary measures. The first is a variable named sescrbl which is the average “score” of the 20 items at baseline. The second is a similar variable using items at 12-months. We then compute a variable called sescrd12 which is the difference between the two variables. This can be used as a measure of improvement (or worsening) over time in the side-effect profile, which can be compared among the treatment groups. We add labels for all variables to document the dataset. We also include a format for randdate so that this variable will always display as a date. Lastly, we use a drop statement that tells SAS to “drop” these variables, i.e. not to include these variables on the data set. We do not need these variables in the analysis.

60 LIBNAME mylib 'C:\SAS_Files';
NOTE: Libref MYLIB was successfully assigned as follows: Engine: V9 Physical Name: C:\SAS_Files DATA mylib.sescore; NOTE: The infile 'C:\SAS_Files\tomhs.data' is: File Name=C:\SAS_Files\tomhs.data, RECFM=V,LRECL=400 NOTE: 100 records were read from the infile 'C:\SAS_Files\tomhs.data'. NOTE: The data set MYLIB.SESCORE has 100 observations and 14 variables. This is a partial SAS log when the program is run. After the LIBNAME statement we get a note that the library reference MYLIB was successfully assigned. If the referenced directory did not exist then you would get an error message. It would most likely mean that you incorrectly typed the folder path. After the DATA step is a note that the dataset MYLIB.SESCORE has 100 observations and 14 variables. This means that the data set was successfully created. There is one observation for each record read-in from the raw data file tomhs.data.

What is inside a SAS dataset?
PROC CONTENTS DATA=mylib.sescore VARNUM ; TITLE 'Description of Variables in Dataset SESCORE' ; RUN; What is inside a SAS dataset? Data Names, labels, and formats of all variables After creating the dataset you will usually want to run a PROC CONTENTS on the dataset to check that the variables on the dataset are what you expect, and that other information about the variables and the dataset are correct. This brings up a point about what is inside a SAS dataset. Well, a SAS data set contains two parts. The first part is called the descriptor portion. This contains all the information about the dataset, like variable names, labels, and formats. When you run PROC CONTENTS only this part of the file is read. So PROC CONTENTS will always run quickly even if there are many observations on the dataset. After the descriptor portion follows the data. This portion, of course, is read for any procedures involving data analyses. PROC CONTENTS reads the descriptor portion of the dataset

Note: mylib is not a part of the dataset name
Description of Variables in Dataset SESCORE The CONTENTS Procedure Data Set Name: MYLIB.SESCORE Observations: Member Type: DATA Variables: Engine: V Indexes: Created: :59 Wednesday, August 11, Observation Length: 112 Last Modified: 10:59 Wednesday, August 11, Deleted Observations: 0 Protection: Compressed: NO Data Set Type: Sorted: NO Label: -----Engine/Host Dependent Information----- File Name: C:\SAS_Files\sescore.sas7bdat Release Created: Host Created: XP_PRO File Size (bytes): Here is part of the output from PROC CONTENTS for dataset sescore. The top portion gives information about the dataset, for example, the date it was created. Other information you may want to look at is the number of observations and number of variables. This should be what you expected. There are 100 observations in sescore, which is what we expected. Most other parts in this top portion can usually be ignored. The engine information tells us we created a version 9 data set. The middle section is titled Engine/Host Dependent Information. The file name will give you the entire path of where the dataset is located on your computer. Note the name of the file is the dataset name with the extension sas7bdat. Note: mylib is not a part of the dataset name

Variables listed in creation order
# Variable Type Len Pos Format Label 1 ptid Char Patient ID 2 clinic Char Clinical Center 3 randdate Num MMDDYY10. Randomization Date 4 group Num Treatment Group (1-6) 5 educ Num Highest Education Level 6 wtbl Num Weight (lbs) at Baseine 7 wt Num Weight (lbs) at 12 Months 8 sbpbl Num Systolic BP (mmHg) at Baseline 9 sbp Num Systolic BP (mmHg) at 12 Months 10 wtd Num Weight Change at Baseline 11 sbpd Num Systolic BP Change at 12 Months 12 sescrbl Num Side Efect at Baseline 13 sescr12 Num Side Efect at 12 Months 14 sescrd12 Num Side Efect Change Score The last section of the output displays a list of each variable on the dataset with the variable type (numeric or character) and any label or format assigned. The Len (length) column gives the length of the variable – this is usually only important for character variables. We see variable ptid is of length 10. The Pos (Position ) column relates to where the variable is stored on the file which is not important for the user. The column labeled # displays the order of the variable on the file. Using the VARNUM option the variables are displayed in creation order, as displayed here. This order is often the most useful because newly created variables will be displayed last. Thus, it is easier to see if the variables you thought should be on the dataset are there. Without the VARNUM option the variables are listed in alphanumeric order. If you want both the alphanumeric and creation order displayed then use the option POSITION in PROC CONTENTS. This becomes the documentation of the dataset

Using PROC COPY to copy work dataset to permanent dataset
Make a work dataset first – then when you know that is working correctly copy the work dataset to a permanent dataset. LIBNAME mylib ‘C:\SAS_Files'; DATA sescore; …. RUN; PROC COPY IN=work OUT=mylib; SELECT sescore; Instead of using a two-part name to create a permanent dataset you can use the copy procedure to copy the work dataset to a permanent library or folder on your computer. You use the IN option to specify the WORK library and then use the OUT option to specify the library (folder) you want the dataset stored. This is usually the way I make permanent datasets, that way my program always has work datasets and then I use PROC COPY to create the permanent dataset when I need to.

LIBNAME class ‘C:\SAS_Files' ;
PROGRAM 13 LIBNAME class ‘C:\SAS_Files' ; * Tells SAS where to find the SAS dataset; PROC MEANS DATA=class.sescore ; TITLE 'Means of All Numeric Variables on SAS Permanent Dataset'; RUN; PROC CORR DATA=class.sescore; VAR wtd12 sbpd12 sescrd12; TITLE 'Correlation Matrix of 3 Change Variables'; What if dataset was moved to a different folder? Just need to change LIBNAME So you have created a permanent dataset called sescore. Now you or another user want to use it. (or maybe the dataset was sent to you and you have stored in on your computer and now want to use it. Well, just as you needed to tell SAS where to store it when you created the dataset, you will need to tell SAS where it is stored on your computer in order to read it. For this you will need to use (again) the LIBNAME statement. The LIBNAME statement is used for both reading and writing SAS datasets. Now the library reference that was used to create the dataset does not matter (in fact, you may not even know this information). You do need to know where on your computer the file is located. From program 12 we know the dataset sescore is located under the ‘C:\SAS_Files folder. In program 13 we use the LIBNAME statement to assign a library reference to this folder. We will use the name class. Now we can start running SAS procedures immediately without a DATA step. We need to refer to the dataset on the PROC statement with the library reference, a period, then the dataset name. The PROC MEANS statement here tells SAS to go out and find a dataset called sescore in the directory referenced by class and run a PROC MEANS on this dataset. We then run a PROC CORR on the dataset. You way think that using library references are an odd way to tell SAS where the data set is located. Perhaps, that is true. However, what if the dataset you have been using gets moved to a different folder (perhaps by the data administrator). If that happens, you just need to change the LIBNAME statement and everything will work.

Means of All Numeric Variables on SAS Permanent Dataset
The MEANS Procedure Variable Label N Mean randdate Randomization Date group Treatment Group (1-6) educ Highest Education Level wtbl Weight (lbs) at Baseline wt Weight (lbs) at 12 Months sbpbl Systolic BP (mmHg) at Baseline sbp Systolic BP (mmHg) at 12 Months wtd Weight Change at Baseline sbpd Systolic BP Change at 12 Months sescrbl Side Effect at Baseline sescr Side Effect at 12 Months sescrd12 Side Effect Change Score Here is the result from PROC MEANS. We notice that the labels appear automatically for each variable. This is because the labels are part of the data set.

Pearson Correlation Coefficients Prob > |r| under H0: Rho=0
Number of Observations wtd sbpd sescrd12 wtd Weight Change at Baseline sbpd Systolic BP Change at 12 Months sescrd Side Efect Change Score Here is the output form PROC CORR. Displayed is the 3x3 matrix of correlation coefficients. The correlation between change in systolic BP and change in weight is The more weight that is lost the more the blood pressure goes down.

if group in(1,2,3,4,5) then rx = 1; else rx = 2; RUN;
* * Often you will read the permanent SAS dataset in a DATA step to modify or add variables. Usually these will be put on a new temporary SAS dataset. The SET statement reads a SAS dataset * *; LIBNAME class 'C:\SAS_Files' DATA rxdata; SET class.sescore; if group in(1,2,3,4,5) then rx = 1; else rx = 2; RUN; PROC MEANS DATA=rxdata N MEAN MAXDEC=2 FW=7; CLASS group; VAR sbpd12 wtd12 sescrd12; TITLE 'Change in SBP, Weight, and Side Effect Score by Treatment'; Finally, here at the end of the program I create a new dataset called rxdata that reads the permanent dataset using the SET statement. I create one new variable called rx and run a PROC MEANS using rx as a class variable. Creating a new dataset from a permanent SAS dataset is no different then creating a SAS dataset from a work dataset. You need to include a LIBNAME statement where you assign the library reference. Then on the SET statement add the library reference to the dataset name. Here we read the SAS dataset sescore from the class library which points to the folder ‘C:\SAS_Files where the file resides.

Lesson 7 - Topics Reading SAS data sets

Similar presentations

Presentation on theme: "Lesson 7 - Topics Reading SAS data sets"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lesson 7 - Topics Reading SAS data sets

Similar presentations

Presentation on theme: "Lesson 7 - Topics Reading SAS data sets"— Presentation transcript:

Similar presentations

About project

Feedback