Lesson 8 - Topics Creating SAS datasets from procedures

Slides:



Advertisements
Similar presentations
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Advertisements

1 An Introduction to IBM SPSS PSY450 Experimental Psychology Dr. Dwight Hennessy.
Quick Data Summaries in SAS Start by bringing in data –Use permanent data set for these examples Proc Tabulate –Produces summaries very quickly and easily.
Data Tutorial Tutorial on Types of Graphs Used for Data Analysis, Along with How to Enter Them in MS Excel Carryn Bellomo University of Nevada, Las Vegas.
SW318 Social Work Statistics Slide 1 Using SPSS for Graphic Presentation  Various Graphics in SPSS  Pie chart  Bar chart  Histogram  Area chart 
Data Cleaning 101 Ron Cody, Ed.D Robert Wood Johnson Medical School Piscataway, NJ.
Week 3 Topic - Descriptive Procedures Program 3 in course notes Cody & Smith (Chapter 2)
SAS PROC REPORT PROC TABULATE
Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.
BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.
Lecture 3 Topic - Descriptive Procedures Programs 3-4 LSB 4:1-4.4; 4:9:4:11; 8:1-8:5; 5:1-5.2.
Lesson 8 - Topics Creating SAS datasets from procedures Using ODS and data steps to make reports Using PROC RANK Programs in course notes LSB 4:11;5:3.
DTC Quantitative Methods Summary of some SPSS commands Weeks 1 & 2, January 2012.
Lesson 12 More SGPLOT examples Exporting data Macro variables Table Generation - PROC TABULATE Miscellaneous Topics.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
BMTRY 789 Lecture9: Proc Tabulate Readings – Chapter 11 & Selected SUGI Reading Lab Problems , 11.2 Homework Due Next Week– HW6.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
Lesson 10 - Topics SAS Procedures for Standard Statistical Tests and Analyses Programs 19 and 20 LSB 8:16-17.
Lecture 3 Topic - Descriptive Procedures
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Session 1 Retrieving Data From a Single Table
Welcome to Week 03 College Statistics
EMPA Statistical Analysis
AP CSP: Cleaning Data & Creating Summary Tables
Introduction to SPSS July 28, :00-4:00 pm 112A Stright Hall
Descriptive Statistics
Applied Business Forecasting and Regression Analysis
Lesson 4 Descriptive Procedures
PubH 6420 Introduction to SAS Programming
Loops BIS1523 – Lecture 10.
Chapter 6: Modifying and Combining Data Sets
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
Lesson 6 - Topics Formatting Output Working with Dates
Lesson 3 Overview Descriptive Procedures Controlling SAS Output
DEPARTMENT OF COMPUTER SCIENCE
Lecture 2 Topics - Descriptive Procedures
Advanced Analytics Using Enterprise Miner
Lesson 9 - Topics Restructuring datasets LSB: 6:14
IPC144 Introduction to Programming Using C Week 1 – Lesson 2
Tamara Arenovich Tony Panzarella
Chapter 4: Sorting, Printing, Summarizing
Lesson 11 - Topics Statistical procedures: PROC LOGIST, REG
Topic 5: Exploring Quantitative data
Lesson 5 - Topics Creating new variables in the data step
Lesson 7 - Topics Reading SAS data sets
Introduction to SAS A SAS program is a list of SAS statements executed in order Every SAS statement ends with a semicolon! SAS statements can be in caps.
Quick Data Summaries in SAS
Working With Dates: Dates Come in Many Ways
SAS Programming Training
Lecture 2 Topics - Descriptive Procedures
You will need your calculator today (and every day from now on)
Working With Dates: Dates Come in Many Ways
Producing Descriptive Statistics
Introduction to SAS Essentials Mastering SAS for Data Analytics
Statistics Frequencies
Measures of Position Section 3.3.
Introduction to SAS Essentials Mastering SAS for Data Analytics
Linear Regression Dr. Richard Jackson
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms
Lecture 2 Topics - Descriptive Procedures
Presentation transcript:

Lesson 8 - Topics Creating SAS datasets from procedures Creating reports using data-step and PROC TABULATE Using PROC RANK Programs 14-15 in course notes LSB 4:11;13-17;5:3 Welcome to lesson 8. In this lesson we will see how you can create SAS datasets from procedures and how to use ODS and some data step techniques we have learned to create customized reports. We will also look at the utility of PROC RANK that can be used to create new variables that are ranks or quantiles of existing variables.

Making SAS Datasets From Procedures Output from SAS PROCs can be put into SAS datasets: To do further processing of the information from the output To reformat output to make a report One of the nice features of SAS is that you can take pieces of output from procedures and send it to a SAS dataset. Why would you want to do this? Well, putting the output into a SAS dataset allows you to do further processing of the output using the DATA step. Rows of output will be observations and columns of output will be variables. You can then do anything you could do with any dataset -- things like creating new variables and running procedures. This “messaging” of the output can then be used to create a customized report. We will also see how procedure output datasets can be used to restructure our original dataset and create new variables.

Ways to Put Output into SAS Datasets Using OUTPUT statement available in many procedures Using ODS OUTPUT statement – any output table can be put into a SAS dataset There are two ways to make a dataset from output. The first is using the OUTPUT statement that is available in many procedures. The second is with ODS using the ODS OUTPUT statement. You may remember from a previous session that each piece of output has a table name. With the ODS OUTPUT statement you can place that output in a SAS dataset. The examples in the programs to follow will illustrate each method.

Report We Want to Generate Quartiles of Weight Change by Clinical Center Clinic N P25 P50 P75 A 15 -19.0 -12.0 -9.0 B 25 -11.0 -8.0 -2.8 C 35 -20.8 -14.0 -6.5 D 17 -12.8 -6.5 -1.5 Total 92 -17.7 -9.9 -5.2 Suppose we want to create the following report based on the TOMHS data. The report gives for each gender and clinic combination statistics regarding the distributions of weight. The statistics include the counts and the 25th, 50th, and 75th percentiles. Think about how we could get this information from SAS. We know we can get percentiles from PROC UNIVARIATE. We could get them separately for each gender and clinic using a CLASS statement or a BY statement. However, a lot of output would be generated and it would not be displayed in this way. Let’s see how we can generate this report by creating a dataset using the OUTPUT statement from PROC UNIVARIATE.

Program 14 LIBNAME class ‘C:\SAS_Files'; * Will use SAS dataset version of TOMHS data; DATA wt; SET class.tomhs (KEEP=ptid clinic wtbl wt12 ); wtchg = wt12 - wtbl; RUN; This is the first portion of program 14. The DATA step creates a dataset called wt reading in data from the SAS dataset version of the TOMHS data, called tomhsp. We bring in the variables ptid, clinic, sex, wtbl, and wt12 (the baseline and 12 month weights). We then compute the change in weight from baseline to 12 months. We then define a format we will use the variable sex.

* Create report by clinic using OUTPUT; PROC MEANS DATA = wt NOPRINT; CLASS clinic; VAR wtchg ; OUTPUT OUT=summary N = n Q1 = p25 MEDIAN = p50 Q3 = P75 ; Dataset summary will have one observation for each clinic and the total. Name of new dataset Statistic name = variable name We then sort the dataset by the variables sex and clinic using PROC SORT. We do this because we will be using a BY statement in the univariate procedure. We then run PROC UNIVARIATE for the variable wt12 for each sex and clinic (using the BY statement). We use the OUTPUT statement within the procedure to create a SAS dataset containing the statistics we want. The syntax is the keyword OUTPUT followed by another keyword OUT followed by the name of the dataset we are creating (univinfo here). This is followed by the statistics we want (using the keyword of the statistic) and the variable names we assign to the statistics. Here we tell SAS to output to the dataset called univinfo the N and the three quartiles (25th, 50th, and 75th percentiles) for the variable wt12 and to give these four statistics the name n, p25, p50, and p75. The dataset univinfo will have one observation for each sex and clinic combination, 8 in all. To see what the dataset look like we will run a PROC PRINT on the dataset. Note, although the OUTPUT statement is rather long it is just one statement, i.e. there is only one semi-colon. You may have also noticed the NOPRINT option on the PROC statement. This tells SAS not to send the standard output to the output window. We will be getting the information we need from the output dataset.

PROC PRINT DATA = summary; RUN; Obs clinic _TYPE_ _FREQ_ n p25 p50 p75 1 0 100 92 -17.65 -9.9 -5.15 2 A 1 18 15 -19.00 -12.0 -9.00 3 B 1 29 25 -11.00 -8.0 -2.80 4 C 1 36 35 -20.80 -14.0 -6.50 5 D 1 17 17 -12.80 -6.5 -1.50 Here is the PROC PRINT statement and the output. It is usually a good idea to run a PROC PRINT after creating an output dataset so you can see what was actually created. Sometimes SAS will add variables as well, so run the PROC PRINT without a VAR statement so all variables will be displayed, as we do here. We see that this listing is pretty much the report we wanted. There is one row for each sex/clinic combination with the statistics we want. To finish the report we will remove the OBS column, reorder the list of variables displayed, and display the percentile variables to one decimal place.

* Put total row at the bottom; PROC SORT; BY DESCENDING _type_ clinic; PROC PRINT ; RUN; Obs clinic _TYPE_ _FREQ_ n p25 p50 p75 1 A 1 18 15 -19.00 -12.0 -9.00 2 B 1 29 25 -11.00 -8.0 -2.80 3 C 1 36 35 -20.80 -14.0 -6.50 4 D 1 17 17 -12.80 -6.5 -1.50 5 0 100 92 -17.65 -9.9 -5.15 Here is the PROC PRINT statement and the output. It is usually a good idea to run a PROC PRINT after creating an output dataset so you can see what was actually created. Sometimes SAS will add variables as well, so run the PROC PRINT without a VAR statement so all variables will be displayed, as we do here. We see that this listing is pretty much the report we wanted. There is one row for each sex/clinic combination with the statistics we want. To finish the report we will remove the OBS column, reorder the list of variables displayed, and display the percentile variables to one decimal place.

if missing(clinic) then clinic = ‘Total’; DROP _type_ _freq_; RUN; * Create final report; DATA summary; LENGTH clinic $5.; SET summary; if missing(clinic) then clinic = ‘Total’; DROP _type_ _freq_; RUN; PROC PRINT NOOBS ; FORMAT p25 p50 p75 6.1; clinic n p25 p50 p75 A 15 -19.0 -12.0 -9.0 B 25 -11.0 -8.0 -2.8 C 35 -20.8 -14.0 -6.5 D 17 -12.8 -6.5 -1.5 Total 92 -17.7 -9.9 -5.2 Here is the PROC PRINT statement and the output. It is usually a good idea to run a PROC PRINT after creating an output dataset so you can see what was actually created. Sometimes SAS will add variables as well, so run the PROC PRINT without a VAR statement so all variables will be displayed, as we do here. We see that this listing is pretty much the report we wanted. There is one row for each sex/clinic combination with the statistics we want. To finish the report we will remove the OBS column, reorder the list of variables displayed, and display the percentile variables to one decimal place.

Using ODS to Send Output to a SAS Dataset Syntax: ODS OUTPUT output-table = new-data-set; * Output quantile table to a dataset; ODS OUTPUT quantiles = qwt; PROC UNIVARIATE DATA = wt ; VAR wtbl wt12 ; RUN; ODS OUTPUT CLOSE ; PROC PRINT DATA=qwt; The more general method of putting output into a SAS dataset is with the ODS OUTPUT statement. The syntax is ODS OUTPUT followed by the output table name, an equals sign, followed by the dataset name you assign the output table. The output table must correspond to a table name used in the procedure you will call. Here will be running PROC UNIVARIATE of two variables, weight at baseline and weight at 12-months, and we want to put the quantile table into a SAS dataset. We will name the dataset qwt. We then run the univariate procedure as usual and follow the run statement with an ODS OUTPUT CLOSE statement. This captures the output into our new dataset qwt. To “see what we get” we generate a proc print on the new dataset.

Display of Output Dataset Obs Varname Quantile Estimate 1 wtbl 100% Max 279.30 2 wtbl 99% 274.15 3 wtbl 95% 246.40 4 wtbl 90% 237.40 5 wtbl 75% Q3 215.15 6 wtbl 50% Median 192.65 7 wtbl 25% Q1 165.90 8 wtbl 10% 141.50 9 wtbl 5% 137.40 10 wtbl 1% 130.25 11 wtbl 0% Min 128.50 12 wt12 100% Max 271.50 13 wt12 99% 271.50 14 wt12 95% 239.00 15 wt12 90% 227.00 16 wt12 75% Q3 202.50 17 wt12 50% Median 180.00 18 wt12 25% Q1 153.50 19 wt12 10% 133.00 20 wt12 5% 130.00 21 wt12 1% 123.00 22 wt12 0% Min 123.00 Would like to put side-by-side This is the display of the data. We get two sections, of 11 rows each; one for weight at baseline one for weight at 12 months, with the name of the statistic and the value as variables. This report might be good enough but we might like to put the baseline and 12 month weight data together on the same rows. We know how to do this from methods we have used before to restructure datasets. Let’s see how this is done here.

Separate the data into 2 datasets DATA wtbl wt12 ; SET qwt; if varname = 'wtbl' then output wtbl; else if varname = 'wt12' then output wt12; RUN; PROC DATASETS ; MODIFY wtbl; RENAME estimate = wtbl; MODIFY wt12; RENAME estimate = wt12; DATA all; MERGE wtbl wt12; DROP varname; PROC PRINT; Separate the data into 2 datasets PROC DATASETS used for changing variable names We will use the method we have used earlier where we create separate datasets for certain rows and then merge them together to get them on the same rows. Here we create a dataset for the 11 rows containing the weight statistics at baseline and a dataset for the 11 rows containing the weight statistics at 12-months. We conditionally output the rows based on the variable VARNAME. We then use PROC DATASETS to change the name of estimate to wtbl or wt12. This variable contains the percentile statistics. The last data step uses the MERGE statement to put the two datasets together. We name this combines dataset ALL. The PROC PRINT will show us what our new dataset looks like. Put 2 datasets side-by-side Note: no BY statement, OK here

Obs Quantile wtbl wt12 1 100% Max 279.30 271.50 2 99% 274.15 271.50 2 99% 274.15 271.50 3 95% 246.40 239.00 4 90% 237.40 227.00 5 75% Q3 215.15 202.50 6 50% Median 192.65 180.00 7 25% Q1 165.90 153.50 8 10% 141.50 133.00 9 5% 137.40 130.00 10 1% 130.25 123.00 11 0% Min 128.50 123.00 This new dataset ALL has 11 rows with the statistics for each weight variable as separate variables. Now we can easily compare any percentile for the two time periods. Getting to know a few data step “Tricks” can be useful because they can be used over and over again to reformat the data to produce the desired report.

PROC RANK Used to divide observations into equal size categories based on values of a variable Creates a new variable containing the categories New variable is added to the dataset or to a new dataset Example: Divide weight change into 5 equal categories (Quinitiles) When investigating the relationship of a continuous independent variable to a dependent variable it is often desirable to divide the independent variable into categories and then see how the dependent variable changes across the categories. Suppose you want to investigate the relationship between change in weight and change in blood cholesterol. One analyses you could do is divide people into categories based on their weight change and compare the average cholesterol change across the weight change groups. Sometimes you may have specific weight change categories of interest, for example, weight loss > 10 lbs, weight loss 1-10 lbs, weight gain 1-10 lbs, and weight gain > 10 lbs. However, in some cases you have no pre-specified categories. Then you may just want to divide persons into categories of equal size. To do this you could run PROC UNIVARIATE on weight change, look at the quantiles to determine the cutoff levels and then go back and create a new variable using IF/THEN logic. However, SAS has a utility procedure that can do that for you automatically. The procedure is called PROC RANK. You simply tell SAS the name of the variable you want to form groups for and how many levels you want. SAS will then compute a new variable containing the categories and add it to the dataset.

PROC RANK SYNTAX PROC RANK DATA = dataset OUT = outdataset GROUPS = # of categories VAR varname; RANKS newvarname; Most of the time you can set OUT to be the same dataset specified in DATA. PROC RANK writes no output Here is the general syntax for calling PROC RANK. DATA = is the input dataset. The OUT option is the output dataset that will contain the new variable or variables. Since this dataset will also include all variables in the original dataset you can usually specify the OUT dataset to be the same as the input dataset specified in DATA. GROUPS is set to the number of categories you want to divide the variables into. GROUPS=5 would create quintiles. In VAR you list the continuous variables for which you want to create new categorical variables for, and RANKS is a the list of names you want to call the new variables. Since PROC RANK is just creating a dataset no output will go to the output window.

PROGRAM 15 LIBNAME class ‘C:\SAS_Files'; DATA wtchol; SET class.tomhs (KEEP=ptid wtbl wt12 cholbl chol12); wtchg = wt12 - wtbl; cholchg = chol12 - cholbl; RUN; *This PROC will add a new variable to dataset which is the tertile of weight change. The new variable will be 0,1,or 2; PROC RANK DATA = wtchol GROUPS=3 OUT = wtchol; VAR wtchg; RANKS twtchg; ** We will see how this works in program 15. We start by creating a dataset called wtchol, reading in weight and cholesterol variables from the TOMHS SAS dataset. We then compute new variables for weight and cholesterol change, 12 month minus baseline. Negative values will indicate decreases and positive values will indicate increases. The RUN statement ends the DATA step. We then run PROC RANK. We want to divide the weight change variable into three equal size groups. The new variable will be called twtchg (T for tertile). The output dataset will be the same as the input dataset. It will now contain one new variable. Name of new variable

9 SET class.tomhsp (KEEP=ptid clinic sex wtbl wt12 cholbl chol12); PARTIAL LOG 8 DATA wtchol; 9 SET class.tomhsp (KEEP=ptid clinic sex wtbl wt12 cholbl chol12); 10 wtchg = wt12 - wtbl; 11 cholchg = chol12 - cholbl; 12 RUN; NOTE: There were 100 observations read from the data set CLASS.TOMHSP. NOTE: The data set WORK.WTCHOL has 100 observations and 9 variables. PROC RANK DATA = wtchol GROUPS=3 OUT = wtchol; 20 VAR wtchg; RANKS twtchg; 21 RUN; NOTE: The data set WORK.WTCHOL has 100 observations and 10 variables. Here is a partial log when the program is run. The DATA step creates the dataset wtchol which has 100 observations and 9 variables. The note after the PROC RANK states that there are now 10 variables on the dataset. The new variable is the one that PROC RANK created, variable twtchg.

PROC FREQ DATA = wtchol; TABLES twtchg; RUN; OUTPUT: Rank for Variable wtchg Cumulative Cumulative twtchg Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 0 31 33.70 31 33.70 1 30 32.61 61 66.30 2 31 33.70 92 100.00 Frequency Missing = 8 To check the values for the new variable we display a frequency distribution using PROC FREQ. We note that twtchg takes on three values, 0,1, and 2 and that each contains about 1/3rd of the data. Values of 0 indicate the lowest 1/3rd of weight change, values of 1 the middle 1/3rd, and values of 2 the upper 1/3rd.There are 8 missing values – these are persons with missing weight change.

Partial Listing of Datset wtchol with new variable added PROC PRINT DATA = wtchol (obs=20); VAR ptid wtchg twtchg; TITLE 'Partial Listing of Datset wtchol with new variable added'; RUN; Partial Listing of Datset wtchol with new variable added Obs PTID wtchg twtchg 1 A00083 -12.00 1 2 A00301 . . 3 A00312 -9.50 1 4 A00354 -21.00 0 5 A00400 . . 6 A00504 -9.25 1 7 A00608 . . 8 A00720 -18.50 0 9 A00762 -5.25 2 10 A00811 -6.75 1 We then do a PROC PRINT displaying the original and ranked weight change variable. We limit the display to 20 observations; 10 are shown here. We note the first patient lost 12 pounds. This put him/her in the middle weight change category. Patient A00354 lost 21 pounds – this person is in the lowest category of weight change which indicates the greatest weight change

PROC MEANS N MEAN MIN MAX MAXDEC=2; VAR cholchg wtchg; CLASS twtchg; TITLE 'Mean Cholesterol Change by Tertile of Weight Change'; RUN; We now want to display the average change in cholesterol by the weight change categories. We do that with PROC MEANS with a class variable. We include in the VAR list cholesterol change and the original weight change variable (variable wtchg). The latter variable will display information so that we know the cutpoints used to define the 3 levels of weight change.

Mean Cholesterol Change by Tertile of Weight Change The MEANS Procedure Rank for Variable N wtchg Obs Variable N Mean Minimum Maximum -------------------------------------------------------------------------- 0 31 cholchg 30 -13.43 -55.00 47.00 wtchg 31 -22.51 -36.50 -14.30 1 30 cholchg 30 -4.70 -37.00 26.00 wtchg 30 -10.21 -14.00 -6.80 2 31 cholchg 31 -0.74 -52.00 45.00 wtchg 31 -1.82 -6.50 13.00 Could graph this data in an x-y plot (3 points) Cutpoints for tertiles Here is the output from PROC MEANS. We see that mean serum cholesterol change is greatest in the greatest weight loss category (a decrease of 13.43 mg/dl). The drop in cholesterol for the middle weight change category is less (4.70 mg/dl) and for the upper weight change category the cholesterol drop is just 0.74 mg/dl). So we confirm a direct relationship between weight and cholesterol change. The MAX values for variable wtchg tell us the cutoffs for the three levels of weight change. The cutoffs are -14.30 and -6.80. A simple summary of the relationship could be done by plotting the three points noted here, weight change on the X-axis and mean cholesterol change on the Y-axis. You could add standard error bars to the plot if you like.

TABLE GENERATION: PROC TABULATE (dbp12 sbp12)*(N MEAN*f=8.1) GROUP ALL SAS has a procedure called PROC TABULATE that can be used to generate a table of descriptive summaries. The output from say PROC MEANS or UNIVARIATE may give you the information you want but it isn’t put together in a nice table as you may want. PROC TABULATE can be used to generate various formatted tables. Here is an example of one. A table is made up of rows and columns. Here the rows are each treatment group in TOMHS along with the total; the columns are the N and MEAN for diastolic and systolic blood pressure. You could get this information from PROC MEANS with a class statement but the output would not be organized so nicely. Let’s look at the PROC TABULATE syntax to produce this table.

PROC TABULATE DATA=class.tomhs FORMAT=8.0; CLASS group; VAR sbp12 dbp12; TABLES group ALL='Total', (dbp12 sbp12)*(N MEAN*f=8.1)/RTS=20; LABEL dbp12 = 'Diastolic BP'; LABEL sbp12 = 'Systolic BP'; LABEL group = 'RX Group'; FORMAT group fgroup.; TITLE 'Average Blood Pressure at 12-Months'; RUN; Same as PROC MEANS Note first the CLASS and VAR statement in PROC TABULATE are the same as you would use in PROC MEANS. What follows is the TABLES statement. This can be a bit tricky to follow but let’s give it a try. Take consolation that there are entire classes and books on how to use PROC TABULATE, so don’t be too discouraged if you don’t get it all the first time. Remember from the output on the previous slide the row information related to treatment group in TOMHS, the variable group. That is placed first followed by the keyword ALL to give the total. The part in quotes is the label for the total. A comma is then typed followed by the column information you want: here the N and MEAN for each of the variables dbp12 and sbp12. The f=8.1 tells SAS to display the mean as a column of 8 characters and display the mean with one decimal. The RTS option sets the number of spaces in the first column (where only labels are placed and not any data) We add label statements for each variable and apply a format for group so that the formatted values are displayed rather than the values 1-6.

Closer Look At TABLES Statement TABLES group ALL='Total', (dbp12 sbp12)*(N MEAN*f=8.1)/RTS=20; Statement before comma indicates row information to display Statement after comma indicates column information to display A * indicates to crosstabulate data A space indicates to concatenate data Words: For each group and the total display the N and mean of diastolic and systolic BP Let’s take another closer look at the TABLE statement. Remember tables have rows and columns. The code before the comma is the row information, the code after the comma is the column information. There are two important characters in the TABLES statement, the space and the asterisk (*). The * indicates to crosstabulate the information. A space means to concatenate the information. In english, the table statement is telling SAS: For each group and the total display the N and mean of diastolic and systolic BP. I encourage you to take this program and first run it as is to produce the output. Then make some changes to the TABLE statement. Take away the ALL statement and see what happens; or add a statistic such as the standard deviation to the columns.

(sex=' ')*(N ROWPCTN*f=10.1) CLINIC ALL Here is one more example using the tabulate procedure. Back in program 4 we used proc freq to display a crosstabulation of clinical center and gender. All the information was there in the output but it was kind of hard to find because there were so many numbers. Well, here is the crosstabulation using PROC TABULATE, where the table is formatted to display selected information more clearly. The row information is the clinical center (plus the total); the column information is the gender information. The number in the cells is the counts and the row percentages, that is the percent of men and women in each clinic. Note the totals for each row add to 100%. Well, let’s look at the TABULATE code that generated the table.

PROC TABULATE DATA=class.tomhsp FORMAT=8.; CLASS clinic sex; TABLE (clinic ALL='Total'), (sex=' ')*(N ROWPCTN*f=10.1)/RTS=15; FORMAT sex sex. clinic $clinic.; LABEL clinic = 'Clinical Center'; KEYLABEL ROWPCTN = 'Percent'; TITLE 'N and Percent Men and Women Enrolled by Center'; RUN; ODS HTML FILE = ‘mytable.html’; Here there is no VAR statement, only a CLASS statement with the two variables clinic and sex. The row portion of the TABLE statement is just the variable clinic with the keyword ALL; the column portion crosses the variable sex with two statistics, N and the row percent (ROWPCTN). The blank text in quotes after the variable sex tells SAS not to include a label for sex. Note also the KEYLABEL statement. This allows us to set the text for the row percent. To see what this statement does, simply remove it, run the program and see how the output differs. If you want the output to be in html format include the ODS HTML statement before the procedure.