Getting Started with the SGPLOT Procedure: A Hands-On Workshop About the Presenter 1/14/2019 Josh Horstman is an independent statistical programming consultant and trainer based in Indianapolis with 20 years’ experience using SAS in the life sciences industry. He specializes in analyzing clinical trial data, and his clients have included major pharmaceutical corporations, biotech companies, and research organizations. Josh is a SAS Certified Advanced Programmer who loves coding as well as talking about coding at SAS Global Forum and other SAS User Group meetings. Getting Started with the SGPLOT Procedure
Getting Started with the SGPLOT Procedure 1/14/2019 Getting Started with the SGPLOT Procedure Getting Started with the SGPLOT Procedure WUSS 2018 - Hands-On Workshop Josh Horstman
INTRODUCTION TO SGPLOT 1/14/2019 Getting Started with the SGPLOT Procedure INTRODUCTION TO SGPLOT
Overview: The Output Delivery System (ODS) 1/14/2019 Prior to ODS, SAS limited to text-based “SAS listing” output ODS output makes use of colors, fonts, graphics, and more! ODS provides ability to produce output in various formats: … and more! ODS is part of the Base SAS product since version 7 (No separate license required) Getting Started with the SGPLOT Procedure
Overview: ODS Statistical Graphics 1/14/2019 An extension to ODS used to create analytical graphs Introduced in SAS 9.2 as part of SAS/GRAPH (experimental in v9.1) Moved into the Base SAS product in version 9.3 Based on the Graph Template Language (GTL) Getting Started with the SGPLOT Procedure
ODS Statistical Graphics – Components 1/14/2019 Graph Template Language (GTL) – comprehensive language for creating statistical graphics ODS Graphics procedures – provide a procedural interface to most common features of GTL ODS GRAPHICS statement – controls various graphic-related settings and options ODS Graphics Editor – interactive tool for modifying graphs ODS Graphics Designer – graphical interface for designing graphs Getting Started with the SGPLOT Procedure
ODS Statistical Graphics – Procedures 1/14/2019 SGPLOT – single-cell plots SGPANEL – multiple-panel plots SGSCATTER – advanced scatter plots SGRENDER – render graphs written in GTL SGDESIGN – used with ODS Graphics Designer Getting Started with the SGPLOT Procedure
Statistical Graphics vs. Legacy SAS/GRAPH 1/14/2019 SG Procedures SAS/GRAPH SGPLOT, SGPANEL, SGSCATTER, etc. Based on templates Creates image files Use ODS GRAPHICS statement to control environment Visual properties are set within the procedure GPLOT, GCHART, GSLIDE, GBARLINE, GCONTOUR, etc. Based on device drivers Creates catalog entries Use GOPTIONS statement to control environment Many properties set with global statements such as AXIS, LEGEND, SYMBOL, etc. Getting Started with the SGPLOT Procedure
About ODS Destinations 1/14/2019 To create ODS graphs, a valid ODS destination must be open. Build an “ODS sandwich” around your graph code. For example, to output a graph to the PDF destination: ods pdf file="c:\example.pdf"; <SG procedure code goes here...>; ods pdf close; Similar syntax for ODS HTML, ODS RTF, etc. Getting Started with the SGPLOT Procedure
Example Datasets 1/14/2019 Datasets in SASHELP library included with SAS – you already have them! SASHELP.CLASS (Demographics on 19 students) SASHELP.CARS (Data about 428 car models) Getting Started with the SGPLOT Procedure
More Example Datasets 1/14/2019 SASHELP.HEART (5,209 patients from a heart study) SASHELP.STOCKS (Stock prices of IBM, Intel, & Microsoft) Getting Started with the SGPLOT Procedure
Basic SGPLOT Syntax 1/14/2019 proc sgplot data=<input-data-set> <options>; <one or more plot requests> <other optional statements> run; There are dozens of plot request statements available – SCATTER, SERIES, VBOX, VBAR, HIGHLOW, BUBBLE, etc. Getting Started with the SGPLOT Procedure Other optional statements control specific graph features – XAXIS, YAXIS, REFLINE, INSET, KEYLEGEND, etc.
SIMPLE PLOTS EXERCISES 1-11 1/14/2019 Getting Started with the SGPLOT Procedure SIMPLE PLOTS
The SCATTER Statement Creates a scatter plot. 1/14/2019 Creates a scatter plot. proc sgplot data=<input-data-set> <options>; scatter x=variable y=variable < / options>; run; Getting Started with the SGPLOT Procedure X and Y are required arguments that specify the variables to plot. Include a slash before specifying one or more options.
Exercise #1: Basic Scatter Plot 1/14/2019 Goal: Create a scatter plot of WEIGHT vs HEIGHT Input: SASHELP.CLASS Syntax: SCATTER statement X= argument Y= argument Getting Started with the SGPLOT Procedure
Exercise #1: Basic Scatter Plot 1/14/2019 proc sgplot data=sashelp.class; scatter x=height y=weight; run; Getting Started with the SGPLOT Procedure
Specifies a variable used to group the data. The GROUP= Option 1/14/2019 proc sgplot data=<input-data-set> <options>; scatter x=variable y=variable / group=variable <more options>; run; Plot elements for each group value are automatically distinguished by different visual attributes. GROUP= option available on almost every plot type. Specifies a variable used to group the data. Getting Started with the SGPLOT Procedure
Exercise #2: Grouped Scatter Plot 1/14/2019 Goal: Create a scatter plot of WEIGHT vs HEIGHT, grouped by SEX Input: SASHELP.CLASS Syntax: SCATTER statement X= argument Y= argument GROUP= option Getting Started with the SGPLOT Procedure
Exercise #2: Grouped Scatter Plot 1/14/2019 Specifies a grouping variable proc sgplot data=sashelp.class; scatter x=height y=weight / group=sex; run; Legend automatically generated Getting Started with the SGPLOT Procedure Alternative: Use BY statement to get separate graphs for each value.
Exercise #2: Grouped Scatter Plot - BONUS 1/14/2019 proc sgplot data=sashelp.class; scatter x=height y=weight / group=sex datalabel=name; run; Getting Started with the SGPLOT Procedure Specifies a variable used to label each data point.
X and Y are required arguments that specify the variables to plot. The BUBBLE Statement 1/14/2019 Creates a bubble plot. proc sgplot data=<input-data-set> <options>; bubble x=variable y=variable size=variable < / options>; run; Getting Started with the SGPLOT Procedure X and Y are required arguments that specify the variables to plot. SIZE is a required argument that specifies a variable that controls the size of the bubbles.
Exercise #3: Grouped Bubble Plot 1/14/2019 Goal: Create a bubble plot of WEIGHT vs. HEIGHT, grouped by SEX with bubbles sized by AGE. Input: SASHELP.CLASS Syntax: BUBBLE statement X= argument Y= argument SIZE= argument GROUP= option Getting Started with the SGPLOT Procedure
Exercise #3: Grouped Bubble Plot 1/14/2019 proc sgplot data=sashelp.class; bubble x=height y=weight size=age / group=sex; run; Getting Started with the SGPLOT Procedure
X and Y are required arguments that specify the variables to plot. The SERIES Statement 1/14/2019 Creates a line plot. proc sgplot data=<input-data-set> <options>; series x=variable y=variable < / options>; run; By default, only lines are shown, not the points themselves. To add markers to points, use MARKERS option. Getting Started with the SGPLOT Procedure X and Y are required arguments that specify the variables to plot.
Exercise #4: Grouped Series Plot 1/14/2019 Goal: Create a series plot of closing price (CLOSE) by date (DATE) grouped by company (STOCK). Add a title to your plot. Input: SASHELP.STOCKS Syntax: SERIES statement X= argument Y= argument GROUP= option TITLE statement Getting Started with the SGPLOT Procedure
Exercise #4: Grouped Series Plot 1/14/2019 proc sgplot data=sashelp.stocks; title "Stock Prices 1986-2005"; series x=date y=close / group=stock; run; Getting Started with the SGPLOT Procedure
Use either X OR Y to specify values to plot along X or Y axis. The HIGHLOW Statement 1/14/2019 Creates floating vertical or horizontal lines representing high and low values. proc sgplot data=<input-data-set> <options>; highlow x=variable | y=variable high=variable low=variable < / options>; run; Add CLOSE= option to specify variable for a closing tick mark. Getting Started with the SGPLOT Procedure Use either X OR Y to specify values to plot along X or Y axis. Use both HIGH AND LOW to specify upper and lower values for the floating lines.
Exercise #5: High-Low Plot 1/14/2019 Goal: Create a high-low plot of monthly stock prices with closing ticks for the stock IBM during the year 2005. Input: SASHELP.STOCKS Syntax: HIGHLOW statement X= argument HIGH= argument LOW= argument CLOSE= option WHERE statement Getting Started with the SGPLOT Procedure
Exercise #5: High-Low Plot HIGH and LOW specify endpoints of each bar. 1/14/2019 proc sgplot data=sashelp.stocks; where stock='IBM' and date >= "01jan2005"d; highlow x=date high=high low=low / close=close; run; Use X= for vertical bars or Y= for horizontal, but not both! Getting Started with the SGPLOT Procedure CLOSE variable determines locations of closing ticks
Analysis variable must be numeric! The HBOX Statement 1/14/2019 Creates a horizontal box plot. proc sgplot data=<input-data-set> <options>; hbox variable < / options>; run; Use CATEGORY= option to create a box for each distinct value of a category variable. (Can be combined with GROUPing.) VBOX statement is analogous for vertical box plots. Getting Started with the SGPLOT Procedure Analysis variable must be numeric!
Minimum Value Above Lower Fence Maximum Value Beneath Upper Fence Anatomy of a Box Plot 1/14/2019 Distance between Q1 and Q3 is the Inter-Quartile Range (IQR) Q1 Mean Q3 Outlier Getting Started with the SGPLOT Procedure Minimum Value Above Lower Fence Median Maximum Value Beneath Upper Fence Values outside fence are considered outliers. Lower Fence = Q1 – 1.5*IQR Upper Fence = Q3 + 1.5*IQR
Exercise #6: Horizontal Box Plot 1/14/2019 Goal: Create a horizontal box plot of vehicle price (MSRP) by vehicle type (TYPE). Input: SASHELP.CARS Syntax: HBOX statement Numeric analysis variable CATEGORY= option Getting Started with the SGPLOT Procedure
Exercise #6: Horizontal Box Plot 1/14/2019 proc sgplot data=sashelp.cars; title "Price by Car Type"; hbox msrp / category=type; run; Getting Started with the SGPLOT Procedure
The VBAR Statement 1/14/2019 Creates a vertical bar chart. proc sgplot data=<input-data-set> <options>; vbar categorical-variable < / options>; run; RESPONSE= option specifies response variable to control length of bars. (Otherwise, bars represent frequency counts.) STAT= option specifies statistic for length of bars (Default is SUM when RESPONSE variable is included, FREQ otherwise.) HBAR statement is analogous for horizontal bar charts. Getting Started with the SGPLOT Procedure
Exercise #7: Vertical Bar Chart 1/14/2019 Goal: Create a vertical bar chart of mean engine size (ENGINESIZE) by vehicle origin (ORIGIN). Input: SASHELP.CARS Syntax: VBAR statement Categorical variable RESPONSE= option STAT= option Getting Started with the SGPLOT Procedure
Exercise #7: Vertical Bar Chart 1/14/2019 proc sgplot data=sashelp.cars; title "Mean Engine Size by Origin"; vbar origin / response=enginesize stat=mean; run; Getting Started with the SGPLOT Procedure
Exercise #7: Vertical Bar Chart - BONUS 1/14/2019 proc sgplot data=sashelp.cars; title "Mean Engine Size by Origin"; vbar origin / response=enginesize stat=mean limits=both; run; Getting Started with the SGPLOT Procedure LIMITS= option adds upper limits, lower limits, or both. LIMITSTAT= option specifies statistics (default is confidence limits).
The GROUP= Option 1/14/2019 proc sgplot data=<input-data-set> <options>; vbar categorical-variable / group=variable <more options>; run; GROUP= option will create a bar for each distinct value of a grouping variable, within each category. Use GROUPDISPLAY= to specify how bars are grouped (CLUSTER or STACK) Getting Started with the SGPLOT Procedure
Exercise #8: Grouped Vertical Bar Chart 1/14/2019 Goal: Create a vertical bar chart of mean engine size (ENGINESIZE) by vehicle type (TYPE) and grouped into clusters by vehicle origin (ORIGIN). Input: SASHELP.CARS Syntax: VBAR statement Categorical variable RESPONSE= option STAT= option GROUP= option GROUPDISPLAY= option Getting Started with the SGPLOT Procedure
Exercise #8: Grouped Vertical Bar Chart 1/14/2019 proc sgplot data=sashelp.cars; title "Mean Engine Size by Type and Origin"; vbar type / response=enginesize stat=mean group=origin groupdisplay=cluster; run; Getting Started with the SGPLOT Procedure
Exercise #8: Grouped Vertical Bar Chart - BONUS 1/14/2019 proc sgplot data=sashelp.cars; title "Mean Engine Size by Type and Origin"; vbar type / response=enginesize stat=mean group=origin groupdisplay=stack; run; Getting Started with the SGPLOT Procedure To stack the bars, use the GROUPDISPLAY= option with a value of STACK instead of CLUSTER.
X and Y are required arguments that specify the variables to plot. The HEATMAP Statement 1/14/2019 Color-codes rectangles based on two-dimensional binning of data. proc sgplot data=<input-data-set> <options>; heatmap x=variable y=variable < / options>; run; Options are available to control the size and/or number of bins in each dimension as well as the colors used. Getting Started with the SGPLOT Procedure X and Y are required arguments that specify the variables to plot.
Exercise #9: Heat Map 1/14/2019 Goal: Create a heat map of CHOLESTEROL vs. WEIGHT. Input: SASHELP.HEART Syntax: HEATMAP statement X= option Y= option Getting Started with the SGPLOT Procedure
Exercise #9: Heat Map 1/14/2019 Bins are colored according to frequency count. proc sgplot data=sashelp.heart; heatmap x=weight y=cholesterol; run; Getting Started with the SGPLOT Procedure Data are grouped into bins in two dimensions.
Exercise #9: Heat Map - BONUS 1/14/2019 proc sgplot data=sashelp.heart; heatmap x=weight y=cholesterol / nxbins=50 nybins=50; run; Getting Started with the SGPLOT Procedure Specify 50 bins in the X dimension and 50 bins in the Y dimension (2500 total rectangles).
The VLINE Statement 1/14/2019 Creates a vertical line chart (line is horizontal). proc sgplot data=<input-data-set> <options>; vline categorical-variable < / options>; run; VLINE plots statistics, SERIES plots raw data points RESPONSE= and STAT= options are similar to VBAR HLINE statement is analogous for horizontal line charts. Getting Started with the SGPLOT Procedure
Exercise #10: Grouped Vertical Line Chart 1/14/2019 Goal: Create a vertical line chart of mean HEIGHT by AGE grouped by SEX and include plot markers. Input: SASHELP.CLASS Syntax: VLINE statement Categorical variables RESPONSE= option STAT= option GROUP= option MARKERS options Getting Started with the SGPLOT Procedure
Exercise #10: Grouped Vertical Line Chart 1/14/2019 proc sgplot data=sashelp.class; title "Height by Age and Sex"; vline age / response=height stat=mean markers group=sex; run; Getting Started with the SGPLOT Procedure
Exercise #10: Grouped Vertical Line Chart - BONUS 1/14/2019 proc sgplot data=sashelp.class; title "Height by Age and Sex"; vline age / response=height stat=mean markers group=sex limits=both; run; Getting Started with the SGPLOT Procedure Adds confidence limits
X and Y are required arguments that specify the variables to plot. The REG Statement 1/14/2019 Fits a regression line or curve. proc sgplot data=<input-data-set> <options>; reg x=variable y=variable < / options>; run; Includes both plot markers and line by default. Remove markers with NOMARKERS option. Getting Started with the SGPLOT Procedure X and Y are required arguments that specify the variables to plot.
Exercise #11: Regression Plot 1/14/2019 Goal: Create a regression plot of WEIGHT vs. HEIGHT. Input: SASHELP.CLASS Syntax: REG statement X= option Y= option Getting Started with the SGPLOT Procedure
Exercise #11: Regression Plot 1/14/2019 proc sgplot data=sashelp.class; title "Height vs. Weight"; reg x=height y=weight; run; Getting Started with the SGPLOT Procedure
Exercise #11: Regression Plot - BONUS 1/14/2019 proc sgplot data=sashelp.class; title "Height vs. Weight"; reg x=height y=weight / clm cli; run; Getting Started with the SGPLOT Procedure CLM adds confidence limits. CLI adds prediction limits.
COMBINATION PLOTS EXERCISES 12-14 1/14/2019 Getting Started with the SGPLOT Procedure COMBINATION PLOTS
About Combination Plots 1/14/2019 proc sgplot data=<input-data-set> <options>; <plot-request-statement-1> <plot-request-statement-2> <plot-request-statement-3> ... <other optional statements> run; All plots are overlaid atop one another in the same graph space and using the same axis system. Combination plots are created when more than one plot statement is used. Getting Started with the SGPLOT Procedure
The HISTOGRAM and DENSITY Statements 1/14/2019 HISTOGRAM creates a histogram. DENSITY creates a density curve. proc sgplot data=<input-data-set> <options>; histogram response-variable < / options>; density response-variable < / options>; run; Getting Started with the SGPLOT Procedure
Exercise #12: Histogram and Density Plot 1/14/2019 Goal: Create a combination plot including both a histogram and density plot of WEIGHT. Suppress the automatic legend. Input: SASHELP.HEART Syntax: HISTOGRAM statement Response variable DENSITY statement NOAUTOLEGEND option (on PROC SGPLOT) Getting Started with the SGPLOT Procedure
Exercise #12: Histogram and Density Plot 1/14/2019 Suppress automatic legend proc sgplot data=sashelp.heart noautolegend; title "Weight Distribution of Patients"; histogram weight; density weight; run; Getting Started with the SGPLOT Procedure Two plot statements are used. Either could have been used separately. Density is drawn on top of histogram because it appears later in SGPLOT.
Exercise #13: Layered Vertical Bar Chart 1/14/2019 Goal: Create a layered vertical bar chart showing both city (MPG_CITY) and highway mileage (MPG_HIGHWAY) by type. Input: SASHELP.CARS Syntax: VBAR statement (twice!) Categorical variable RESPONSE= option STAT= option BARWIDTH=0.5 option (only on second VBAR) Statistical Graphics (SG) Procedures
Exercise #13: Layered Vertical Bar Chart 1/14/2019 proc sgplot data=sashelp.cars; title 'Mileage by Type'; vbar type / response=mpg_city stat=mean; vbar type / response=mpg_highway stat=mean barwidth=0.5; run; Statistical Graphics (SG) Procedures Decrease bar width to 50% MPG_HIGHWAY is layered over MPG_CITY because we plotted MPG_HIGHWAY second.
The VBARBASIC and HBARBASIC Statements 1/14/2019 proc sgplot data=sashelp.class; vbar age / response=height stat=mean; scatter x=height y=age; run; ERROR: Attempting to overlay incompatible plot or chart types. Some combinations of plot statements are not allowed To combine VBAR or HBAR with basic plot types (such as SCATTER, SERIES, etc.), use VBARBASIC or HBARBASIC. Statistical Graphics (SG) Procedures
Multiple Axis Systems in Combination Plots 1/14/2019 By default, all plots requests use a common set of axes Use the X2AXIS and/or Y2AXIS options on a plot statement to use the secondary axes. Secondary X Axis Getting Started with the SGPLOT Procedure Primary Y Axis Secondary Y Axis Primary X Axis
Exercise #14: Bar and Line Chart 1/14/2019 Goal: Combine a series plot of closing price and a vertical bar chart of trading volume for monthly IBM stock data for 2005. Input: SASHELP.STOCKS Syntax: VBARBASIC statement Categorical variable RESPONSE= option Y2AXIS option SERIES statement X= and Y= arguments MARKERS option WHERE statement Statistical Graphics (SG) Procedures
Exercise #14: Bar and Line Chart 1/14/2019 proc sgplot data=sashelp.stocks; title IBM Stock Price and Volume for 2005'; where stock='IBM' and year(date)=2005; vbarbasic date / response=volume y2axis; series x=date y=close / markers; run; Statistical Graphics (SG) Procedures
1/14/2019 Getting Started with the SGPLOT Procedure WRAP-UP
Conclusion The SGPLOT procedure is extremely versatile. 1/14/2019 The SGPLOT procedure is extremely versatile. Presentation-ready graphics with minimal coding. We've only scratched the surface! Dozens more statements and hundreds more options provide: Complete customization of plot symbols, lines, labels, titles, axes, legends and much more Advanced features like axis tables and attribute maps Custom annotation facility Getting Started with the SGPLOT Procedure
Recommended Resources 1/14/2019 SAS 9.4 ODS Graphics: Procedures Guide, Sixth Edition (SAS Press, 2016) View in browser at http://support.sas.com/documentation/cdl/en/grstatproc/69716/HTML/default/viewer.htm#titlepage.htm Download PDF free at http://support.sas.com/documentation/cdl/en/grstatproc/69716/PDF/default/grstatproc.pdf Getting Started with the SGPLOT Procedure
Recommended Resources 1/14/2019 Over 30,000 SAS User Group papers available free online at http://www.lexjansen.com. 40+ years of papers from SAS Global Forum (formerly SUGI), PharmaSUG, and regional conferences (WUSS, MWSUG, etc.) Fully searchable database Includes many papers authored by SAS staff including ODS Statistical Graphics experts such as Dan Heath, Warren Kuhfeld, Sanjay Matange, Cynthia Zender, and others. Getting Started with the SGPLOT Procedure
Contact Information 1/14/2019 Thank you for attending! Feel free to contact me with questions or comments: Josh Horstman Nested Loop Consulting josh@nestedloopconsulting.com 317-721-1009 Getting Started with the SGPLOT Procedure