Download presentation
Presentation is loading. Please wait.
Published bySheena McDonald Modified over 9 years ago
1
1 Graphics HRP223 – 2013 November 18, 2013 Copyright © 1999-2013 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.
2
2 Robbins Creating More Effective Graphics by Naomi Robbins is a wonderful book showing the right and wrong ways to visualize scientific data. Read it when you have an afternoon off. It is an ideal read on a transcontinental flight.
3
3 How I do graphics Exploratory stuff – Use the quick and dirty graphics built into EG Production quality graphics – Write SAS or R code to make better looking graphics – Edit in Adobe Illustrator
4
4 Visualization Tools This is a excellent book that covers how to visualize stuff using many tools (including R). It has a great introduction to Adobe Illustrator.
5
5 Why Do Data Visualization? Well designed pictures will show you the details and the whole pattern in your data. Numeric descriptions can easily hide important information. Some patterns are hard to detect in tables. – Whenever data is reported over time or locations, you need art. YOU CAN LEARN A LOT BY JUST LOOKING. -Yogi Berra
6
6 Fisher’s Plot Data Reported in Cleveland Based on code written by Robert Allison at SAS Institute Year 1Year 2
7
7 Scatter Plot for Correlations All have r 2 =.67Anscombe 1973, Graphs in Statistical Analysis
8
8 Good Statistical Graphics Show central tendency and variability Do not require the reader to think… – Include no extra information – Avoids unnecessary ink – Highlights they key points – Directly plots the conclusion/inference – Uses mnemonic colors – Labels categories Includes the sample size
9
9 Bad Things First, I want to talk about bad graphics that I frequently see. – 3d – Extra Ink – Pie – Donuts – Stacked graphics
10
10 General 3D graphics – Don’t, Don’t, Don’t While the SAS implementation of 3D graphics is relatively good, don’t use 3D effects, unless you are measuring something in 3D. Even then, don’t.
11
11 Tufte is a God to many. The empiricist in me is very nervous about the amount of pontificating in his books… – I want to have evidence-based advice. His best advice is to put no extra ink on the page. – Think about the ink-to-information ratio. – Remove all chart junk. Note: the irony of the chart junk on this slide….
12
12 Ink-to-Information Ratio How much ink for seven numbers? Based on Soukup & Davidson, 2002 Visual Data Mining
13
13 Pie is bad. Work by Cleveland (and experimental psychologists) suggests that: – people are bad at judging the relative magnitude of angles – if you twist the rotation of the pie you can cause people to systematically misjudge the size of the angles – a 3 rd dimension makes judgment worse If you get a glossy handout with a 3D pie, assume someone is lying to you. Don’t use them.
14
14 Don’t Explode! This exploded 3D pie (brought to you by Excel) is nearly useless for judging amounts.
15
15 Forbidden Donut…. Donut plots have the same problems as pies (if not worse) ….
16
16 Cleveland If you want to know how to do scientific visualization, you must read William Cleveland’s work. – He attempted to quantify what makes a good graphic good. His early work on graphics is one of the reasons why R/S-plus is taking over the statistical world.
17
17 Stacking is Bad Cleveland also quantified the fact that people are bad at judging the relative height of stacked data.
18
18 Wow, a cinnamon roll plot! Good luck making rapid judgments using this stacked 3D pie.
19
19 Bar Chart ~ Default
20
20 How did I know that? Start using the sgplot documentation: http://support.sas.com/documentation/cdl/en/grstatproc/64978/HTML/default/viewer.htm#n0yjdd910dh59zn1toodgu paj4v9.htm … and/or get this: http://library.stanford.edu/bks24_id=45422 http://library.stanford.edu/bks24_id=45422
21
21 sgplot documentation: http://support.sas.com/documen tation/cdl/en/grstatproc/64978/H TML/default/viewer.htm#n0yjdd9 10dh59zn1toodgupaj4v9.htm
22
22 Custom Axis I prefer to have ticks beyond the largest value
23
23 yaxis stuff ; The popular options are not listed first…
24
24 Ticks on Both Y Axes Ticks on both axes may help guestimate the counts
25
25 Things in are optional arguments
26
26 Data in Jail The lines are on top so they look prominent. Can I lighten them?
27
27 OR Things in are optional arguments A variable name A number Optionally add other numbers Optionally add a / and options for the reference line(s)
28
28
29
29 Line Attributes I wanted to adjust the line attributes so I will use / lineattrs=() OR Give a set of line options in () A style element with options I wish I knew how that works…
30
30 Details in parentheses:
31
31 Gray is here…
32
32 Reference Lines First draw reference lines then draw bars on top.
33
33 Reducing Ink In Tufte’s world less ink is good. I could remove the bar color but that is removing necessary ink for this style of plot. I can remove the black edges to the bars.
34
34 Second axis labels are gone
35
35
36
36 Put White Lines Over the Bars
37
37 Ut oh…
38
38
39
39 Awful Design…. I specify a title and it does not show up in the plot.
40
40
41
41 I want to draw attention to this.
42
42 Specify you want a different color group for each bar. Specify the bar and outline colors.
43
43 Notice the blue.
44
44
45
45 Cleveland made Dot Plots Replace Bar Charts
46
46 Adding an Offset to the y-axis
47
47 Adding Value Labels
48
48 Show Frequency Counts
49
49 Label Attributes
50
50
51
51 You can easily specify that each value is its own color group.
52
52
53
53 What is a good graphic? Don’t make your audience think unnecessarily! – The point of the graphic should stand out instantly. – Plot the quantity (inference) that you want people to notice. Show the central tendency and the variability. Minimize the amount of ink on the page. Be sure colorblind people can understand it. – Use a black and white photocopier and make sure you can distinguish all groups.
54
54 What is wrong with this? What is the point that the reader should learn from this? How is the variability represented? What are the error bars? Can you interpret a 1 SD error bars? How many people are included? Ink to information … How many numbers are depicted? Never contrast black on blue.
55
55 Another Way
56
56 How did I do that?
57
57
58
58 Don’t put a legend on the plot, add a pop-up description tooltip for pages on the web and save the graphics template.
59
59 The Graphic Template Premade complex plot designs are stored in a graphic template. You can also save the graphic template for any plot you make. Add tmplout= to the gplot statement.
60
60 The Template I gave it a name Use the new template on the dataset.
61
61 With the title in the graphic
62
62 Another version of the same data
63
63 Moderately Awful Code Bubble sizes are set by providing a radius… I force the area by setting the smallest and largest bubble size in centimeters. I want to use this part of the plot for the legend so I name it.
64
64 9.3 How do I set the blue and pink? For grouped data you can specify the details for each plotted element in a style template:
65
65 What is wrong with this? What is the point of this graphic? How are the two sexes represented? What data is this?
66
66
67
67 Easy But Awful Boxplots
68
68 What is wrong with this? Lovely white space What is the point? How are the sexes represented? How many people? Where is the mean? What data is this?
69
69 What is the point of this graphic? How are the two sexes represented? A Good Boxplot
70
70 What is the point of this graphic? A Very Good Boxplot
71
71 Code for a Very Good Boxplot 9.4
72
72
73
73 Built in Graphics Many Enterprise Guide analyses have built in diagnostic graphics.
74
74 When you test for the difference in the mean SAS gives you a great plot.
75
75 Avoid Thinking Put labels on the graphic directly instead of using a key. If you want people to compare the difference between two lines, plot the difference, not the two lines.
76
76 Bivariate Comparisons with Lines People are extremely bad at judging the distance between two curves. Never ask people to judge up and down (vertical) distances between curves. Based on: Robbins Creating More Effective Graphs, 2005 The distance between the two curves is the same at all points.
77
77 Plot Types Categorical variables – Descriptive Bar charts Dot plots – Inferential Continuous variables – Histogram – Box plot – Violin plots – Quantile and QQ plots
78
78 Frequency Plots EG for frequency plots Custom code
79
79 I Typically Use HTML This says the images should show tooltips with extra statistical details when you hover the mouse over parts of the graphic. (I can’t image these.) This is the appearance template. For optimal results use: Analysis: color Default : overdistinguishes symbols for color or B&W Journal or journal2, etc: black and white Statistical or statistical2, etc: color Include image_dpi = 300 to set the resolution to be higher than the default 100 dots per inch. Try 300 for final images pasting into MS Office.
80
80 ods graphics on; This turns on the ODS statistical graphics. Behind the scenes this combines your data with a pre-specified description of what to plot and the aesthetics of the appearance. Your data Graph template Style template What Where? Colors Fonts
81
81 Useful ods graphics options ods graphics on / ods graphics / reset; ods graphics off; Width = 8in Height = 11in Imagefmt = jpg imagename = thingy imagefmt = staticmap ; Make a series of graphics called thingy1, thingy2, etc. If you set only width or height, it will use a 4:3 aspect ratio. Reset the graphic counter back to 1 Use pop-up tooltips with details. If you want to disable ods graphics for a procedure
82
82
83
83 ODS Graphics Editor with EG If you want to do extensive tweaking to a graphic, you can use the WYSIWYG ODS Graphics editor. Unfortunately it only works with ODS graphics procedures and you need to rerun the code in SAS to invoke it.
84
84 Move code from EG to SAS 1.Use the query builder to put your data in a permanent SAS library (not the work library). 2.Right click on the graphic node which is run on data in a permanent library and choose Open… Open Last Submitted Code. 3.Copy the code beginning with the SQL that makes the data. 4.Start SAS and paste the code into the program editor.
85
85 Move all your code to SAS Because the ODS graphics editor is not in EG (yet), you can export the entire set of code for the project and then rerun it in SAS.
86
86 ODS Graphics Editor with EG (2) After exporting all your EG project, open the code in SAS and add these lines at the top of the program: ods rtf file = "c:\blah\somefile.rtf"; ods listing sge = on; Then open the graphic of interest.
87
87
88
88 WYSIWYG Editing Right click and/or double click to set properties for objects in the plot. The tool is optimized for some of the ODS style templates but you can use custom colors.
89
89 Right click on things to set properties. – Colors, text details, fonts – Point and click annotation – Symbols, arrows, text, circles
90
90 WYSIWYG Editing While the Statistical graphics editor is a much needed improvement, it is incomplete. You can only use a few, style templates (for setting default colors and such) and you can not use custom style templates. This means that you can not do critical tasks like manually set the color for different values in scatter plots.
91
91 Bar Charts The ink-to-information ratio is lousy. A one dimensional quantity is being “expanded” into two dimensions. – Doubling of the amount corresponds to how much of an increase in area?
92
92 SAS Bar Charts SAS makes the reader do extra work by rotating the axis labels in ActiveX images. They pointlessly include variable labels by default.
93
93 How to do it? Notice you can Edit the data and apply filters. You can right click on variables and apply user-defined formats off the Properties dialog.
94
94 First create the format. In the Data windowpane of the Bar Chart GUI, right click on the variable and change the format to the User Defined format you had created.
95
95 The GUI is Solid My only complaints are that the rotate grouping values text does not work (position in this example) and the summary statistics do not show up when you request ActiveX images.
96
96.PNG format ActiveX image format
97
97 Saving the Graphic for Publication The easiest way to get publication quality graphics is to set the output type to be RTF.
98
98 Default Output and Graphics The default graphic format in EG is ActiveX. These images can be edited (even on the web) but they only display with Internet Explorer. I have set my graphics to display as ActiveX images. Tweak this with Tools> Options… > Graph.
99
99
100
100 Types of Images The default formats of the images are determined by the ODS destinations you are using: – LISTING: pgn visible in the Windows Image Fax Viewer – HTML: png, gif, jpg contained in web pages and visible in Internet Explorer, Firefox or Opera – LATEX: PostScrpt, epsi, gif, jpeg, pgn are visible in GhostView – PCL or PS: contained in Postscript file are visible in GhostView – PDF: contained in pdf, which is visible with Adobe Reader – RTF: visible in MS Word RTF graphics are done at 300 dpi by default
101
101 You can browse the ODS appearance templates from the Style Manager on the Tools menu.
102
102 ODS SGraphics Compared to the competition, for the last 10 years SAS graphics have been between poor and pathetic. – Graphics procedures rendered with okay quality, at best. – No “what you see is what you get” editing. – Many plots were nearly impossible to render. – Custom graphics required extensive programming. SAS 9.x has attempted to solve this problem.
103
103 Old vs. New Procedures The old (commonly used) graphics procedures were gchart, gplot. Now most analysis procedures have built in high quality graphics that can be invoked with an ODS graphics on statement. – Early on in the class I told you to tweak the EG options to include “ODS graphics on” with every run. There are also new “easy to use” statistical graphics (sg) procedures.
104
104 New Graphics Statistical Graphics Procs proc sgPlot – general plotting procedure that replaces gplot proc sgScatter – lots of tools for scatterplots and scatter matrices proc sgPanel – quick and easy trellis/lattice/matrix/panel of plots Proc sgRender – used with proc template to make totally custom plots – It replaces proc greplay
105
105 Plot Types Categorical variables – Bar charts – Dot plots Continuous variables – Histogram – Box plot – Violin plots – Quantile and QQ plots
106
106 Dot charts Categorical variables
107
107 Grouped Categorical Variables To graph categorical data in SAS you need to get Michael Friendly’s Visualizing Categorical Data. Unfortunately, his macros are copyrighted with the book… So I will show you the R versions. – Fourfold plots – Mosaic plots – Association plots Grouped categorical variables
108
108 If you want to use R Download R for Mac or PC cran.cnr.berkeley.edu/bin/macosx/ cran.cnr.berkeley.edu/bin/windows/base cran.cnr.berkeley.edu/bin/macosx/ cran.cnr.berkeley.edu/bin/windows/base
109
109 How to learn R I usually teach R classes in the summer. – www.stanford.edu/~balise/ has links to my slide decks for R classes. www.stanford.edu/~balise/
110
110 Plots for inference Categorical plots – Confidence limits on odds ratios – Four-fold plots – Expectancy plots – Mosaic plots
111
111 Fourfold Plots They draw 4 slices of pie with the area corresponding to the number of people in each cell of a 2x2 table and they have confidence bands such that if the confidence bounds overlap on adjacent pie pieces, they are not statistically significantly different. Grouped categorical variables 45% male vs. 30% female admission
112
112 More males were admitted than females. There is clear evidence of sexist policies in admissions! Grouped categorical variables
113
113 Department A admitted more females than males and every other department had no bias! The joy of Simpsons paradox. Grouped categorical variables
114
114 Mosaic Plots So you have an contingency table and you want to know if there is as an association. You do a chi-square test and it says there are associations between the rows and columns. What next? Grouped categorical variables
115
115 Some basic voodoo in R shows which combinations are over (in blue) or under represented (in red). Grouped categorical variables
116
116 I prefer the simpler association plots. Grouped categorical variables
117
117 Continuous Outcomes The Distribution Analysis menu option can do basic plots. Continuous variables
118
118 The resolution of the histogram is okay but the others are unacceptable.
119
119 Use sgplot for high resolution plots. Continuous variables
120
120 As you add more requests to the plot, it resizes and shifts things to make room. It draws them in the order you request them. It reads the requests from the first listed to the bottom. Change the order if you want to have an item appear layered on top of, or behind, another thing. Some colors are not set yet in the enhanced editor. Use the menu Tools>Options>Enhanced Editor… then click User Defined Keywords to add the coloring.
121
121 I want the title!
122
122 How is that made? proc format library = work; value $smoked "Non-smoker" = "None " missing = "Missing" other = "Not none" ; run; data fram; set sashelp.heart; smokin = put(smoking_Status, $smoked.); run;
123
123 How is that made? title "5209 Cholesterol Measures from Framingham Heart Study"; proc sgplot data = fram tmplout="c:\blah\plate.sas"; histogram cholesterol; density cholesterol / type = kernel; density cholesterol / type = normal; keylegend / location=inside position=topright across=1; run; Make a new graphics template
124
124 proc template; define statgraph sgplotFram; begingraph /; EntryTitle "5209 Cholesterol Measures from Framingham Heart Study" /; layout overlay; Histogram 'Cholesterol'n / primary=true binaxis=false LegendLabel="Cholesterol"; DensityPlot 'Cholesterol'n / Lineattrs=GraphFit kernel() LegendLabel="Kernel" NAME="DENSITY"; DensityPlot 'Cholesterol'n / Lineattrs=GraphFit2 normal() LegendLabel="Normal" NAME="DENSITY1"; DiscreteLegend "DENSITY" "DENSITY1" / Location=Inside across=1 halign=right valign=top; endlayout; endgraph; end; run; proc sgrender data = work.fram template = template=sgplotFram ; run; This was saved in plate.sas. Render a graphic with the template and dataset specified. Note I changed the name of this template.
125
125 How to set the color for a histogram
126
126 proc sgplot data = fram; histogram weight / fillattrs = (color = coral); run;
127
127 You can also tweak the style template
128
128 Tweaking the Style Template proc template; define style myStyle; parent = styles.Statistical; style GraphDataDefault / color=coral; end; run; ods html style = myStyle; proc sgplot data = fram; histogram weight ; run; ods html close;
129
129 vbar Version proc sgplot data = fram; vbar weight / group = sex; run;
130
130 proc sgplot data = fram; vbar weight / group = sex; xaxis fitpolicy = thin ; run;
131
131 proc template; define style myStyle; parent = styles.Statistical; style graphdata1 from graphdata1 / contrastColor=pink color = pink; style graphdata2 from graphdata1 / contrastColor=blue color = blue; end; run; ods html style = myStyle; proc sgplot data = fram; vbar weight / group = sex; xaxis fitpolicy = thin ; run; ods html close;
132
132 Customizing graphics You can tweak the graphics that ship with SAS by modifying their graph template or you can create truly custom graphics by making your own statistical graph template. Your data Graph template Style template
133
133 If you do not want to explain what Kernel density estimation is… remove the lines.
134
134 Finding the template Add before the procedure that draws the graphic add ods trace on; and include ods trace off; afterwards. This prints the names of all the templates used by the procedure in the log. product.procedure.Graphis.TemplateName
135
135 Looking at a Template You can ask proc template to display the template with the source statement: proc template; source stat.ttest.graphics.summary2; run; Remember to type this before you start editing: ods path(prepend) work.template (update);
136
136 Don’t Panic This is a complete template except for the proc template statement here and a run statement at the bottom. Copy this into an editor window and add proc template.
137
137 After adding proc template and commenting out the Kernel statements rerun the code.
138
138 Oops. Unknown key words… You can fix the color coding on the template code easily.
139
139 Fixed (permanently) All your subsequent plots will have no density line.
140
140 Continuous variables
141
141 Violin A violin plot mirrors the shape of the histogram (density). They can be done in R. Continuous variables
142
142 Grouped Continuous Variables You can use the Distribution Analysis to get basic grouped plots. For better looking plots you need to write sgplot and/or sgpanel code. Grouped continuous variables
143
143 Request distinct graphics by subgroups. Grouped continuous variables
144
144 Grouped continuous variables
145
145 Actually this took a bit of voodoo. Grouped continuous variables
146
146 1 st 2 nd Grouped continuous variables
147
147 Double click here. Put details on the histogram tweaks here. I use/tweak nrow ncol and endpoints often. endpoints = 2 to 10 by 0.5 midpoints = 5.6 5.8 6.0 6.2 6.4 Grouped continuous variables
148
148 Grouped continuous variables
149
149 Grouped continuous variables
150
150 I want to add in a reference line showing what is normal and put the categories in order.
151
151
152
152 Side by Side Violin Plots Grouped continuous variables
153
153 Paired Continuous Variables People typically show paired data with scatterplots. EG generate them: Grouped continuous variables
154
154 Scatter Plot Grouped continuous variables
155
155 Jittered Plot
156
156 Jitter vs. Sunflowers In R you can also do sunflower plots. Grouped continuous variables
157
157 Ordinary Least Squares Regression People typically plot a regression line to show a relationship between two continuous variables. Grouped continuous variables
158
158 Regression line You can easily add a regression line to the scatter plot.
159
159 proc sgplot data = fram; scatter x = height y = weight; run; proc sgplot data = fram; reg x = height y = weight; run;
160
160 ods listing sge = on style = statistical; proc sgplot data = fram; reg x = height y = weight / markerattrs = (color = green) lineattrs = graphdata1 (color = lime); run;
161
161 ods listing style = statistical; proc sgplot data = fram; reg x = height y = weight / group = sex ; run;
162
162 proc template; define style sexE; parent = styles.Statistical; style graphdata1 / contrastColor=pink markersymbol = "star"; style graphdata2 / contrastColor=blue markersymbol = "plus"; end; run; ods html body="C:\blah\stuff.html" gpath="c:\blah" style = sexE; proc sgplot data = fram; scatter x = height y = weight / group = sex ; reg x = height y = weight / group = sex ; run; ods html close;
163
163
164
164 Bisquare Figure out what is an odd value and then put a weight on it to devalue it. There are many robust regression algorithms around. R and S-Plus software have them well implemented. Grouped continuous variables
165
165 Loess and Splines Loess is a technique essentially creates a rolling window and gets a weighted average across the values visible inside the window. Splines are curved lines that allow different amounts of stiffness to the curves. Grouped continuous variables
166
166 Smooth = 25 Smooth = 50 Smooth = 99
167
167 Proc phreg has a lot of new features but nothing major in the graphics. With phreg, if you specify ods graphics on you do not automatically get any plots. Here I request survival and cumulative hazard plots including the global confidence limits option (cl). Once again the option names are not consistent with the table names.
168
168 Proc lifetest can show the number at risk but the implementation is weak. It labels the groups with numbers even if the strata are character strings. You have to manually edit them and this affords ample opportunity for mistakes. I don’t see a way to change the censoring symbol in the legend. This shows the number of people at risk after 20, 40 etc days.
169
169 Too Many Graphics If the ods graphics on statement gives you too many graphics, you can specify which graphics you want by including code designed for the procedure. Typically it looks like this: plot(only) = (table names). This design is poorly implemented because you need to know where to put the plot statement and what the table names are. Does it go on the proc line (like phreg), the tables line (like proc freq), or some other line? Also the table names specified with a plot statement do not always match the ODS table names.
170
170 Usually you can use an ODS exclude statement or an ODS select statement to pick the correct things to print. Using the plots(only) = syntax is more efficient.
171
171 Splitting a Grid Some procedures produce a grid of plots. You can get access to the individual plots by specifying plots(unpack). Then you can use plots(only)=tableName to get just the right parts. ODS select or exclude statements will not work.
172
172 plots(GlobalOptionsGoHere). The global options apply to all graphics in this procedure.
173
173 Beyond the Basic Univariate plots There are 4 SG procedures that allow you to build up complex univariate plots and do multivariate (trellis/lattice) plots.
174
174 Statistical Graphics Procs proc sgPlot – general plotting procedure that replaces gplot Proc sgRender – used with proc template to make totally custom plots – It replaces proc greplay proc sgScatter – lots of tools for scatterplots and scatter matrices proc sgPanel – quick and easy trellis/lattice/matrix/panel of plots
175
175 Grids You can produce lattices full of graphics with proc gpanel.
176
176
177
177 Spaghetti Plots Data from Singer and Willett: www.ats.ucla.edu/stat/examples/alda.htm
178
178 SGPlot vs Template You can replicate everything done with proc sgplot using the template language but don’t reinvent the wheel if you don’t need to. You will want to use proc template to build custom graphics that use many panels. Proc sgplot uses statements that start like reg but template uses names like regressionplot. – Similar but not identical names… boo.
179
179
180
180
181
181 layout gridded = ticks do not have to align layout lattice = ticks must align
182
182
183
183
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.