ggplot2 II EPID 799C Wed Sep 19 2017 New Packages (install now!) RColorBrewer ggrepel plotly
Today’s Overview More ggplot: Homework 1: due today! Scales Colors ggplot extension packages Themes Homework 1: due today! Homework 2: out by tomorrow morning!
ggplot2 stats, scales, colors, themes
Ggplot components data aesthetic mapping geometric object statistical transformations scales coordinate system position adjustments faceting
Ggplot components
Let’s Try Review: make a scatterplot ggplot of the mpg dataset where displ and hwy are x and y. Overlay a linear relationship line by group using geom_smooth(). Color all of it by the class variable. Add and tweak some geoms add a layer with a black line modeling the overall relationship of the data. Hint: override… something! We have overplotting going on (how you can tell?). Lower the points alpha and jitter to see the problem. Add a geom_rug, just cuz. Jitter it as well.
Answers # start with ggplot(mpg, aes(displ, hwy, color=class))+ geom_smooth()+ geom_point(alpha=0.2)+ # add geoms geom_jitter()+ geom_smooth(color="black")+ geom_rug(position = "jitter")
scales: position Some form of: scale_x_continuous(limits=c(a,b), breaks=a:b) …is pretty common A note on clipping:
Let’s Try: Playing with scales
Answers ggplot(mpg, aes(displ, hwy, color=class))+ geom_jitter()+ geom_smooth()+ geom_smooth(color="black")+ #scale_x_continuous(limits=c(0,NA))+ #scale_y_continuous(limits=c(0,NA)) #scale_x_continuous(limits=c(0,5)) coord_cartesian(xlim = c(0,5)) #change the "window" (needs all limits)
scales: color Typically using brewer or manual
scales: color brewer RColorBrewer::display.brewer.all() http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3 RColorBrewer::display.brewer.all() Also, not covered, if you’re making maps, see tmaptools::palette_explorer()
Let’s Try
Answers ggplot(mpg, aes(displ, hwy, color=class))+ geom_point()+ #scale_color_discrete() #scale_color_manual(values = c("red", "yellow", "green", "purple", "orange", "blue", "black")) #scale_color_brewer("Blues") #scale_color_brewer(type="seq", palette="OrRd", guide="colorbar") #scale_color_brewer(type="div", palette="Spectral", guide="legend") #scale_color_brewer(type="div", palette="OrRd") #scale_color_manual(values = RColorBrewer::brewer.pal(7, "Blues"))
positions
positions dodge fill jitter nudge stack ggplot(mpg, aes(fl, fill=drv))+ geom_bar(position=“<POSITION>”) Where <POSITION> is one of: dodge fill jitter nudge stack Can be tweaked further with the function equivalents position = position_dodge(width=1) Or a few more detailed versions …+geom_label(aes(label=mylabs), nudge_y=1)
Let’s Try
Answers # Positions ggplot(mpg, aes(fl, fill=drv))+geom_bar() ggplot(mpg, aes(fl, fill=drv))+geom_bar(position="stack") ggplot(mpg, aes(fl, fill=drv))+geom_bar(position="dodge") ggplot(mpg, aes(fl, fill=drv))+geom_bar(position="fill") ggplot(mpg, aes(fl, fill=drv))+geom_bar(position="jitter") #ugh. ggplot(mpg, aes(cty, hwy, color=drv))+geom_point(position="jitter") ggplot(mpg, aes(cty, hwy, color=drv, label=drv))+geom_point()+geom_text(nudge_y = 2)
coordinate systems Rarely used, except for coord_flip() Another solution in helper packages
labels labs(), like last time, covers almost every label: x y title subtitle caption And can also do legend titles… …or do in the scale_*() function Use g+annotate() to just stick something (anything!) right where you want it.
Let’s Try
Answers # Labels ggplot(mpg, aes(displ, hwy, color=class))+ geom_point()+ annotate(geom="text", x=1, y=4, label="Note: nothing in \n this corner")+ annotate(geom="point", x=1, y=4, color="red")+ scale_y_continuous(limits = c(0,NA))+ scale_x_continuous(limits = c(0,NA)) # Add labs #coord_flip()
Slightly More Advanced: stats Put on your thinking caps
Layer=data+stats+geoms Hadley’s (package author) underlying theory from grammar of graphics: layer=data+stats+geoms. Often stat or geom imply the other, so each has a default parameter of the other. http://vita.had.co.nz/papers/layered-grammar.pdf Excellent Stack Overflow Review https://stackoverflow.com/questions/38775661/what- is-the-difference-between-geoms-and-stats-in-ggplot2
Let’s Try http://ggplot2.tidyverse.org/reference/
Let’s dig in… Default stats for geoms: http://sape.inf.usi.ch/quick- reference/ggplot2/geom
stat variables geoms, behind the scenes, often calculate a new dataframe to actually plot on screen. stat_<thing> calculates a new dataframe explicitly That dataframe has some “secret” (documented) variable names, accessible by special inline variables ..count.. ..ncount.. ..density.. ..ndensity.. ..count.. ..prop.. Etc. What if I want to refer to those new variables elsewhere in the ggplot call? (kinda guts / rare use)
Let’s Try
Answers # Stats and geoms ggplot(mpg, aes(displ))+geom_bar(stat="count") ggplot(mpg, aes(displ))+stat_count(geom) ggplot(mpg, aes(displ))+geom_bar(stat="density", aes(fill=..density..), color=NA) ggplot(mpg, aes(displ))+geom_bar(stat="density", aes(color=..scaled..)) ggplot(mpg, aes(displ))+geom_bar(stat="density", aes(color=..count..)) ggplot(mpg, aes(displ))+geom_bar(stat="bin") ggplot(mpg, aes(displ, hwy))+geom_tile(stat="bin_2d") ggplot(mpg, aes(displ, hwy))+geom_hex(stat="") ggplot(mpg, aes(displ, hwy))+geom_bar(stat="bin_2d") ggplot(mpg, aes(cty, hwy, z=displ))+ geom_point(alpha=0.2)+geom_jitter(stat="unique")+ stat_ellipse() ggplot(mpg, aes(cty))+geom_bar(stat="density", aes(fill=..count..)) # stat = count or bin ggplot(mpg, aes(y=cty, x=manufacturer))+geom_boxplot(aes(color=..lower..)) #ggsave()
And: Saving a ggplot The usual GUI / Rstudio way Simplest version is super easy. Save my last ggplot: ggsave(“filename.png”) ggsave(“filename.pdf”) etc. Can fool with size (width, height in inches, pixels, etc.)
ggplot2 Extensions Extending the language
Overview Extensions to ggplot2 add geometries, statistics, themes… all within the language you now know. Dedicated website for them: http://www.ggplot2-exts.org/ And many in development elsewhere. Let’s look at some greatest hits
plotly https://plot.ly/ Minimal use? ggplotly() Can also send plots to your account online, integrate with shiny (renderPlotly and outputPlotly), etc.
ggrepel Already have geom_text() and geom_label (label draws a text box, text is just text) Now have geom_text_repel() and geom_label_repel() to use a “force” to spread out data and labels.
Let’s Try
Answers library(plotly) ggplot(mpg, aes(y=cty, x=manufacturer))+geom_boxplot(aes(color=..middle..)) ggplotly() ggplot(mpg, aes(cty, hwy))+geom_label(aes(label=manufacturer)) library(ggrepel) ggplot(mpg, aes(cty, hwy))+geom_jitter()+geom_text(aes(label=manufacturer)) ggplot(mpg, aes(cty, hwy, color=manufacturer))+geom_jitter()+geom_text(aes(label=manufacturer), nudge_y=1) ggplot(mpg, aes(displ, cyl, color=drv))+geom_point(stat="unique")+geom_text_repel(aes(label=manufacturer), stat="unique",force=0.01)
ggforce https://github.com/thomasp85/ggforce https://cran.r- project.org/web/packages/ggforce/vignettes/Visual _Guide.html
ggstance coord_flip() alternative
ggmosaic https://github.com/haleyjeppson/ggmosaic/blob/ master/vignettes/ggmosaic.Rmd
Survminer (also GGAlly::ggsurv())
ggridges https://cran.r- project.org/web/packages/ggridges/vignettes/intro duction.html
GGally
So,so many themes https://github.com/cttobin/ggthemr ggthemes, etc.
ggsci https://ggsci.net/ : ggsci offers a collection of ggplot2 color palettes inspired by scientific journals, data visualization libraries, science fiction movies, and TV shows
Special mention: broom Broom… tidies data! Available Tidiers Currently broom provides tidying methods for many S3 objects from the built-in stats package, including lm glm htest anova nls kmeans manova TukeyHSD arima It also provides methods for S3 objects in popular third-party packages, including Lme4 glmnet boot gam survival Lfe zoo multcomp sp maps
Thinking big: other packages For example, treemapify https://github.com/wilkox/treemapify How to install dev versions of things…
Thinking big: Sankey diagrams How many ways to cut it: https://stackoverflow.com/questions/9968433/san key-diagrams-in-r Good example
Next week: dplyr! Highly recommend doing something this weekend about dplyr.