Ggplot2 I EPID 799C Mon Sep 18 2017.

Slides:



Advertisements
Similar presentations
PRE-SCHOOL QUANT WORKSHOP II R THROUGH EXCEL. NEW YORK TIMES INFOGRAPHICS GALARY The Jobless Rate for People Like You Home Prices in Selected Cities For.
Advertisements

Plotting with ggplot2: Part 1
Data visualization and graphic design
Statistics for the Social Sciences
Statistics—Chapter 4 Analyzing Frequency Distributions Read pp , , 116, ,
Introduction to SPSS (For SPSS Version 16.0)
Mind Maps are an effective method of note taking and especially useful for our generation that forms ideas by associations. By using Mind Maps, you.
Baburao Kamble (Ph.D) University of Nebraska-Lincoln
Ggplot2 A cool way for creating plots in R Maria Novosolov.
Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques.
Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)
R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer : Proterra in Greenville SC
Frequency Distributions Chapter 2. Descriptive Statistics Distributions are part of descriptive statistics…we are learning how to describe some data by.
R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer : Proterra in Greenville SC
Statistical Programming Using the R Language Lecture 2 Basic Concepts II Darren J. Fitzpatrick, Ph.D April 2016.
Lecturer’s desk INTEGRATED LEARNING CENTER ILC 120 Screen Row A Row B Row C Row D Row E Row F Row G Row.
How Story Elements Interact: Questions to ask. Setting: 1)How does the setting affect what happens in the story? 2) How does the setting affect how the.
STATISTICS 200 Lecture #2Thursday, August 25, 2016 Distinguish between: - A statistic and a parameter - A categorical and a quantitative variable - A response.
Visual Displays of Data Chapter 3. Uses of Graphs Positive and negative uses – Can accurately and succinctly present information – Can reveal/conceal.
MATH 598: Statistics & Modeling for Teachers June 4, 2014
Introduction to R and RStudio
Tidy data, wrangling, and pipelines in R
Statistics in SPSS Lecture 3
Overview of R and ggplot2 for graphics
Ggplot2 Wu Shaohuan.
Graphing Lines Using Intercepts
ggplot2 Merrill Rudd TAs: Brooke Davis and Megsie Siple
Common Core State Standards
Getting your data into R
Next Generation R tidyr, dplyr, ggplot2
Summary Statistics in R Commander
R Assignment #4: Making Plots with R (Due – by ) BIOL
Data manipulation in R: dplyr
Data visualization in Python
Dplyr I EPID 799C Mon Sep
R Programming III: Real Things with Real Data!
LINDSEY BREWER CSSCR (CENTER FOR SOCIAL SCIENCE COMPUTATION AND RESEARCH) UNIVERSITY OF WASHINGTON September 17, 2009 Introduction to SPSS (Version 16)
Data Visualization using R
Four Column layout Cheat Sheet
Graphical Descriptives in (Base) R
ETL – Using R Kiran Math Developer : Flour in Greenville SC
ggplot2 II EPID 799C Wed Sep New Packages (install now!)
R Programming I EPID 799C Fall 2017.
Maths in Year 8 and 9.
Numerical Descriptives in R
R Programming II EPID 799C Fall 2017.
Recoding II: Numerical & Graphical Descriptives
R Programming For Sql Developers ETL USING R
Chapter 6 transformations
L07 Apply and purrr EPID 799C Fall 2018.
Tidy data, wrangling, and pipelines in R
Bell Ringer What does it mean to substitute?.
To get ready for class: 1. Get births ready as usual
Overview of R and ggplot2 for graphics
ADDITIVE VS. MULTIPLICATIVE RELATIONSHIPS
6.3 Finding the Roots The roots of a quadratic function are the places that a parabola crosses the x axis. Hint: Roots can also be called Zeros or x intercepts.
Lecture 7 – Delivering Results with R
Generalized Linear Models (GLM) in R: Part 2
R course 6th lecture.
Straight Line Graphs Lesson Objectives:
Quadratic Graphs.
Geometry 3.6 – 3.7 Brett Solberg AHS ‘11-’12.
Maths in Year 8 and 9.
2-2 Logic Part 2 Truth Tables.
R for Epi Workshop Module 2: Data Manipulation & Summary Statistics
R for Epi Workshop Module 3: Data visualization with ggplot
Plotting Points.
DATA VISUALISATION (QUANTITATIVE).
Finding the Missing Coordinate
Data visualization and graphic design
Presentation transcript:

ggplot2 I EPID 799C Mon Sep 18 2017

Today’s Overview The ggplot package: Basic Syntax Homework 1: due Wed 9/20 notes: <20 weeks considered not viable, exclude by “recode as missing”, we mean NA. Homework 2: Out this week, starting today

Ggplot2 Basic Syntax

ggplot components data aesthetic mapping geometric object statistical transformations scales coordinate system position adjustments faceting

ggplot components

ggplot components Or minimally, ggplot(data=<DATA>)+ <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>)) e.g., using mpg dataset ggplot(mpg)+ geom_point(aes(displ, hwy, color=class))

hist(mpg$cyl) ggplot(mpg)+geom_histogram(aes(x=cyl)) ggplot(mpg, aes(cyl))+geom_histogram() qplot(mpg$cyl, geom="histogram")

Anatomy of a ggplot

data Ggplot likes “long”, well structured data.frames ggplot “stats” can make quick transformations dplyr will help with complicated transformations tidyr will help go from wide to long

geoms …and many more in sister packages

Let’s Try: Create a scatterplot of wksgest and mage using ggplot and geom_point. D’oh! Overplotting! Use the geom_jitter() geometry instead. Let’s try colors. Map cigdur to color. That’s it! Fancier: change color to preterm_f, change the point character to a “.”, and set alpha = 0.1

Answers ggplot(births)+geom_point(aes(wksgest, mage)) ggplot(births)+geom_jitter(aes(wksgest, mage)) ggplot(births)+geom_jitter(aes(wksgest, mage, color=cigdur)) ggplot(births)+geom_jitter(aes(wksgest, mage, color=preterm_f), pch=".", alpha=0.2)

Cheatsheet: https://www. rstudio Cheatsheet: https://www.rstudio.com/wp- content/uploads/2015/03/ggplot2-cheatsheet.pdf ... Or right in RStudio!

Let’s Try: Exploring geometries Pick a few variables of interest from the raw variables of interest… mdif wksgest mage mrace methnic cigdur cores …or our recoded versions! pnc5_f preterm_f raceeth Pick a relevant geometry from the cheat sheet Run the example geometry using the supplied ggplot2 dataset (e.g. diamonds, economics, mpg, iris) Create a comparable graph using the births dataset by mapping your variables onto one or more aesthetic elements: e.g. X, y, color, fill, shape, group, linetype, size, weight, alpha

Answers Various!

Facets Facets take an R formula object (e.g . ~ x, y ~ x) and split your graph into small multiples based on that. Can also “free the scales” so they aren’t shared across plots. ggplot(births, aes(mage, wksgest))+ geom_jitter(aes(color=preterm_f), pch=".", alpha=0.2)+ geom_smooth()+ facet_grid( ~ raceeth_f)

Themes Change the theme with theme_NAME(), e.g. theme_minimal(). Can define your own themes, or tweak existing ones. See https://cran.r- project.org/web/packages/ggthemes/vignettes/ggthemes.html for more themes.

Back to data Sometimes we need to transform our data to get at the question we have in mind. e.g. What is the relationship of preterm birth and maternal age?

Let’s Try: ggplots from new data Create a plot of the relationship of preterm birth and maternal age using: ggplot(births)+ geom_histogram(aes(x=mage, fill=preterm_f), position="fill") OK, but ew. Let’s create a new dataset to plot with. Create a plot of the relationship of preterm birth and maternal age. Hint: dplyr will make this easy. But for now, use the table function and data.frame functions to create a data.frame with a column for each maternal age, the number of births, and the percent preterm. Challenge: Sew a dplyr statement to a ggplot statement all in one go. Hint %>% … %>% ….+ …+ … +

Answers t1 = table(births$mage, births$preterm) mage_df = data.frame(mage = as.numeric(row.names(t1)), n_preterm = t1[,2], total = t1[,1] + t1[,2]) mage_df$pct_preterm = mage_df$n_preterm / mage_df$total * 100 ggplot(mage_df, aes(mage, pct_preterm))+geom_point() ggplot(mage_df, aes(mage, pct_preterm))+geom_point(aes(size=total))+geom_smooth(color="blue") ggplot(mage_df, aes(mage, pct_preterm))+ geom_point(aes(size=total))+ geom_smooth(aes(weight=total), color="blue")+ labs(title="% Preterm vs. Maternal Age", x="maternal age", y="% preterm", subtitle="Seems to be a quadratic relationship...")

Homework 2 Plots coming up

Plots in HW 2

Plots in HW 2

Plots in HW 2

Plots in future HW

Plots in future HW

Worth a skim: http://r-statistics Worth a skim: http://r-statistics.co/Top50-Ggplot2-Visualizations- MasterList-R-Code.html http://chartmaker.visualisingdata.com/

Coming up next in ggplot Colors, themes, ggplot extension packages