ggplot2 Merrill Rudd TAs: Brooke Davis and Megsie Siple

Slides:



Advertisements
Similar presentations
1 SESSION 5 Graphs for data analysis. 2 Objectives To be able to use STATA to produce exploratory and presentation graphs In particular Bar Charts Histograms.
Advertisements

Framework Manager and Transformer Tips
Internet Basics & Way Beyond!
Rich Pugh Andy Nicholls Head to Head: Lattice vs ggplot2 Rich Pugh
Minitab® 15 Tips and Tricks
Plotting with ggplot2: Part 1
GIS Level 2 MIT GIS Services
High Quality Maps With R and ggplot
Visualizing Multiple Physician Office Locations Exercise 9 GIS in Planning and Public Health Wansoo Im, Ph.D.
XP Creating Web Pages with HTML Using Tables. XP Objectives Create a text table Create a table using the,, and tags Create table headers and captions.
Intro. To GIS Lecture 6 Spatial Analysis April 8th, 2013
Presenter notes: This Microsoft Excel presentation is a prepackaged solution for basic Excel training. You may use the presentation as-is or customize.
Baburao Kamble (Ph.D) University of Nebraska-Lincoln
Class Instructor Name Date. Classroom Tips Class Roster – Please Sign In Class Roster – Please Sign In Internet Usage Internet Usage –Breaks and Lunch.
Range, Variance, and Standard Deviation in SPSS. Get the Frequency first! Step 1. Frequency Distribution  After reviewing the data  Start with the “Analyze”
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Lecture 3 + Seminar 3 A workshop on graphing using ggplot2.
Key Data Management Tasks in Stata
Chapter 2 Adapted from Silberschatz, et al. CHECK SLIDE 16.
An Internet of Things: People, Processes, and Products in the Spotfire Cloud Library Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist.
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
1 (21) EZinfo Introduction. 2 (21) EZinfo  A Software that makes data analysis easy  Reveals patterns, trends, groups, outliers and complex relationships.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
Ggplot2 A cool way for creating plots in R Maria Novosolov.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
Computer Graphics, KKU. Lecture 41 The Computer Programming Laws Any given program, when running, is obsolete. Any given program costs more and.
Outline Research Question: What determines height? Data Input Look at One Variable Compare Two Variables Children’s Height and Parents Height Children’s.
Extracting Information from an Excel List The purpose of creating a database, or list in Excel, is to be able to manipulate the data elements in ways that.
BMTRY 789 Lecture9: Proc Tabulate Readings – Chapter 11 & Selected SUGI Reading Lab Problems , 11.2 Homework Due Next Week– HW6.
Intro. To GIS Pre-Lab Spatial Analysis April 1 st, 2013.
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer : Proterra in Greenville SC
MAKING BUSINESS INTELLIGENT Brought to you by your local PASS Community! Self Service ETL with Power Query Welcome.
R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer : Proterra in Greenville SC
Working with data in R 2 Fish 552: Lecture 3. Recommended Reading An Introduction to R (R Development Core Team) –
Review > system.time(unique(temp)) > merge(station1, station2, by.x="time1", by.y="time2") > match(1:10, c(1,3,5,9)) > as.Date('9/22/1983', format = '%m/%d/%Y')
Review DATA VISUALIZATION WITH TABLEAU ONLINE TUTORIAL Training Guide Fundamentals.
Jefferson Davis Research Analytics
Tidy data, wrangling, and pipelines in R
Data Visualization The commonality between science and art is in trying to see profoundly - to develop strategies of seeing and showing Edward Tufte.
Overview of R and ggplot2 for graphics
Manipulating MATLAB Matrices Chapter 4
Digital Text and Data Processing
Getting your data into R
Next Generation R tidyr, dplyr, ggplot2
Introduction to R Studio
Summary Statistics in R Commander
Data visualization in Python
Ggplot2 I EPID 799C Mon Sep
Preliminaries: -- vector, raster, shapefiles, feature classes.
Lesson 6: Working with Layout and Graphics
SDMX: Enabling World Bank to automate data ingestion
Lesson 4: Advanced Transforms
Python I/O.
Power Query Discovery and connectivity to a wide range of data sources
Lesson 6: Working with Layout and Graphics
R Programming For Sql Developers ETL USING R
Lesson 6: Working with Layout and Graphics
Tidy data, wrangling, and pipelines in R
Lesson 6: Working with Layout and Graphics
CSCI N317 Computation for Scientific Applications Unit R
Lab 2 and Merging Data (with SQL)
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Overview of R and ggplot2 for graphics
Key Concepts R for Data Science.
ESRM 250/CFR 520 Autumn 2009 Phil Hurvitz
Bespoke Visual Layouts with Charticulator
The Grammar of Graphics
Data visualization and graphic design
Presentation transcript:

ggplot2 Merrill Rudd TAs: Brooke Davis and Megsie Siple FISH 512: Super-Advanced R

Goals for the lecture If you have used ggplot2 before: learn some new tricks obtain a useful handout Share your tricks with others If you have not used ggplot2 before: Learn the basics Start using for easy data exploration

Figure Credit: Sean Anderson

Sources http://docs.ggplot2.org/current/ Elegant Graphics for Data Analysis: http://ggplot2.org/book/ Sean Anderson – notes for ggplot2, FSH 554 Cookbook for R: http://www.cookbook-r.com/Graphs/ http://sharpstatistics.co.uk/r/ggplot2-guide/ https://github.com/hadley/ggplot2/wiki http://vita.had.co.nz/papers/tidy-data.pdf

Fundamentals to ggplot2 Data massaging Layering Themes

Data massaging AGRRA database

Data format requirements “long” format data, “tidy data” Each aesthetic or facet variable in its own column Useful packages: reshape2, plyr, dplyr (Lecture 6)

Merging multiple data sets with shared columns base R merge() Joins 2 data frames by matching ID variables plyr Join functions Merge data frames by ID variables, more flexible than merge() join_all() – list of data frames Can specify rows to use (1st data frame only, rows from all data frames, etc.) reshape2 melt() Reshapes data into long form Option to specify ID and measurement variables Cast functions Reshapes data into wide form dcast() or acast() depending on if you want data frame or array output

Atlantic and Gulf Rapid Reef Assessment (AGRRA)

Variables in Data 3 data files Simplified variables for this example Average and standard deviation for most variables Shared ID Variables Coral Mortality Algae Abundance Fish Biomass Country Number of Corals Crustose Total Fish Biomass Year Total Standing Dead Turf Herbivore Biomass Site Macro Invertivore Biomass Date Piscivore Biomass Depth Zone Number of Transects

DataMassageAGRRA.R

DataMassageAGRRA.R Wide form to long form

DataMassageAGRRA.R Wide form to long form Specify variables that should have separate value columns

DataMassageAGRRA.R Wide form to long form Specify variables that should have separate value columns Get 2 separate value columns for each factor

DataMassageAGRRA.R Wide form to long form Specify variables that should have separate value columns Get 2 separate value columns for each factor + a couple other boring steps in code…. And repeat for algae and fish biomass data

DataMassageAGRRA.R Join multiple data frames from a list

DataMassageAGRRA.R Join multiple data frames from a list Include rows from all data frames, will result in NAs for ID variable combinations that don’t have data for some measurement variables

Take a look at the data structure “SelectAGRRA_long Take a look at the data structure “SelectAGRRA_long.csv” read into “Explore_ggplot2.R” “DataMassageAGRRA.R” and original csv files for reference

Layering ggplot2 functions

Difference between plot and ggplot2 functions qplot = ggplot wrapper: less syntax for common tasks ggplot will work in all cases

ggplot function Layered grammar data + geometric representation + aesthetics + layout See file: ggplot2_explore.R Hadley Wickham – Elegant Graphics for Data Analysis

Data and aesthetic mapping Data frame Columns in data frame

Data and aesthetic mapping Data frame Columns in data frame Can add data aesthetics to initial aes function, or add as another layer later to base plot

Slide courtesy of Sean Anderson Geoms Slide courtesy of Sean Anderson

Position adjustment

Scales Controls the mapping from data to aesthetic attributes scale_xxx_yyy()

Faceting Investigating whether patterns hold across all conditions Discrete variables in data frame

Subsetting data So far – all examples used data that was available for every ID variable Country, Depth, Zone, Number of Transects (all data points) Number of Corals, Total Fish Biomass (not all data points, but for all of the above categorical variables) Need to subset data to adjust which data to plot when you’re only interested in 1 factor from a list of possible factors

Specifying data in geom OR This example – just used a different subset from the same dataset Useful if working with multiple datasets or would like to refer to any other object

Specifying data in geom OR This example – just used a different subset from the same dataset Useful if working with multiple datasets or would like to refer to any other object

Adding lines and rectangles to plots See also: geom_abline() geom_vline() geom_hline() geom_rect()

Statistics stat_smooth < 1000 points – default “loess” > 1000 points – default “gam” This would take a lot more time to create in base R!

Publication quality figures Adjusting the theme

See example code for creating custom theme in“ggplot2_explore.R” Courtesy of: http://sharpstatistics.co.uk/r/ggplot2-guide/

ggthemes package

Exercises in code ending with… Recreating this plot!

Coordinate System Maps position of objects onto plane of the plot (x,y) coordinates – potential for more dimensions but not yet capable Cartesian Semi-log Polar