Graphical Descriptives in (Base) R

Slides:



Advertisements
Similar presentations
Introduction to R Graphics
Advertisements

Graphics in R data analysis and visualization Katia Oleinik Scientific Computing and Visualization Boston University
Study title written out in sentence case, which means that only the first letter of the first word is capitalized. This make the title more readable. (Never.
The visual display of quantitative data Joyce Chapman, Consultant for Communications & Data Analysis State Library of North Carolina,
Designing & Using Charts & Graphs Compiled by: Jim Lucas Modified by: Luke Reese.
Data Presentation A guide to good graphics Bureau of Justice Statistics Marianne W. Zawitz.
Data Visualization.
ID-2050 The “Design” Lecture. Today Document Design Information Design Tufte’s “Data Maps” BREAK Graphical Excellence in practice.
Designing & Using Charts & Graphs Compiled by: Jim Lucas Modified by: Luke Reese Franklin (42-48, 91-96, , , )
CS1100: Computer Science and Its Applications Creating Graphs and Charts in Excel.
Welcome to Data Analysis and Interpretation
Charts and Graphs V
Baburao Kamble (Ph.D) University of Nebraska-Lincoln
CMPT 880/890 Writing labs. Outline Presenting quantitative data in visual form Tables, charts, maps, graphs, and diagrams Information visualization.
R-Graphics Day 2 Stephen Opiyo. Basic Graphs One of the main reasons data analysts turn to R is for its strong graphic capabilities. R generates publication-ready.
Graphing Guidelines  Often the goal of an experiment is to find the relationship between two variables.  As one variable changes, so does the other.
Making data meaningful through effective visual presentation 5 th - 9 th December 2011, Rome Slides courtesy of: United Nations Economic Commission for.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Ergo User Tutorial - Part 3 NCSA, UIUC.
Graphing Tutorial William Hornick CS 101. Overview You will be given a brief description, example, and “how to create” for each of the following: You.
R-Graphics Stephen Opiyo. Basic Graphs One of the main reasons data analysts turn to R is for its strong graphic capabilities. R generates publication-ready.
Ggplot2 A cool way for creating plots in R Maria Novosolov.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Ergo User Tutorial - Part 3 NCSA, UIUC.
Plotting Complex Figures Using R
MIS 420: Data Visualization, Representation, and Presentation Content adapted from Chapter 2 and 3 of
DATA VISUALIZATION BOB MARSHALL, MD MPH MISM FAAFP FACULTY, DOD CLINICAL INFORMATICS FELLOWSHIP.
Study title written out in sentence case, which means that only the first letter of the first word is capitalized. This make the title more readable. (Never.
The Science of Data Visualization Presented by Nick Beaton.
(Never use all capitals.)
Elementary Statistics
Stats 202: Statistical Aspects of Data Mining Professor Rajan Patel
EMPA Statistical Analysis
AP CSP: Cleaning Data & Creating Summary Tables
Data Visualization The commonality between science and art is in trying to see profoundly - to develop strategies of seeing and showing Edward Tufte.
Map & Geographic Visualization
Charts & Graphs CTEC V
Using R Graphs in R.
Guilford County SciVis V105.01
Unit 4 Statistical Analysis Data Representations
WOCAT Mapping methodology
Tutorial 4: Enhancing a Workbook with Charts and Graphs
Getting your data into R
Creating Accessible PDFs from Word Docs
Summary Statistics in R Commander
QM222 A1 Visualizing data using Excel graphs
R Assignment #4: Making Plots with R (Due – by ) BIOL
Dplyr I EPID 799C Mon Sep
R Programming III: Real Things with Real Data!
A quick and dirty primer
Ggplot2 I EPID 799C Mon Sep
Basics of Accessibility in Adobe PDF
Sec. 1.1 HW Review Pg. 19 Titanic Data Exploration (Excel File)
Module 6: Presenting Data: Graphs and Charts
CSc4730/6730 Scientific Visualization
Theme 3 Describing Variables Graphically
Numerical Descriptives in R
GRAPHING AND INTERPRETING DATA
Lesson 1: Introduction to Trifacta Wrangler
Make Your Data Tell a Story
Week 4: Data management and cleaning
Recoding II: Numerical & Graphical Descriptives
Theme 3 Describing Variables Graphically
Designing & Using Charts & Graphs
Topic 7: Visualization Lesson 1 – Creating Charts in Excel
Overview of R and ggplot2 for graphics
Stat 251 (2009, Summer) Lab 2 TA: Yu, Chi Wai.
Association between 2 variables
Displaying data Seminar 2.
Association between 2 variables
R for Epi Workshop Module 2: Data Manipulation & Summary Statistics
GEO 481 Lab Geographical Information Systems Spring 2019
Presentation transcript:

Graphical Descriptives in (Base) R EPID 799C Wed Sep 12 2017

Today’s Overview Lecture & Practice: Back to births Homework 1: Graphics & Recoding Lecture: Primer on info-viz theory (groundwork for ggplot2 next week)

Graphics in Base R Using births

Base Graphics Why R for graphics? Fast, flexible, etc. Yes, you get super powers. Why (not) base R for graphics? Want to take advantage of human higher abstraction

Base Graphics Generally two flavors Functions that accept raw data (like vectors) as arguments Functions that accept more complex objects (like tables, models, shapefiles) built from data

Key Functions for Base Graphics Main functions plot() multitool hist() barplot() boxplot() Parameters col=, xlab=, ylab=, pch=, main= (point character.) Helpful data helpers jitter() density()

Let’s Try Create a scatterplot of wksgest and mage using plot. Please note: there are faster, more intuitive ways to do all of this right around the corner! Let’s Try Create a scatterplot of wksgest and mage using plot. D’oh! Overplotting! Use the jitter() function to help. Let’s try colors. Create an empty vector called my_colors of the same length as other variables using rep() and length() or nrow(). Using square brackets, assign “red” or “blue” to my_colors when cigdur is “Y” or ”N” respectively. Use plot() with col=my_colors argument to plot with colors.

Let’s Try: scatterplots, cont. Put a title on the graph using the “main=” argument to plot(). Add x and y labels using xlab and ylab arguments to plot(). Change the marker type using the pch= option (try “.”, or google for numeric options that translate to symbols. Let’s add another “layer” with the points(), lines() or abline(). Calculate the mean of each variable and place this point on the graph using points(). Place a green vertical and horizontal dashed line on the graph using abline and the col and lty parameters. Now save the plot by placing pdf(“plot.pdf”) before plotting functions and then dev.off() afterwards

Let’s Try : other plots Create a boxplot of mage using …boxplot()! Create a histogram of mdif using hist(). Change breaks=0:100 Create a table of mage and plot() and barplot() it. Create a table of cigdur vs. pnc5; plot() and barplot() again. Create a sample() of the dataset with 1000 random points and a few columns, then plot() it. Create a boxplot of mage by preterm_f or pnc5_f or cigdur_f using the ~ operator. Plot the density() of mage.

Answers #............................. # Graphical Exploration # Base R graphical Experiments... plot(births$mage, births$wksgest) plot(jitter(births$mage), jitter(births$wksgest), pch=".") cig_color = rep(NA, nrow(births)) cig_color[births$cigdur == "Y"] = "red" cig_color[births$cigdur == "N"] = "blue" plot(jitter(births$mage), jitter(births$wksgest), pch=".", col=cig_color) points(mean(births$mage, na.rm=T), mean(births$wksgest, na.rm=T)) abline(v=mean(births$mage, na.rm=T));abline(h=mean(births$wksgest, na.rm=T)) boxplot(births$mage) hist(births$mdif) hist(births$mdif, breaks = 0:100) table(births$cigdur, births$pnc5_f) cig_tbl = table(births$cigdur, births$pnc5_f) plot(cig_tbl) barplot(cig_tbl) births_sample = births[sample(nrow(births), 1000), c("mage", "mdif", "wksgest")] plot(births_sample) boxplot(births$mage ~ births$pnc5_f) #notch =T plot(density(births$mage, na.rm=T))

Resources Datacamp The web!

Homework 1 Graphics & Recoding

Graphics on HW1 HW 1 Questions #5 B & (optional) C #6 b.a. We don’t really have the tools yet to explore as much as we want to. More graphics in HW2.

Recoding race/ethnicity Subsetting Nested ifelse() The merge() function The factor() directly

Let’s Try : recoding race

Answers # Options for coding mrace race_sample = data.frame(mrace=sample(5, 20, replace=T)) #note the 5! race_helper = data.frame(mrace=1:4, race1=c("White", "Black", "American Indian or Alaska Native","Other")) # could read as csv race_coded = merge(race_sample, race_helper) #defaults to inner join! Will drop non-matches without param help. race_coded = merge(race_sample, race_helper, all.x=T, all.y=F) race_coded$race2 = NA race_coded$race2[race_coded$mrace == 1] = "White" race_coded$race2[race_coded$mrace == 2] = "Black" race_coded$race2[race_coded$mrace == 3] = "American Indian or Alaska Native" race_coded$race2[race_coded$mrace == 4] = "Other" race_coded$race3 = ifelse(race_coded$mrace==1, "White", ifelse(race_coded$mrace==2, "Black", ifelse(race_coded$mrace==3, "American Indian or Alaska Native", ifelse(race_coded$mrace==4, "Other", NA)))) race_coded$race_f = factor(race_coded$mrace, levels=1:4, labels=c("White", "Black", "American Indian or Alaska Native","Other")) race_coded str(race_coded) # Thinking ahead to raceeth variable… or any other options raceeth_helper = data.frame(race=c("White", rep("Black", 2), rep("American Indian or Alaska Native", 2)), methic=c("N", "Y", "N", "Y", "N"), race_eth = c("White nH", rep("Black", 2), rep("American Indian or Alaska Native", 2)))

Info-Viz Theory

Why Graphics The obvious: Powerfully conveys content Takes advantage of our powerful visual systems Broader audience than a table of numbers or a paragraph of findings The less obvious: Can be a way to explore / understand data… if fast and intuitive enough!

High Level

High Level Graphics serve a story …when there’s a narrative Graphical integrity don’t cheat, on purpose or unintentionally Minimize “data-ink” ratio Consider data “words,” small multiples, and sentences! Wouldn’t be a graphics lecture without a Tufte reference: Edward Tufte, (2001) The Visual Display of Quantitative Information.

Graphics serve a story http://www.pointerpointer.com/ Graphical Excellence Graphics serve a story http://www.pointerpointer.com/

Graphical Integrity Avoid: Distortion Chart-junk Dimensionality mixing (3d*) … See http://www.vox.com/2015/9/29/9417845/planned-parenthood-terrible-chart

Low Level Pre-attentive attributes… and a side-note on color Reduce processing demands chiefly through simplicity and gestalt principles Stephen Few, (2009) Now you see it: Simple visualization techniques for quantitative analysis. Stephen Few, (2012) Show me the numbers: Designing tables and graphs to enlighten.

(Some) Pre-attentive attributes of visual perception

And two theoretical side-notes on color… 1: Color Group Language Alpha (not greyscale, but “see-through-ness”) Brewer (is cool)! http://colorbrewer2.org/ Sequential Diverging Qualitative Grey (intensity)

Color is: Meaningful (A Priori) Meaning-loaded Culture specific Organization specific PMS 288 PMS 542 http://styleguide.duke.edu/identity/color-palette/ http://identity.unc.edu/colors/ Blue tones matter to many people. Yet: “If you prick us, do we not bleed?” (Merchant of Venice) RY Girls / Women Boys / Men Aposematism EMOTIONAL associations! Some semi-born out through research. Also: LINKS (and visited ones, etc.) Note how this PPT theme messes w/ this. Heteronormative & dominant culture reinforcing. Don’t do this. This is a classic example… but ALSO an over-simplification of culture as if it were homogenous and independent! For more, check out: http://lifehacker.com/learn-the-basics-of-color-theory-to-know-what-looks-goo-1608972072

Gestalt Principles of Visual Perception Simplicity Proximity Similarity Enclosure Closure Continuity Connection Figure & Ground http://graphicdesign.spokanefalls.edu/tutorials/process/gestaltprinciples/gestaltprinc.htm http://www.smashingmagazine.com/2014/03/design-principles-visual-perception-and-the-principles-of-gestalt/ PS I’m leaving some out!

Think with a Grammar of Graphics (R: ggplot2, and other things) Data!  shape (long/wide) & statistical transforms sometimes required. dplyr:: in two weeks! Aesthetic “mappings” e.g. x position in spacevar1, colorvar2, shapevar3 Geometries column, bar, boxplot… violin, map, slopegraph, etc. Scales Coordinate Systems Positional adjustments (tweaks) Facets (small multiples)

Next Week ggplot2!