DATA VISUALISATION (QUANTITATIVE).

Slides:



Advertisements
Similar presentations
Statistics 100 Lecture Set 7. Chapters 13 and 14 in this lecture set Please read these, you are responsible for all material Will be doing chapters
Advertisements

Stat 350 Lab Session GSI: Yizao Wang Section 016 Mon 2pm30-4pm MH 444-D Section 043 Wed 2pm30-4pm MH 444-B.
Data Description Tables and Graphs Data Reduction.
CHAPTER 1: Picturing Distributions with Graphs
INFORMATION TECHNOLOGY IN BUSINESS AND SOCIETY SESSION 19 – GETTING DATA AND VISUALIZING IT SEAN J. TAYLOR.
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
Baburao Kamble (Ph.D) University of Nebraska-Lincoln
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
STATISTICS!!! The science of data. What is data? Information, in the form of facts or figures obtained from experiments or surveys, used as a basis for.
+ The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1: Exploring Data Introduction Data Analysis: Making Sense of Data.
 Quantitative Numbers  Qualitative Descriptions.
Chapter 1 Review MDM 4U Mr. Lieff. 1.1 Displaying Data Visually Types of data Quantitative Discrete – only whole numbers are possible Continuous – decimals/fractions.
A Brief Introduction to R Programming Darren J. Fitzpatrick, PhD The Bioinformatics Support Team 27/08/2015.
AP STATISTIC LESSON 1-1 EXPLORING DATA DISPLAYING DISTRIBUTION WITH GRAPHS.
1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.
CADA Final Review Assessment –Continuous assessment (10%) –Mini-project (20%) –Mid-test (20%) –Final Examination (50%) 40% from Part 1 & 2 60% from Part.
Presenting and Summarizing Data The Goal Is To Put Together A Coherent and Meaningful Story.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Describing Data: Graphical Methods ● So far we have been concerned with moving from asking a research question to collecting good quality empirical data.
Ggplot2 A cool way for creating plots in R Maria Novosolov.
Writing Prompts are FUN!!! 8/28/06 In complete sentences write down something you would measure in kilometers and something you would measure in millimeters.
Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)
Bell Ringer You will need a new bell ringer sheet – write your answers in the Monday box. 3. Airport administrators take a sample of airline baggage and.
Linear Best Fit Models Learn to identify patterns in scatter plots, and informally fit and use a linear model to solve problems and make predictions as.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 1 Exploring Data Introduction Data Analysis:
Frequency Distributions Chapter 2. Descriptive Statistics Distributions are part of descriptive statistics…we are learning how to describe some data by.
IENG-385 Statistical Methods for Engineers SPSS (Statistical package for social science) LAB # 1 (An Introduction to SPSS)
R PROGRAMMING FOR SQL DEVELOPERS Kiran Math Developer : Proterra in Greenville SC
Stem-and-Leaf Plots …are a quick way to arrange a set of data and view its shape or distribution A key in the top corner shows how the numbers are split.
Unit 1 Review. 1.1: representing data Types of data: 1. Quantitative – can be represented by a number Discrete Data Data where a fraction/decimal is not.
Visual Displays of Data Chapter 3. Uses of Graphs Positive and negative uses – Can accurately and succinctly present information – Can reveal/conceal.
Chapter 1: Exploring Data
Overview of R and ggplot2 for graphics
Ggplot2 Wu Shaohuan.
Welcome to AP Stats!.
Add More Zing to your Dashboards – Creating Zing Plot Gadgets
Using R Graphs in R.
AP Statistics Variables.
Next Generation R tidyr, dplyr, ggplot2
Chapter 1: Exploring Data
Summary Statistics in R Commander
CHAPTER 1: Picturing Distributions with Graphs
Ggplot2 I EPID 799C Mon Sep
Data Visualization using R
Advantages and disadvantages of types of graphs
CHAPTER 1: Picturing Distributions with Graphs
Chapter 1 Data Analysis Ch.1 Introduction
Chapter 1: Exploring Data
Let’s say you are planning to study Kim Possible’s pet
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Welcome!.
Overview of R and ggplot2 for graphics
CHAPTER 1: Picturing Distributions with Graphs
Welcome to AP Statistics
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Different kinds of graphs
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
R course 6th lecture.
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Presentation transcript:

DATA VISUALISATION (QUANTITATIVE)

Visualising Data More important than statistical tests!(?) Telling a story in a graph R has some great packages for visualising your data, especially ggplot

ggplot2 Part of (and complementary to) the tidyverse collection Beautiful and informative code, for beautiful and informative graphs Expects tidy data; one observation per row (See tidyverse introduction from this morning)

Plan for the Session Introduce ggplot2 Explain how it works, and show some example output Practice on some data we’re interested in

ggplot2 Syntax Load in packages library(tidyverse) Data iris %>% ggplot(aes(x=Sepal.Length, y=Petal.Length)) + geom_point() Data Pipe data into the plot Build ggplot object Add feature to ggplot object Plot feature

ggplot2 Syntax - Pipes Remember, this is equivalent to… library(tidyverse) iris %>% ggplot(aes(x=Sepal.Length, y=Petal.Length)) + geom_point() Remember, this is equivalent to…

ggplot2 Syntax - Pipes library(tidyverse) ggplot(iris, aes(x=Sepal.Length, y=Petal.Length)) + geom_point()

ggplot2 syntax iris %>% ggplot(aes(x = Sepal.Length, y = Petal.Length)) + geom_point()

geom_x Multiple possible “geoms” Which geom you want will depend on what your variables are like: How many? Are they discrete (categorical) or continuous (numerical)? Most useful geoms can be found on the ggplot cheat sheat A full list of geoms is available on the ggplot reference page Google to find the geom you want, e.g.: “ggplot scatter graph” will return results showing that you want geom_point()

ggplot2 syntax iris %>% ggplot(aes(x = Sepal.Length, y = Petal.Length)) + geom_point()

ggplot2 syntax iris %>% ggplot(aes(x = Sepal.Length, y = Petal.Length)) + geom_smooth()

ggplot2 syntax iris %>% ggplot(aes(x = Sepal.Length, y = Petal.Length)) + geom_point() + geom_smooth()

ggplot2 syntax iris %>% ggplot(aes(x = Sepal.Length, y = Petal.Length)) + geom_point() + geom_smooth(method = “lm”)

Aesthetics: aes() Tells ggplot what variables to plot, and which visual features should represent them Possible aesthetics include: x y colour / color fill shape

ggplot2 syntax iris %>% ggplot(aes(x = Sepal.Length, y = Petal.Length)) + geom_point()

ggplot2 syntax iris %>% ggplot(aes(x = Sepal.Length, y = Petal.Length, colour = Species)) + geom_point()

ggplot2 syntax iris %>% ggplot(aes(x = Sepal.Length, y = Petal.Length, colour = Species)) + geom_point() + geom_smooth(method = “lm”)

Visualising Distributions - Histograms iris %>% ggplot(aes(x = Sepal.Width)) + geom_histogram(binwidth = 0.1)

Visualising Distributions – Density Plots iris %>% ggplot(aes(x = Sepal.Length)) + geom_density()

Visualising Distributions – Density Plots iris %>% ggplot(aes(x = Sepal.Length, fill = Species)) + geom_density()

Visualising Distributions – Density Plots iris %>% ggplot(aes(x = Sepal.Length, fill = Species)) + geom_density(alpha = 0.5)

Ban the Bar Graph! Bar graphs can show us summaries about our data, but don’t tell us much about the underlying distributions.

One Alternative – Violinbox Plots iris %>% ggplot(aes(x = Species, y = Sepal.Length, fill = Species)) + geom_violin(alpha = 0.5) + geom_boxplot(width = 0.2)

Time to Practice!

Example data Available in the .zip folder: haggis.csv Heights for two groups, based on lifetime breakfast habits: Porridge eaters Haggis eaters 64 Participants in each group (128 participants in total) Who will be taller?

SPSS and Microsoft Excel – Bar Graphs (Yuck!) *Error bars show SEM

So Haggis Eaters are Taller than Porridge Eaters, right? Well, let’s load the data and have a gander! TASK 1: Create a new (or use an existing) .Rmd file to make notes in Load in the tidyverse packages Load in the dataset hint: read_csv() Have a go, and then we’ll go through it together

So Haggis Eaters are Taller than Porridge Eaters, right? Well, let’s load the data and have a gander! TASK 1: Create a new (or use an existing) .Rmd file to make notes in Load in the tidyverse packages Load in the dataset hint: read_csv() Have a go, and then we’ll go through it together

Task 1 – Solution haggis <- read_csv("haggis.csv")

Task 2 – Visualise Height by Breakfast Group Create a violinbox plot, showing how height differs between haggis and porridge eaters. Have a go, and then we’ll go through it together

Task 2 – Solution haggis %>% ggplot(aes(x = group, y = height, fill = group)) + geom_violin(alpha = 0.5) + geom_boxplot(width = 0.1)

Task 3 – Check the Distributions Create two density plots to see: How the distribution of height differs between pop fans and classical fans How the distribution of age differs between pop fans and classical fans Have a go, and then we’ll go through it together

Task 3 – Solution a) haggis %>% ggplot(aes(x = height, fill = music_taste)) + geom_density(alpha = 0.5)

Task 3 – Solution b) haggis %>% ggplot(aes(x = age, fill = music_taste)) + geom_density(alpha = 0.5)

Task 4 – Does Age predict Height? Draw a scatter plot to see how age predicts height. Add a line showing the linear relationship between age and height. Have a go, and then we’ll go through it together

Task 4 – Solution haggis %>% ggplot(aes(x = age, y = height)) + geom_point() + geom_smooth(method = "lm")

Task 5 – Does Age interact with Breakfast Habits? Recreate the previous graph, but colour the points and line by participants' breakfast habits. Have a go, and then we’ll go through it together

Task 5 – Solution haggis %>% ggplot(aes(x = age, y = height, colour = group)) + geom_point() + geom_smooth(method = "lm")

Task 6 – Does Music Taste interact with Breakfast Preference? Create a violinbox plot as you did in Practice Question 2, but split by music taste *as well as* breakfast group. Have a go, and then we’ll go through it together

Task 6 – Solution haggis %>% ggplot(aes(x = group, y = height, fill = music_taste)) + geom_violin(alpha = 0.5) + geom_boxplot(width = 0.2, position = position_dodge(width = 0.9))

Conclusion + = ?

Conclusion Well… no. We don’t know anything about the possible causal relationships, and only looked at our data in an exploratory way Also, the data was kind of made up.

Real Conclusion Data Visualisation in R is fun, easy, and informative! If we’re looking at differences between groups, it’s important to not hide the distributions behind bar graphs and summary statistics. Googling for R solutions is a skill in itself.