Data Visualization with R (II)

Slides:



Advertisements
Similar presentations
Introduction to Lattice Graphics Richard Pugh 4th December 2012.
Advertisements

Rich Pugh Andy Nicholls Head to Head: Lattice vs ggplot2 Rich Pugh
PRE-SCHOOL QUANT WORKSHOP II R THROUGH EXCEL. NEW YORK TIMES INFOGRAPHICS GALARY The Jobless Rate for People Like You Home Prices in Selected Cities For.
Chapter 3 – Data Visualization © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
ggplot2 - spatial plotting - spatial plotting Norsk statistikermøte, Halden, 11. juni 2013 André Teigland Forskningssjef SAMBA Elisabeth.
 Consumer Research Organization.  Commissions surveys and publishes reports & ratings for automobiles.  Maintains online discussion forums where consumers.
® Microsoft Office 2010 Excel Tutorial 4: Enhancing a Workbook with Charts and Graphs.
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Data Visualization with R (I) Dr. Jieh-Shan George YEH
Plotting with ggplot2: Part 1
DATA VISUALIZATION UNIVARIATE (no review- self study) STEM & LEAF BOXPLOT BIVARIATE SCATTERPLOT (review correlation) Overlays; jittering Regression line.
Classifier Decision Tree A decision tree classifies data by predicting the label for each record. The first element of the tree is the root node, representing.
Types of Graph And when to use them!.
Chapter 2 Graphs, Charts, and Tables – Describing Your Data
Stata Review: Part II Biost/Epi 536 Discussion Section October 13, 2009.
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Chapter 7 Scatterplots, Association, Correlation Scatterplots and correlation Fitting a straight line to bivariate data © 2006 W. H. Freeman.
Baburao Kamble (Ph.D) University of Nebraska-Lincoln Data Analysis Using R Week5: Charts/Plots in R.
Chapter 7 Scatterplots and Correlation Scatterplots: graphical display of bivariate data Correlation: a numerical summary of bivariate data.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
What factors are most responsible for height?
Research Terminology for The Social Sciences.  Data is a collection of observations  Observations have associated attributes  These attributes are.
Baburao Kamble (Ph.D) University of Nebraska-Lincoln
Screens appear here to display the: Data Tables Plots Bibliographic Info Add New Data Form, Molecular Structure-Drawing Form ‘Tree’ for navigation between.
Association between 2 variables We've described the distribution of 1 variable in Chapter 1 - but what if 2 variables are measured on the same individual?
R-Graphics Day 2 Stephen Opiyo. Basic Graphs One of the main reasons data analysts turn to R is for its strong graphic capabilities. R generates publication-ready.
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
Graphing Parameters Titles X-Axis Title Y-Axis Title Legend Scales Color Gridlines library(help="graphics") Basic Chart Types The R Graphics Package LineHistogram.
An Introduction to R graphics Cody Chiuzan Division of Biostatistics and Epidemiology Computing for Research I, 2012.
Jessica M. Orth Department of Statistics and Actuarial Science University of Iowa Dynamic Graphics: An Interactive Analysis Of What Attaches People To.
The Statistical Analysis of Data. Outline I. Types of Data A. Qualitative B. Quantitative C. Independent vs Dependent variables II. Descriptive Statistics.
M21- Scatterplots 1  Department of ISM, University of Alabama, Lesson Objectives  Learn to visually assess the relationship between two quantitative.
R-Graphics Stephen Opiyo. Basic Graphs One of the main reasons data analysts turn to R is for its strong graphic capabilities. R generates publication-ready.
Scatterplots and Correlations
Ggplot2 A cool way for creating plots in R Maria Novosolov.
Plotting Complex Figures Using R
Sampling ‘Scientific sampling’ is random sampling Simple random samples Systematic random samples Stratified random samples Random cluster samples What?
Introduction to Graphics in R 3/12/2014. First, let’s get some data Load the Duncan dataset It’s in the car package. Remember how to get it? – library(car)
APPLIED DATA ANALYSIS IN CRIMINAL JUSTICE CJ 525 MONMOUTH UNIVERSITY Juan P. Rodriguez.
Multivariate plots. Glyphs The plot, shown in Figure 1, displays the relationship between WEIGHT and PRICE of automobiles in the foreground variables.
Math Reflections Looking at Data Organizing and Interpreting Data How are a table, a line plot and a bar graph alike?
Visual Correlation Analysis of Numerical and Categorical Data on the Correlation Map Zhiyuan Zhang, Kevin T. McDonnell, Erez Zadok, Klaus Mueller.
Outline Research Question: What determines height? Data Input Look at One Variable Compare Two Variables Children’s Height and Parents Height Children’s.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Charts Overview PowerPoint Prepared by Alfred P.
Chapter 9 Scatter Plots and Data Analysis LESSON 1 SCATTER PLOTS AND ASSOCIATION.
Introduction to plotting data Fish 552: Lecture 4.
02 SAJIAN SEBUAH PEUBAH DISKRET Metode Grafik untuk Analisis dan Penyajian Data.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Statistical Programming Using the R Language Lecture 2 Basic Concepts II Darren J. Fitzpatrick, Ph.D April 2016.
Section 3.4: Displaying Bivariate Numerical Data
Using R Graphs in R.
ggplot2 Merrill Rudd TAs: Brooke Davis and Megsie Siple
Statistical Programming Using the R Language
Tutorial 4: Enhancing a Workbook with Charts and Graphs
SIMPLE LINEAR REGRESSION MODEL
Summary Statistics in R Commander
Data visualization in Python
Data Analysis Module: Basic Visualizations
EXPLORATORY DATA ANALYSIS – PART II
Recoding II: Numerical & Graphical Descriptives
INTRODUCTION TO SGPLOT Zahir Raihan OVERVIEW  ODS Graphics  SGPLOT overview  Plot Content  High value plot statements  High value plot options 
R Programming For Sql Developers ETL USING R
Graphs with SPSS.
Lecture 7 – Delivering Results with R
Ten things about Descriptive Statistics
Association between 2 variables
Let’s continue to review some of the statistics you’ve learned in your first class: Bivariate analyses (two variables measured at a time on each observation)
Premium Design A wide range of subjects PowerPoint Presentation
Simple plots using R Instructor: Li, Han
Ungraded quiz Unit 5.
Presentation transcript:

Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

Outlines Data Visualization with R Visualizing Different Type of Data Univariate Univariate Categorical Bivariate Categorical Bivariate Continuous vs Categorical Bivariate Continuous vs Continuous Bivariate: Continuous vs Time

Data Visualization with R Both anecdotally, and per Google Trends, R is the language and tool most closely associated with creating data visualizations. http://www.google.com/trends/explore?hl=en-US#q=R%20language,%20Data%20Visualization,%20D3.js,%20Processing.js&cmpt=q

Google Trend on R & Data Visualization

Google Trend on R & Data Visualization

Graph For data mining

Hierarchical Clustering hc<-hclust(dist(mtcars)) plot(hc) rect.hclust(hc, k=4)

Decision Tree require(rpart) require(rpart.plot) rp1<-rpart(factor(cyl)~mpg, data=mtcars) prp(rp1)

OTHERS

Financial Timeseries Quantitative Financial Modeling Framework require(quantmod) getSymbols("YHOO",src="google") # from google finance getSymbols("YHOO", from="2014-01-01") chartSeries(YHOO)

barChart(YHOO) candleChart(YHOO,multi.col=TRUE,theme="white") chartSeries(to.weekly(YHOO),up.col='white',dn.col='blue')

ggplot2

ggplot2 The ggplot2 package, created by Hadley Wickham, offers a powerful graphics language for creating elegant and complex plots. Originally based on Leland Wilkinson's The Grammar of Graphics, ggplot2 allows you to create graphs that represent both univariate and multivariate numerical and categorical data in a straightforward manner. Grouping can be represented by color, symbol, size, and transparency. The creation of trellis plots (i.e., conditioning) is relatively simple.  qplot() (for quick plot) hides much of this complexity when creating standard graphs.

qplot() The qplot() function can be used to create the most common graph types. While it does not expose ggplot's full power, it can create a very wide range of useful plots. The format is: qplot(x, y, data=, color=, shape=, size=, alpha=, geom=, method=, formula=, facets=, xlim=, ylim= xlab=, ylab=, main=, sub=) Notes: At present, ggplot2 cannot be used to create 3D graphs or mosaic plots. Use I(value) to indicate a specific value. For example size=z makes the size of the plotted points or lines proportional to the values of a variable z. In contrast, size=I(3) sets each point or line to three times the default size.

Customizing ggplot2 Graphs Unlike base R graphs, the ggplot2 graphs are not effected by many of the options set in the par( ) function. They can be modified using the theme() function, and by adding graphic parameters within the qplot() function. For greater control, use ggplot() and other functions provided by the package. ggplot2 functions can be chained with "+" signs to generate the final plot.

Example # ggplot2 examples library(ggplot2)  # create factors with value labels  mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5),    labels=c("3gears","4gears","5gears"))  mtcars$am <- factor(mtcars$am,levels=c(0,1),    labels=c("Automatic","Manual"))  mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8),    labels=c("4cyl","6cyl","8cyl")) 

# Kernel density plots for mpg # grouped by number of gears (indicated by color) qplot(mpg, data=mtcars, geom="density", fill=gear, alpha=I(.5),     main="Distribution of Gas Milage", xlab="Miles Per Gallon",     ylab="Density")

# Scatterplot of mpg vs. hp for each combination of gears and cylinders # in each facet, transmission type is represented by shape and color qplot(hp, mpg, data=mtcars, shape=am, color=am, facets=gear~cyl, size=I(3), xlab="Horsepower", ylab="Miles per Gallon")

# Separate regressions of mpg on weight for each number of cylinders qplot(wt, mpg, data=mtcars, geom=c("point", "smooth"), method="lm", formula=y~x, color=cyl, xlab="Weight", ylab="Miles per Gallon“, main="Regression of MPG on Weight", )

# Boxplots of mpg by number of gears # observations (points) are overlayed and jittered qplot(gear, mpg, data=mtcars, geom=c("boxplot", "jitter"), fill=gear, main="Mileage by Gear Number", xlab="", ylab="Miles per Gallon")

To learn more, see the ggplot reference site http://docs.ggplot2.org/current/index.html