Data Visualization The commonality between science and art is in trying to see profoundly - to develop strategies of seeing and showing Edward Tufte.

Slides:



Advertisements
Similar presentations
SADC Course in Statistics Good graphs & charts using Excel Module B2 Sessions 6 & 7.
Advertisements

Working with Tables for Page Design – Lesson 41 Working with Tables for Page Design Lesson 4.
DAY 8: MICROSOFT EXCEL – CHAPTER 5 Aliya Farheen February 5, 2015.
Spreadsheet Basics Computer Technology.
® Microsoft Office 2010 Excel Tutorial 4: Enhancing a Workbook with Charts and Graphs.
Fundamental Features of Graphs All graphs have two, clearly-labeled axes that are drawn at a right angle. –The horizontal axis is the abscissa, or X-axis.
FIRST COURSE Excel Tutorial 4 Working with Charts and Graphics.
McGraw-Hill Technology Education © 2004 by the McGraw-Hill Companies, Inc. All rights reserved. Office Excel 2003 Lab 2 Charting Worksheet Data.
Visualization and Data Mining. 2 Outline  Graphical excellence and lie factor  Representing data in 1,2, and 3-D  Representing data in 4+ dimensions.
A Simple Guide to Using SPSS© for Windows
1 Committed to Shaping the Next Generation of IT Experts. Chapter 3 – Graphs and Charts: Delivering a Message Robert Grauer and Maryann Barber Exploring.
COMPREHENSIVE Excel Tutorial 4 Working with Charts and Graphics.
1 Good graphs & charts using Excel Module 1 Session 7.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
Rebecca Boger Earth and Environmental Sciences Brooklyn College.
NU Data Excel Orientation Graphing of Screening Data and Basic Graphing Functions.
OCR Functional Skills Charts Presenting data – Good data presentation skills are important. – Poor graphs and tables lead to the wrong conclusions being.
Pasewark & Pasewark 1 Excel Lesson 8 Working with Charts Microsoft Office 2007: Introductory.
Instructor: Professor Cora Martinez, PhD Department of Civil and Environmental Engineering Florida International University.
Charts and Graphs V
Chapter 5 Review: Plotting Introduction to MATLAB 7 Engineering 161.
CMPT 880/890 Writing labs. Outline Presenting quantitative data in visual form Tables, charts, maps, graphs, and diagrams Information visualization.
R-Graphics Day 2 Stephen Opiyo. Basic Graphs One of the main reasons data analysts turn to R is for its strong graphic capabilities. R generates publication-ready.
Exploring Microsoft Access Chapter 4 Relational Databases, External Data, Charts, and the Switchboard.
Chapter 5: Charts Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
The Advantage Series ©2004 The McGraw-Hill Companies, Inc. All rights reserved Chapter 8 Managing Worksheet Lists Microsoft Office Excel 2003.
Examples of different formulas and their uses....
Chapter 19 Managing Worksheet Lists. Creating Lists ► Microsoft Office Excel 2003 is inarguably the most powerful electronic spreadsheet available. ►
Chapter 9 Creating and Designing Graphs. Creating a Graph A graph is a diagram of data that shows relationship among a set of numbers. Data can be represented.
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. ACCESS 2007 M I C R O S O F T ® THE PROFESSIONAL APPROACH S E R I E S Lesson 6 – Designing.
Microsoft FrontPage 2003 Illustrated Complete Using Office Components.
Domain 3 Understanding the Adobe Dreamweaver CS5 Interface.
Copyright © 2008 Pearson Prentice Hall. All rights reserved. 11 Committed to Shaping the Next Generation of IT Experts. Chapter 5 PivotTables and Charts.
A Picture Is Worth A Thousand Words. DAY 7: EXCEL CHAPTER 4 Tazin Afrin September 10,
Graphing Parameters Titles X-Axis Title Y-Axis Title Legend Scales Color Gridlines library(help="graphics") Basic Chart Types The R Graphics Package LineHistogram.
Information Visualization Digital Humanities Workshop Brad Hemminger.
An Internet of Things: People, Processes, and Products in the Spotfire Cloud Library Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist.
Microsoft Office Illustrated Fundamentals Unit I: Working with Charts.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
Graphing Tutorial William Hornick CS 101. Overview You will be given a brief description, example, and “how to create” for each of the following: You.
DAY 6: MICROSOFT EXCEL – CHAPTER 3 Sravanthi Lakkimsetty September 2, 2015.
R-Graphics Stephen Opiyo. Basic Graphs One of the main reasons data analysts turn to R is for its strong graphic capabilities. R generates publication-ready.
CRSD Technology Training Tony Judice. Quick Access Toolbar – can be modifiedSave as… allows you to save the file to a different location and also as an.
Introduction to R Statistics are no substitute for judgment Henry Clay, U.S. congressman and senator.
Ggplot2 A cool way for creating plots in R Maria Novosolov.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
1 Excel Lesson 5 Working with Multiple Worksheets and Charts Microsoft Office 2010 Introductory Pasewark & Pasewark.
Data Visualization.
Spreadsheet Basics Computer Technology What is a spreadsheet? Spreadsheets are: –Applications that track, analyze, and chart numeric information –Used.
Excel Part 4 Working with Charts and Graphics. XP Objectives Create an embedded chart Work with chart titles and legends Create and format a pie chart.
Excel Part 4 Working with Charts and Graphics. XP Objectives Create an embedded chart Work with chart titles and legends Create and format a pie chart.
CSA2 – ADVANCED FUNCTIONS AND CHARTS (A DAY) & (B DAY) ADVANCED FUNCTIONS - are used in higher-level operations, such as in conditional.
Charts MOAC Lesson 6.
1 PowerPoint Lesson 1 PowerPoint Basics Microsoft Office 2013: Introductory Pasewark & Pasewark.
Charts MOAC Lesson 6.
Data Visualization The commonality between science and art is in trying to see profoundly - to develop strategies of seeing and showing Edward Tufte.
Data Visualization.
Add More Zing to your Dashboards – Creating Zing Plot Gadgets
Reading a file R can read a wide variety of input formats Text,
Scatterplot #SCATTERPLOT: USEFUL FOR PLOTTING RELATIONSHIPS BETWEEN TWO NUMERIC VARIABLES library(ggvis) library(DBI) require(RMySQL) # set a driver m
Data Visualization Jeopardy
Chapter A - Getting Started with Dreamweaver MX 2004
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
Microsoft FrontPage 2003 Illustrated Complete
Data Visualization The commonality between science and art is in trying to see profoundly - to develop strategies of seeing and showing Edward Tufte.
Introduction to R Statistics are no substitute for judgment
Charts MOAC Lesson 6.
Lesson 13 Working with Tables
Interactive Data Visualizations using R and ggvis
Presentation transcript:

Data Visualization The commonality between science and art is in trying to see profoundly - to develop strategies of seeing and showing Edward Tufte

Visualization skills Humans are particularly skilled at processing visual information An innate capability compared Our ancestors were those who were efficient visual processors and quickly detected threats and used this information to make effective decisions

A graphical representation of Napoleon Bonaparte's invasion of and subsequent retreat from Russia during The graph shows the size of the army, its location and the direction of its movement. The temperature during the retreat is drawn at the bottom of figure, which was drawn by Charles Joseph Minard in 1861 and is generally considered to be one of the finest graphs ever produced.

Wilkinson’s grammar of graphics Data A set of data operations that create variables from datasets Trans Variable transformations Scale Scale transformations Coord A coordinate system Element Graph and its aesthetic attributes Guide One or more guides

ggvis An implementation of the grammar of graphics in R The grammar describes the structure of a graphic A graphic is a mapping of data to a visual representation ggvis

Data Spreadsheet approach Use an existing spreadsheet or create a new one Export as CSV file Database Execute SQL query

Transformation A transformation converts data into a format suitable for the intended visualization # compute a new column in carbon containing the relative change in CO2 carbon$relCO2 = (carbon$CO2-280)/280

Coord A coordinate system describes where things are located Most graphs are plotted on a two- dimensional (2D) grid with x (horizontal) and y (vertical) coordinates The default coordinate system for most graphic packages is Cartesian.

Element An element is a graph and its aesthetic attributes Build a graph by adding layers library(ggvis) url <- ' carbon <- read.table(url, header=T, sep=',') # Select year(x) and CO2(y) to create a x-y point plot # Specify red points, as you find that aesthetically pleasing carbon %>% ggvis(~year,~CO2) %>% layer_points(fill:=‘red’) # Notice how ‘%>%’ is used for creating a pipeline of commands

Element

Scale carbon %>% ggvis(~year,~CO2) %>% layer_points(fill:='red') %>% scale_numeric('y',zero=T)

Axes # Compute a new column containing the relative change in CO2 carbon$relCO2 = (carbon$CO2-280)/280 carbon %>% ggvis(~year,~relCO2) %>% layer_lines(stroke:='blue') %>% scale_numeric('y',zero=T) %>% add_axis('y', title = "CO2 ppm of the atmosphere", title_offset=50) %>% add_axis('x', title ='Year', format = ' #### ' )

Guides Axes and legends are both forms of guides Helps the viewer to understand a graphic

Exercise Create a line plot using the data in the following table.

Histogram library(ggvis) url <- ' t <- read.table(url, header=T, sep=',') t$C <- round((t$temperature - 32)*5/9,1) t %>% ggvis(~C) %>% layer_histograms(width = 2, fill:='cornflowerblue') %>% add_axis('x',title='Celsius') %>% add_axis('y',title='Frequency')

Bar graph library(ggvis) library(RMySQL) conn <- dbConnect(RMySQL::MySQL(), "richardtwatson.com", dbname="ClassicModels", user="db1", password="student") # Query the database and create file for use with R d <- dbGetQuery(conn,"SELECT productLine from Products;") # Plot the number of product lines by specifying the appropriate column name d %>% ggvis(~productLine) %>% layer_bars(fill:='chocolate') add_axis('x',title='Product line') %>% add_axis(‘y’,title='Count')

Exercise Create a bar graph using the data in the following table

Scatterplot library(ggvis) library(RMySQL) conn <- dbConnect(RMySQL::MySQL(), "richardtwatson.com", dbname="ClassicModels", user="db1", password="student") # Get the monthly value of orders d <- dbGetQuery(conn,"SELECT MONTH(orderDate) AS orderMonth, sum(quantityOrdered*priceEach) AS orderValue FROM Orders, OrderDetails WHERE Orders.orderNumber = OrderDetails.orderNumber GROUP BY orderMonth;") # Plot data orders by month # Show the points and the line d %>% ggvis(~orderMonth, ~orderValue/ ) %>% layer_lines(stroke:='blue') %>% layer_points(fill:='red') %>% add_axis('x', title = 'Month') %>% add_axis('y',title='Order value (millions)', title_offset=30)

Scatterplot

library(ggvis) library(RMySQL) conn <- dbConnect(RMySQL::MySQL(), "richardtwatson.com", dbname="ClassicModels", user="db1", password="student") d <- dbGetQuery(conn,"SELECT YEAR(orderDate) AS orderYear, MONTH(orderDate) AS Month, sum((quantityOrdered*priceEach)) AS Value FROM Orders, OrderDetails WHERE Orders.orderNumber = OrderDetails.orderNumber GROUP BY orderYear, Month;") # Plot data orders by month and display by year # ggvis expects grouping variables to be a factor, so convert d$Year <- as.factor(d$orderYear) d %>% group_by(Year) %>% ggvis(~Month,~Value/1000, stroke = ~Year) %>% layer_lines() %>% add_axis('x', title = 'Month') %>% add_axis('y',title='Order value (thousands)', title_offset=50)

Scatterplot

Bar graph d %>% group_by(Year) %>% ggvis( ~Month, ~Value/100000, fill = ~Year) %>% layer_bars() %>% add_axis('x', title = 'Month') %>% add_axis('y',title='Order value (thousands)', title_offset=50)

Multiple files library(ggvis) library(sqldf) library(RMySQL) options(sqldf.driver = "SQLite") # to avoid conflict with RMySQl # Load the driver conn <- dbConnect(RMySQL::MySQL(), "richardtwatson.com", dbname="ClassicModels", user="db1", password="student") orders <- dbGetQuery(conn,"SELECT 'Orders' as Category, MONTH(orderDate) AS month, sum((quantityOrdered*priceEach)) AS value FROM Orders, OrderDetails WHERE Orders.orderNumber = OrderDetails.orderNumber and YEAR(orderDate) = 2004 GROUP BY Month;") payments <- dbGetQuery(conn,"SELECT 'Payments' as Category, MONTH(paymentDate) AS month, SUM(amount) AS value FROM Payments WHERE YEAR(paymentDate) = 2004 GROUP BY MONTH;") # concatenate the two files m <- sqldf("select month, Category, value from orders UNION select month, Category, value from payments") m %>% group_by(Category) %>% ggvis(~month, ~value, stroke = ~ Category) %>% layer_lines() %>% add_axis('x',title='Month') %>% add_axis('y',title='Value',title_offset=70)

Multiple files

Smoothing library(sqldf) options(sqldf.driver = "SQLite") # to avoid conflict with RMySQl url <- " t <- read.table(url, header=T, sep=',') t8 <- sqldf('select * from t where month = 8') t8 %>% ggvis(~year,~temperature) %>% layer_lines(stroke:='red') %>% layer_smooths(se=T, stroke:='blue') %>% add_axis('x',title='Year’,format = ’####') %>% add_axis('y',title='Temperature (F)', title_offset=30)

Exercise National GDP and fertility data have been extracted from a web site and saved as a CSV fileweb site CSV file Compute the correlation between GDP and fertility Do a scatterplot of GDP versus fertility with a smoother Log transform both GDP and fertility and repeat the scatterplot with a smoother

Box plot library(ggvis) library(RMySQL) options(sqldf.driver = "SQLite") # to avoid conflict with RMySQl conn <- dbConnect(RMySQL::MySQL(), "richardtwatson.com", dbname="ClassicModels", user="db1", password="student") d <- dbGetQuery(conn,"SELECT amount from Payments;") # Boxplot of amounts paid d %>% ggvis(~factor(0),~amount) %>% layer_boxplots() %>% add_axis('x',title='Checks') %>% add_axis('y',title='')

Box plot

library(ggvis) library(RMySQL) options(sqldf.driver = "SQLite") # to avoid conflict with RMySQl conn <- dbConnect(RMySQL::MySQL(), "richardtwatson.com", dbname="ClassicModels", user="db1", password="student") d <- dbGetQuery(conn,"SELECT month(paymentDate) as month, amount from Payments;") # Boxplot of amounts paid d %>% ggvis(~factor(month),~amount) %>% layer_boxplots()

Box plot

Heatmap library(ggvis) library(RMySQL) options(sqldf.driver = "SQLite") # to avoid conflict with RMySQL # Load the driver conn <- dbConnect(RMySQL::MySQL(), "richardtwatson.com", dbname="ClassicModels", user="db1", password="student") d <- dbGetQuery(conn,'SELECT count(*) as Frequency, productLine as Line, productScale as Scale from Products group by productLine, productScale') d %>% ggvis( ~Scale, ~Line, fill= ~Frequency) %>% layer_rects(width = band(), height = band()) %>% layer_text(text:=~Frequency, stroke:='white', align:='left', baseline:='top') # add frequency to each cell

Heatmap

Interactive graphics FunctionPurpose input_checkbox()Check one or more boxes input_checkboxgroup()A group of checkboxes input_numeric()A spin box input_radiobuttons()Pick one from a set of options input_select()Select from a drop-down text box input_slider()Select using a slider input_text()Input text

Interactive graphics Select a property from a drop-down list carbon$relCO2 = (carbon$CO2-280)/280 carbon %>% ggvis(~year,~relCO2) %>% layer_lines(stroke:=input_select(c("red", "green", "blue"))) %>% scale_numeric('y',zero=T) %>% add_axis('y', title = "CO2 ppm of the atmosphere", title_offset=50) %>% add_axis('x', title ='Year', format='####')

Interactive graphics Select a numeric value with a slider carbon$relCO2 = (carbon$CO2-280)/280 slider <- input_slider(1, 5, label = "Width") select_color <- input_select(label='Color',c("red", "green", "blue")) carbon %>% ggvis(~year,~relCO2) %>% layer_lines(stroke:=select_color, strokeWidth:=slider) %>% scale_numeric('y',zero=T) %>% add_axis('y', title = "CO2 ppm of the atmosphere", title_offset=50) %>% add_axis('x', title ='Year', format='####')

dplyr Designed to work with ggvis and %>% FunctionPurpose filter() Select rows select()Select columns arrange()Sort rows mutate()Add new columns summarize()Compute summary statistics

dplyr library(sqldf) library(dplyr) options(sqldf.driver = "SQLite") # to avoid conflict with RMySQL url <- ' t <- read.table(url, header=T, sep=',') # filter sqldf("select * from t where year = 1999") filter(t,year==1999) # select sqldf("select temperature from t") select(t,temperature) # a combination of filter and select sqldf("select * from t where year > 1989 and year < 2000") select(t,year, month, temperature) %>% filter(year > 1989 & year < 2000) # arrange sqldf("select * from t order by year desc, month") arrange(t, desc(year),month) # mutate -- create a new column t_SQL <- sqldf("select year, month, temperature, (temperature-32)*5/9 as CTemp from t") t_dplyr <- mutate(t,CTemp = (temperature-32)*5/9) # summarize sqldf("select avg(temperature) from t") summarize(t,mean(temperature))

dplyr & ggvis library(ggvis) library(dplyr) url <- ' t <- read.table(url, header=T, sep=',') slider <- input_slider(1, 12,label="Month") t %>% ggvis(~year,~temperature) %>% filter(month == eval(slider)) %>% layer_points() %>% add_axis('y', title = "Temperature", title_offset=50) %>% add_axis('x', title ='Year', format='####')

Geographic data ggmap supports multiple mapping systems, including Google maps library(ggplot) library(ggmap) library(mapproj) library(RMySQL) options(sqldf.driver = "SQLite") # to avoid conflict with RMySQl # connect to the database conn <- dbConnect(RMySQL::MySQL(), "richardtwatson.com", dbname="ClassicModels", user="db1", password="student") # Google maps requires lon and lat, in that order, to create markers d <- dbGetQuery(conn,"SELECT y(officeLocation) AS lon, x(officeLocation) AS lat FROM Offices;") # show offices in the United States # vary zoom to change the size of the map map <- get_googlemap('united states',marker=d,zoom=4) ggmap(map) + labs(x = 'Longitude', y = 'Latitude') + ggtitle('US offices')

Map

John Snow 1854 Broad Street cholera map Water pump

Cholera map (now Broadwick Street) library(ggplot2) library(ggmap) library(mapproj) url <- ' pumps <- read.table(url, header=T, sep=',') url <- ' deaths <- read.table(url, header=T, sep=',') map <- get_googlemap('broadwick street, london, united kingdom',markers=pumps,zoom=15) ggmap(map) + labs(x = 'Longitude', y = 'Latitude') + ggtitle('Pumps and deaths') + geom_point(aes(x=longitude,y=latitude,size=count),color='blue',data=deaths) + xlim(-.14,-.13) + ylim(51.51,51.516)

Florence Nightingale

Florence Nightingale (code)code

Key points ggvis is based on a grammar of graphics Very powerful and logical Supports interactive graphics You can visualize the results of SQL queries using R The combination of MySQL and R provides a strong platform for data reporting