Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Visualization The commonality between science and art is in trying to see profoundly - to develop strategies of seeing and showing Edward Tufte.

Similar presentations


Presentation on theme: "Data Visualization The commonality between science and art is in trying to see profoundly - to develop strategies of seeing and showing Edward Tufte."— Presentation transcript:

1 Data Visualization The commonality between science and art is in trying to see profoundly - to develop strategies of seeing and showing Edward Tufte

2 Visualization skills Humans are particularly skilled at processing visual information A natural capability compared to reading which is a learning skill Our ancestors were those who were efficient visual processors and quickly detected threats and used this information to make effective decisions

3 A graphical representation of Napoleon Bonaparte's invasion of and subsequent retreat from Russia during The graph shows the size of the army, its location and the direction of its movement. The temperature during the retreat is drawn at the bottom of figure, which was drawn by Charles Joseph Minard in 1861 and is generally considered to be one of the finest graphs ever produced.

4 Wilkinson’s grammar of graphics
Data A set of data operations that create variables from datasets (e.g., spreadsheets and databases (e.g., Classic Models)) Trans Variable transformations (converting data into a format suitable for the intended visualization) Scale Scale transformations (good for controlling the visualization of data)

5 Wilkinson’s grammar of graphics
Coord A coordinate system describing where things are located (e.g., longitude and latitude for maps, and x-axis and y-axis for graphs) Element Graph and its aesthetic attributes (e.g., scatterplot of year against co2 emissions) Guide One or more guides (e.g., axes and legends can be useful for guiding what is plotted in a graph)

6 ggvis An implementation of the grammar of graphics in R
The grammar describes the structure of a graphic A graphic is a mapping of data to a visual representation ggvis

7 Data Spreadsheet approach Database
Use an existing spreadsheet or create a new one Export as CSV file Database Execute SQL query

8 Transformation A transformation converts data into a format suitable for the intended visualization # TRANSFORMATION: url <-' carbon <- read.table(url, header=T, sep=',') head(carbon) # compute a new column in carbon containing the relative change in CO2 since pre- # industrial periods, when the value was 280ppm. carbon$relCO2 = (carbon$CO2-280)/280

9 Coord A coordinate system describes where things are located
Most graphs are plotted on a two-dimensional (2D) grid with x (horizontal) and y (vertical) coordinates The default coordinate system is Cartesian (histogram)

10 Element An element is a graph and its aesthetic attributes
Build a graph by adding layers library(ggvis) library(readr) # ELEMENT: CO2 EMISSION BY YEAR carbon %>% ggvis(~year,~CO2) %>% layer_points(fill:='red') # use pipe function (%>%) to create a pipeline of commands # the code above reads like a recipe. It says: # 1. take the carbon data, then # 2. use the package ggvis to plot year by co2, and # 3. specify the plot to contain red points.

11 Element

12 Scale # SCALE: GOOD IDEA TO HAVE A ZERO POINT FOR THE Y-AXIS (DONT DISTORT THE SLOPE!) carbon %>% ggvis(~year,~CO2) %>% layer_points(fill:='red') %>% scale_numeric('y',zero=T) # perform steps 1-3 of the ELEMENT code, and then, # 4. set the scale for the y-axis to zero.

13 Axes # AXES: HELP THE READER UNDERSTAND THE GRAPH
carbon %>% ggvis(~year,~relCO2) %>% layer_lines(stroke:='blue') %>% scale_numeric('y',zero=T) %>% add_axis('y', title = "CO2 ppm of the atmosphere", title_offset=50) %>% add_axis('x', title ='Year', format = '####') # the code above says: # 1. take the carbon data, then # 2. use the package ggvis to plot year by relco2, then # 3. specify the plot to contain a continuous blue line, then # 4. set the scale for the y-axis to zero, then # 5. add a title for the y-axis that is moved a bit to the left to improve readability, and # 6. add a title for the x-axis, specifying a format of 4 consecutive digits for displaying year on the x-axis

14 Axes

15 Guides Axes and legends are both forms of guides
Helps the viewer to understand a graphic

16 Exercise Create a point plot using the data in the following table. Add a title for both x- and y- axes. Year 1804 1927 1960 1974 1987 1999 2012 2027 2046 Population (billions) 1 2 3 4 5 6 7 8 9

17 Histogram # HISTOGRAM: USEFUL FOR SHOWING THE DISTRIBUTION OF VALUES IN A SINGLE COLUMN url <- ' t <- read.table(url, header=T, sep=',') t$C <- round((t$temperature - 32)*5/9,1) t %>% ggvis(~C) %>% layer_histograms(width = 2, fill:='cornflowerblue') %>% add_axis('x',title='Celsius') %>% add_axis('y',title='Frequency') # width refers to the size of the bin. # this means that the bin above the tick mark 10 contains all values in the range 9 to 11. # The code above says: # 1. read the url, then # 2. read the url content as table t, then # 3. create a new column in t that transforms f temperature to celsius and rounds it to one decimal place, then # 4. take the t data, then # 5. use the package ggvis to plot celsius temperature, then # 6. specify the plot to be a histogram with width 2 and color cornflowerblue, then, # 7. add a title for the x-axis, and # 8. add a title for the y-axis.

18 Histogram

19 Exercise Create a histogram of CO2 using the carbon data. Add a title for both x- and y- axis. url <-' carbon <- read.table(url, header=T, sep=',')

20 Bar graph # BAR GRAPH: USEFUL FOR GRAPHING CATEGORICAL DATA
library(DBI) require(RMySQL) # set a driver m<-dbDriver("MySQL") # connect to the database conn <- dbConnect(m,user='student',password='student',host='wallaby.terry.uga.edu',dbname='ClassicModels') # if error "in .local(drv, ...): cannot allocate a new connection: 16 connections already opened" appears loop through the connections and delete them. If there is no problem move on to query the database. cons<-dbListConnections(MySQL()) for(con in cons) dbDisconnect(con) # query the database and create file for use with R d <- dbGetQuery(conn,"select productLine from Products;") # plot the number of product lines by specifying the appropriate column name d %>% ggvis(~productLine) %>% layer_bars(fill:='chocolate') %>% add_axis('x',title='Product line') %>% add_axis('y',title='Count') # The code immediately above says: # 1. take the d data, then # 2. use the package ggvis to plot productline, the # 3. specify the plot to be a bar graph with color chocolate, then, # 4. add a title for the x-axis, and # 5. add a title for the y-axis.

21 Bar graph

22 Exercise Using Classic Models, create a bar graph to show how many offices each country has.


Download ppt "Data Visualization The commonality between science and art is in trying to see profoundly - to develop strategies of seeing and showing Edward Tufte."

Similar presentations


Ads by Google