Visualization Analytics

Slides:



Advertisements
Similar presentations
Analyzing Data (C2-5 BVD) C2-4: Categorical and Quantitative Data.
Advertisements

Chapter 3 Graphic Methods for Describing Data. 2 Basic Terms  A frequency distribution for categorical data is a table that displays the possible categories.
Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 2 Picturing Variation with Graphs.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Intro to Statistics for the Behavioral Sciences PSYC 1900
Copyright © 2005 Department of Computer Science CPSC 641 Winter Data Analysis and Presentation There are many “tricks of the trade” used in data.
Chapter 2 Graphs, Charts, and Tables – Describing Your Data
Chapter 2 Describing Data Sets
1 Data Analysis H There are many “tricks of the trade” used in data analysis and results presentation H A few will be mentioned here: –statistical analysis.
Sometimes, Tables can be confusing
Graphical Displays of Data Section 2.2. Objectives Create and interpret the basic types of graphs used to display data.
Charts and Graphs V
CMPT 880/890 Writing labs. Outline Presenting quantitative data in visual form Tables, charts, maps, graphs, and diagrams Information visualization.
Smith/Davis (c) 2005 Prentice Hall Chapter Four Basic Statistical Concepts, Frequency Tables, Graphs, Frequency Distributions, and Measures of Central.
Quantitative Skills 1: Graphing
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 2 Section 1 – Slide 1 of 27 Chapter 2 Section 1 Organizing Qualitative Data.
StatisticsStatistics Graphic distributions. What is Statistics? Statistics is a collection of methods for planning experiments, obtaining data, and then.
Writing the “Results” & “Discussion” sections Awatif Alam Professor Community Medicine Medical College/ KSU.
Unit 4 Statistical Analysis Data Representations.
Highline Class, BI 348 Basic Business Analytics using Excel Chapter 03: Data Visualization: Tables & Charts 1.
Displaying Data  Data: Categorical and Numerical  Dot Plots  Stem and Leaf Plots  Back-to-Back Stem and Leaf Plots  Grouped Frequency Tables  Histograms.
Graphs & Charts: The Art of Data Visualisation Alasdair Rutherford SSPC9C6University of StirlingSpring 2016.
Copyright © 2009 Pearson Education, Inc. 3.2 Picturing Distributions of Data LEARNING GOAL Be able to create and interpret basic bar graphs, dotplots,
DATA VISUALIZATION BOB MARSHALL, MD MPH MISM FAAFP FACULTY, DOD CLINICAL INFORMATICS FELLOWSHIP.
The Science of Data Visualization Presented by Nick Beaton.
Section 2.2 Graphical Displays of Distributions. Graphical Displays Always plot your data first! To see shape of distribution of data set, you need a.
Elementary Statistics
Visualizations of Safety Data
Organizing Qualitative Data
Homework Line of best fit page 1 and 2.
Measurements Statistics
Tennessee Adult Education 2011 Curriculum Math Level 3
Chapter 2: Methods for Describing Data Sets
Display of Quantitative Information
3.2 Picturing Distributions of Data
Unit 4 Statistical Analysis Data Representations
Tutorial 4: Enhancing a Workbook with Charts and Graphs
Three Using Visuals in Written and Oral Communication.
Quantitative Skills : Graphing
QM222 A1 Visualizing data using Excel graphs
STAT 4030 – Jennifer Priestley, Ph.D. Programming in R
Elementary Statistics
Module 6: Presenting Data: Graphs and Charts
Information Text – Text Features
CSc4730/6730 Scientific Visualization
Make Your Data Tell a Story
CPSC 531: System Modeling and Simulation
Data Presentation Carey Williamson Department of Computer Science
Analyzing One-Variable Data
Graphical Techniques.
Anscombe’s Quartet.
Let’s say you are planning to study Kim Possible’s pet
Pre-AP Biology Graphing 1.
Descriptive statistics
CHAPTER 1 Exploring Data
Creating Visuals and Data Displays
Organizing Qualitative Data
Keller: Stats for Mgmt & Econ, 7th Ed
Statistical Reasoning
How To conduct a thesis 1- Define the problem
Honors Statistics Review Chapters 4 - 5
Carey Williamson Department of Computer Science University of Calgary
Essentials of Statistics for Business and Economics (8e)
Keller: Stats for Mgmt & Econ, 7th Ed
Chapter 3: Graphic Presentation
How To conduct a thesis 1- Define the problem
Writing Technical Reports
Presentation transcript:

Visualization Analytics Principles

Visual Analytics motivation

Mean of y 7.50 to 2 decimal places Chatterjee, Sangit; Firat, Aykut (2007). "Generating Data with Identical Statistics but Dissimilar Graphics: A Follow up to the Anscombe Dataset". American Statistician. 61 (3): 248–254. doi:10.1198/000313007X2200 Mean of x=9 exact Sample variance of x= 11 Mean of y 7.50 to 2 decimal places Sample variance of y 4.125plus/minus 0.003 Correlation between x and y0.816to 3 decimal places Linear regression liney = 3.00 + 0.500xto 2 and 3 decimal places, respectively

Shneiderman’s Event Quartet: #1 http://eventevent.github.io/papers/EVENT_2016_paper_7.pdf

Shneiderman’s Event Quartet: #2 Website statistics show high weekday visitation with gaps on weekends

Shneiderman’s Event Quartet: #3 Slowdown: fewer events per unit time, e.g. earthquake aftershocks

Shneiderman’s Event Quartet: #4

Visual Analytics: Where Art Meets Science "split-brain" research in the 1960s, such as that which later won Roger Sperry of Caltech a Nobel prize. The left brain is also referred to as the digital brain. It controls reading and writing, calculation, and logical thinking. The right brain is referred to as the analogbrain. It controls three-dimensional sense, creativity, and artistic senses. Visual Analytics gives us an opportunity to use our whole brain. Visual Analytics includes static graphics and automated analysis techniques with interactive visualizations for an effective understanding, reasoning and decision making. There are two applications of Visual Analytics: 1) messaging and 2) learning. Methods and tools are different for each application.

Table: Cancer Incidence by Type Data from http://www.cancer.gov/cancertopics/types/commoncancers

Graph: Cancer Incidence by Type In general a graphic is better than a table. A table is better in a few exceptions: Convey a handful of numbers Report precise values for lookup Small cases Graphic better for comparisons Data from http://www.cancer.gov/cancertopics/types/commoncancers

Pre-attentive processing

Target Selection Visual Cue: color

Target Selection Visual Cue: Shape

Target Selection Visual Cue: Conjunction

Boundary Detection

Good graphical principles Visual Analytics Good graphical principles

Safety Graphics Wiki General Principles Content Communication Information Annotation Axes Styles Techniques Types of plots Colors http://www.ctspedia.org/do/view/CTSpedia/BestPractices Graphics are almost always better than tables but not all graphics are equal. It should tell its story without a need for detailed explanatory text or supporting documents. Content Every graphic should stand on its own. It should tell a story . . Communication Tailor each graphic to its primary communication purpose Information Maximize the data-to-ink ratio Annotation Provide legible text and information Annotation Provide legible text and information Axes Design axes to aid interpretation of a graph Styles Make symbols and plot lines distinct and readable Techniques Use established techniques to clarify the message Types of plots Use the simplest plot that is appropriate for the information to be displayed Colors Make use of color if appropriate for the medium of communication

Graph 1a: Bar Chart of Distribution of Eye Irritation Graph 2a. Bar Chart of Distribution of Eye Irritation. (graph before enhancements). The data are given in http://support.sas.com/kb/39/166.html along with the code we used to create this Graph. The data are percent of subjects with eye irritation at five time points—weeks 1, 2, 4, 6, 8 and at endpoint. There is a lot of ink here, but the main information is in the percent of subjects, with confidence intervals (or some measure of variability—it is not clear what it is). Using weeks and end point as categorical variables doesn’t show the time differences between them. This example illustrate two common issues: - It is not optimal in terms of data-to-ink ratio, as only the height of the bars are important, rather than the filled bar parts themselves (At least this particular example has the virtue in this case of starting the bars at zero). - If one looks at what happens over time, it is not directly clear that all the time points are not equidistant in time, and the endpoint is just another set of bars, nor really distinguished clearly from the ‘over time’ view From paper: 5.2. Maximize data-to-ink ratio Our second example further illustrates the good graphing principle “maximize the data-to-ink ratio”. Graph 2a show the percentage of subjects with eye redness over time in a study for three treatment groups. The Graph is pleasing to the eye but has several possible areas for improvement. Much of the ink used in Graph 2a does not aid the reader in their interpretation of the data. In fact the important message of the plot is somewhat obscured. The main information is the percent of subjects, with confidence intervals (or some measure of variability—it is not clear what the measure of variability is). However, all the ink in the bars obscures this information. Only the height of the bars is important, rather than the filled bar parts themselves. Another problem with this graph is that the x-axis represents the continuous variable of time as a categorical variable. The eye irritation was measured at weeks 1, 2, 4, 6, and 8, but the spacing on the x-axis makes it appear that they were measured at equal time intervals. Additionally, the “End Point” is not clearly distinguished from the data at the specific weeks. A more minor issue is the choice of colors. The plot indicates that the three treatment groups are Placebo, Drug A and Drug B. Graph 2a uses sequential colors (light to dark) which might be a better choice for treatment groups that progress from low to high such as placebo, low dose, and high dose. In this case, it would be better to choose colors that suggest a qualitative difference between the groups. Qualitative schemes do not imply magnitude differences between legend classes and hues are used to create the primary visual differences between classes. (We do recognize, however, that depending on what Drug A and Drug B are, a sequential scheme might be appropriate.) Graphs that use an appropriate color scheme will communicate their messages more effectively. There are many resources to help statisticians choose effective color schemes, such as ColorBrewer 2.0: Color Advice for Maps [8] Lots of ink doesn’t help the message Not clear what is the measure of variability Endpoint just another set of bars, not distinguished from ‘over time’ info

Graph 1b: Dotplot of distribution of eye irritation This also shows making the time in weeks as a quantitative variable, rather than categorical. It is subtle here, but now weeks 1 and 2 are visually closer than weeks 2, 4, 6 and 8. Endpoint is clearly separated from the time in weeks. Main message not obscured by all the ink. Weeks 1 and 2 visually closer than weeks 2, 4, 6 and 8. Endpoint clearly separated from time in weeks.

Pie Charts & Quantitative Information Can you describe the data? Is there a pattern in the areas?

Dot Charts & Quantitative Information Did you realizing some were 50% smaller than others?

Take home message....

Cognitive scale of visual cues for quantitative variables

Cognitive scale of visual cues for qualitative variables

New Drug Application (NDA)