Scientific Figure Design

Slides:



Advertisements
Similar presentations
The theory of data visualisation v2.0 Simon Andrews, Phil Ewels
Advertisements

Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.
Making effective plots: 1.Don’t use default Excel plots! 2.Figure should highlight the key relationships in the data. 3.Should be clear - no extraneous.
® Microsoft Office 2010 Excel Tutorial 4: Enhancing a Workbook with Charts and Graphs.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Reading Graphs and Charts are more attractive and easy to understand than tables enable the reader to ‘see’ patterns in the data are easy to use for comparisons.
DESIGNING DOCUMENTS And page layout. What is document design?  Refers to page layout, that is, where the visuals and information are placed on a page.
C.R.A.P.   Color balance : The colors of the entire illustration- grey, black, white and the contrast orange, seem like they were chosen without much.
Charts and Graphs V
Guidelines for Visual Aids and Presentations Suggestions for Presenters Society of Quality Assurance 2004 Annual Meeting Guidance M. Rosenberg/L. KvasnickaJune.
CMPT 880/890 Writing labs. Outline Presenting quantitative data in visual form Tables, charts, maps, graphs, and diagrams Information visualization.
Scientific Figure Design v2.0 Simon Andrews, Anne Segonds-Pichon, Boo Virk
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
The introduction to SPSS Ⅱ.Tables and Graphs for one variable ---Descriptive Statistics & Graphs.
©2007 by the McGraw-Hill Companies, Inc. All rights reserved. 2/e PPTPPT.
StatisticsStatistics Graphic distributions. What is Statistics? Statistics is a collection of methods for planning experiments, obtaining data, and then.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 4 Describing Numerical Data.
MATH 3400 Computer Applications of Statistics Lecture 6 Data Visualization and Presentation.
Unit 4 Statistical Analysis Data Representations.
GrowingKnowing.com © Frequency distribution Given a 1000 rows of data, most people cannot see any useful information, just rows and rows of data.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Proposal: Preliminary Results and Discussion. Dos and Don’ts DoDon’t Include initial results if you have them You can also conduct and report on informal.
Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.
Surveillance and Population-based Prevention Department for Prevention of Noncommunicable Diseases Displaying data and interpreting results.
Chapter 5: Organizing and Displaying Data. Learning Objectives Demonstrate techniques for showing data in graphical presentation formats Choose the best.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Integrating Graphics, Illustrations, Figures, Charts.
Graphics and Desktop Publishing Objective 1.02: Investigate Design Principles and Elements.
Plot type specific considerations
2. Graphing Sci. Info Skills.
Prof. Eric A. Suess Chapter 3
Exploratory Data Analysis
Scientific Figure Design
Chapter 12 Visual Representation of Data
Esther Roughsedge & Vicky Avila BSPS conference 2017
Skills for Science with a focus on Biology.
EMPA Statistical Analysis
The theory of data visualisation
Visualizing Data and Communicating Information
Tennessee Adult Education 2011 Curriculum Math Level 3
Making great charts isn’t easy.
Unit 4 Statistical Analysis Data Representations
Tutorial 4: Enhancing a Workbook with Charts and Graphs
Understanding and Comparing Distributions
Understanding and Comparing Distributions
The Principles of Graphic Design
Describing Distributions Numerically
The Principles of Graphic Design
How could data be used in an EPQ?
STAT 4030 – Jennifer Priestley, Ph.D. Programming in R
Description of Data (Summary and Variability measures)
Understanding and Comparing Distributions
Design Theory v Boo Virk Simon Andrews
Proposal: Preliminary Results and Discussion
Module 6: Presenting Data: Graphs and Charts
CHAPTER 1: Picturing Distributions with Graphs
DAY 3 Sections 1.2 and 1.3.
Describing Distributions of Data
Technical Writing (AEEE299)
Presentation, layout and labeling
Pitfalls and misuses of statistics and graphs
Graphs with SPSS.
Understanding and Comparing Distributions
Introducing: CRAP TECH MENTORING
Honors Statistics Review Chapters 4 - 5
Lesson – Teacher Notes Standard:
Charts A chart is a graphic or visual representation of data
Accessibility Guide.
Use a Large Bold Type for the Main Title (80 pt):
Use a Large Bold Type for the Main Title (70 pt):
Presentation transcript:

Scientific Figure Design v2018-11 Simon Andrews, Anne Segonds-Pichon, Boo Virk, Jo Montgomery simon.andrews@babraham.ac.uk anne.segonds-pichon@babraham.ac.uk bhupinder.virk@babraham.ac.uk jo.montgomery@babraham.ac.uk

Figures are the way your science is presented to an audience Before we start, I’d like you to have a look at this graph; talk to the person next to you about its pitfalls

What this course covers… Theory of data visualisation Why do some figures work better than others? Applying theory to common plot types Ethics of data representation Using graphic design Editing bitmap images in GIMP Vector editing and compositing in Inkscape

What this course doesn’t cover… How to draw graphs in specific programs R Introduction Statistics with R Statistics with GraphPad Plotting with R/ggplot

Timetable Morning Coffee Afternoon Coffee Introduction Data Visualisation Theory Coffee Data Representation Practical Plots and ethics talk Design theory talk Afternoon GIMP Tutorial GIMP Practical Coffee Inkscape Tutorial Inkscape Practical Final practical

Data Visualisation Process Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Clean Dataset Exploratory Analysis Generate Conclusion

Exploratory visualisation Understand your data Multiple ways to present and summarise Crude representations Interactive Not intended for final publication Can be adapted for publication

Reference visualisation Using your data as a resource Allows users to look up data of interest Tabular / Configurable Interactive

Illustrative visualisation Intended to convey a specific point Carefully chosen subset of data Optimised presentation Good design Used for figures in papers

What makes a good figure? Has a clear message Helps to tell a story Adds to the text, and links to it Is focused Don’t confuse one message with another Is easy to interpret correctly Good data visualisation Good design Is an honest and true reflection of the data

The theory of data visualisation Simon Andrews, Phil Ewels simon.andrews@babraham.ac.uk phil.ewels@scilifelab.se

Data Visualisation A scientific discipline involving the creation and study of the visual representation of data whose goal is to communicate information clearly and efficiently to users. Data Visualisation is both an art and a science.

ISBN-10: 1466508914 http://www.cs.ubc.ca/~tmm/talks.html

Data Viz Process Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Visualisation Generate Conclusion

A data visualisation should… Show the data Not distort the data Summarise to make things clearer Serve a clear purpose Link to the accompanying text and statistics

Different representations have common elements

Graphical Representations Basic questions How are you going to turn the data into a graphical form (weight becomes length etc.) How are you going to arrange things in space How are you going to use colours, shapes etc. to clarify the point you want to make

Marks and Channels Marks Channels Geometric primitives Lines Points Areas Used to represent data sets Channels Graphical appearance of a mark Colour Length Position Angle Used to encode data

Figures are a combination of marks and channels 1 Mark = Rectangle 1 Channel = Length of longest side 1 Mark = Circle segment 1 Channel = Angle 1 Mark = Diamond shape 2 Channels = X position, Y position 1 Mark = Circle 4 Channels: X position Y position Area Colour

Golden Rules Effectiveness Expressiveness Encode the most important information with the most effective channel Expressiveness Match the properties of the data and channel

Types of channel Quantitative Qualitative Position on scale Length Angle Area Colour (saturation) Colour (lightness) Qualitative Spatial Grouping Colour (hue) Shape

Colour Technical representations of colour Red + Green + Blue (RGB) Cyan + Magenta + Yellow + Black (CMYK) Perceptual representation of colour Hue + Saturation + Lightness (HSL)

HSL Representation Hue = Shade of colour = Qualitative Saturation = Amount of colour = Quantitative Lightness = Amount of white = Quantitative Humans have no innate quantitative perception of hue but we have learned some (cold – hot, rainbow etc.) Our perception of hue is not linear

Types of channel Quantitative Qualitative Position on scale Length Angle Area Colour (saturation) Colour (lightness) Qualitative Spatial Grouping Colour (hue) Shape

Data Types Quantitative Ordered Categorical Height, Length, Weight, Expression etc. Ordered Small, Medium, Large January, February, March Categorical WT, Mutant1, Mutant2 GeneA, GeneB, GeneC

Golden Rules Effectiveness Expressiveness Encode the most important information with the most effective channel Expressiveness Match the properties of the data and channel

Golden Rules Effectiveness Expressiveness Encode the most important information with the most effective channel Expressiveness Match the properties of the data and channel

Effectiveness of quantitation 2X 7X 4.5X 1.8X 16X 3.4X

Quantitation Perception

Golden Rules Effectiveness Expressiveness Encode the most important information with the most effective channel Expressiveness Match the properties of the data and channel

Most Quantitative Representations Good quantitation Bar chart Stacked bar chart with common start Stacked bar chart with different starts Pie charts Bubble plots (circular area) Rectangular area Colour (luminance) Colour (saturation) Poor quantitation

Discriminability If you encode categorical data are the differences between categories easy for the user to perceive correctly?

Qualitative Discrimination How many colours can you discriminate?

Qualitative Discrimination How many (fillable) shapes can you discriminate? Can combine with colour, but need to maintain similar fillable areas

Qualitative Discrimination Can combine with colour, but need to maintain similar fillable areas

Separability The effectiveness of a channel does not always survive being combined with a second channel. There are large variations in how much two different channels interfere with each other Trying to put too much information on a figure can erode the impact of the main point you’re trying to make

Separability There is no confusion between the two channels Larger points are easier to discriminate than smaller ones We tend to focus on the area of the shape rather than the height/width separately Humans are very bad at separating combined colours

Popout A distinct item immediately stands out from the others Triggered by our low level visual system You don’t need to actively look at every point (slow!) to see it

Popout (find the red circle)

Popout Speed of identification is independent of the number of distracting points

Popout Colour pops out more than shape

Popout Mixing channels removes the effect (Find the red circle)

Use of space Where you want a viewer to focus on specific subsets of data you can help their perception by using the layout or highlighting of data to draw their attention to the point you’re making

Grouping

Grouping Exon CGI Intron Repeat

Ordering Is a monkey heavier than a dog?

Containment / Linking Wild Type Mutant

Validation Always try to validate plots you create You have seen your data too often to get an unbiased view Show the plot to someone not familiar with the data What does this plot tell you? Is this the message you wanted to convey? If they pick multiple points, do they choose the most important one first?

General Rules No unnecessary figures One point per figure Does a graphical representation make things clearer? Would a table be better? One point per figure Design each figure to illustrate a single point Adding complexity compromises the effectiveness of the main point No absolute reliance on colour Figures should ideally still work in black and white Colour should help perception

Making effective use of common plot types Anne Segonds-Pichon Simon Andrews Phil Ewels anne.segonds-pichon@babraham.ac.uk simon.andrews@babraham.ac.uk phil.ewels@scilifelab.se

Types of plot Things you can illustrate

Plot Properties Exploration, Presentation or both? Effectiveness Scalability Options Potential Problems

Distributions

Histograms / Density Plots Exploration or Presentation Effectiveness Scalability Both Good Poor

Histogram Options / Problems Bin Size Too few categories Too many categories Discrete Data

Box Plots Exploration or Presentation Effectiveness Scalability Cutoff = Q1 – 1.5*IQR Median Maximum Interquartile Range (IQR): 50% of the data Lower Quartile (Q1) 25th percentile (1st quartile) Outlier Upper Quartile (Q3) 75th percentile (3rd quartile) Minimum Exploration or Presentation Effectiveness Scalability Presentation Good

BoxPlot Problems Assumes a large, normally distributed dataset Misleading plots from small or non-normal datasets In most cases there are better alternatives

Bean Plots Exploration or Presentation Effectiveness Scalability Both Beans (Individual data points) Data Density Sample mean Global mean Exploration or Presentation Effectiveness Scalability Both Good Good / Intermediate

BoxPlot vs Beanplot Bimodal Uniform Normal

Comparisons

Stripcharts Exploration or Presentation Effectiveness Scalability Both Good Poor

Barplot Exploration or Presentation Effectiveness Scalability Good

Barplot Options Selection of suitable confidence measures Standard error Standard deviation

Barplot Problems Setting a suitable baseline

Barplot Options / Problems Dealing with ratio data

Confidence Interval Plots Exploration or Presentation Effectiveness Scalability Presentation Good

Relationships

Line Graphs Exploration or Presentation Effectiveness Scalability Both Good Poor

Line Graph Problems Discrete Data Implies interpolation Can be useful for exploration Shouldn’t use for presentation

Scatterplots Exploration or Presentation Effectiveness Scalability Both Good Intermediate

Scatterplot Options / Problems Large Data Equality of Axes

Composition

Pie Charts Exploration or Presentation Effectiveness Scalability Both Intermediate Poor

Stacked Bar Charts Exploration or Presentation Effectiveness Scalability Both Good / Intermediate Intermediate

Stacked Bar Chart Options Scaling and Ordering

Heatmaps Exploration or Presentation Effectiveness Scalability Both Poor Excellent

HeatMap Options Clustering

HeatMap Options Colours Turns quantitative differences into categorical

Simon Andrews, Anne Segonds-Pichon Ethics of data representation Simon Andrews, Anne Segonds-Pichon simon.andrews@babraham.ac.uk anne.segonds-pichon@babraham.ac.uk

Data Visualisation Process Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Two parts of the process where visualisation is important. They have different requirements and will need different visualisations. Generate Visualisation Generate Conclusion

when it comes to data visualisation? What is Ethics when it comes to data visualisation? The figure/graph/image should show what is actually happening and not what you want to happen. Different ways of being unethical: not exploring/getting to know the data well enough, misusing your chosen graphical representation. deliberately showing the data in a misleading manner, choosing the ‘most representative’ image/experiment.

Is my plot ethical? Would a reader come to a different conclusion if they could see the details of the data which were omitted from the plot?

Advertising and politics are built on unethical data representation. https://venngage.com/blog/misleading-graphs/

Not exploring/getting to know the data well enough One experiment: change in the variable of interest between CondA to CondB. Data plotted as a bar chart.

Not exploring/getting to know the data well enough Five experiments: change in the variable of interest between 3 treatments and a control. Data plotted as a bar chart. Comparisons: Treatments vs. Control p=0.001 Exp3 Exp4 Exp1 Exp5 Exp2 p=0.04 p=0.32

Choosing the wrong axis/scale Example: increase in salary in the last term.

Choosing the y-axis/scale Be careful with Linear vs. logarithmic scale.

Choosing the y-axis/scale Inappropriate use of a log scale can artificially minimise differences Linear scale Logarithmic scale

Choosing the y-axis/scale Logarithmic axis should be used for: Logarithmically spaced values Lognormal data

Simply Cheating: Manipulating images ‘Playing’ too much with contrast “Adjusting the contrast/brightness of a digital image is common practice and is not considered improper if the adjustment is applied to the whole image. Adjusting the contrast/brightness of only part of an image is improper, however, and this practice can usually be spotted by someone scrutinizing a file.” Original Brightness and Contrast Adjusted Brightness and Contrast Adjusted Too Much: Oversaturation

Manipulating images: Cutting gels Simply Cheating: Manipulating images: Cutting gels Presenting bands out of context Juxtaposing two lanes that were not next to each other in an original gel is common practice when preparing figures from hard copy photographs of the gel, and is acceptable manipulation if the figure is digital. Taking a band from one digital image and placing it in a lane in another is improper manipulation, which can usually be spotted by someone scrutinizing a file. ‘Rebuilding’ a gel from several cuts

Image Manipulation can be detected 10.1172/JCI28824

Is my plot ethical? Would a reader come to a different conclusion if they could see the details of the data which were omitted from the plot?

Design Theory v2018-11 Boo Virk Simon Andrews boo.virk@babraham.ac.uk simon.andrews@babraham.ac.uk

Why does good design matter? Good design makes a great first impression Good design makes for effective communication Good design keeps the reader engaged Art Palvanov (http://www.palvanov.com/)

Elements of design Contrast Alignment Space Colour Symmetry Repetition Proximity Size

Proximity – Find logical and visually appealing ways to structure panels Which figures logically group together? Are there sub-groups which should be connected? Is there a logical flow to the ordering? Is the layout balanced?

Alignment: Some arrangements are more visually appealing than others

We like symmetrical ordered layouts Nutritional Immunology and Molecular Medicine Laboratory (2012) Modeling H. pylori using ENISI and Cell Designer

We like regular radial arrangements A panoramic view of acute myeloid leukemia Sai-Juan Chen, Yang Shen & Zhu Chen Nature Genetics 45, 586–587 (2013)

Without symmetry we should consider visual weight Bold Outline Strong Colour Size Variation O’Callaghan CA (2000) Molecular basis of human natural killer cell recognition of HLA-E (human leucocyte antigen-E) and its relevance to clearance of pathogen-infected and tumour cells, Clinical Science 99, (9–17) Greenblum S (2012) Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease, PNAS vol. 109 no. 2

Alignment: We are sensitive to aligned edges, even when they are separated 50 100 150 200 Control Treatment A Treatment B 20 40 60 80 100 120 1 2 3 4 5 6 Day Control Treatment A Treatment B Control Treatment A Treatment B Dead

Use a grid to help align disparate parts of a figure Control Treatment A Treatment B 200 150 100 50 20 40 60 80 100 120 1 2 3 4 5 6 Day Control Treatment A Treatment B Control Treatment A Treatment B Dead

Leave space between elements of figures

Colour can be an essential or optional part of any figure

Colour can have multiple uses Colour can be used to: Highlight specific data Group categories of data Encode quantitative values The more selective you are with colour, the greater its effect Try to make figures work in black and white

Sparing use of colour is most effective Which is most effective at conveying your message?

Don’t invent your own colour schemes Colorbrewer2.org

Use an appropriate colour scheme + Sequential Run between two values Typically two main colours Divergent Diverging from a central value to a min and a max Typically three colours Categorical Colours have no intrinsic ordering - +

If possible try to consider colour blind users Affects 1:12 men and 1:200 women worldwide “If a submitted manuscript happens to go to three male reviewers of Northern European descent, the chance that at least one will be colour blind is 22 percent.”

You can see how well your figure works for colour blind people Gradients are easy to change Categorical colours are very limited Basic interpretability in black and white is ideal Normal colour vision Protanopia http://www.color-blindness.com/coblis-color-blindness-simulator/

When overlaying information, make sure you have sufficient contrast Poor contrast Good contrast Poor contrast Good contrast Vibrating colour Busy background

Add overlays to increase contrast Poor contrast Good contrast

Keep text and fonts simple All fonts for figures should use sans serif fonts All text in figures should be black or white sans-serif serif Wild type Knockout Wild type Knockout

Keep text horizontal

Keep text horizontal Numbers are small, text is big All graphs still work when rotated 90o

Make sure appropriate labels are added Each axis is labelled Quantitative axes have units Colour scheme is explained Point shapes are explained You need enough annotation that the figure is understandable on its own.

Make sure all text is legible at the final printed size 6 point font is the smallest you can comfortably read (just over 2mm height on paper)

When resizing be aware of what can and cannot have its aspect ratio changed Things that always need to maintain their aspect ratios: Images Text Circular objects Axes with comparable units X 

When resizing be aware of what can and cannot have its aspect ratio changed X 

Simpler figures are easier to interpret

Simpler figures are easier to interpret

Consistency across figures makes interpretation easier Same colour/marker for same group Size of comparable figures should be the same Positions of axis titles and labels Font styles and sizes Order: If presented ‘Sample A’ and then ‘Sample B’, maintain this throughout

Elements of design Contrast Alignment Space Colour Symmetry Repetition Proximity Size