Scientific Figure Design v2018-11 Simon Andrews, Anne Segonds-Pichon, Boo Virk, Jo Montgomery simon.andrews@babraham.ac.uk anne.segonds-pichon@babraham.ac.uk bhupinder.virk@babraham.ac.uk jo.montgomery@babraham.ac.uk
Figures are the way your science is presented to an audience Before we start, I’d like you to have a look at this graph; talk to the person next to you about its pitfalls
What this course covers… Theory of data visualisation Why do some figures work better than others? Applying theory to common plot types Ethics of data representation Using graphic design Editing bitmap images in GIMP Vector editing and compositing in Inkscape
What this course doesn’t cover… How to draw graphs in specific programs R Introduction Statistics with R Statistics with GraphPad Plotting with R/ggplot
Timetable Morning Coffee Afternoon Coffee Introduction Data Visualisation Theory Coffee Data Representation Practical Plots and ethics talk Design theory talk Afternoon GIMP Tutorial GIMP Practical Coffee Inkscape Tutorial Inkscape Practical Final practical
Data Visualisation Process Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Clean Dataset Exploratory Analysis Generate Conclusion
Exploratory visualisation Understand your data Multiple ways to present and summarise Crude representations Interactive Not intended for final publication Can be adapted for publication
Reference visualisation Using your data as a resource Allows users to look up data of interest Tabular / Configurable Interactive
Illustrative visualisation Intended to convey a specific point Carefully chosen subset of data Optimised presentation Good design Used for figures in papers
What makes a good figure? Has a clear message Helps to tell a story Adds to the text, and links to it Is focused Don’t confuse one message with another Is easy to interpret correctly Good data visualisation Good design Is an honest and true reflection of the data
The theory of data visualisation Simon Andrews, Phil Ewels simon.andrews@babraham.ac.uk phil.ewels@scilifelab.se
Data Visualisation A scientific discipline involving the creation and study of the visual representation of data whose goal is to communicate information clearly and efficiently to users. Data Visualisation is both an art and a science.
ISBN-10: 1466508914 http://www.cs.ubc.ca/~tmm/talks.html
Data Viz Process Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Visualisation Generate Conclusion
A data visualisation should… Show the data Not distort the data Summarise to make things clearer Serve a clear purpose Link to the accompanying text and statistics
Different representations have common elements
Graphical Representations Basic questions How are you going to turn the data into a graphical form (weight becomes length etc.) How are you going to arrange things in space How are you going to use colours, shapes etc. to clarify the point you want to make
Marks and Channels Marks Channels Geometric primitives Lines Points Areas Used to represent data sets Channels Graphical appearance of a mark Colour Length Position Angle Used to encode data
Figures are a combination of marks and channels 1 Mark = Rectangle 1 Channel = Length of longest side 1 Mark = Circle segment 1 Channel = Angle 1 Mark = Diamond shape 2 Channels = X position, Y position 1 Mark = Circle 4 Channels: X position Y position Area Colour
Golden Rules Effectiveness Expressiveness Encode the most important information with the most effective channel Expressiveness Match the properties of the data and channel
Types of channel Quantitative Qualitative Position on scale Length Angle Area Colour (saturation) Colour (lightness) Qualitative Spatial Grouping Colour (hue) Shape
Colour Technical representations of colour Red + Green + Blue (RGB) Cyan + Magenta + Yellow + Black (CMYK) Perceptual representation of colour Hue + Saturation + Lightness (HSL)
HSL Representation Hue = Shade of colour = Qualitative Saturation = Amount of colour = Quantitative Lightness = Amount of white = Quantitative Humans have no innate quantitative perception of hue but we have learned some (cold – hot, rainbow etc.) Our perception of hue is not linear
Types of channel Quantitative Qualitative Position on scale Length Angle Area Colour (saturation) Colour (lightness) Qualitative Spatial Grouping Colour (hue) Shape
Data Types Quantitative Ordered Categorical Height, Length, Weight, Expression etc. Ordered Small, Medium, Large January, February, March Categorical WT, Mutant1, Mutant2 GeneA, GeneB, GeneC
Golden Rules Effectiveness Expressiveness Encode the most important information with the most effective channel Expressiveness Match the properties of the data and channel
Golden Rules Effectiveness Expressiveness Encode the most important information with the most effective channel Expressiveness Match the properties of the data and channel
Effectiveness of quantitation 2X 7X 4.5X 1.8X 16X 3.4X
Quantitation Perception
Golden Rules Effectiveness Expressiveness Encode the most important information with the most effective channel Expressiveness Match the properties of the data and channel
Most Quantitative Representations Good quantitation Bar chart Stacked bar chart with common start Stacked bar chart with different starts Pie charts Bubble plots (circular area) Rectangular area Colour (luminance) Colour (saturation) Poor quantitation
Discriminability If you encode categorical data are the differences between categories easy for the user to perceive correctly?
Qualitative Discrimination How many colours can you discriminate?
Qualitative Discrimination How many (fillable) shapes can you discriminate? Can combine with colour, but need to maintain similar fillable areas
Qualitative Discrimination Can combine with colour, but need to maintain similar fillable areas
Separability The effectiveness of a channel does not always survive being combined with a second channel. There are large variations in how much two different channels interfere with each other Trying to put too much information on a figure can erode the impact of the main point you’re trying to make
Separability There is no confusion between the two channels Larger points are easier to discriminate than smaller ones We tend to focus on the area of the shape rather than the height/width separately Humans are very bad at separating combined colours
Popout A distinct item immediately stands out from the others Triggered by our low level visual system You don’t need to actively look at every point (slow!) to see it
Popout (find the red circle)
Popout Speed of identification is independent of the number of distracting points
Popout Colour pops out more than shape
Popout Mixing channels removes the effect (Find the red circle)
Use of space Where you want a viewer to focus on specific subsets of data you can help their perception by using the layout or highlighting of data to draw their attention to the point you’re making
Grouping
Grouping Exon CGI Intron Repeat
Ordering Is a monkey heavier than a dog?
Containment / Linking Wild Type Mutant
Validation Always try to validate plots you create You have seen your data too often to get an unbiased view Show the plot to someone not familiar with the data What does this plot tell you? Is this the message you wanted to convey? If they pick multiple points, do they choose the most important one first?
General Rules No unnecessary figures One point per figure Does a graphical representation make things clearer? Would a table be better? One point per figure Design each figure to illustrate a single point Adding complexity compromises the effectiveness of the main point No absolute reliance on colour Figures should ideally still work in black and white Colour should help perception
Making effective use of common plot types Anne Segonds-Pichon Simon Andrews Phil Ewels anne.segonds-pichon@babraham.ac.uk simon.andrews@babraham.ac.uk phil.ewels@scilifelab.se
Types of plot Things you can illustrate
Plot Properties Exploration, Presentation or both? Effectiveness Scalability Options Potential Problems
Distributions
Histograms / Density Plots Exploration or Presentation Effectiveness Scalability Both Good Poor
Histogram Options / Problems Bin Size Too few categories Too many categories Discrete Data
Box Plots Exploration or Presentation Effectiveness Scalability Cutoff = Q1 – 1.5*IQR Median Maximum Interquartile Range (IQR): 50% of the data Lower Quartile (Q1) 25th percentile (1st quartile) Outlier Upper Quartile (Q3) 75th percentile (3rd quartile) Minimum Exploration or Presentation Effectiveness Scalability Presentation Good
BoxPlot Problems Assumes a large, normally distributed dataset Misleading plots from small or non-normal datasets In most cases there are better alternatives
Bean Plots Exploration or Presentation Effectiveness Scalability Both Beans (Individual data points) Data Density Sample mean Global mean Exploration or Presentation Effectiveness Scalability Both Good Good / Intermediate
BoxPlot vs Beanplot Bimodal Uniform Normal
Comparisons
Stripcharts Exploration or Presentation Effectiveness Scalability Both Good Poor
Barplot Exploration or Presentation Effectiveness Scalability Good
Barplot Options Selection of suitable confidence measures Standard error Standard deviation
Barplot Problems Setting a suitable baseline
Barplot Options / Problems Dealing with ratio data
Confidence Interval Plots Exploration or Presentation Effectiveness Scalability Presentation Good
Relationships
Line Graphs Exploration or Presentation Effectiveness Scalability Both Good Poor
Line Graph Problems Discrete Data Implies interpolation Can be useful for exploration Shouldn’t use for presentation
Scatterplots Exploration or Presentation Effectiveness Scalability Both Good Intermediate
Scatterplot Options / Problems Large Data Equality of Axes
Composition
Pie Charts Exploration or Presentation Effectiveness Scalability Both Intermediate Poor
Stacked Bar Charts Exploration or Presentation Effectiveness Scalability Both Good / Intermediate Intermediate
Stacked Bar Chart Options Scaling and Ordering
Heatmaps Exploration or Presentation Effectiveness Scalability Both Poor Excellent
HeatMap Options Clustering
HeatMap Options Colours Turns quantitative differences into categorical
Simon Andrews, Anne Segonds-Pichon Ethics of data representation Simon Andrews, Anne Segonds-Pichon simon.andrews@babraham.ac.uk anne.segonds-pichon@babraham.ac.uk
Data Visualisation Process Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Two parts of the process where visualisation is important. They have different requirements and will need different visualisations. Generate Visualisation Generate Conclusion
when it comes to data visualisation? What is Ethics when it comes to data visualisation? The figure/graph/image should show what is actually happening and not what you want to happen. Different ways of being unethical: not exploring/getting to know the data well enough, misusing your chosen graphical representation. deliberately showing the data in a misleading manner, choosing the ‘most representative’ image/experiment.
Is my plot ethical? Would a reader come to a different conclusion if they could see the details of the data which were omitted from the plot?
Advertising and politics are built on unethical data representation. https://venngage.com/blog/misleading-graphs/
Not exploring/getting to know the data well enough One experiment: change in the variable of interest between CondA to CondB. Data plotted as a bar chart.
Not exploring/getting to know the data well enough Five experiments: change in the variable of interest between 3 treatments and a control. Data plotted as a bar chart. Comparisons: Treatments vs. Control p=0.001 Exp3 Exp4 Exp1 Exp5 Exp2 p=0.04 p=0.32
Choosing the wrong axis/scale Example: increase in salary in the last term.
Choosing the y-axis/scale Be careful with Linear vs. logarithmic scale.
Choosing the y-axis/scale Inappropriate use of a log scale can artificially minimise differences Linear scale Logarithmic scale
Choosing the y-axis/scale Logarithmic axis should be used for: Logarithmically spaced values Lognormal data
Simply Cheating: Manipulating images ‘Playing’ too much with contrast “Adjusting the contrast/brightness of a digital image is common practice and is not considered improper if the adjustment is applied to the whole image. Adjusting the contrast/brightness of only part of an image is improper, however, and this practice can usually be spotted by someone scrutinizing a file.” Original Brightness and Contrast Adjusted Brightness and Contrast Adjusted Too Much: Oversaturation
Manipulating images: Cutting gels Simply Cheating: Manipulating images: Cutting gels Presenting bands out of context Juxtaposing two lanes that were not next to each other in an original gel is common practice when preparing figures from hard copy photographs of the gel, and is acceptable manipulation if the figure is digital. Taking a band from one digital image and placing it in a lane in another is improper manipulation, which can usually be spotted by someone scrutinizing a file. ‘Rebuilding’ a gel from several cuts
Image Manipulation can be detected 10.1172/JCI28824
Is my plot ethical? Would a reader come to a different conclusion if they could see the details of the data which were omitted from the plot?
Design Theory v2018-11 Boo Virk Simon Andrews boo.virk@babraham.ac.uk simon.andrews@babraham.ac.uk
Why does good design matter? Good design makes a great first impression Good design makes for effective communication Good design keeps the reader engaged Art Palvanov (http://www.palvanov.com/)
Elements of design Contrast Alignment Space Colour Symmetry Repetition Proximity Size
Proximity – Find logical and visually appealing ways to structure panels Which figures logically group together? Are there sub-groups which should be connected? Is there a logical flow to the ordering? Is the layout balanced?
Alignment: Some arrangements are more visually appealing than others
We like symmetrical ordered layouts Nutritional Immunology and Molecular Medicine Laboratory (2012) Modeling H. pylori using ENISI and Cell Designer
We like regular radial arrangements A panoramic view of acute myeloid leukemia Sai-Juan Chen, Yang Shen & Zhu Chen Nature Genetics 45, 586–587 (2013)
Without symmetry we should consider visual weight Bold Outline Strong Colour Size Variation O’Callaghan CA (2000) Molecular basis of human natural killer cell recognition of HLA-E (human leucocyte antigen-E) and its relevance to clearance of pathogen-infected and tumour cells, Clinical Science 99, (9–17) Greenblum S (2012) Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease, PNAS vol. 109 no. 2
Alignment: We are sensitive to aligned edges, even when they are separated 50 100 150 200 Control Treatment A Treatment B 20 40 60 80 100 120 1 2 3 4 5 6 Day Control Treatment A Treatment B Control Treatment A Treatment B Dead
Use a grid to help align disparate parts of a figure Control Treatment A Treatment B 200 150 100 50 20 40 60 80 100 120 1 2 3 4 5 6 Day Control Treatment A Treatment B Control Treatment A Treatment B Dead
Leave space between elements of figures
Colour can be an essential or optional part of any figure
Colour can have multiple uses Colour can be used to: Highlight specific data Group categories of data Encode quantitative values The more selective you are with colour, the greater its effect Try to make figures work in black and white
Sparing use of colour is most effective Which is most effective at conveying your message?
Don’t invent your own colour schemes Colorbrewer2.org
Use an appropriate colour scheme + Sequential Run between two values Typically two main colours Divergent Diverging from a central value to a min and a max Typically three colours Categorical Colours have no intrinsic ordering - +
If possible try to consider colour blind users Affects 1:12 men and 1:200 women worldwide “If a submitted manuscript happens to go to three male reviewers of Northern European descent, the chance that at least one will be colour blind is 22 percent.”
You can see how well your figure works for colour blind people Gradients are easy to change Categorical colours are very limited Basic interpretability in black and white is ideal Normal colour vision Protanopia http://www.color-blindness.com/coblis-color-blindness-simulator/
When overlaying information, make sure you have sufficient contrast Poor contrast Good contrast Poor contrast Good contrast Vibrating colour Busy background
Add overlays to increase contrast Poor contrast Good contrast
Keep text and fonts simple All fonts for figures should use sans serif fonts All text in figures should be black or white sans-serif serif Wild type Knockout Wild type Knockout
Keep text horizontal
Keep text horizontal Numbers are small, text is big All graphs still work when rotated 90o
Make sure appropriate labels are added Each axis is labelled Quantitative axes have units Colour scheme is explained Point shapes are explained You need enough annotation that the figure is understandable on its own.
Make sure all text is legible at the final printed size 6 point font is the smallest you can comfortably read (just over 2mm height on paper)
When resizing be aware of what can and cannot have its aspect ratio changed Things that always need to maintain their aspect ratios: Images Text Circular objects Axes with comparable units X
When resizing be aware of what can and cannot have its aspect ratio changed X
Simpler figures are easier to interpret
Simpler figures are easier to interpret
Consistency across figures makes interpretation easier Same colour/marker for same group Size of comparable figures should be the same Positions of axis titles and labels Font styles and sizes Order: If presented ‘Sample A’ and then ‘Sample B’, maintain this throughout
Elements of design Contrast Alignment Space Colour Symmetry Repetition Proximity Size