Download presentation
Presentation is loading. Please wait.
1
Scientific Figure Design
v Simon Andrews, Anne Segonds-Pichon, Boo Virk, Jo Montgomery
2
Figures are the way your science is presented to an audience
Before we start, I’d like you to have a look at this graph; talk to the person next to you about its pitfalls
3
What this course covers…
Theory of data visualisation Why do some figures work better than others? Applying theory to common plot types Ethics of data representation Using graphic design Editing bitmap images in GIMP Vector editing and compositing in Inkscape
4
What this course doesn’t cover…
How to draw graphs in specific programs R Introduction Statistics with R Statistics with GraphPad Plotting with R/ggplot
5
Timetable Morning Coffee Afternoon Coffee Introduction
Data Visualisation Theory Coffee Data Representation Practical Plots and ethics talk Design theory talk Afternoon GIMP Tutorial GIMP Practical Coffee Inkscape Tutorial Inkscape Practical Final practical
6
Data Visualisation Process
Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Clean Dataset Exploratory Analysis Generate Conclusion
7
Exploratory visualisation
Understand your data Multiple ways to present and summarise Crude representations Interactive Not intended for final publication Can be adapted for publication
9
Reference visualisation
Using your data as a resource Allows users to look up data of interest Tabular / Configurable Interactive
11
Illustrative visualisation
Intended to convey a specific point Carefully chosen subset of data Optimised presentation Good design Used for figures in papers
13
What makes a good figure?
Has a clear message Helps to tell a story Adds to the text, and links to it Is focused Don’t confuse one message with another Is easy to interpret correctly Good data visualisation Good design Is an honest and true reflection of the data
14
The theory of data visualisation
Simon Andrews, Phil Ewels
15
Data Visualisation A scientific discipline involving the creation and study of the visual representation of data whose goal is to communicate information clearly and efficiently to users. Data Visualisation is both an art and a science.
17
ISBN-10:
18
Data Viz Process Collect Raw Data Process and Filter Data
Clean Dataset Exploratory Analysis Generate Visualisation Generate Conclusion
19
A data visualisation should…
Show the data Not distort the data Summarise to make things clearer Serve a clear purpose Link to the accompanying text and statistics
20
Different representations have common elements
21
Graphical Representations
Basic questions How are you going to turn the data into a graphical form (weight becomes length etc.) How are you going to arrange things in space How are you going to use colours, shapes etc. to clarify the point you want to make
22
Marks and Channels Marks Channels Geometric primitives
Lines Points Areas Used to represent data sets Channels Graphical appearance of a mark Colour Length Position Angle Used to encode data
23
Figures are a combination of marks and channels
1 Mark = Rectangle 1 Channel = Length of longest side 1 Mark = Circle segment 1 Channel = Angle 1 Mark = Diamond shape 2 Channels = X position, Y position 1 Mark = Circle 4 Channels: X position Y position Area Colour
24
Golden Rules Effectiveness Expressiveness
Encode the most important information with the most effective channel Expressiveness Match the properties of the data and channel
25
Types of channel Quantitative Qualitative Position on scale Length
Angle Area Colour (saturation) Colour (lightness) Qualitative Spatial Grouping Colour (hue) Shape
26
Colour Technical representations of colour
Red + Green + Blue (RGB) Cyan + Magenta + Yellow + Black (CMYK) Perceptual representation of colour Hue + Saturation + Lightness (HSL)
27
HSL Representation Hue = Shade of colour = Qualitative
Saturation = Amount of colour = Quantitative Lightness = Amount of white = Quantitative Humans have no innate quantitative perception of hue but we have learned some (cold – hot, rainbow etc.) Our perception of hue is not linear
28
Types of channel Quantitative Qualitative Position on scale Length
Angle Area Colour (saturation) Colour (lightness) Qualitative Spatial Grouping Colour (hue) Shape
29
Data Types Quantitative Ordered Categorical
Height, Length, Weight, Expression etc. Ordered Small, Medium, Large January, February, March Categorical WT, Mutant1, Mutant2 GeneA, GeneB, GeneC
30
Golden Rules Effectiveness Expressiveness
Encode the most important information with the most effective channel Expressiveness Match the properties of the data and channel
31
Golden Rules Effectiveness Expressiveness
Encode the most important information with the most effective channel Expressiveness Match the properties of the data and channel
32
Effectiveness of quantitation
2X 7X 4.5X 1.8X 16X 3.4X
33
Quantitation Perception
34
Golden Rules Effectiveness Expressiveness
Encode the most important information with the most effective channel Expressiveness Match the properties of the data and channel
35
Most Quantitative Representations
Good quantitation Bar chart Stacked bar chart with common start Stacked bar chart with different starts Pie charts Bubble plots (circular area) Rectangular area Colour (luminance) Colour (saturation) Poor quantitation
36
Discriminability If you encode categorical data are the differences between categories easy for the user to perceive correctly?
37
Qualitative Discrimination
How many colours can you discriminate?
38
Qualitative Discrimination
How many (fillable) shapes can you discriminate? Can combine with colour, but need to maintain similar fillable areas
39
Qualitative Discrimination
Can combine with colour, but need to maintain similar fillable areas
40
Separability The effectiveness of a channel does not always survive being combined with a second channel. There are large variations in how much two different channels interfere with each other Trying to put too much information on a figure can erode the impact of the main point you’re trying to make
41
Separability There is no confusion between the two channels
Larger points are easier to discriminate than smaller ones We tend to focus on the area of the shape rather than the height/width separately Humans are very bad at separating combined colours
42
Popout A distinct item immediately stands out from the others
Triggered by our low level visual system You don’t need to actively look at every point (slow!) to see it
43
Popout (find the red circle)
44
Popout Speed of identification is independent of the number of distracting points
45
Popout Colour pops out more than shape
46
Popout Mixing channels removes the effect (Find the red circle)
47
Use of space Where you want a viewer to focus on specific subsets of data you can help their perception by using the layout or highlighting of data to draw their attention to the point you’re making
48
Grouping
49
Grouping Exon CGI Intron Repeat
50
Ordering Is a monkey heavier than a dog?
51
Containment / Linking Wild Type Mutant
52
Validation Always try to validate plots you create
You have seen your data too often to get an unbiased view Show the plot to someone not familiar with the data What does this plot tell you? Is this the message you wanted to convey? If they pick multiple points, do they choose the most important one first?
53
General Rules No unnecessary figures One point per figure
Does a graphical representation make things clearer? Would a table be better? One point per figure Design each figure to illustrate a single point Adding complexity compromises the effectiveness of the main point No absolute reliance on colour Figures should ideally still work in black and white Colour should help perception
54
Making effective use of common plot types
Anne Segonds-Pichon Simon Andrews Phil Ewels
55
Types of plot Things you can illustrate
56
Plot Properties Exploration, Presentation or both? Effectiveness
Scalability Options Potential Problems
57
Distributions
58
Histograms / Density Plots
Exploration or Presentation Effectiveness Scalability Both Good Poor
59
Histogram Options / Problems
Bin Size Too few categories Too many categories Discrete Data
60
Box Plots Exploration or Presentation Effectiveness Scalability
Cutoff = Q1 – 1.5*IQR Median Maximum Interquartile Range (IQR): 50% of the data Lower Quartile (Q1) 25th percentile (1st quartile) Outlier Upper Quartile (Q3) 75th percentile (3rd quartile) Minimum Exploration or Presentation Effectiveness Scalability Presentation Good
61
BoxPlot Problems Assumes a large, normally distributed dataset
Misleading plots from small or non-normal datasets In most cases there are better alternatives
62
Bean Plots Exploration or Presentation Effectiveness Scalability Both
Beans (Individual data points) Data Density Sample mean Global mean Exploration or Presentation Effectiveness Scalability Both Good Good / Intermediate
63
BoxPlot vs Beanplot Bimodal Uniform Normal
64
Comparisons
65
Stripcharts Exploration or Presentation Effectiveness Scalability Both
Good Poor
66
Barplot Exploration or Presentation Effectiveness Scalability
Good
67
Barplot Options Selection of suitable confidence measures
Standard error Standard deviation
68
Barplot Problems Setting a suitable baseline
69
Barplot Options / Problems
Dealing with ratio data
70
Confidence Interval Plots
Exploration or Presentation Effectiveness Scalability Presentation Good
71
Relationships
72
Line Graphs Exploration or Presentation Effectiveness Scalability Both
Good Poor
73
Line Graph Problems Discrete Data Implies interpolation
Can be useful for exploration Shouldn’t use for presentation
74
Scatterplots Exploration or Presentation Effectiveness Scalability
Both Good Intermediate
75
Scatterplot Options / Problems
Large Data Equality of Axes
76
Composition
77
Pie Charts Exploration or Presentation Effectiveness Scalability Both
Intermediate Poor
78
Stacked Bar Charts Exploration or Presentation Effectiveness
Scalability Both Good / Intermediate Intermediate
79
Stacked Bar Chart Options
Scaling and Ordering
80
Heatmaps Exploration or Presentation Effectiveness Scalability Both
Poor Excellent
81
HeatMap Options Clustering
82
HeatMap Options Colours
Turns quantitative differences into categorical
83
Simon Andrews, Anne Segonds-Pichon
Ethics of data representation Simon Andrews, Anne Segonds-Pichon
84
Data Visualisation Process
Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Two parts of the process where visualisation is important. They have different requirements and will need different visualisations. Generate Visualisation Generate Conclusion
85
when it comes to data visualisation?
What is Ethics when it comes to data visualisation? The figure/graph/image should show what is actually happening and not what you want to happen. Different ways of being unethical: not exploring/getting to know the data well enough, misusing your chosen graphical representation. deliberately showing the data in a misleading manner, choosing the ‘most representative’ image/experiment.
86
Is my plot ethical? Would a reader come to a different conclusion if they could see the details of the data which were omitted from the plot?
87
Advertising and politics are built on unethical data representation.
88
Not exploring/getting to know the data well enough
One experiment: change in the variable of interest between CondA to CondB. Data plotted as a bar chart.
89
Not exploring/getting to know the data well enough
Five experiments: change in the variable of interest between 3 treatments and a control. Data plotted as a bar chart. Comparisons: Treatments vs. Control p=0.001 Exp3 Exp4 Exp1 Exp5 Exp2 p=0.04 p=0.32
90
Choosing the wrong axis/scale
Example: increase in salary in the last term.
91
Choosing the y-axis/scale
Be careful with Linear vs. logarithmic scale.
92
Choosing the y-axis/scale
Inappropriate use of a log scale can artificially minimise differences Linear scale Logarithmic scale
93
Choosing the y-axis/scale
Logarithmic axis should be used for: Logarithmically spaced values Lognormal data
94
Simply Cheating: Manipulating images ‘Playing’ too much with contrast
“Adjusting the contrast/brightness of a digital image is common practice and is not considered improper if the adjustment is applied to the whole image. Adjusting the contrast/brightness of only part of an image is improper, however, and this practice can usually be spotted by someone scrutinizing a file.” Original Brightness and Contrast Adjusted Brightness and Contrast Adjusted Too Much: Oversaturation
95
Manipulating images: Cutting gels
Simply Cheating: Manipulating images: Cutting gels Presenting bands out of context Juxtaposing two lanes that were not next to each other in an original gel is common practice when preparing figures from hard copy photographs of the gel, and is acceptable manipulation if the figure is digital. Taking a band from one digital image and placing it in a lane in another is improper manipulation, which can usually be spotted by someone scrutinizing a file. ‘Rebuilding’ a gel from several cuts
96
Image Manipulation can be detected
/JCI28824
97
Is my plot ethical? Would a reader come to a different conclusion if they could see the details of the data which were omitted from the plot?
98
Design Theory v2018-11 Boo Virk Simon Andrews boo.virk@babraham.ac.uk
99
Why does good design matter?
Good design makes a great first impression Good design makes for effective communication Good design keeps the reader engaged Art Palvanov (
100
Elements of design Contrast Alignment Space Colour Symmetry Repetition
Proximity Size
101
Proximity – Find logical and visually appealing ways to structure panels
Which figures logically group together? Are there sub-groups which should be connected? Is there a logical flow to the ordering? Is the layout balanced?
102
Alignment: Some arrangements are more visually appealing than others
103
We like symmetrical ordered layouts
Nutritional Immunology and Molecular Medicine Laboratory (2012) Modeling H. pylori using ENISI and Cell Designer
104
We like regular radial arrangements
A panoramic view of acute myeloid leukemia Sai-Juan Chen, Yang Shen & Zhu Chen Nature Genetics 45, 586–587 (2013)
105
Without symmetry we should consider visual weight
Bold Outline Strong Colour Size Variation O’Callaghan CA (2000) Molecular basis of human natural killer cell recognition of HLA-E (human leucocyte antigen-E) and its relevance to clearance of pathogen-infected and tumour cells, Clinical Science 99, (9–17) Greenblum S (2012) Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease, PNAS vol. 109 no. 2
106
Alignment: We are sensitive to aligned edges, even when they are separated
50 100 150 200 Control Treatment A Treatment B 20 40 60 80 100 120 1 2 3 4 5 6 Day Control Treatment A Treatment B Control Treatment A Treatment B Dead
107
Use a grid to help align disparate parts of a figure
Control Treatment A Treatment B 200 150 100 50 20 40 60 80 100 120 1 2 3 4 5 6 Day Control Treatment A Treatment B Control Treatment A Treatment B Dead
108
Leave space between elements of figures
109
Colour can be an essential or optional part of any figure
110
Colour can have multiple uses
Colour can be used to: Highlight specific data Group categories of data Encode quantitative values The more selective you are with colour, the greater its effect Try to make figures work in black and white
111
Sparing use of colour is most effective
Which is most effective at conveying your message?
112
Don’t invent your own colour schemes
Colorbrewer2.org
113
Use an appropriate colour scheme
+ Sequential Run between two values Typically two main colours Divergent Diverging from a central value to a min and a max Typically three colours Categorical Colours have no intrinsic ordering - +
114
If possible try to consider colour blind users
Affects 1:12 men and 1:200 women worldwide “If a submitted manuscript happens to go to three male reviewers of Northern European descent, the chance that at least one will be colour blind is 22 percent.”
115
You can see how well your figure works for colour blind people
Gradients are easy to change Categorical colours are very limited Basic interpretability in black and white is ideal Normal colour vision Protanopia
116
When overlaying information, make sure you have sufficient contrast
Poor contrast Good contrast Poor contrast Good contrast Vibrating colour Busy background
117
Add overlays to increase contrast
Poor contrast Good contrast
118
Keep text and fonts simple
All fonts for figures should use sans serif fonts All text in figures should be black or white sans-serif serif Wild type Knockout Wild type Knockout
119
Keep text horizontal
120
Keep text horizontal Numbers are small, text is big
All graphs still work when rotated 90o
121
Make sure appropriate labels are added
Each axis is labelled Quantitative axes have units Colour scheme is explained Point shapes are explained You need enough annotation that the figure is understandable on its own.
122
Make sure all text is legible at the final printed size
6 point font is the smallest you can comfortably read (just over 2mm height on paper)
123
When resizing be aware of what can and cannot have its aspect ratio changed
Things that always need to maintain their aspect ratios: Images Text Circular objects Axes with comparable units X
124
When resizing be aware of what can and cannot have its aspect ratio changed
X
125
Simpler figures are easier to interpret
126
Simpler figures are easier to interpret
127
Consistency across figures makes interpretation easier
Same colour/marker for same group Size of comparable figures should be the same Positions of axis titles and labels Font styles and sizes Order: If presented ‘Sample A’ and then ‘Sample B’, maintain this throughout
128
Elements of design Contrast Alignment Space Colour Symmetry Repetition
Proximity Size
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.