Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scientific Figure Design

Similar presentations


Presentation on theme: "Scientific Figure Design"— Presentation transcript:

1 Scientific Figure Design
v Simon Andrews, Anne Segonds-Pichon, Boo Virk, Jo Montgomery

2 Figures are the way your science is presented to an audience
Before we start, I’d like you to have a look at this graph; talk to the person next to you about its pitfalls

3 What this course covers…
Theory of data visualisation Why do some figures work better than others? Applying theory to common plot types Ethics of data representation Using graphic design Editing bitmap images in GIMP Vector editing and compositing in Inkscape

4 What this course doesn’t cover…
How to draw graphs in specific programs R Introduction Statistics with R Statistics with GraphPad Plotting with R/ggplot

5 Timetable Morning Coffee Afternoon Coffee Introduction
Data Visualisation Theory Coffee Data Representation Practical Plots and ethics talk Design theory talk Afternoon GIMP Tutorial GIMP Practical Coffee Inkscape Tutorial Inkscape Practical Final practical

6 Data Visualisation Process
Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Clean Dataset Exploratory Analysis Generate Conclusion

7 Exploratory visualisation
Understand your data Multiple ways to present and summarise Crude representations Interactive Not intended for final publication Can be adapted for publication

8

9 Reference visualisation
Using your data as a resource Allows users to look up data of interest Tabular / Configurable Interactive

10

11 Illustrative visualisation
Intended to convey a specific point Carefully chosen subset of data Optimised presentation Good design Used for figures in papers

12

13 What makes a good figure?
Has a clear message Helps to tell a story Adds to the text, and links to it Is focused Don’t confuse one message with another Is easy to interpret correctly Good data visualisation Good design Is an honest and true reflection of the data

14 The theory of data visualisation
Simon Andrews, Phil Ewels

15 Data Visualisation A scientific discipline involving the creation and study of the visual representation of data whose goal is to communicate information clearly and efficiently to users. Data Visualisation is both an art and a science.

16

17 ISBN-10:

18 Data Viz Process Collect Raw Data Process and Filter Data
Clean Dataset Exploratory Analysis Generate Visualisation Generate Conclusion

19 A data visualisation should…
Show the data Not distort the data Summarise to make things clearer Serve a clear purpose Link to the accompanying text and statistics

20 Different representations have common elements

21 Graphical Representations
Basic questions How are you going to turn the data into a graphical form (weight becomes length etc.) How are you going to arrange things in space How are you going to use colours, shapes etc. to clarify the point you want to make

22 Marks and Channels Marks Channels Geometric primitives
Lines Points Areas Used to represent data sets Channels Graphical appearance of a mark Colour Length Position Angle Used to encode data

23 Figures are a combination of marks and channels
1 Mark = Rectangle 1 Channel = Length of longest side 1 Mark = Circle segment 1 Channel = Angle 1 Mark = Diamond shape 2 Channels = X position, Y position 1 Mark = Circle 4 Channels: X position Y position Area Colour

24 Golden Rules Effectiveness Expressiveness
Encode the most important information with the most effective channel Expressiveness Match the properties of the data and channel

25 Types of channel Quantitative Qualitative Position on scale Length
Angle Area Colour (saturation) Colour (lightness) Qualitative Spatial Grouping Colour (hue) Shape

26 Colour Technical representations of colour
Red + Green + Blue (RGB) Cyan + Magenta + Yellow + Black (CMYK) Perceptual representation of colour Hue + Saturation + Lightness (HSL)

27 HSL Representation Hue = Shade of colour = Qualitative
Saturation = Amount of colour = Quantitative Lightness = Amount of white = Quantitative Humans have no innate quantitative perception of hue but we have learned some (cold – hot, rainbow etc.) Our perception of hue is not linear

28 Types of channel Quantitative Qualitative Position on scale Length
Angle Area Colour (saturation) Colour (lightness) Qualitative Spatial Grouping Colour (hue) Shape

29 Data Types Quantitative Ordered Categorical
Height, Length, Weight, Expression etc. Ordered Small, Medium, Large January, February, March Categorical WT, Mutant1, Mutant2 GeneA, GeneB, GeneC

30 Golden Rules Effectiveness Expressiveness
Encode the most important information with the most effective channel Expressiveness Match the properties of the data and channel

31 Golden Rules Effectiveness Expressiveness
Encode the most important information with the most effective channel Expressiveness Match the properties of the data and channel

32 Effectiveness of quantitation
2X 7X 4.5X 1.8X 16X 3.4X

33 Quantitation Perception

34 Golden Rules Effectiveness Expressiveness
Encode the most important information with the most effective channel Expressiveness Match the properties of the data and channel

35 Most Quantitative Representations
Good quantitation Bar chart Stacked bar chart with common start Stacked bar chart with different starts Pie charts Bubble plots (circular area) Rectangular area Colour (luminance) Colour (saturation) Poor quantitation

36 Discriminability If you encode categorical data are the differences between categories easy for the user to perceive correctly?

37 Qualitative Discrimination
How many colours can you discriminate?

38 Qualitative Discrimination
How many (fillable) shapes can you discriminate? Can combine with colour, but need to maintain similar fillable areas

39 Qualitative Discrimination
Can combine with colour, but need to maintain similar fillable areas

40 Separability The effectiveness of a channel does not always survive being combined with a second channel. There are large variations in how much two different channels interfere with each other Trying to put too much information on a figure can erode the impact of the main point you’re trying to make

41 Separability There is no confusion between the two channels
Larger points are easier to discriminate than smaller ones We tend to focus on the area of the shape rather than the height/width separately Humans are very bad at separating combined colours

42 Popout A distinct item immediately stands out from the others
Triggered by our low level visual system You don’t need to actively look at every point (slow!) to see it

43 Popout (find the red circle)

44 Popout Speed of identification is independent of the number of distracting points

45 Popout Colour pops out more than shape

46 Popout Mixing channels removes the effect (Find the red circle)

47 Use of space Where you want a viewer to focus on specific subsets of data you can help their perception by using the layout or highlighting of data to draw their attention to the point you’re making

48 Grouping

49 Grouping Exon CGI Intron Repeat

50 Ordering Is a monkey heavier than a dog?

51 Containment / Linking Wild Type Mutant

52 Validation Always try to validate plots you create
You have seen your data too often to get an unbiased view Show the plot to someone not familiar with the data What does this plot tell you? Is this the message you wanted to convey? If they pick multiple points, do they choose the most important one first?

53 General Rules No unnecessary figures One point per figure
Does a graphical representation make things clearer? Would a table be better? One point per figure Design each figure to illustrate a single point Adding complexity compromises the effectiveness of the main point No absolute reliance on colour Figures should ideally still work in black and white Colour should help perception

54 Making effective use of common plot types
Anne Segonds-Pichon Simon Andrews Phil Ewels

55 Types of plot Things you can illustrate

56 Plot Properties Exploration, Presentation or both? Effectiveness
Scalability Options Potential Problems

57 Distributions

58 Histograms / Density Plots
Exploration or Presentation Effectiveness Scalability Both Good Poor

59 Histogram Options / Problems
Bin Size Too few categories Too many categories Discrete Data

60 Box Plots Exploration or Presentation Effectiveness Scalability
Cutoff = Q1 – 1.5*IQR Median Maximum Interquartile Range (IQR): 50% of the data Lower Quartile (Q1) 25th percentile (1st quartile) Outlier Upper Quartile (Q3) 75th percentile (3rd quartile) Minimum Exploration or Presentation Effectiveness Scalability Presentation Good

61 BoxPlot Problems Assumes a large, normally distributed dataset
Misleading plots from small or non-normal datasets In most cases there are better alternatives

62 Bean Plots Exploration or Presentation Effectiveness Scalability Both
Beans (Individual data points) Data Density Sample mean Global mean Exploration or Presentation Effectiveness Scalability Both Good Good / Intermediate

63 BoxPlot vs Beanplot Bimodal Uniform Normal

64 Comparisons

65 Stripcharts Exploration or Presentation Effectiveness Scalability Both
Good Poor

66 Barplot Exploration or Presentation Effectiveness Scalability
Good

67 Barplot Options Selection of suitable confidence measures
Standard error Standard deviation

68 Barplot Problems Setting a suitable baseline

69 Barplot Options / Problems
Dealing with ratio data

70 Confidence Interval Plots
Exploration or Presentation Effectiveness Scalability Presentation Good

71 Relationships

72 Line Graphs Exploration or Presentation Effectiveness Scalability Both
Good Poor

73 Line Graph Problems Discrete Data Implies interpolation
Can be useful for exploration Shouldn’t use for presentation

74 Scatterplots Exploration or Presentation Effectiveness Scalability
Both Good Intermediate

75 Scatterplot Options / Problems
Large Data Equality of Axes

76 Composition

77 Pie Charts Exploration or Presentation Effectiveness Scalability Both
Intermediate Poor

78 Stacked Bar Charts Exploration or Presentation Effectiveness
Scalability Both Good / Intermediate Intermediate

79 Stacked Bar Chart Options
Scaling and Ordering

80 Heatmaps Exploration or Presentation Effectiveness Scalability Both
Poor Excellent

81 HeatMap Options Clustering

82 HeatMap Options Colours
Turns quantitative differences into categorical

83 Simon Andrews, Anne Segonds-Pichon
Ethics of data representation Simon Andrews, Anne Segonds-Pichon

84 Data Visualisation Process
Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Two parts of the process where visualisation is important. They have different requirements and will need different visualisations. Generate Visualisation Generate Conclusion

85 when it comes to data visualisation?
What is Ethics when it comes to data visualisation? The figure/graph/image should show what is actually happening and not what you want to happen. Different ways of being unethical: not exploring/getting to know the data well enough, misusing your chosen graphical representation. deliberately showing the data in a misleading manner, choosing the ‘most representative’ image/experiment.

86 Is my plot ethical? Would a reader come to a different conclusion if they could see the details of the data which were omitted from the plot?

87 Advertising and politics are built on unethical data representation.

88 Not exploring/getting to know the data well enough
One experiment: change in the variable of interest between CondA to CondB. Data plotted as a bar chart.

89 Not exploring/getting to know the data well enough
Five experiments: change in the variable of interest between 3 treatments and a control. Data plotted as a bar chart. Comparisons: Treatments vs. Control p=0.001 Exp3 Exp4 Exp1 Exp5 Exp2 p=0.04 p=0.32

90 Choosing the wrong axis/scale
Example: increase in salary in the last term.

91 Choosing the y-axis/scale
Be careful with Linear vs. logarithmic scale.

92 Choosing the y-axis/scale
Inappropriate use of a log scale can artificially minimise differences Linear scale Logarithmic scale

93 Choosing the y-axis/scale
Logarithmic axis should be used for: Logarithmically spaced values Lognormal data

94 Simply Cheating: Manipulating images ‘Playing’ too much with contrast
“Adjusting the contrast/brightness of a digital image is common practice and is not considered improper if the adjustment is applied to the whole image. Adjusting the contrast/brightness of only part of an image is improper, however, and this practice can usually be spotted by someone scrutinizing a file.” Original Brightness and Contrast Adjusted Brightness and Contrast Adjusted Too Much: Oversaturation

95 Manipulating images: Cutting gels
Simply Cheating: Manipulating images: Cutting gels Presenting bands out of context Juxtaposing two lanes that were not next to each other in an original gel is common practice when preparing figures from hard copy photographs of the gel, and is acceptable manipulation if the figure is digital. Taking a band from one digital image and placing it in a lane in another is improper manipulation, which can usually be spotted by someone scrutinizing a file. ‘Rebuilding’ a gel from several cuts

96 Image Manipulation can be detected
/JCI28824

97 Is my plot ethical? Would a reader come to a different conclusion if they could see the details of the data which were omitted from the plot?

98 Design Theory v2018-11 Boo Virk Simon Andrews boo.virk@babraham.ac.uk

99 Why does good design matter?
Good design makes a great first impression Good design makes for effective communication Good design keeps the reader engaged Art Palvanov (

100 Elements of design Contrast Alignment Space Colour Symmetry Repetition
Proximity Size

101 Proximity – Find logical and visually appealing ways to structure panels
Which figures logically group together? Are there sub-groups which should be connected? Is there a logical flow to the ordering? Is the layout balanced?

102 Alignment: Some arrangements are more visually appealing than others

103 We like symmetrical ordered layouts
Nutritional Immunology and Molecular Medicine Laboratory (2012) Modeling H. pylori using ENISI and Cell Designer

104 We like regular radial arrangements
A panoramic view of acute myeloid leukemia Sai-Juan Chen, Yang Shen & Zhu Chen Nature Genetics 45, 586–587 (2013)

105 Without symmetry we should consider visual weight
Bold Outline Strong Colour Size Variation O’Callaghan CA (2000) Molecular basis of human natural killer cell recognition of HLA-E (human leucocyte antigen-E) and its relevance to clearance of pathogen-infected and tumour cells, Clinical Science 99, (9–17) Greenblum S (2012) Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease, PNAS vol. 109 no. 2

106 Alignment: We are sensitive to aligned edges, even when they are separated
50 100 150 200 Control Treatment A Treatment B 20 40 60 80 100 120 1 2 3 4 5 6 Day Control Treatment A Treatment B Control Treatment A Treatment B Dead

107 Use a grid to help align disparate parts of a figure
Control Treatment A Treatment B 200 150 100 50 20 40 60 80 100 120 1 2 3 4 5 6 Day Control Treatment A Treatment B Control Treatment A Treatment B Dead

108 Leave space between elements of figures

109 Colour can be an essential or optional part of any figure

110 Colour can have multiple uses
Colour can be used to: Highlight specific data Group categories of data Encode quantitative values The more selective you are with colour, the greater its effect Try to make figures work in black and white

111 Sparing use of colour is most effective
Which is most effective at conveying your message?

112 Don’t invent your own colour schemes
Colorbrewer2.org

113 Use an appropriate colour scheme
+ Sequential Run between two values Typically two main colours Divergent Diverging from a central value to a min and a max Typically three colours Categorical Colours have no intrinsic ordering - +

114 If possible try to consider colour blind users
Affects 1:12 men and 1:200 women worldwide “If a submitted manuscript happens to go to three male reviewers of Northern European descent, the chance that at least one will be colour blind is 22 percent.”

115 You can see how well your figure works for colour blind people
Gradients are easy to change Categorical colours are very limited Basic interpretability in black and white is ideal Normal colour vision Protanopia

116 When overlaying information, make sure you have sufficient contrast
Poor contrast Good contrast Poor contrast Good contrast Vibrating colour Busy background

117 Add overlays to increase contrast
Poor contrast Good contrast

118 Keep text and fonts simple
All fonts for figures should use sans serif fonts All text in figures should be black or white sans-serif serif Wild type Knockout Wild type Knockout

119 Keep text horizontal

120 Keep text horizontal Numbers are small, text is big
All graphs still work when rotated 90o

121 Make sure appropriate labels are added
Each axis is labelled Quantitative axes have units Colour scheme is explained Point shapes are explained You need enough annotation that the figure is understandable on its own.

122 Make sure all text is legible at the final printed size
6 point font is the smallest you can comfortably read (just over 2mm height on paper)

123 When resizing be aware of what can and cannot have its aspect ratio changed
Things that always need to maintain their aspect ratios: Images Text Circular objects Axes with comparable units X

124 When resizing be aware of what can and cannot have its aspect ratio changed
X

125 Simpler figures are easier to interpret

126 Simpler figures are easier to interpret

127 Consistency across figures makes interpretation easier
Same colour/marker for same group Size of comparable figures should be the same Positions of axis titles and labels Font styles and sizes Order: If presented ‘Sample A’ and then ‘Sample B’, maintain this throughout

128 Elements of design Contrast Alignment Space Colour Symmetry Repetition
Proximity Size


Download ppt "Scientific Figure Design"

Similar presentations


Ads by Google