Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning outcomes By the end of this workshop you should:

Similar presentations


Presentation on theme: "Learning outcomes By the end of this workshop you should:"— Presentation transcript:

1 Good practice in constructing charts Mathematics & Statistics Help University of Sheffield

2 Learning outcomes By the end of this workshop you should:
Be aware of good practice when displaying data in charts

3 Download the slides from the MASH website
MASH > statistics> workshop_statistics

4 A few warm-up questions
Have a look at the following slides Decide which of the two options works best Why did you decide this?

5 Q1: Which graph make is easier to determine which days of the week are the busiest in A&E?

6 Q2: Which graph of average house prices in 2017 is easier to read?

7 Q3: Which graph makes it is easier to focus on patterns of change over time?

8 Q4: Which graph present the data accurately
Q4: Which graph present the data accurately? Which skews the values in a misleading manner?

9

10 Q5: Which map makes it easier to see the busiest times in A&E?

11 Q6: Which graph makes it easier to determine differences in house prices?

12 Q7: In which graph are the labels easier to read?

13 Q8: Which graph is easier to look at?

14 Exercise: what do you think are the key elements of a good chart?

15 In this section “Often the most effective way to describe, explore, and summarize a set of numbers is to look at pictures of those numbers” (Tufte, 2001) Today, visualization is key for: Business & public sector (patterns; predictions; sales) Science (simulation & remote sensing) Communication (media, websites, etc.) eg RSS Journalism awards

16 Top blunders by visual designers (Horton, 1995)
Crayola effect and photoshop envy “remember what happened when, as a kid, you got that box of 64 crayola crayons?” 20-20 myopia “far too many graphics…seem designed as if the whole human race has perfect vision…”

17 Top blunders by visual designers (Horton, 1995)
Bullying backgrounds Horseless-carriage thinking Design for computer displays vs paper

18 Top blunders by visual designers (Horton, 1995)
Ransom note Typography

19 Top blunders by visual designers (Horton, 1995)
Why 99% of page layouts are wrong “…squander reader’s attention and complicate understanding” Going gah-gah over gew gew “…communicate better if designers valued simplicity over decoration…” Data : Ink ratio (Tufte)

20 Top blunders by visual designers (Horton, 1995)
Pictures do lie Salacious selection Multidimensional scaling Suppressed zero

21 Exercise: How could this graph be improved?

22

23 Good practice in visualisation

24 There is no single statistical tool that is as powerful as a well-chosen graph… even for small sets of data, there are many relationships that are considerably easier to discern in graphical displays than by any other analytic method’ Graphical methods for Data Analysis – Chambers, Cleveland et al

25 Why bother?

26 Why bother? Exploration Description Tabulation Communication

27 Why bother? Exploration Description Tabulation Communication

28 An example of how a chart can be used to illuminate…

29

30

31 Make sure that the chart you produce reflects what you want to say
The authors of the paper that contained the following chart stated that: ‘it showed the mean deprivation score for the areas in which cases and controls lived according to participation status. Although the selected controls lived in areas of similar material wealth to their corresponding cases, the controls who participated differed markedly from those who did not. Furthermore, we found significant differences (P< 0.05) between the non-participating groups’ Representativeness of samples from general practice lists in epidemiological studies: case‐control study. Smith et al, BMJ, 2004, 328, 932

32 Significant differences?
Mean deprivation? Significant differences? Representativeness of samples from general practice lists in epidemiological studies: case‐control study. Smith et al, BMJ, 2004, 328, 932

33 Guidelines for getting it right

34 Labelling

35 Figure 1: Results Risk of pre-eclampsia in first and subsequent pregnancies. S Hernandez-Diaz, et al. BMJ 2009;338:b2255

36 Figure 1: Risk of pre-eclampsia in second pregnancy by years since first pregnancy and history of pre-eclampsia Risk of pre-eclampsia in first and subsequent pregnancies. S Hernandez-Diaz, et al. BMJ 2009;338:b2255

37 Colour

38 Radar plot of SF12 dimensions at baseline and 12 months for pilot study of 30 patients with swallowing difficulties following a stroke What do you think will happen to this plot if you photocopy it?

39 Radar plot of SF12 dimensions at baseline and 12 months for pilot study of 30 patients with swallowing difficulties following a stroke

40 Radar plot of SF12 dimensions at baseline and 12 months for pilot study of 30 patients with swallowing difficulties following a stroke

41 Radar plot of SF12 dimensions at baseline and 12 months for pilot study of 30 patients with swallowing difficulties following a stroke

42 Ordering by size

43 Timeliness of criminal cases in criminal courts, by offence group, 2013 Q1

44 Timeliness of criminal cases in criminal courts, by offence group, 2013 Q1

45 2D vs 3D

46 2D vs 3D

47 3D with pattern – definitely not recommended

48 Population (in millions) in 2004 for 20 European countries ordered by size

49 Population (in millions) in 2004 for 20 European countries ordered by size

50 Gridlines

51 Gridlines

52 Gridlines

53 Gridlines

54 Gridlines

55 Make sure the axis accurately reflects what’s going on

56 MMR vaccination rates from 1990 to 2007
Taken from:

57 MMR vaccination rates from 1990 to 2007
Taken from:

58 MMR vaccination rates from 1990 to 2007
Taken from:

59 Pictograms Make sure that they are scaled properly:

60 Always include the number of observations

61 In summary

62 Guidelines for good practice when constructing graphs
Limit the use of colour – this is particularly important if your original is to be photocopied For categorical data, if there is no natural ordering – order by size Include the number of observations

63 Guidelines for good practice when constructing graphs
The amount of information should be maximised for the minimum amount of ink – only include information that contributes to understanding – make every mark count Figures should have a title clearly explaining what is being displayed Axes should be clearly labelled and numbers on axes rounded effectively

64 Guidelines for good practice when constructing graphs
Never use 3-D charts as these can be difficult to read Gridlines should be kept to a minimum, use only enough to aid interpretation Start at zero when graphing absolute numbers or standard bar charts

65 Ask yourself would a table be better?

66 Figure or table? The choice is not always obvious
Tables are suitable for information about large numbers of variables at once Figures are good for showing multiple observations on individuals or groups Figures can be particularly useful when conveying information to an audience How much information is being conveyed? A figure displaying only two means and their standard errors or confidence intervals may be a waste of space and either more information should be included or the summary values should be put in the text

67 Guidelines for getting it right
The amount of information should be maximised for the minimum amount of ink Figures should have a title clearly explaining what is being displayed Axes should be clearly labelled and numbers on axes rounded effectively

68 Final thoughts Who are you presenting to? Know your audience:
Decide what you want to present. Are you presenting data or are you presenting results? Tables are good for quantification Charts are good for illustrating specific points What do you want to show? What methods are available? Is the method chosen the best? Would another have been even better? Have you done all that you can to make it as clear as possible?

69 Final thoughts: Tufte’s Principles
Above all else show the data Maximise the data-ink ratio, within reason Erase non-data-ink, within reason Erase redundant data-ink Revise and edit After this exercise start the video. Stop it after it has described the types of plots.

70 “Everything should be made as simple as possible but not simpler”
Summary Think about your audience and what you want to say to them Keep it simple “Everything should be made as simple as possible but not simpler” Albert Einstein

71 Final thoughts… Florence Nightingale is well known for her pioneering reform of healthcare. Less well known is that she was also an accomplished statistician! First female member of the RSS! She was most known for her work on graphical statistics. Credited with developing a form of the pie chart now known as the polar area diagram

72

73

74

75

76

77

78

79

80

81

82

83

84 John Snow’s cholera incidence map, 1854

85

86 Download the slides from the MASH website
MASH > statistics> workshop_statistics

87 Maths And Statistics Help
Statistics appointments: Mon-Fri (10am-1pm) Statistics drop-in: Mon-Fri (10am-1pm), Weds (4-7pm)

88 Resources: All resources are available in paper form at MASH or on the MASH website

89 Contacts Follow MASH on twitter: @mash_uos Staff (stats)
Jenny Freeman Basile Marquier Marta Emmett Website Follow MASH on

90 6 stages of visualisation design
Purpose Editorial focus Preparing your data Develop designs Construct Evaluate What types of data are there? Within the data structure there are observations or individuals, and for each observation there are data variables. Data variables can be continuous, nominal or ordinal. Variables can be divided into two main categories: numerical and categorical. Categorical variables indicate categories, for example gender (Male or Female) and marital status (Single, Married, Divorced or Widowed). Sometimes they are coded as numbers e.g. 1= male. Categorical variables can be divided into two: ordinal and nominal. If the categories are meaningfully ordered, the variable is ordinal; if it doesn’t matter in which way the categories are ordered, then the variable is nominal. For example, satisfaction levels (dissatisfied, satisfied and highly satisfied) and education level (secondary, sixth form, undergraduate and postgraduate) are ordinal variables; Student’s religion (Christian, Muslim, Hindu, etc) and Gender (Male, Female) are nominal variables. Numerical variables appear as meaningful comparable numbers, such as blood pressure, height, weight, income, age, and probability of illness etc. Numerical variables can be further divided into two subtypes: continuous and discrete. The continuous variables can take any value within a range and are the most common, e.g. body weight, height, income, etc. Discrete variables can only take whole numbers, such as number of students in class, number of new patients every day, etc but are treated as continuous for statistical analysis if there are a large range of numbers. There is another variable type called ‘Label’ variable, which identifies observations uniquely, such as Student ID, subjects’ name. Andy Kirk: ‘Data Visualization :A Successful Design Process’

91 1. Purpose The three E’s Explain Explore Exhibit
What types of data are there? Within the data structure there are observations or individuals, and for each observation there are data variables. Data variables can be continuous, nominal or ordinal. Variables can be divided into two main categories: numerical and categorical. Categorical variables indicate categories, for example gender (Male or Female) and marital status (Single, Married, Divorced or Widowed). Sometimes they are coded as numbers e.g. 1= male. Categorical variables can be divided into two: ordinal and nominal. If the categories are meaningfully ordered, the variable is ordinal; if it doesn’t matter in which way the categories are ordered, then the variable is nominal. For example, satisfaction levels (dissatisfied, satisfied and highly satisfied) and education level (secondary, sixth form, undergraduate and postgraduate) are ordinal variables; Student’s religion (Christian, Muslim, Hindu, etc) and Gender (Male, Female) are nominal variables. Numerical variables appear as meaningful comparable numbers, such as blood pressure, height, weight, income, age, and probability of illness etc. Numerical variables can be further divided into two subtypes: continuous and discrete. The continuous variables can take any value within a range and are the most common, e.g. body weight, height, income, etc. Discrete variables can only take whole numbers, such as number of students in class, number of new patients every day, etc but are treated as continuous for statistical analysis if there are a large range of numbers. There is another variable type called ‘Label’ variable, which identifies observations uniquely, such as Student ID, subjects’ name.

92 1. Purpose: Explain Visualisation built around a carefully constructed narrative Business: Corporate performance figures Media: Complexity of problems in economic crisis Academia: Figure in scientific paper Overload, clutter and confusion are not attributes of information they are failures of design” Edward Tufte What types of data are there? Within the data structure there are observations or individuals, and for each observation there are data variables. Data variables can be continuous, nominal or ordinal. Variables can be divided into two main categories: numerical and categorical. Categorical variables indicate categories, for example gender (Male or Female) and marital status (Single, Married, Divorced or Widowed). Sometimes they are coded as numbers e.g. 1= male. Categorical variables can be divided into two: ordinal and nominal. If the categories are meaningfully ordered, the variable is ordinal; if it doesn’t matter in which way the categories are ordered, then the variable is nominal. For example, satisfaction levels (dissatisfied, satisfied and highly satisfied) and education level (secondary, sixth form, undergraduate and postgraduate) are ordinal variables; Student’s religion (Christian, Muslim, Hindu, etc) and Gender (Male, Female) are nominal variables. Numerical variables appear as meaningful comparable numbers, such as blood pressure, height, weight, income, age, and probability of illness etc. Numerical variables can be further divided into two subtypes: continuous and discrete. The continuous variables can take any value within a range and are the most common, e.g. body weight, height, income, etc. Discrete variables can only take whole numbers, such as number of students in class, number of new patients every day, etc but are treated as continuous for statistical analysis if there are a large range of numbers. There is another variable type called ‘Label’ variable, which identifies observations uniquely, such as Student ID, subjects’ name.

93 1. Purpose: Explore Provide users with a means to discover facts, patterns and relationships “The greatest value of a picture is when it forces us t o notice what we never expected to see” Edward Tufte What types of data are there? Within the data structure there are observations or individuals, and for each observation there are data variables. Data variables can be continuous, nominal or ordinal. Variables can be divided into two main categories: numerical and categorical. Categorical variables indicate categories, for example gender (Male or Female) and marital status (Single, Married, Divorced or Widowed). Sometimes they are coded as numbers e.g. 1= male. Categorical variables can be divided into two: ordinal and nominal. If the categories are meaningfully ordered, the variable is ordinal; if it doesn’t matter in which way the categories are ordered, then the variable is nominal. For example, satisfaction levels (dissatisfied, satisfied and highly satisfied) and education level (secondary, sixth form, undergraduate and postgraduate) are ordinal variables; Student’s religion (Christian, Muslim, Hindu, etc) and Gender (Male, Female) are nominal variables. Numerical variables appear as meaningful comparable numbers, such as blood pressure, height, weight, income, age, and probability of illness etc. Numerical variables can be further divided into two subtypes: continuous and discrete. The continuous variables can take any value within a range and are the most common, e.g. body weight, height, income, etc. Discrete variables can only take whole numbers, such as number of students in class, number of new patients every day, etc but are treated as continuous for statistical analysis if there are a large range of numbers. There is another variable type called ‘Label’ variable, which identifies observations uniquely, such as Student ID, subjects’ name.

94 1. Purpose: Exhibit Can think of this as ‘Data Art’ or Aesthetics
However, it can lack narrative and visual capability

95 1. Purpose: Factors Aim Time pressures Costs Client pressures Format
Audience Time pressures Costs Client pressures Format Delivery platforms Technical capabilities & resources Your/team/skills Tools

96 2. Editorial focus What is the reason for this visualisation project?
For whom? Function What’s the story? What question is it answering? Tone?

97 3. Preparing your data Data is your raw material
Where will you acquire it from? Need to consider Completeness. Over what time period; what variables; granularity; format Quality. Corrupt?; missing data; invalid codes; unusual (outliers) Data types Categorical, ordinal, numeric

98 3. Preparing your data Transforming for quality
Resolve errors Plug gaps (missing data?) Remove duplicates Clean up erroneous data Transforming for analysis Parse (split up) variables e.g. date Merge and derive new variables Convert (e.g. code free-text) Calculate Remove redundant data (for faster loading)

99 3. Data resolution Full Filtered Aggregate Sample Headline
Every record Filtered Only include records matching certain criteria Aggregate ‘Rolled up’ (given year, category etc) Sample Mathematical rule to select subset of data, especially useful with very large datasets for doing a rapid, initial analysis Headline Descriptive statistics Distributions

100 4. Develop designs Visualisation methods Encoding

101 5. Construct Existing tools versus programming toolkits Excel Tableau
Spotfire Statistics eXplorer XmdvTool Sigmaplot GGPLOT in R

102 6. Evaluate Take a step back and reflect
Does my chosen method actually show what I want it to show? What would I do differently next time


Download ppt "Learning outcomes By the end of this workshop you should:"

Similar presentations


Ads by Google