Visualising Variables – Validly! Damien Jolley School of Health & Social Development Deakin University, Victoria Originally presented at: ASCEPT Workshop: “Extracting the real significance from biological data” AHMRC, Thursday 28 November 2002 AHMRC Posters Department of Human Physiology & Anatomy School of Human Biosciences La Trobe University 23 April 2004
Download slides from: Average daily retail petrol price, Melbourne, Oct-Nov 2002 Th TuSat FriSatTh Source: 21 Nov 2002http://
Download slides from: Sydney Adelaide Brisbane Melbourne Other weeks Average daily retail petrol price, selected cities, 4 week to 21 Apr Source: 23 Apr 2004http:// Vertical lines indicate Sundays
Download slides from: Obvious fact #1: l Graphs can communicate data: l quickly l accurately l powerfully l efficiently
Download slides from: “Only 50% of American 17-year-olds can identify information in a graph”* Source: Wainer H. Understanding graphs and tables. Educational Researcher 1992; 21:14-23 * US National Assessment of Educational Progress, June 1990
Download slides from: Whose fault? Source: Wainer H. Understanding graphs and tables. Educational Researcher 1992; 21:14-23 “Like characterising someone’s ability to read by asking questions about a passage full of spelling and grammatical errors. What are we really testing?” Drawn using MS Excel ‘XY-chart’
Download slides from: Obvious fact #2: l Bad graphs can hinder communication
Download slides from: Less obvious facts #3, #4, #5: l What characterises a “good” graph? l What are the characteristics of a “bad” graph? l What software to use? How to use it?
Download slides from: Howie’s Helpful Hints for bad graph displays l Ten useful pointers to help you create uninformative, difficult-to-read scientific graphs l Adapted from: Wainer H. (1997) Visual Revelations. Mahwah, NJ: Lawrence Erlbaum Associates, Publishers
Download slides from: Steps for better graphs 1. Identify direction of effect l In almost all cases, the cause or predictor variable should be horizontal (X) l Effect or outcome variable is best vertical (Y) 2. Identify the levels of measurement l Nominal, ordinal or quantitative are different! 3. Think of visual perception guides l Columns or dots? Lines or scatterplot? 4. Minimise guides and non-data l Grid lines, tick marks, legends are non-data
Download slides from: Cause (X) and effect (Y) Figure 16 Standard deviation of batting averages for all full-time players by year for the first 100 years of professional baseball. Note the regular decline.* Standard deviation Time Source: Gould, Stephen Jay. Full House: The Spread of Excellence from Plato to Darwin. Random House, cited: 24 Nov 2002http:// * My emphasis Standard deviation Time
Source: Killias M. International correlations between gun ownership and rates of homicide and suicide. Can Med Assoc J 1993; 148:
% of households owning guns Rate of homicide with a gun (per million per year) USA Norway Canada France Finland Belgium Australia Spain Switzerland Netherlands West Germany Scotland England & Wales Drawn using S-plus
Download slides from: Levels of Measurement l The right display for a variable depends on its level of measurement l For univariate graphs, l qualitative barplot l ordinal column chart l quantitative boxplot or histogram l For bivariate graphs, l X ordinal, Y binary connected percents l X & Y both quantitative scatterplot l X categorical, Y quant box plots l Binary s eg gender, death, pregnant l Categorical l Qualitative s eg race, political party, religion l Diverging s eg change (-ve to +ve) l Ordinal s eg rating scale, skin type, colour l Quantitative l Interval s only differences matter, eg BP, IQ l Ratio s absolute zero, ratios matter, eg weight, height, volume
Source: Lewis S, Mason C, Srna J. Carbon monoxide exposure in blast furnace workers. Aust J Public Health Sep;16(3): Ordinal variable, but categories mixed Outcome is COHb%, but drawn on X
Download slides from: An alternative display... Area of circles proportional to n Predictor variable Outcome variable Drawn using MS Excel ‘bubble plot’
Download slides from: Principles of visual perception l WS Cleveland l much work in psycho- physics of human visual understanding Tells us: l hierarchy of visual quantitative perception l patterns and shade can cause vibration l graphs can shrink with almost no loss of information Source: Cleveland WS. The Elements of Graphing Data. Monterey: Wadsworth, 1985.
Download slides from: Ubiquitous column charts Source: Jamrozik K, SpencerCA, et al. Does the Mediterranean paradox extend to abdominal aortic aneurism? Int J Epidemiol 2001; 30(5): 1071
Download slides from: A dotchart version… Drawn using S-plus “Trellis” graphics
Moiré vibration is easy with a computer !!!
Download slides from: Moiré vibration l Vibration is maximised with lines of equal separation l This is common in scientific column charts cited in Tufte E. The Visual Display of Quantitative Information.
Download slides from: Minimise non-data ink l Non-data ink includes tick marks, grid lines, background, legend l Explanation of error bars, P-values can be included in caption or in text Greeks in Australia Swedes in Sweden Japanese in Japan Anglo-Celts in Australia Greeks in Greece Relative mortality rate (all causes) Note the exception for X-Y orientation: because predictor is qualitative (unordered)
Download slides from: Software for scientific graphics l Dedicated programs – thousands! l DeltaGraph (SPSS) l Prism l ViSta l Business graphics l MS Excel l many other spreadsheet programs l Graphics in statistical packages l S-Plus, R s powerful, difficult l SPSS interactive graphics s easy, expensive l Systat s good reputation l SAS GRAPH language s expensive, powerful l Stata s simple, limited Advice: Avoid “default” choice in all programs (almost always wrong). Avoid programs with “Chart Type” menus – wrong approach.
Download slides from: Graph formats l Object-oriented l lines, shapes, etc can be identified within graph l each object has attributes (eg size, colour, font) l editable using selection and “grouping” l Common formats: l Postscript (ps,eps) l Windows metafile (wmf,emf) l Bit-mapped l image exists as a collection of pixels l each pixel is light or dark, coloured l can edit only pixels not objects l often “compressed” to save disk space, bandwidth l Common formats l graphics interchange (gif) l Windows bitmap (bmp) l JPEG interchange (jpg) Advice: Use WMF format where possible. Paste WMF into PowerPoint, “ungroup”, then edit objects for publication quality.
Download slides from: References, further reading Tufte ER. The Visual Display of Quantitative Information Cheshire, CT: Graphics Press Cleveland WS. Visualizing Data Summit NJ: Hobart Press, 1993 Wainer H. Visual Revelations. Graphical Tales of Fate and Deception from Napoleon Bonaparte to Ross Perot Mahwah, NJ: Lawrence Erlbaum Associates, Publishers Wilkinson L. The Grammar of Graphics New York: Springer Verlag, 1999
Download slides from: Summary l Howie’s Helpful Hints for bad graphs: l Don’t show the data l Show the data inaccurately l Obfuscate the data l Steps for better graphs: l Identify direction of cause & effect l Exploit levels of measurement l Accommodate visual perception principles l Minimise non-data ink l Don’t use Excel unless you have to l And if you have to, don’t use the default chart!
Download slides from: Summary so far … l Howie’s Helpful Hints for bad graphs: l Don’t show the data l Show the data inaccurately l Obfuscate the data l Steps for better graphs: l Identify direction of cause & effect l Exploit levels of measurement l Accommodate visual perception principles l Minimise non-data ink l Don’t use Excel unless you have to l And if you have to, don’t use the default chart!
Download slides from: Next … Some principles for better scientific graphs …