Rudiments of Proper Display: I. Tables “Getting information from a table is like extracting sunbeams from a cucumber” (Farquhar & Farquhar, 1891) Howard Wainer National Board of Medical Examiners
Why Tables? 1.Data usually start out as a table. Why not make it a good one? 2.Tables are easy to prepare - all you need is EXCEL. 3.For extracting specific details tables are superb. 4.Historically tables were used for data archiving. 5.Tables look more serious than graphs.
6. If one wants to present a number it is always important (and usually critical) to present it in context. Tables help to do this. For example: “There are 57 doctors in our town.” Is this a lot? “There are 57 doctors!” Or too few? “There are only 57 doctors.” A small tables adds context and so emphasis: Towns our size# of MDs Our town 57 Town B 18 Town C 2 Average town our size 23
There are but three simple rules that almost guarantee the easing of comprehension problems that usually accompany tabular presentation. But first we need an orientating attitude: A Table is for communication not data storage. Modern data storage is accomplished electronically. Paper and print are meant for human eyes and human minds.
Some basic ideas that underlie all presentation. 1. Know what you are doing - All data displays are answers to questions. Before preparing a display, know what those questions are. 2. Don’t try to do too much - A display can have four purposes. Build the display to accomplish your primary goal. If it also satisfies a secondary one, fine. But that’s just gravy. It is a rare multipurpose display that is as good for a specific task as even a mediocre special purpose one. 3. Revise your display at least as often as you revise your prose - People will often look at you display more carefully than at your words, and they will certainly remember it better, hence it deserves at least as much of your attention. This usually means revising your display several times, as each revision tells you more about how you should have done it originally. 4. A picture may be worth 1000 words, but it may take 100 words to do it - Captions and Legends are critical. Makes sue they are correct legible and complete. 5. Less is more - Avoid chartjunk. If the material is not part of the story, exclude it.
What kinds of questions are these data meant to answer? 1. How many unexpected deaths occur? 2. Which countries had the highest accidental death rates? The lowest? Why? 3. What sorts of fatal accidents are the most frequent? The least frequent? 4. Are there any unusual data points? Which ones?
The three rules 1. Round - a lot. This is for three reasons a. Humans cannot comprehend more than two digits very easily. b. We can only rarely justify more than two digits of accuracy statistically. c. We almost never care about accuracy of more than two digits
The three rules (continued) 2. Order the rows and columns in a way that makes sense. We are almost never interested in “Alabama First.” Two often useful ways to order are: a. Size places - put the largest first. Often we look most carefully at what is on top and less carefully further down. b. Naturally - time is ordered from the past to the future. Showing data in that order melds well with what the reader might expect; always a good idea.
The three rules (continued) 3. ALL is different and is important. Separate ALL from the rest spatially and perceptually: make it darker, bigger, and spaced away.
Let’s start with rounding. Some might complain that profound rounding loses valuable information. Not likely.
Now that we have reordered the rows of the table by the overall death rate we note immediately that: France and Austria have the highest rates; The United Kingdom and Japan the lowest. Why?
The last table makes clear the most fundamental rule of data display. Think hard about the data being displayed to understand what is the point, then think hard about best to make it. Data do not speak for themselves. They can only tell their tale through you, and that only happens when you know what you are doing and why you are doing it.
Once the row and column effects have been removed the residuals become obvious. We can just pick out the large ones to emphasize in a reformatted table. This is how I knew which entries in the accidental death table to put a box around.
These tabular data can be displayed as a graph. Exactly how to do this has a number of simple steps. I’ll get to this next.
Next let us look at a real-life example that has to do with satisfying laws about racial integration in Princeton, NJ elementary schools
Yet another example of great interest to us all: Understanding Canadian unemployment data
Let us begin with a sample from the whole data set and then we’ll return to the whole thing. The only reason for sampling is to ease visibility problems.