How do we (MIS)interpret graphs? Nataly Beribisky
Interpreting graphs Commonly used graphs such as histograms and boxplots are often a cleaner and more accessible way to present data. However, although graphs may help us in interpreting data, we may sometimes misinterpret them.
Tversky’s graph design principles (1) the use of spatial metaphors Natural use of space Area of a graphical element should correspond to its property like proportion or frequency (2) the use of directionality. We focus more on vertical dimensions than horizontal ones ”the top is naturally viewed as more or better than the bottom”
On the misinterpretation of histograms and box plots Lem, onghena, Lieven, verschaffel & Van dooren (2012) N = 125 university students in an educational science course. Participants were asked to respond to statements made about four different items.
iTEM 1: comparing SAMPLE SIZE
TWO CHILDREN ASKED THEIR FRIENDS HOW MANY PETS THEY HAD TWO CHILDREN ASKED THEIR FRIENDS HOW MANY PETS THEY HAD. both CHILDREN THEN PRESENTED THEIR DATA WITH THE REPRESENTATION BELOW A B 2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 18 ITEM 1 - B asked more children about their pets than A: TRUE FALSE INDETERMINABLE I DON’T KNOW
TWO CHILDREN ASKED THEIR FRIENDS HOW MANY PETS THEY HAD TWO CHILDREN ASKED THEIR FRIENDS HOW MANY PETS THEY HAD. both CHILDREN THEN PRESENTED THEIR DATA WITH THE REPRESENTATION BELOW 987654321 987654321 A B 2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 18 ITEM 1 - B asked more children about their pets than A: TRUE FALSE INDETERMINABLE I DON’T KNOW
TWO CHILDREN ASKED THEIR FRIENDS HOW MANY PETS THEY HAD TWO CHILDREN ASKED THEIR FRIENDS HOW MANY PETS THEY HAD. both CHILDREN THEN PRESENTED THEIR DATA WITH THE REPRESENTATION BELOW ITEM 1 - B asked more children about their pets than A: TRUE FALSE INDETERMINABLE I DON’T KNOW
IN WHICH CASE DID Participants HAVE THE MOST TROUBLE?
Results to item 1 “According to design principle on the use of space in graphs, people will try to interpret space between elements in a graph, but also area, in a natural way. In the case of the box plot, this means that students can be expected to interpret the area of the box as representing frequency or proportion of observations. As it is in fact not possible to derive from the box plot the number of observations on which it is based, this is likely to lead to an incorrect conclusion.” Histogram (%) Box plot (%) Descriptives (%) A – True 61 37 95 B – False 30 C – Indeterminable 2 26 5 D – I don’t know 7
iTEM 2: comparing MEANS
TWO CHILDREN ASKED THEIR FRIENDS HOW MANY PETS THEY HAD TWO CHILDREN ASKED THEIR FRIENDS HOW MANY PETS THEY HAD. both CHILDREN THEN PRESENTED THEIR DATA WITH THE REPRESENTATION BELOW A B 2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 18 ITEM 2 – The mean number of pets is the same in both figures: TRUE FALSE INDETERMINABLE I DON’T KNOW
TWO CHILDREN ASKED THEIR FRIENDS HOW MANY PETS THEY HAD TWO CHILDREN ASKED THEIR FRIENDS HOW MANY PETS THEY HAD. both CHILDREN THEN PRESENTED THEIR DATA WITH THE REPRESENTATION BELOW 987654321 987654321 A B 2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 18 ITEM 1 - The mean number of pets is the same in both figures: TRUE FALSE INDETERMINABLE I DON’T KNOW
TWO CHILDREN ASKED THEIR FRIENDS HOW MANY PETS THEY HAD TWO CHILDREN ASKED THEIR FRIENDS HOW MANY PETS THEY HAD. both CHILDREN THEN PRESENTED THEIR DATA WITH THE REPRESENTATION BELOW ITEM 1 - The mean number of pets is the same in both figures: TRUE FALSE INDETERMINABLE I DON’T KNOW
IN WHICH CASE DID Participants HAVE THE MOST TROUBLE?
Results to item 2 Possible misconceptions: (coming from directionality principle) For histograms: would think that figure with higher bars would make participants believe that means was higher. For boxplots: expected that students might interpret the higher box as representing a higher mean Results: Students comparing descriptive statistics all got the correct answer. When students answered the histogram response incorrectly, they often stated that the mean was not the same. Boxplot confusion occurred much less frequently
other misinterpretations Evidence that participants may not view the whiskers of the boxplots as containing data. Illusion that we know specific values from graphs and from descriptive statistics When asked about a specific value, participants stated that it was possible to tell that it was in the dataset from reading boxplots, histograms, and descriptive statistics.
Conclusions In these instances, descriptive statistics were easier to interpret than graphs. When using graphs to communicate, we should try to implement good design principles, so that interpretations are as clear as possible.