Techniques for Decision-Making: Data Visualization Sam Affolter
Basics of Visualization Analytical Navigation Analytical Interaction Techniques & Practices Analytical Patterns
We attend to contrasts from the norm.
Visualizations should display patterns that are easy to spot.
Memory plays an important role in cognition, but is very limited.
Pre-attentive Visual Attributes Stephen Few, Now You See It, pages
Working in groups to help one another out: Find two datasets for your project. Open Tableau and import the datasets. Join the two datasets as appropriate.
Directed analysis begins with a specific question that we hope to answer, Searches for an answer to that question, Then produces an answer. In cases of directed navigation, data visualization typically will not be used until we attempt to communicate the answer.
Exploratory analysis begins by looking without knowing what we’ll find Then, we find something that seems interesting and ask a question, We then proceed in the directed fashion to find an answer. Data visualization, particularly in powerful DV tools like Tableau, is often the best way to work through exploratory navigation.
“Overview first, zoom and filter, then details-on-demand.” An overview reduces search and allows detection of patterns. Zooming and filtering cuts away the inconsequential and focuses on the relevant data space Having detailed data at ones fingers is essential to developing a full understanding of what is happening
A useful method of navigation, starting from high-level views of the data set, then progressing to ever finer grains. Node-link diagrams (tree diagrams from the diagramming class) can be used to visualize the structure of hierarchical navigation Tree maps are also often used to give the high level view.
Using your new datasets in Tableau, create a Tree Map. Spend some time determining the quantitative measures that are appropriate for high level study. Get into your groups and discuss.
“The beating heart of analysis” - Stephen Few Specifically, we are looking for similarities or differences within data We can compare magnitudes or patterns within larger data sets Some problems inherent in visually comparing – Hidden data – Obscured patterns
Nominal: Comparing values that have no particular order Ranking: Comparing values that are arranged by magnitude Part-to-whole: Comparing values that make up parts of a whole
Deviation: Comparing the differences between two sets of values Time-series: Comparing measures that were recorded at different points in time to see how they change
Sorting can quickly point to meaningful relationships in the underlying data Sorting can be done on the quantitative data (to determine which category is the largest/smallest driver) It can also be done on the categories. This allows the user to easily find specific categorical elements.
When exploring a data set, adding additional variables allows us to segment data which in turn can pinpoint interesting data anomalies.
Viewing Revenue by Country may show us that the US generates the most revenue for our company. However, by pulling in Region, we find that one of our US Regions is the smallest revenue generator in the entire company.
In contrast to adding variables, filtering allows the user to remove superfluous data. Used as one proceeds deeper into the data set The filtering process typically removes extra data elements, but can also be used to eliminate outliers to further understand the underlying trends.
The first glance of the dataset to the left one might see positive trends across the board. Removing some of the additional lines quickly shows that Item G’s revenue stream has been dropping steadily. Furthermore, Item F has made a “stair step” increase in revenue at approximately the same time. Are these related?
Most data analysis is done on data sets with elements that are grouped. Grouping helps to reduce volatility in thin data. Binning sales data by price bucket, is an example of this. Minimally, we aggregate by time; however, aggregations across other categorical elements can be extremely powerful. All hierarchical data can be considered as pre-aggregated.
One of the most commonly used analytical techniques. Can help to build up your data to ink ratio substantially. Re-expression can be seen in changing numbers to percentages, looking at deltas between data sets, or building rates.
Re-expression Example
From the top level view that you have built in your Tree Map, use the analytical interaction techniques described to begin to dive into your data. Does comparing magnitudes bring to light questions? Sorting? Adding variables? Are there categorical or quantitative elements that you should be aggregating or re-expressing?
When using a bar graph, begin the scale at zero and end the scale above the highest value. With every type of graph other than a bar graph begin the scale a little below the lowest value and end it a little above the highest value. Begin and end the scale at round numbers, make the intervals round numbers as well.
By scaling between $5 and $6 Million, the chart to the left is presenting what Few terms a “visual lie.” Compared to the chart below, the viewer sees the variance to be much larger than it really is. This is problematic in programs such as MS Excel, which auto-formats the axis.
In contrast with bar charts, other charts are meant to compare data relative to itself. A perfect example of which is line charts. The line chart to the left begins at 0. Here we are unable to see the trend, compared to the chart below which displays the trend well. Moving from a bar chart to a line or other chart in Excel can be problematic when it comes to axes. If we forced the low value to be 0 to fit the bar chart, we may lose visual understanding as we switch.
Reference lines are used to compare the data set with another metric. These help make outliers obvious. Reference lines can be developed off the data themselves (averages, standard deviations, etc.) They can also be driven externally (year end growth goals, call time goals, etc.)
Take the above monthly chart of monthly changes in revenue. One thing of possible interest is to understand when we are above or below average. We may want to add additional warnings if the changes become too great. In this case, I added a lower and upper control limit at 1.5X the stdev.
Trellis displays should have the following characteristics: – Graphs only differ in terms of the data displayed. Each graph displays a subset of a single larger set, divided according to some categorical variable – Every graph is the same type, shape, and size, and shares the same categorical and quantitative scales – Graphs can be arranged horizontally, vertically, or both. – Graphs are sequenced in a meaningful order, usually based on the values that are featured.
Trellis Chart Example
Exercise 4 Once again using your current data sets in Tableau, build a trellis chart to analyze additional variables. Get into your groups to discuss findings.