In 2015, this was mercifully replaced by a guest lecture from UBC colleague Tamara Munzner. The content below would be better with links to specific figures from Naomi Robbins’ book, but the Shiny app does not make that possible. Or perhaps I should just embed my versions of them.
According to Naomi Robbins, effective graphs “improve understanding of data”. They do not confuse or mislead.
To paraphrase: Most of us use a computer to write but we would never characterize a Nobel prize winning writer as being highly skilled with Microsoft Word. Similarly, advanced ggplot2
skills won’t necessarily lead to effective communication of numerical data. You have to master the principles of effective graphs in addition to the mechanics.
One graph is more effective than another if its quantitative information can be decoded more quickly or more easily by most observers.
When I’m lost in data and struggling to make a figure, I repeat this mantra distilled from Gelman, et al:
CMEG = Naomi Robbins’ book “Creating More Effective Graphs”; visual catalog of figures via the R Graph Catalog
The most loathed graph of all and yet surprisingly common. Give your average person a bunch of numbers that add up to one and they want to make a pie chart. Why? My hypothesis is it goes back to all the pies and pizzas referenced when kids learn to work with fractions.
Why do the pros hate pie charts? They are awful because they encode quantitative information in angles and areas, which are very hard for humans to judge. Skeptical? Read on.
Examples from CMEG and the R Graph Catalog:
We are best able to make comparisons via position of objects along a common scale, which is why these simple dot plots are so much more effective than the pie charts.
Tufte, as quoted by Robbins: “the only worse design than a pie chart is several of them.”
The average person, if told they should not make a pie chart, might then take that bunch of numbers for different categories and make a stacked bar chart. Especially if they have a a series of such numbers. But this is also a very difficult graph to decode.
Stacked bar charts are difficult to decode because we need a common baseline to judge changes in length. So the trend for the category on the “ground floor” is easy to see but trends for those stuck in the middle are hard to see.
Grouped bar charts also make it hard to see trends.
Grouped bar charts are difficult because it’s hard to make comparisons between things that aren’t adjacent or at least very near each other.
When your text (especially the caption!) and the figure contradict each other, it undermines the reader’s trust in everything you present. You can dramatically reduce your ability to shoot yourself in the foot this way by using an integrated reporting approach, such as R Markdown. If figures are made from live R code in chunks and numbers are inserted via live inline R code, the two cannot diverge.
Barring that, my advice is to proofread like a maniac.
We will look through this section (slides 1 - 36) of Karl Broman’s excellent talk How to Display Data Badly (see References for links).
This animation created by Darkhorse Analytics illustrates how communication can be greatly enhanced by eliminating clutter and de-emphasizing supporting elements. Every aspect of a figure should be there on a “need to have it” basis.
In CMEG, Figs 6.2 vs 6.3 make much the same point, i.e. stripping the figure way down is a huge improvement. Figs 5.4 and 5.5 are both decent graphs but using dots (Fig 5.5) instead of bars (Fig 5.4) improves the data:ink ratio.
If you’re going to talk about the difference between this and that, then please go ahead a plot the difference between this and that! Sure, it might be nice to plot this and that, on their own, but don’t stop there. You’ve got a computer. And software. Use them to do annoying arithmetic for your reader.
We are best able to make comparisons if items are positioned along a common scale. Design your graphs to take advantage of this.
We have a harder time with area, volume, length of non-adjacent things, length without a common baseline, angle, color, and shape.
We can see differences in angles when they’re around 45 degrees. But as they get steeper, our ability to compare goes down quickly. You control the angles of line segments in your graphs by controlling the aspect ratio. Pick the ratio so that the “average line segment” is around 45 degrees, a.k.a. banking to 45.
There is no global rule about whether axis limits must be chosen to include zero. It depends.
Robbins proposes you always include it in bar charts, but use your judgement with, e.g., line charts or dot plots.
Figs 7.3, 7.4, and 7.5 explore the inclusion of zero.
Logarithically transformed scales are useful when
Figs 7.7 and 7.8 show a skewed dataset before and after log transformation. We are also used to logging the gdpPercap
variable in the Gapminder data, for the same reasons.
How about presenting two scales for the same axis?
ggplot2
!Consider two quantitative variables, where the x-axis is time or something similar. There are many legitimate ways to present such data. In ggplot2
jargon, there are many relevant geoms.
Beware connecting the dots when the x axis represents an unordered categorical variable.
Consider two quantitative variables, plus a third categorical variable. How to encode the factor?
If superposing, you have shape, filled-ness, and color at your disposal.
It is often better to avoid superposition and, instead, to put the groups into different facets.
We will look through another section (slides 48 - 62) of Karl Broman’s excellent talk How to Display Data Badly (see References for links).
“Creating More Effective Graphs” by Naomi Robbins
The R Graph Catalog presents the figures from “Creating More Effective Graphs” as a visual quilt. Click on a figure to see the ggplot2
code that makes it.
Karl Broman’s talk “How to display data badly”
ggplot2
written by Hadley Wickham
Winston Chang’s book “R Graphics Cookbook” and the Graphs section of his Cookbook for R website
“Let’s Practice What We Preach: Turning Tables into Graphs” by Gelman A, Pasarica C, Dodhia R. The American Statistician, Volume 56, Number 2, 1 May 2002 , pp. 121-130(10). via JSTOR