What would C. J. Minard have done if he had access to R
and ggplot2
? The goal of this excercise is to reproduce, to some reasonable approximation, Minard’s famous graphic of Napoleon’s March on Moscow. Along the way, we’ll learn some techniques for developing plots using ggplot2
.
(The original source of this exercise was the documentation example for the Minard data, example("Minard", package="HistData")
, with the steps explained here. Other ideas were taken from Andrew Heiss, Exploring Minard’s 1812 plot with ggplot2.)
This graph looks very complicated. How should we get started?
The first step is to understand the available data.
The data are contained in three data.frames in the HistData package. Let’s load each one and examine its structure.
Minard.troops
The main data on Napoleon’s troop strength at points (lat
, long
) along the campaign path, giving the number of survivors
, stratified by direction
, a factor with levels A
(“Advance”) and R
(“Retreat”), and group
(Napoleon had three generals commanding portions of his troops).
data(Minard.troops, package="HistData")
str(Minard.troops)
## 'data.frame': 51 obs. of 5 variables:
## $ long : num 24 24.5 25.5 26 27 28 28.5 29 30 30.3 ...
## $ lat : num 54.9 55 54.5 54.7 54.8 54.9 55 55.1 55.2 55.3 ...
## $ survivors: int 340000 340000 340000 320000 300000 280000 240000 210000 180000 175000 ...
## $ direction: Factor w/ 2 levels "A","R": 1 1 1 1 1 1 1 1 1 1 ...
## $ group : int 1 1 1 1 1 1 1 1 1 1 ...
Minard.cities
The (lat
, long
) locations of various places along the path of Napoleon’s army, with the name of the city
.
data(Minard.cities, package="HistData")
str(Minard.cities)
## 'data.frame': 20 obs. of 3 variables:
## $ long: num 24 25.3 26.4 26.8 27.7 27.6 28.5 28.7 29.2 30.2 ...
## $ lat : num 55 54.7 54.4 54.3 55.2 53.9 54.3 55.5 54.4 55.3 ...
## $ city: Factor w/ 20 levels "Bobr","Chjat",..: 5 18 15 9 4 7 16 13 1 19 ...
Minard.temp
The temperature at various places along the march of retreat from Moscow, with their date
.
data(Minard.temp, package="HistData")
str(Minard.temp)
## 'data.frame': 9 obs. of 4 variables:
## $ long: num 37.6 36 33.2 32 29.2 28.5 27.2 26.7 25.3
## $ temp: int 0 0 -9 -21 -11 -20 -24 -30 -26
## $ days: int 6 6 16 5 10 4 3 5 1
## $ date: Factor w/ 8 levels "Dec01","Dec06",..: 7 8 4 5 NA 6 1 2 3
The first step is to try to decompose the graph in terms of the components to be plotted.
First, the graph really consists of two separate plots, stacked vertically:
lat
, long
)temp
, long
)The graph of troop strength has two layers:
survivors
Minard.cities
First, load the packages we will need. In addition to ggplot2
we will use the scales
package to provide convenient formatting of the scale for survivors
and the gridExtra
package to combine the two separate plots.
library(ggplot2)
library(scales) # additional formatting for scales
library(grid) # combining plots
library(gridExtra) # combining plots
library(dplyr) # tidy data manipulations
The basic plot uses lat
and long
as the ggplot (x, y) coordinates. The line below just sets up an empty plot frame for lat
and long
.
ggplot(Minard.troops, aes(long, lat))
The flow-map path of the surviving troops is a geom_path
layer. The important aesthetic attribute is to map the size
(width) of the path to survivors
. Here is a first try:
ggplot(Minard.troops, aes(long, lat)) +
geom_path(aes(size = survivors))
That is pretty hideous, but it is at least a first approxmiation. What’s wrong here:
the path of Advance and Retreat are not distinguished in the graph.
the aspect ratio of the plot doesn’t reflect the equal scaling of degrees of latitude and longitude on a map, or Minard’s scaling in the graphic.
For the first problem, we need to map the color of the path to direction
. In Minard’s map, there are also some side paths of parts of the army diverted to separate battles. These are distinguished by the group
variable.
The scaling of the horizontal and vertical axes is easily fixed by coord_fixed()
which makes equal units appear equal on the two axes.
ggplot(Minard.troops, aes(long, lat)) +
geom_path(aes(size = survivors, colour = direction, group = group)) +
coord_fixed()
Before going further, here are some things to try:
What if Minard simply made a line graph of the path, without using survivors
as the size of the path? What would he have gotten with just geom_path()
– no size
, color
or group
aesthetics?
What if Minard had added points (+ geom_point(aes(size=survivors)
) to reflect the remaining size of the Grand Army?
What if he also tried to distinguish the points by color, based on direction
(+ geom_point(aes(size=survivors, color=direction))
) ?
In Minard’s version, the two upward diversions of troops on the Retreat are drawn “behind” the path of the Advance, whereas in our version they appear in front. How can we change this? (Hint: consider sorting the Minard.troops
data or using transparent versions of the two colors.)
The graph above is correct geographically, but the vertical size is too small to accommodate other graphical elements. In the plots below, we omit coord_fixed()
, and use knitr
options fig.height=3.5, fig.width=10
to scale the plot in proportion to Minard’s original.
As well, the individual segments of the path don’t fit together very well and leave big gaps. We can fix that by adding a rounded line ending to each segment (lineend="round"
).
ggplot
automatically makes discrete categories for the survivors
variable and the values are printed in an unpleasant “e” (exponential) notation. We can override the default using scale_size()
providing our own breaks
, and using scales::comma()
to format the values.
While we’re at it, we can also override the colors for the direction
variable. I used the Eyedropper tool in Firefox Tools -> Web Developer to get the HEX values from the original graph.
breaks <- c(1, 2, 3) * 10^5
ggplot(Minard.troops, aes(long, lat)) +
geom_path(aes(size = survivors, colour = direction, group = group),
lineend="round") +
scale_size("Survivors", range = c(1,10), #c(0.5, 15),
breaks=breaks, labels=scales::comma(breaks)) +
scale_color_manual("Direction",
values = c("#E8CBAB", "#1F1A1B"),
labels=c("Advance", "Retreat"))
plot_troops <- last_plot()
I’m also using a ggplot
trick here: I liked the result of this plot, so I can assign it to a variable, plot_troops
that I’ll use later, rather than reproducing all the code each time. last_plot()
always returns the last ggplot created or modified.
When we assemble this into the complete graphic, we might want to suppress the legends for surviviors
and direction
, and perhaps also the axis labels (lat
and long
). This is easy to do with ggplot
, even though we saved this plot in a ggplot
object.
Before going further, here are some things to try:
Open a copy of Minard’s graphic, http://euclid.psych.yorku.ca/www/psy6135/images/Minard-march.png, in a web browser or other application. Find the color-picking tool that lets you hover the mouse on the graphic and get the color values to use for the Advance and Retreat paths.
When Minard combines this plot with others, he will want to make some adjustments. Try some of the following:
Minard will not need the default labels and scales for the horizontal and vertical axes. Try running: plot_troops + labs(x = NULL, y = NULL)
Minard would not like the default ggplot theme, with a gray background and white grid lines. Try running: plot_troops + theme_bw()
. There is a large collection of other themes, such as theme_minimal()
and theme_void()
.
He will also want to delete the legends for survivors
and direction
. In ggplot2
, these are handled by guides()
, and we can set them to "none"
to suppress them. Try: plot_troops + guides(color = "none", size = "none")
The locations of the cities in Minard’s graphic provide the geographical context for this graphic story of Napoleon’s terrible defeat. The cities he chose for the labels in the graph reflect important battles or other locations from historical accounts of the 1812 campaign. In ggplot
terms, it is just another layer on the graph of the troops, added with +
.
Using the Minard.cities
data, we can use geom_point()
to plot city locations, and/or geom_text()
to plot their names. If we use both, we have to figure out how to deal with overlap of points & text.
plot_troops + geom_text(data = Minard.cities, aes(label = city), size = 3)
Here is another version using both points and text labels for the cities. geom_text()
has several other arguments (hjust
, vjust
) to move the text away from the points, angle
to print them at an angle, and family
to change the font family (e.g., family = "Times New Roman"
)
plot_troops +
geom_point(data = Minard.cities) +
geom_text(data = Minard.cities, aes(label = city), vjust = 1.5)
Plotting both points and text labels is a common problem in graphics. You often have to tweak the positions of the labels so they don’t overlap the points. A separate ggplot
-compatible package, ggrepel provides a function, geom_text_repel()
to automatically move the labels away from points and to ensure none of the labels overlap.
if (!require(ggrepel)) {install.packages("ggrepel"); require(ggrepel)}
library(ggrepel)
plot_troops +
geom_point(data = Minard.cities) +
geom_text_repel(data = Minard.cities, aes(label = city))
plot_troops_cities <- last_plot()
We like this version best, so we save it as plot_troops_cities
.
The second plot in Minard’s graphic is the plot of temperature against longitude on the path of the retreat. Minard first takes a quick look at the data.
Minard.temp
## long temp days date
## 1 37.6 0 6 Oct18
## 2 36.0 0 6 Oct24
## 3 33.2 -9 16 Nov09
## 4 32.0 -21 5 Nov14
## 5 29.2 -11 10 <NA>
## 6 28.5 -20 4 Nov28
## 7 27.2 -24 3 Dec01
## 8 26.7 -30 5 Dec06
## 9 25.3 -26 1 Dec07
He then decides that this is just another geom_path()
His first version just adds the points.
In making this plot, I again measured the size of this part of the graph in Minard’s original, and used fig.height=1.2, fig.width=10
as the knitr
options in this chunk to make this plot have approximately the right size and shape.
ggplot(Minard.temp, aes(long, temp)) +
geom_path(color="grey", size=1.5) +
geom_point(size=2)
If you look carefully at Minard’s graph, he labeled each point using the temperature and the date, in the form \(-26^o \textrm{le } 7 X^{bre}\) for the left-most point for December 7. We construct a nicer label
combining temperature and date as follows:
Minard.temp <- Minard.temp %>%
mutate(label = paste0(temp, "° ", date))
head(Minard.temp$label)
## [1] "0° Oct18" "0° Oct24" "-9° Nov09" "-21° Nov14" "-11° NA" "-20° Nov28"
(There is one temperature in the data without a date. What R function could you use to change “NA” to ” ” in the code above?)
His next version of the plot of temperature used this label
variable as follows:
ggplot(Minard.temp, aes(long, temp)) +
geom_path(color="grey", size=1.5) +
geom_point(size=1) +
geom_text(aes(label=label), size=2, vjust=-1)
However, putting all the labels above the points gives the result that those for the right-most two points are clipped, because they are outside the plot frame. He tries this again, using geom_text_repel()
:
ggplot(Minard.temp, aes(long, temp)) +
geom_path(color="grey", size=1.5) +
geom_point(size=1) +
geom_text_repel(aes(label=label), size=2.5)
This is not too bad. He can always come back and tweak this later, assuming he saves his R code!
plot_temp <- last_plot()
OK, the pieces are done, and it is time to try to paste them together into a single graphic. The tool for this is grid.arrange()
from the gridExtra
package. Here is the first attempt, just placing one plot on top of the other, as is.
grid.arrange(plot_troops_cities, plot_temp)
In the knitr
options, I specified the final dimensions of the combined plot, fig.height=4.7, fig.width=10
. By defaut, grid.arrange()
stacks them, and gives each the same vertical height. We can fix this later.
First, let’s fix the separate plots. In the plot of troops and cities, we need to:
survivors
and direction
;ggplot2
theme elements.plot_troops_cities +
coord_cartesian(xlim = c(24, 38)) +
labs(x = NULL, y = NULL) +
guides(color = "none", size = "none") +
theme_void()
plot_troops_cities_fixed <- last_plot()
In the plot of temperature, we use similar techniques, however it takes a bit more work to suppress the horizontal axis tick marks and labels. The theme()
function allows all aspects of a graph to be controlled.
plot_temp +
coord_cartesian(xlim = c(24, 38)) +
labs(x = NULL, y="Temperature") +
theme_bw() +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank(),
axis.text.x = element_blank(), axis.ticks = element_blank(),
panel.border = element_blank())
plot_temp_fixed <- last_plot()
These plots are now both on the same horizontal scale. To combine them in a single plot, we can again use grid.arrange()
. It would be nice to add a border around the entire graphic. The function grid.rect()
in the grid
package does this for us.
grid.arrange(plot_troops_cities_fixed, plot_temp_fixed, nrow=2, heights=c(3.5, 1.2))
grid.rect(width = .99, height = .99, gp = gpar(lwd = 2, col = "gray", fill = NA))
This is as far as we will take the reconstruction of Minard’s graphic in this tutorial. But, here are some suggestions for going further.
Resizing and adding a title or descriptive text: My calculation of the height of the graphic (3.5” for the plot of troops, 4.7” overall) did not take into account the top portion (about 0.64”) that Minard devotes to the title and descriptive text.
ggplot2
function annotate()
for this, but that is quite tedious for lots of text. Alternatively, find a way to read in this graphic and combine it with the image.Add a map background: Minard drew some map elements, largely rivers, on his graphic to provide more geographic context. The package ggmap works nicely with ggplot2
and provides tools for getting map data from various web sources. Andrew Heiss, Exploring Minard’s 1812 plot with ggplot2 shows how to do this.
Re-visions of Minard: My web page Re-Visions of Minard contains a collection of graphs that others have produced to either recreate Minard’s graphic in other software or to attempt to display the data in some other way. Study this page and try one or more of the ideas illustrated there.