Wachsmuth et. al (2003) noticed that a loess smooth through Galton's data on heights of mid-parents and their offspring exhibited a slightly non-linear trend, and asked whether this might be due to Galton having pooled the heights of fathers and mothers and sons and daughters in constructing his tables and graphs.

To answer this question, they used analogous data from English families at about the same time, tabulated by Karl Pearson and Alice Lee (1896, 1903), but where the heights of parents and children were each classified by gender of the parent.




A frequency data frame with 746 observations on the following 6 variables.


child height in inches, a numeric vector


parent height in inches, a numeric vector


a numeric vector


a factor with levels fd fs md ms


a factor with levels Father Mother


a factor with levels Daughter Son


The variables gp, par and chl are provided to allow stratifying the data according to the gender of the father/mother and son/daughter.


#> 'data.frame':	746 obs. of  6 variables:
#>  $ child    : num  59.5 59.5 59.5 60.5 60.5 61.5 61.5 61.5 61.5 61.5 ...
#>  $ parent   : num  62.5 63.5 64.5 62.5 66.5 59.5 60.5 62.5 63.5 64.5 ...
#>  $ frequency: num  0.5 0.5 1 0.5 1 0.25 0.25 0.5 1 0.25 ...
#>  $ gp       : Factor w/ 4 levels "fd","fs","md",..: 2 2 2 2 2 2 2 2 2 2 ...
#>  $ par      : Factor w/ 2 levels "Father","Mother": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ chl      : Factor w/ 2 levels "Daughter","Son": 2 2 2 2 2 2 2 2 2 2 ...

    lim <- c(55,80)
    xv <- seq(55,80, .5)
    sunflowerplot(parent,child, number=frequency, xlim=lim, ylim=lim, seg.col="gray", size=.1)
    abline(lm(child ~ parent, weights=frequency), col="blue", lwd=2)
    lines(xv, predict(loess(child ~ parent, weights=frequency), data.frame(parent=xv)), 
          col="blue", lwd=2)
    # NB: dataEllipse doesn't take frequency into account
    if(require(car)) {
    dataEllipse(parent,child, xlim=lim, ylim=lim, plot.points=FALSE)

## separate plots for combinations of (chl, par)

# this doesn't quite work, because xyplot can't handle weights
xyplot(child ~ parent|par+chl, data=PearsonLee, type=c("p", "r", "smooth"), col.line="red")

# Using ggplot [thx: Dennis Murphy]
ggplot(PearsonLee, aes(x = parent, y = child, weight=frequency)) +
   geom_point(size = 1.5, position = position_jitter(width = 0.2)) +
   geom_smooth(method = lm, aes(weight = PearsonLee$frequency,
               colour = 'Linear'), se = FALSE, size = 1.5) +
   geom_smooth(aes(weight = PearsonLee$frequency,
               colour = 'Loess'), se = FALSE, size = 1.5) +
   facet_grid(chl ~ par) +
   scale_colour_manual(breaks = c('Linear', 'Loess'),
                       values = c('green', 'red')) +
   theme(legend.position = c(0.14, 0.885),
        legend.background = element_rect(fill = 'white'))
# inverse regression, as in Wachmuth et al. (2003)

ggplot(PearsonLee, aes(x = child, y = parent, weight=frequency)) +
   geom_point(size = 1.5, position = position_jitter(width = 0.2)) +
   geom_smooth(method = lm, aes(weight = PearsonLee$frequency,
               colour = 'Linear'), se = FALSE, size = 1.5) +
   geom_smooth(aes(weight = PearsonLee$frequency,
               colour = 'Loess'), se = FALSE, size = 1.5) +
   facet_grid(chl ~ par) +
   scale_colour_manual(breaks = c('Linear', 'Loess'),
                       values = c('green', 'red')) +
   theme(legend.position = c(0.14, 0.885),
        legend.background = element_rect(fill = 'white'))
