The methods discussed in Chapter 10 provide the basis for a rather complete multivariate analysis of traditional univariate methods for the same designs. You can carry out multiple regression, ANOVA, or indeed, any classical linear model with the standard collection of analysis tools you use for a single outcome variable, but naturally extended in most cases to having several outcomes to analyse together. The key points are:

As nice as these mathematical and statistical ideas might be, the fact that the analysis is conducted for the response variables collectively, means that it may be harder to interpret and explain what this means about the separate responses. Here’s where multivariate model visualization comes to the rescue!

Packages

In this chapter we use the following packages. I load them now.

11.1 HE plot framework

Chapter 9 illustrated the basic ideas of the framework for visualizing multivariate linear models in the context of a simple two group design, using Hotelling’s \(T^2\). The main ideas were illustrated in Figure 9.9.

Having described the statistical ideas behind the MLM in Chapter 10, we can proceed to extend this framework to larger designs. Figure 11.1 illustrates these ideas using the simple one-way MANOVA design of the dogfood data from Section 10.2.1.

Figure 11.1: Dogfood quartet: Illustration of the conceptual ideas of the HE plot framework for the dogfood data. (a) Scatterplot of the data; (b) Summary using data ellipses; (c) HE plot shows the variation in the means in relation to pooled within group variance; (d) Transformation from data space to canonical space
  • In data space (a), each group is summarized by its data ellipse (b), representing the means and covariances.

  • Variation against the hypothesis of equal means can be seen by the \(\mathbf{H}\) ellipse in the HE plot (c), representing the data ellipse of the fitted values. Error variance is shown in the \(\mathbf{E}\) ellipse, representing the pooled within-group covariance matrix, \(\mathbf{S}_p\) and the data ellipse of the residuals from the model. For the dogfood data, the group means have a negative relation: longer time to start eating is associated with a smaller amount eaten.

  • The MANOVA (or Hotelling’s \(T^2\)) is formally equivalent to a discriminant analysis, predicting group membership from the response variables which can be seen in data space. (The main difference is emphasis and goals: MANOVA seeks to test differences among group means, while discriminant analysis aims at classification of the observations into groups.)

  • This effectively projects the \(p\)-dimensional space of the predictors into the smaller canonical space (d) that shows the greatest differences among the groups. As in a biplot, vectors show the relations of the response variables with the canonical dimensions.

For more complex models such as MANOVA with multiple factors or multivariate multivariate regression, there is one sum of squares and products matrix (SSP), and therefore one \(\mathbf{H}\) ellipse for each term in the model. For example, in a two-way MANOVA design with the model formula (y1, y2) ~ A + B + A*B and equal sample sizes in the groups, the total sum of squares accounted for by the model is \[\begin{aligned} \mathbf{SSP}_{\text{Model}} & = \mathbf{SSP}_{A} + \mathbf{SSP}_{B} + \mathbf{SSP}_{AB} \\ & = \mathbf{H}_A + \mathbf{H}_B + \mathbf{H}_{AB} \:\: . \end{aligned}\]

11.2 HE plot construction

The HE plot is constructed to allow a direct visualization of the “size” of hypothesized terms in a multivariate linear model in relation to unexplained error variation. These can be displayed in 2D or 3D plots, so I use the term “ellipsoid” below to cover all cases.

Error variation is represented by a standard 68% data ellipsoid of the \(\mathbf{E}\) matrix of the residuals in \(\boldsymbol{\Large\varepsilon}\). This is divided by the residual degrees of freedom, so the size of \(\mathbf{E} / \text{df}_e\) is analogous to a mean square error in univariate tests. The choice of 68% coverage allows you to ``read’’ the residual standard deviation as the half-length of the shadow of the \(\mathbf{E}\) ellipsoid on any axis (see Figure 3.10). The \(\mathbf{E}\) ellipsoid is then translated to the overall (grand) means \(\bar{\mathbf{y}}\) of the variables plotted, which allows us to show the means for factor levels on the same scale, facilitating interpretation. In the notation of Equation 3.2, the error ellipsoid is given by \[ \mathcal{E}_c (\bar{\mathbf{y}}, \mathbf{E}) = \bar{\mathbf{y}} \; \oplus \; c\,\mathbf{E}^{1/2} \:\: , \tag{11.1}\] where \(c = \sqrt{2 F_{2, n-2}^{0.68}}\) for 2D plots and \(c = \sqrt{3 F_{3, n-3}^{0.68}}\) for 3D for standard 68% coverage.

An ellipsoid representing variation in the means of a factor (or any other term reflected in a general linear hypothesis test, Equation 10.7) in the \(\mathbf{H}\) matrix is simply the data ellipse of the fitted values for that term.

Dividing the hypothesis matrix by the error degrees of freedom, giving \(\mathbf{H} / \text{df}_e\), puts this on the same scale as the ellipse. I refer to this as effect size scaling, because it is similar to an effect size index used in univariate models, e.g., \(ES = (\bar{y}_1 - \bar{y}_2) / s_e\) in a two-group, univariate design.

11.2.1 Example: Iris data

Perhaps the most famous (or infamous) dataset in the history of multivariate data analysis is that of measurements on three species of Iris flowers collected by Edgar Anderson (1935) in the Gaspé Peninsula of Québec, Canada. Anderson wanted to quantify the outward appearance (“morphology”: shape, structure, color, pattern, size) of species as a method to study variation within and between such groups. Although Anderson published in the obscure Bulletin of the American Iris Society, R. A. Fisher (1936) saw this as a challenge and opportunity to introduce the method now called discriminant analysis—how to find a weighted composite of variables to best discriminate among existing groups.

History corner

I said “infamous” above because Fisher published in the Annals of Eugenics, was an ardent eugenicist himself, and the work of eugenicists was often pervaded by prejudice against racial, ethnic and disabled groups. Through guilt by association, the Iris data, having mistakenly been called “Fisher’s Iris Data”, has become deprecated, even called “racist data”.1 The voices of the Setosa, Versicolor and Virginica of Gaspé protest: we don’t have a racist bone in out body and nor prejudice against any other species.

Bodmer et al. (2021) present a careful account of Fisher’s views on eugenics within the context of his time and his contributions to modern statistical theory and practice. Fisher’s views on race were largely formed by Darwin and Galton, but “nearly all of Fisher’s statements were about populations, groups of populations, or the human species as a whole”. Regardless, the iris data were Anderson’s and should not be blamed. After all, if Anderson had lent his car to Fisher, would the car be tainted by Fisher’s eugenicist leanings?

Figure 11.2: Diagram of an iris flower showing the measurements of petal and sepal size. Each flower has three sepals and three alternating petals. The sepals have brightly colored central sections. Source: Gayan De Silva (2020)

So that we understand what the measurements represent, Figure 11.2 superposes labels on a typical iris flower. Sepals are like ostentatious petals, with attractive decorations in the central section. Length is the distance from the center to the tip and width is the transverse dimension.

As always, it is useful to start with overview displays to see the data. A scatterplot matrix (Figure 11.3) shows that versicolor and virginica are more similar to each other than either is to setosa, both in their pairwise means (setosa are smaller) and in the slopes of regression lines. Further, the ellipses suggest that the assumption of constant within-group covariance matrices is problematic: While the shapes and sizes of the concentration ellipses for versicolor and virginica are reasonably similar, the shapes and sizes of the ellipses for setosa are different from the other two.

iris_colors <-c("blue", "darkgreen", "brown4")
scatterplotMatrix(~ Sepal.Length + Sepal.Width + 
                    Petal.Length + Petal.Width | Species,
  data = iris,
  col = iris_colors,
  pch = 15:17,
  smooth=FALSE,
  regLine = TRUE,
  ellipse=list(levels=0.68, fill.alpha=0.1),
  diagonal = FALSE,
  legend = list(coords = "bottomleft", 
                cex = 1.3, pt.cex = 1.2))
Figure 11.3: Scatterplot matrix of the iris dataset. The species are summarized by 68% data ellipses and linear regression lines in each pairwise plot.

11.2.2 MANOVA model

We proceed nevertheless to fit a multivariate one-way ANOVA model to the iris data. The MANOVA model for these data addresses the question: “Do the means of the Species differ significantly for the sepal and petal variables taken together?” \[ \mathcal{H}_0 : \mathbf{\mu}_\textrm{setosa} = \mathbf{\mu}_\textrm{versicolor} = \mathbf{\mu}_\textrm{virginica} \]

Because there are three species, the test involves \(s = \min(p, g-1) =2\) degrees of freedom, and we are entitled to represent this by two 1-df contrasts, or sub-questions. From the separation among the groups shown in Figure 11.3 (or more botanical knowledge), it makes sense to compare:

  • Setosa vs. others: \(\mathbf{c}_1 = (1,\: -\frac12, \: -\frac12)\)
  • Versicolor vs. Virginica: : \(\mathbf{c}_1 = (0,\: 1, \: -1)\)

You can do this by putting these vectors as columns in a matrix and assigning this to the contrasts() of Species. It is important to do this before fitting with lm(), because the contrasts in effect determine how the \(\mathbf{X}\) matrix is setup, and hence the names of the coefficients representing Species.

C <- matrix(c(1,-1/2,-1/2,  
              0,   1,  -1), nrow=3, ncol=2)
contrasts(iris$Species) <- C
contrasts(iris$Species)
#>            [,1] [,2]
#> setosa      1.0    0
#> versicolor -0.5    1
#> virginica  -0.5   -1

Now let’s fit the model. As you would expect from Figure 11.3, the differences among groups are highly significant.

iris.mod <- lm(cbind(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) ~
                 Species, data=iris)
Anova(iris.mod)
#> 
#> Type II MANOVA Tests: Pillai test statistic
#>         Df test stat approx F num Df den Df Pr(>F)    
#> Species  2      1.19     53.5      8    290 <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As a quick follow-up, it is useful to examine the univariate tests for each of the iris variables, using heplots::glance() or heplots::uniStats(). It is of interest that the univariate \(R^2\) values are much larger for the petal variables than the sepal length and width.2. For comparison, heplots::etasq() gives the overall \(\eta^2\) proportion of variance accounted for in all responses.

glance(iris.mod)
#> # A tibble: 4 × 8
#>   response     r.squared sigma fstatistic numdf dendf  p.value  nobs
#>   <chr>            <dbl> <dbl>      <dbl> <dbl> <dbl>    <dbl> <int>
#> 1 Sepal.Length     0.619 0.515      119.      2   147 1.67e-31   150
#> 2 Sepal.Width      0.401 0.340       49.2     2   147 4.49e-17   150
#> 3 Petal.Length     0.941 0.430     1180.      2   147 2.86e-91   150
#> 4 Petal.Width      0.929 0.205      960.      2   147 4.17e-85   150

etasq(iris.mod)
#>         eta^2
#> Species 0.596

But these statistics don’t help to understand how the species differ. For this, we turn to HE plots.

11.3 HE plots

The heplot() function takes a "mlm" object and produces an HE plot for one pair of variables specified by the variables argument. By default, it plots the first two. Figure 11.4 shows the HE plots for the two sepal and the two petal variables.

heplot(iris.mod, size = "effect",
       cex = 1.5, cex.lab = 1.5,
       fill = TRUE, fill.alpha = c(0.3, 0.1))
heplot(iris.mod, size = "effect", variables = 3:4,
       cex = 1.5, cex.lab = 1.5,
       fill = TRUE, fill.alpha = c(0.3, 0.1))
Figure 11.4: HE plots for the multivariate model iris.mod. The left panel shows the plot for the Sepal variables; the right panel plots the Petal variables.

The interpretation of the plots in Figure 11.4 is as follows:

  • For the Sepal variables, length and width are positively correlated within species (the \(\mathbf{E}\) = “Error” ellipsoid). The means of the groups (the \(\mathbf{H}\) = “Species” ellipsoid), however, are negatively correlated. This plot is the HE plot representation of the data shown in row 2, column 1 of Figure 11.3. It reflects the relative shape of the iris sepals: shorter and wider for setosa than the other two species.

  • For the Petal variables length and width are again positively correlated within species, but now the means of the groups are positively correlated: longer petals go with wider ones across species. This reflects the relative size of the iris petals. The analogous data plot appears in row 4, column 3 of Figure 11.3.

11.4 Significance scaling

The geometry of ellipsoids and multivariate tests allow us to go further with another re-scaling of the \(\mathbf{H}\) ellipsoid that gives a visual test of significance for any term in a MLM. This is done simply by dividing \(\mathbf{H} / df_e\) further by the \(\alpha\)-critical value of the corresponding test statistic to show the strength of evidence against the null hypothesis.

Among the various multivariate test statistics, Roy’s maximum root test, based on the largest eigenvalue \(\lambda_1\) of \(\mathbf{H} \mathbf{E}^{-1}\), gives \(\mathbf{H} / (\lambda_\alpha df_e)\) which has the visual property that the scaled \(\mathbf{H}\) ellipsoid will protrude somewhere outside the standard \(\mathbf{E}\) ellipsoid if and only if Roy’s test is significant at significance level \(\alpha\). The critical value \(\lambda_\alpha\) for Roy’s test is \[ \lambda_\alpha = \left(\frac{\text{df}_1}{\text{df}_2}\right) \; F_{\text{df}_1, \text{df}_2}^{1-\alpha} \:\: , \] where \(\text{df}_1 = \max(p, \text{df}_h)\) and \(\text{df}_2 = \text{df}_h + \text{df}_e - \text{df}_1\).

For these data, the HE plot using significance scaling is shown in the right panel of Figure 11.5. The left panel is the same as that shown for sepal width and length in Figure 11.4, but with axis limits to make the two plots directly comparable.

Figure 11.5: HE plots for sepal width and sepal length in the iris dataset. Left: effect scaling of the \(\mathbf{H}\) matrix; right: significance scaling, where protrusion of \(\mathbf{H}\) outside \(\mathbf{E}\) indicates a significant effect by Roy’s test.

You can interpret the plot using effect scaling to indicate that the overall “size” of variation of the group means is roughly the same as that of within-group variation for the two sepal variables. Significance scaling weights the evidence against the null hypothesis that a given effect is zero.

11.5 Visualizing contrasts and linear hypotheses

As described in Section 10.3.1, tests of linear hypotheses and contrasts represented by the general linear test \(\mathcal{H}_0: \mathbf{C} \;\mathbf{B} = \mathbf{0}\) provide a powerful way to probe the specific effects represented within the global null hypothesis, \(\mathcal{H}_0: \mathbf{B} = \mathbf{0}\), that all effects are zero.

In this example the contrasts \(\mathbf{c}_1\) (Species1) and \(\mathbf{c}_2\) (Species2) among the iris species are orthogonal, i.e., \(\mathbf{c}_1^\top \mathbf{c}_2 = 0\). Therefore, their tests are statistically independent, and their \(\mathbf{H}\) matrices are additive. They fully decompose the general question of differences among the groups into two independent questions regarding the contrasts.

\[ \mathbf{H}_\text{Species} = \mathbf{H}_\text{Species1} + \mathbf{H}_\text{Species2} \tag{11.2}\]

car::linearHypothesis() is the means for testing these statistically, and heplot() provides the way to show these tests visually. Using the contrasts set up in Section 11.2.2, \(\mathbf{c}_1\), representing the difference between setosa and the other species is labeled Species1 and the comparison of versicolor with virginica is Species2. The coefficients for these in \(\mathbf{B}\) give the differences in the means. The line for (Intercept) gives grand means of the variables.

coef(iris.mod)
#>             Sepal.Length Sepal.Width Petal.Length Petal.Width
#> (Intercept)        5.843       3.057        3.758       1.199
#> Species1          -0.837       0.371       -2.296      -0.953
#> Species2          -0.326      -0.102       -0.646      -0.350

Numerical tests of hypotheses using linearHypothesis() can be specified in a very general way: A matrix (or vector) \(\mathbf{C}\) giving linear combinations of coefficients by rows, or a character vector giving the hypothesis in symbolic form. A character variable or vector tests whether the named coefficients are different from zero for all responses.

linearHypothesis(iris.mod, "Species1") |> print(SSP=FALSE)
#> 
#> Multivariate Tests: 
#>                  Df test stat approx F num Df den Df Pr(>F)    
#> Pillai            1      0.97     1064      4    144 <2e-16 ***
#> Wilks             1      0.03     1064      4    144 <2e-16 ***
#> Hotelling-Lawley  1     29.55     1064      4    144 <2e-16 ***
#> Roy               1     29.55     1064      4    144 <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

linearHypothesis(iris.mod, "Species2") |> print(SSP=FALSE)
#> 
#> Multivariate Tests: 
#>                  Df test stat approx F num Df den Df Pr(>F)    
#> Pillai            1     0.745      105      4    144 <2e-16 ***
#> Wilks             1     0.255      105      4    144 <2e-16 ***
#> Hotelling-Lawley  1     2.925      105      4    144 <2e-16 ***
#> Roy               1     2.925      105      4    144 <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The various test statistics are all equivalent here—they give the same \(F\) statistics— because they they have 1 degree of freedom.

In passing, from Equation 11.2, note that the joint test of these contrasts is exactly equivalent to the overall test of Species (results not shown).

linearHypothesis(iris.mod, c("Species1", "Species2"))

We can show these contrasts in an HE plot by supplying a named list for the hypotheses argument. The names are used as labels in the plot. In the case of a 1-df multivariate test, the \(\mathbf{H}\) ellipses plot as a degenerate line.

hyp <- list("S:Vv" = "Species1", "V:v" = "Species2")
heplot(iris.mod, hypotheses=hyp,
       cex = 1.5, cex.lab = 1.5,
       fill = TRUE, fill.alpha = c(0.3, 0.1),
       col = c("red", "blue", "darkgreen", "darkgreen"),
       lty = c(0,0,1,1), label.pos = c(3, 1, 2, 1),
       xlim = c(2, 10), ylim = c(1.4, 4.6))
Figure 11.6: HE plot for sepal length and width in the iris data showing the tests of the two contrasts

This HE plot shows that, for the two sepal variables, the greatest between-species variation is accounted for by the contrast (S:Vv) between setosa and the others, for which the effect is very large in relation to error variation. The second contrast (V:v), between the versicolor and virginica species is relatively smaller, but still explains significant variation of the sepal variables among the species.

The directions of these hypotheses in a given plot show how the group means differ in terms of a given contrast.3 For example, the contrast S:Vv is the line that separates setosa from the others and indicates that setosa flowers have shorter but wider sepals.

11.6 HE plot matrices

In base R graphics, 2D scatterplots are extended to all pairwise views of multivariate data with a pairs() method. For multivariate linear models, the heplots defines a pairs.mlm() method to display HE plots for all pairs of the response variables.

pairs(iris.mod,
      fill=TRUE, fill.alpha=c(0.3, 0.1))
Figure 11.7: All pairwise HE plots for the iris data.

Figure 11.7 provides a fairly complete visualization of the results of the multivariate tests and answers the question: how do the species differ? Sepal length and the two petal variables have the group means nearly perfectly correlated, in the order setosa < versicolor < virginica. For Sepal width, however, setosa has the largest mean, and so the \(\mathbf{H}\) ellipses show a negative correlation in the second row and column.

11.7 Low-D views: Canonical analysis

The HE plot framework so far provides views of all the effects in a MLM in variable space. We can view this in 2D for selected pairs of response variables, or for all pairwise views in scatterplot matrix format. There is also an heplot3d() function giving plots for three response variables together. The 3D plots are interactive, in that they can be rotated and zoomed by mouse control, and dynamic, in that they can be made to spin and saved as movies. To save space, these plots are not shown here.

However in a one-way MANOVA design with more than response three variables, it is difficult to visualize how the groups vary on all responses together, and how the different variables contribute to discrimination among groups. In this situation, canonical discriminant analysis (CDA) is often used, to provide a low-D visualization of between-group variation. When the predictors are also continuous, the analogous term is canonical correlation analysis (CCA). The advantage here is that we can also show the relations of the response variables to these dimensions, similar to a biplot (Section 4.3) for a PCA of purely quantitative variables.

The key to this is the eigenvalue decomposition, \(\mathbf{H}\mathbf{E}^{-1} \lambda_i = \lambda_i \mathbf{v}_i\) (Equation 10.5) of \(\mathbf{H}\) relative to \(\mathbf{E}\). The eigenvalues, \(\lambda_i\), give the “size” of each \(s\) orthogonal dimensions on which the multivariate tests are based. But the corresponding eigenvectors, \(\mathbf{v}_i\), give the weights for the response variables in \(s\) linear combinations that maximally discriminate among the groups or equivalently maximize the (canonical) \(R^2\) of a linear combination of the predictor \(\mathbf{X}\)s with a linear combination of the response \(\mathbf{Y}\)s.

Thus, CDA amounts to a transformation of the \(p\) responses, \(\mathbf{Y}_{n \times p}\) into the canonical space, \[ \mathbf{Z}_{n \times s} = \mathbf{Y} \; \mathbf{E}^{-1/2} \; \mathbf{V} \;, \] where \(\mathbf{V}\) contains the eigenvectors of \(\mathbf{H}\mathbf{E}^{-1}\) and \(s=\min ( p, df_h )\). It is well-known (e.g., Gittins (1985)) that canonical discriminant plots of the first two (or three, in 3D) columns of \(\mathbf{Z}\) corresponding to the largest canonical correlations provide an optimal low-D display of the variation between groups relative to variation within groups.

Canonical discriminant analysis is typically carried out in conjunction with a one-way MANOVA design. The candisc package (Friendly & Fox, 2025) generalizes this to multi-factor designs in the candisc() function. For any given term in a "mlm", the generalized canonical discriminant analysis amounts to a standard discriminant analysis based on the \(\mathbf{H}\) matrix for that term in relation to the full-model \(\mathbf{E}\) matrix.4

Tests based on the eigenvalues \(\lambda_i\), initially stated by Bartlett (1938), use Wilks’ \(\Lambda\) likelihood ratio tests of these. This allow you to determine the number of significant canonical dimensions, or the number of different aspects to consider for the relations between the responses and predictors. These take the form of sequential global tests of the hypothesis that the canonical correlation in the current row and all that follow are zero. Thus, if you find The canonical \(R^2\), CanRsq, gives the R-squared value of fitting the \(i\)th response canonical variate to the corresponding \(i\)th canonical variate for the predictors.

For the iris data, we get the following printed summary:

iris.can <- candisc(iris.mod) |> print()
#> 
#> Canonical Discriminant Analysis for Species:
#> 
#>   CanRsq Eigenvalue Difference Percent Cumulative
#> 1  0.970     32.192       31.9  99.121       99.1
#> 2  0.222      0.285       31.9   0.879      100.0
#> 
#> Test of H0: The canonical correlations in the 
#> current row and all that follow are zero
#> 
#>   LR test stat approx F numDF denDF Pr(> F)    
#> 1        0.023    199.1     8   288 < 2e-16 ***
#> 2        0.778     13.8     3   145 5.8e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This analysis shows a very simple result: The differences among the iris species can be nearly entirely accounted for by the first canonical dimension (99.1%). Interestingly, the second dimension is also highly significant, even though it accounts for only 0.88%.

11.7.1 Coeficients

The coef() method for “candisc” objects returns a matrix of weights for the response variables in the canonical dimensions. By default, these are given for the response variables standardized to \(\bar{y}=0\) and \(s^2_y = 1\). The type argument also allows for raw score weights (type = "raw") used to compute the observation scores on Can1, Can2, … . type = "structure" gives the canonical structure coefficients, which are the correlations between each response and the canonical scores.

coef(iris.can, type = "std")
#>                Can1    Can2
#> Sepal.Length  0.427  0.0124
#> Sepal.Width   0.521  0.7353
#> Petal.Length -0.947 -0.4010
#> Petal.Width  -0.575  0.5810
coef(iris.can, type = "structure")
#>                Can1  Can2
#> Sepal.Length -0.792 0.218
#> Sepal.Width   0.531 0.758
#> Petal.Length -0.985 0.046
#> Petal.Width  -0.973 0.223

Thus, the two weighted sums for the canonical variates, using mean-centered standardized scores, can be shown as follows:

Can1 = 0.43 x S.len + 0.52 x S.wid + -0.95 x P.len + -0.58 x P.wid Can2 = 0.01 x S.len + 0.74 x S.wid + -0.4 x P.len + 0.58 x P.wid

11.7.2 Canonical scores plot

The plot() method for "candisc" objects gives a plot of these observation scores. ellipse=TRUE overlays this with their standard data ellipses for each species, as shown in Figure 11.8. The response variables are shown as vectors, using the structure coefficients, as in a biplot. Thus, the relative size of the projection of these vectors on the canonical axes reflects the correlation of the observed response on the canonical dimension. For ease of interpretation I flipped the sign of the first canonical dimension, so that the positive Can1 direction corresponds to larger flowers.

TODO: var.labels not a graphical parameter

vars <- names(iris)[1:4] |> 
  stringr::str_replace("\\.", "\n")
plot(iris.can,
     var.labels = vars,
     var.col = "black",
     var.lwd = 2,
     ellipse=TRUE,
     scale = 9,
     col = iris_colors,
     pch = 15:17,
     cex = 0.7, var.cex = 1.25,
     rev.axes = c(TRUE, FALSE),
     xlim = c(-10, 10),
     cex.lab = 1.5)
Figure 11.8: Plot of canonical scores for the iris data.

The interpretation of this plot is simple: in canonical space, variation of the means for the iris species is essentially one-dimensional (99.1% of the effect of Species), and this dimension corresponds to overall size of the iris flowers. All variables except for Sepal.Width are positively aligned with this axis, but the two petal variables show the greatest discrimination. The negative direction for Sepal.Width reflects the pattern seen in Figure 11.7, where setosa have wider sepals.

For the second dimension, look at the projections of the variable vectors on the Can2 axis. All are positive, but this is dominated by Sepal.Width. We could call this a flower shape dimension.

11.7.3 Canonical HE plot

For a one-way design, the canonical HE plot is simply the HE plot of the canonical scores in the analogous MLM model that substitutes \(\mathbf{Z}\) for \(\mathbf{Y}\). In effect, it is a more compact visual summary of the plot shown in Figure 11.8.

This is shown in Figure 11.9 for the iris data. In canonical space, the residuals are always uncorrelated, so the \(\mathbf{E}\) ellipse plots as a circle. The \(\mathbf{H}\) ellipse for Species here reflects a data ellipse for the fitted values— group means— shown as labeled points in the plot. The differences among Species are so large that this plot uses size = "effect" scaling, making the axes comparable to those in Figure 11.8.5

The vectors for each predictor are the same structure coefficients as in the ordinary canonical plot. They can again be reflected for interpretation and scaled in length to fill the plot window.

heplot(iris.can,
       size = "effect",
       scale = 8,
       var.labels = vars,
       var.col = "black",
       var.lwd = 2,
       fill = TRUE, fill.alpha = 0.2,
       rev.axes = c(TRUE, FALSE),
       xlim = c(-10, 10))
Figure 11.9: Canonical HE plot for the iris data.

The collection of plots shown for the iris data here can be seen as progressive visual summaries of the data:

  • The scatterplot matrix in Figure 11.3 shows the iris flowers in the data space of the sepal and petal variables.
  • Canonical analysis substitutes for these the two linear combinations reflected in Can1 and Can2. The plot in Figure 11.8 portrays exactly the same relations among the species, but in the reduced canonical space of only two dimensions.
  • The HE plot version, shown in Figure 11.9 summarizes the separate data ellipses for the species with pooled, within-group variance of the \(\mathbf{E}\) matrix for the canonical variables, which are always uncorrelated. The variation among the group means is reflected in the size and shape of the ellipse for the \(\mathbf{E}\) matrix.

11.8 Quantitative predictors: MMRA

The ideas behind HE plots extend naturally to multivariate multiple regression (MMRA). A purely visual feature of HE plots in these cases is that the \(\mathbf{H}\) ellipse for a quantitative predictor with 1 df appears as a degenerate line. But consequently, the angles between these for different predictors has a simple interpretation as as the correlation between their predicted effects. Moreover, it is easy to show visual overall tests of joint linear hypotheses for two or more predictors together.

TODO: Use these examples below

11.8.1 Example: NLSY data

Here I’ll continue the analysis of the NLSY data from Section 10.5.1. In the model NLSY.mod1 I used only father’s income and education to predict scores in reading and math, and both of these demographic variables were highly significant.

data(NLSY, package = "heplots")
NLSY.mod1 <- lm(cbind(read, math) ~ income + educ, 
                data = NLSY)

heplot(NLSY.mod1, 
  fill=TRUE, fill.alpha = 0.2, 
  cex = 1.5, cex.lab = 1.5,
  lwd=c(2, 3, 3),
  label.pos = c("bottom", "top", "top")
  )
Figure 11.10: HE plot for the simple model for the NLSY data fitting reading and math scores from income and education.

Fathers income and education are positively correlated in their effects on the outcome scores. From the angles in the plot, income is most related to the math score, while education is related to both, but slightly more to the reading score.

The overall joint test for both predictors can then visualized as the test of the linear hypothesis \(\mathcal{H}_0 : \mathbf{B} = [\boldsymbol{\beta}_\text{income}, \boldsymbol{\beta}_\text{educ}] = \mathbf{0}\). For heplot(), we specify the names of the coefficients to be tested with the hypotheses argument.

coefs <- rownames(coef(NLSY.mod1))[-1] |> print()
#> [1] "income" "educ"

heplot(NLSY.mod1, 
       hypotheses = list("Overall" = coefs),
       fill=TRUE, fill.alpha = 0.2, 
       cex = 1.5, cex.lab = 1.5,
       lwd=c(2, 3, 3, 2),
       label.pos = c("bottom", "top"))
Figure 11.11: HE plot adding the \(\mathbf{H}\) ellipse for the overall test that both predictors have no effect on the outcome scores.

11.9 Canonical correlation analysis

Just as we saw for MANOVA designs, a canonical analysis for multivariate regression involves finding a low-D view of the relations between predictors and outcomes that maximally explains their relations in terms of linear combinations of each. That is, the goal is to find weights for one set of variables, say \(\mathbf{X}\) not to predict each of the other set \(\mathbf{Y} =[\mathbf{y}_1, \mathbf{y}_2, \dots]\) individually, but rather to also find weights for the \(\mathbf{y}\)s which is most highly correlated with the linear combination of the \(\mathbf{x}\)s.

In this sense, canonical correlation analysis (CCA) is symmetric in the \(x\) and \(y\) variables: the \(y\) set is not considered responses. Rather the goal is simply to explain the correlations between the two sets. For a thorough treatment of this topic, see Gittins (1985).

Geometrically, these linear combinations are vectors representing projections in the observation space of the \(x\) and \(y\) variables, and CCA can also be thought of as minimizing the angle between these vectors or maximizing the cosine of this angle. This is illustrated in Figure 11.12.

Figure 11.12: Diagram illustrating canonical correlation. For two \(y\) variables, all linear combinations are vectors in their plane, and similarly for the \(x\) variables. Maximizing the correlation between linear combinations of each is equivalent to making the angle \(\phi\) between them as small as possible, or maximizing \(\cos({\theta})\), shown in the diagram at the right. The thick grey arrow indignates that the two planes should be overlaid at a common origin. Source: Re-drawn by Udi Alter following a Cross-Validated discussion by user ‘ttnphns’, https://bit.ly/4dgq2cp

Specifically, we want to find one set of weights \(\mathbf{a}_1\) for the \(x\) variables and another for the \(y\) variables to give the linear combinations \(\mathbf{u}_1\) and \(\mathbf{v}_1\),

\[\begin{aligned} \mathbf{u}_1 & = \mathbf{X} \ \mathbf{a}_1 = a_{11} \mathbf{x}_1 + a_{12} \mathbf{x}_2 + \cdots + a_{11} \mathbf{x}_q \\ \mathbf{v}_1 & = \mathbf{Y} \ \mathbf{b}_1 = b_{11} \mathbf{y}_1 + b_{12} \mathbf{y}_1 + \cdots + b_{11} \mathbf{y}_p \; , \end{aligned}\]

such that the correlation \(\rho_1 = \textrm{corr}(\mathbf{u}_1, \mathbf{v}_1)\) is maximized, or equivalently, minimizing the angle between them.

Using \(\mathbf{S}_{xx}\), \(\mathbf{S}_{yy}\) to represent the covariance matrices of the \(x\) and \(y\) variables, and \(\mathbf{S}_{xy}\) for the cross-covariances between the two sets, the correlation between the linear combinations of each can be expressed as

\[\begin{aligned} \rho_1 & = \textrm{corr}(\mathbf{u}_1, \mathbf{v}_1) = \textrm{corr}(\mathbf{X} \ \mathbf{a}_1, \mathbf{Y} \ \mathbf{b}_1) \\ & = \frac{\mathbf{a}_1^\top \ \mathbf{S}_{xy} \ \mathbf{b}_1 }{\sqrt{\mathbf{a}_1^\top \ \mathbf{S}_{xx} \ \mathbf{a}_1 } \sqrt{\mathbf{b}_1^\top \ \mathbf{S}_{yy} \ \mathbf{b}_1 }} \end{aligned}\]

But, the \(y\) variables lie in a \(p\)-dimensional (observation) space, and the \(x\) in \(q\) dimensions, so what they have common is a space of \(s = \min(p, q)\) dimensions. Therefore, we can find additional pairs of canonical variables,

\[\begin{aligned} \mathbf{u}_2 = \mathbf{X} \ \mathbf{a}_2 & \quad\quad \mathbf{v}_2 = \mathbf{Y} \ \mathbf{b}_2 \\ & \vdots \\ \mathbf{u}_s = \mathbf{X} \ \mathbf{a}_s & \quad\quad \mathbf{v}_s = \mathbf{Y} \ \mathbf{b}_s \\ \end{aligned}\]

such that each pair \((\mathbf{u}_i, \mathbf{v}_i)\) has the maximum possible correlation and all distinct pairs are uncorrelated:

\[\begin{aligned} \rho_i & =\max _{\mathbf{a}_i, \mathbf{b}_i}\left\{\mathbf{u}_i^{\top} \mathbf{v}_i\right\} = \\ \left\|\mathbf{u}_i\right\| & =1, \quad\left\|\mathbf{v}_i\right\|=1, \\ \mathbf{u}_i^{{\top}} \mathbf{u}_j & =0, \quad \mathbf{v}_i^{\top} \mathbf{v}_j=0 \quad\quad \forall j \neq i: i, j \in\{1,2, \ldots, s\} \ . \end{aligned}\]

In words, the correlations among canonical variables are zero except when when they are associated with the same canonical correlation or the weights \((\mathbf{a}_i, \mathbf{b}_i)\) for the same pair. Alternatively, all \(p \times q\) correlations the variables in \(\mathbf{Y}\) and \(\mathbf{X}\) are fully summarized in the \(s = \min(p, q)\) canonical correlations \(\rho_i\) for \(i = 1, 2, \dots, s\).

The solution, developed by Hotelling (1936), is a form of a generalized eigenvalue problem, that can be stated in two equivalent ways,

\[ \begin{aligned} & \left(\mathbf{S}_{y x} \ \mathbf{S}_{x x}^{-1} \ \mathbf{S}_{x y} - \rho^2 \ \mathbf{S}_{y y}\right) \mathbf{b} = \mathbf{0} \\ & \left(\mathbf{S}_{x y} \ \mathbf{S}_{y y}^{-1} \ \mathbf{S}_{y x} - \rho^2 \ \mathbf{S}_{x x}\right) \mathbf{a} = \mathbf{0} \ . \end{aligned} \] Both equations have the same form and have the same eigenvalues. And, given the eigenvectors for one of these equations, we can find the eigenvectors for the other.

TODO: Fill in details of canonical correlations

11.9.1 Example: School data

The schooldata dataset analyzed in Section 10.5.2 can also be illuminated by the methods of this chapter. There I fit the multivariate regression model predicting students scores on reading, mathematics and a measure of self-esteem using as predictors measures of parents’ education, occupation, school visits, counseling help with school assignments and number of teachers per school. But I also found two highly influential observations (cases 44, 59; see Figure 10.16) whose effect on the coefficients is rather large; so, I remove them from the analysis here.6

data(schooldata, package = "heplots")

bad <- c(44, 59)
OK <- (1:nrow(schooldata)) |> setdiff(bad)
school.mod2 <- lm(cbind(reading, mathematics, selfesteem) ~ ., 
                  data=schooldata[OK, ])

In this model, parent’s education and occupation and their visits to the schools were highly predictive of student’s outcomes but their counseling efforts and the number of teachers in the schools did not contribute much. However, the nature of these relationships was largely uninterpreted in that analysis.

Here is where HE plots can help. You can think of this as a way to visualize what is entailed in the coefficients for this model by showing the magnitude of the predictor effects by their size and their relations to the outcome variable by their direction. The table of raw score coefficients isn’t very helpful in this regard.

coef(school.mod2)
#>             reading mathematics selfesteem
#> (Intercept)  2.7096       3.561    0.39751
#> education    0.2233       0.132   -0.01088
#> occupation   3.3336       4.284    1.79574
#> visit        0.0101      -0.123    0.20005
#> counseling  -0.3953      -0.293    0.00868
#> teacher     -0.1945      -0.360    0.01129

Figure 11.13 shows the HE plot for reading and mathematics scores in this model, using the default significance scaling.

heplot(school.mod2, 
       fill=TRUE, fill.alpha=0.1,
       cex = 1.5,
       cex.lab = 1.5,
       label.pos = c(rep("top", 4), "bottom", "bottom"))
Figure 11.13: HE plot for reading and mathematics scores in the multivariate regression model…

Parent’s occupation and education are both significant in this view, but what is more important is their orientation. Both are positively associated with reading and math scores, but education is somewhat more related to reading than to mathematics. Number of teachers and degree of parental counseling have a similar orientation, with teachers having a greater relation to mathematics scores. Visits to school and number of teachers are not significant in this plot, but both are positively correlated with reading and math and are coincident in the plot. The parent time counseling measure, while also insignificant, tilts in the opposite direction, having different signs for reading and math.

In the pairs() plot for all three responses (Figure 11.14), we see something different in the relations for self-esteem. While occupation has a large positive relation in all the plots in the third row and column, education, counseling and teachers have negative relations in these plots, particularly with mathematics scores.

pairs(school.mod2, 
      fill=TRUE, fill.alpha=0.1,
      var.cex = 2.5,
      cex = 1.3)
Figure 11.14: Pairwise HE plots for the three outcome variables in the multivariate regression model …

11.9.2 Canonical analysis

With \(p = 3\) responses and \(q = 5\) predictors there are three possible sets of canonical variables which together account for 100% of the total linear relations between them. heplots::cancor() gives the percentage associated with each of the eigenvalues and the canonical correlations.

For this dataset, the first canonical variates, with Can \(R = 0.995\), accounts for 98.6%, so you might think that that is sufficient. Yet the likelihood ratio tests show that the second set, with Can \(R = 0.774\), is also significant, even though it only accounts for 1.3%.

# bad <- c(44, 59)
# OK <- (1:nrow(schooldata)) |> setdiff(bad)
school.can2 <- cancor(cbind(reading, mathematics, selfesteem) ~
                        education + occupation + visit + counseling + teacher,
                     data=schooldata[OK, ])
school.can2
#> 
#> Canonical correlation analysis of:
#>   5   X  variables:  education, occupation, visit, counseling, teacher 
#>   with  3   Y  variables:  reading, mathematics, selfesteem 
#> 
#>     CanR CanRSQ    Eigen  percent    cum
#> 1 0.9946 0.9892 91.41999 98.57540  98.58
#> 2 0.7444 0.5541  1.24267  1.33994  99.92
#> 3 0.2698 0.0728  0.07852  0.08466 100.00
#>                          scree
#> 1 ****************************
#> 2                             
#> 3                             
#> 
#> Test of H0: The canonical correlations in the 
#> current row and all that follow are zero
#> 
#>    CanR LR test stat approx F numDF denDF Pr(> F)    
#> 1 0.995        0.004     67.5    15   166 < 2e-16 ***
#> 2 0.744        0.413      8.5     8   122 4.1e-09 ***
#> 3 0.270        0.927      1.6     3    62    0.19    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The virtue of CCA is that all correlations between the X and Y variables are completely captured in the correlations between the pairs of canonical scores: The \(p \times q\) correlations between the sets are entirely represented by the \(s = \min(p, q)\) canonical ones. Whether the second dimension is useful here depends on whether it adds some interpretable increment to what is going on in these relations. One could be justifiably happy with an explanation based on the first dimension that accounts for nearly all the total association between the sets.

The class "cancor" object returned by cancor() contains the canonical coefficients, for which there is a coef() method as in candisc(), and also a scores() method to return the scores on the canonical variables, called Xcan1, Xcan2, … and Ycan1, Ycan2.

names(school.can2)
#>  [1] "cancor"    "names"     "ndim"      "dim"       "coef"     
#>  [6] "scores"    "X"         "Y"         "weights"   "structure"
#> [11] "call"      "terms"

You can use the plot() method or heplot() method to visualize and help interpret the results. The plot() method plots the canonical scores$X against the scores$Y for a given dimension (selected by the which argument). The id.n argument gives a way to flag noteworthy observations.

text(-2, 1.5, paste("Can R =", round(school.can2$cancor[1], 3)), 
     cex = 1.4, pos = 4)

plot(school.can2, which = 2, 
     pch=16, id.n = 3,
     cex.lab = 1.5, id.cex = 1.5,
     ellipse.args = list(fill = TRUE, fill.alpha = 0.1))
text(-3, 3, paste("Can R =", round(school.can2$cancor[2], 3)), 
     cex = 1.4, pos = 4)
par(op)
Figure 11.15: Plots of canonical scores for the first two canonical dimensions of the schooldata dataset, omitting the two highly influential cases.

It is worthwhile to look at an analogous plot of canonical scores for the original dataset including the two highly influential cases. As you can see in Figure 11.16, cases 44 and 59 are way outside the range of the rest of the data. Their influence increases the canonical correlation to a near perfect \(\rho = 0.997\).

school.can <- cancor(cbind(reading, mathematics, selfesteem) ~
                       education + occupation + visit + counseling + teacher,
                     data=schooldata)
plot(school.can, 
     pch=16, id.n = 3,
     cex.lab = 1.5, id.cex = 1.5,
     ellipse.args = list(fill = TRUE, fill.alpha = 0.1))
text(-5, 1, paste("Can R =", round(school.can$cancor[1], 3)), 
     cex = 1.4, pos = 4)
Figure 11.16: Plots of canonical scores on the first canonical dimension for the schooldata, including the influential cases, which stand out as so far frome the rest of the observations.

Plots of canonical scores tell us of the strength of the canonical dimensions, but do not help interpreting the analysis in relation to the original variables. The HE plot version for canonical correlation analysis re-fits a multivariate regression model for the Y variables against the Xs, but substitutes the canonical scores for each, essentially projecting the data into canonical space.

TODO: Check out signs of structure coefs from cancor(). Would be better to reflect the vectors for Ycan1.

heplot(school.can2,
       fill = TRUE, fill.alpha = 0.2,
       var.col = "red", 
       asp = NA, scale = 0.25,
       cex.lab = 1.5, cex = 1.25,
       prefix="Y canonical dimension ")
Figure 11.17: HE plot for the canonical correlation analysis of the schooldata. Vectors for the variables indicate their correlations with the canonical dimensions.

The red variable vectors shown in these plots are intended only to show the correlations of Y variables with the canonical dimensions. The fact that they are so closely aligned reflects the fact that the first dimension accounts for nearly all of their associations with the predictors. The orientation of the \(\mathbf{H}\) ellipses/lines reflects the projection of those from Figure 11.14 into canonical space

Only their relative lengths and angles with respect to the Y canonical dimensions have meaning in these plots. Relative lengths correspond to proportions of variance accounted for in the Y canonical dimensions plotted; angles between the variable vectors and the canonical axes correspond to the structure correlations. The absolute lengths of these vectors are arbitrary and are typically manipulated by the scale argument to provide better visual resolution and labeling for the variables.

11.10 MANCOVA models

HE plots for designs containing a collection of quantitative predictors and one or more factors are quite simple in MANCOVA models where the effects are additive, i.e., don’t involve interactions. They are a bit more challenging when you allow separate slopes for groups on all quantitative variables, because there get to be too many terms to usefully display. But these models are more complicated!

If the evidence for heterogeneity of regressions is not very strong, it is still useful to fit the MANCOVA model and display it in an HE plot.

An alternative is to fit separate models for the groups and display these as HE plots. As noted earlier (Section 10.7.1), this is not ideal for testing hypotheses, but provides a useful and informative display of the relations between the predictors and responses and the groups effect. I illustrate these approaches for the Rohwer data, encountered in Section 10.7.1, below.

11.10.1 Example: Rohwer data

In Section 10.7.1 I fit several models for Rohwer’s data on the relations between paired-associate tasks and scholastic performance. The first model was the MANCOVA model testing the difference between the high and low SES groups, controlling for, or taking into account differences on the paired-associate task.

Rohwer.mod1 <- lm(cbind(SAT, PPVT, Raven) ~ SES + n + s + ns + na + ss, 
                 data=Rohwer)

HE plots for this model for the pairs (SAT, PPVT) and (SAT, Raven) is shown in Figure 11.18. The result of an overall test for all predictors, \(\mathcal{H}_0 : \mathbf{B} = \mathbf{0}\), is added to the basic plot using the hypotheses argument.

colors <- c("red", "blue", rep("black",5), "#969696")
covariates <- rownames(coef(Rohwer.mod1))[-(1:2)]
pairs(Rohwer.mod1, 
       col=colors,
       hypotheses=list("Regr" = covariates),
       fill = TRUE, fill.alpha = 0.1,
       cex=1.5, cex.lab = 1.5, var.cex = 3,
       lwd=c(2, rep(3,5), 4))
Figure 11.18: All-pairs HE plot for SAT, PPVT and Raven using the MANCOVA model. The ellipses labeled ‘Regr’ show the test of the overall effect of the quantitative predictors.

The positive effect of SES on the outcome measures is seen in all pairwise plots: the high SES group is better on all responses. The positive orientation of the Regr ellipses for the covariates shows that the predicted values for all three responses are positively correlated (more so for SAT and PPVT): higher performance on the paired associate tasks, in general, is associated with higher academic performance. The two significant predictors, na and ns are the only ones that extend outside the error ellipses, but their orientations differ.

Homogeneity of regression

A second model tested the assumption of homogeneity of regression by adding interactions of SES with the PA tasks, allowing separate slopes for the two groups on each of the other predictors.

Rohwer.mod2 <- lm(cbind(SAT, PPVT, Raven) ~ SES * (n + s + ns + na + ss),
                  data=Rohwer)

This model has 11 terms, excluding the intercept: SES, plus 5 main effects (\(x\)s) for the predictors and 5 interactions (slope differences), too many for an understandable display. To visualize this in an HE plot (Figure 11.19), I simplify, by showing the interaction terms collectively by a single ellipse, representing their joint effect, and specified as a linear hypothesis called slopes that picks out the interaction effects.

The argument terms limits the \(\mathbf{H}\) ellipses for the right-hand-side of the model which are shown to just those terms specified. The combined effect of the interaction terms is specified as an hypothesis (slopes) testing the interaction terms (which have a “:” in their name). Because SES is “treatment-coded” in this model, the interaction terms reflect the difference in slopes for the high SES group compared to the low.

(coefs <- rownames(coef(Rohwer.mod2)))
#>  [1] "(Intercept)" "SESLo"       "n"           "s"          
#>  [5] "ns"          "na"          "ss"          "SESLo:n"    
#>  [9] "SESLo:s"     "SESLo:ns"    "SESLo:na"    "SESLo:ss"

colors <- c("red", "blue", rep("black",5), "#969696")
heplot(Rohwer.mod2, col=c(colors, "darkgreen"), 
       terms=c("SES", "n", "s", "ns", "na", "ss"), 
       hypotheses=list("Regr" = c("n", "s", "ns", "na", "ss"),
                       "Slopes" = coefs[grep(":", coefs)]),
       fill = TRUE, fill.alpha = 0.2, cex.lab = 1.5)
Figure 11.19: HE plot for SAT and PPVT using the heterogeneous regression model. The ellipse labeled ‘Regr’ shows the test of the covariates combined, and the ellipse labeled ‘slopes’ shows the combined difference in slopes between the two groups.

Separate models

When there is heterogeneity of regressions, using submodels for each of the groups has the advantage that you can easily visualize the slopes for the predictors in each of the groups, particularly if you overlay the individual HE plots. In this example, I’m using the models Rohwer.sesLo and Rohwer.sesLo fit to each of the groups.

Rohwer.sesLo <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, 
                   data=Rohwer, subset = SES=="Lo")
Rohwer.sesHi <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, 
                   data=Rohwer, subset = SES=="Hi")

Here I make use of the fact that several HE plots can be overlaid using the option add=TRUE as shown in Figure 11.20. The axis limits may need adjustment in the first plot so that the second one will fit.

heplot(Rohwer.sesLo, 
       xlim = c(0,100),               # adjust axis limits
       ylim = c(40,110), 
       col=c("red", "black"), 
       fill = TRUE, fill.alpha = 0.1,
       lwd=2, cex=1.2, cex.lab = 1.5)
heplot(Rohwer.sesHi, 
       add=TRUE, 
       col=c("brown", "black"), 
       grand.mean=TRUE, 
       error.ellipse=TRUE,            # not shown by default when add=TRUE
       fill = TRUE, fill.alpha = 0.1,
       lwd=2, cex=1.2)
Figure 11.20: Overlaid HE plots for SAT and PPVT, for the low and high SES groups, when each group is fit separately.

We can readily see the difference in means for the two SES groups (Hi has greater scores on both variables) and it also appears that the slopes of the s and n predictor ellipses are shallower for the High than the Low group, indicating greater relation with the SAT score. As well, the error ellipses show that on these measures, error variation is somewhat smaller in the low SES group.

11.11 What We Have Learned

  • HE plots transform multivariate model complexity into visual clarity - The HE plot framework brilliantly solves the interpretability problem of multivariate models by visualizing hypothesis (H) ellipses against error (E) ellipses. Instead of drowning in tables of coefficients and test statistics, you can literally see which effects matter, how they relate to each other, and whether they’re statistically significant—all in a single, intuitive plot that reveals the geometric structure underlying your multivariate analysis.

  • Canonical space is your secret weapon for high-dimensional visualization - When you have many response variables, canonical discriminant analysis and canonical correlation analysis project the complex multivariate relationships into a lower-dimensional space that captures the essential patterns. This isn’t just dimension reduction—it’s insight amplification, revealing the fundamental directions of variation that matter most while showing how your original variables contribute to these meaningful dimensions.

  • Visual hypothesis testing beats p-value hunting every time - HE plots make hypothesis testing immediate and intuitive: if a hypothesis ellipse extends outside the error ellipse (under significance scaling), the effect is significant. No more scanning tables of p-values or wrestling with multiple comparisons—the geometry tells the story directly. You can even decompose overall effects into meaningful contrasts and see their individual contributions as separate ellipses.

  • Ellipse orientations reveal the hidden correlational structure of your effects - The angles between hypothesis ellipses in HE plots directly show how different predictors relate to your response variables and to each other. When effect ellipses point in similar directions, those predictors have similar multivariate signatures; when they’re orthogonal, they capture independent aspects of variation. This geometric insight goes far beyond what correlation matrices can reveal.

  • Multivariate models become tractable through progressive visual summaries - The chapter demonstrates a powerful visualization strategy: start with scatterplot matrices to see the raw data structure, move to HE plots to understand model effects, then project to canonical space for the clearest possible view of multivariate relationships. Each step preserves the essential information while making it more interpretable, turning the complexity of multivariate analysis into a comprehensible visual narrative.

Packages:

#> Writing packages to  C:/R/Projects/Vis-MLM-book/bib/pkgs.txt
#> 9  packages used here:
#>  broom, candisc, car, carData, dplyr, ggplot2, heplots, knitr, tidyr

  1. For example, Megan Stodel in a blog post Stop using iris says, “It is clear to me that knowingly using work that was itself used in pursuit of racist ideals is totally unacceptable.” A Reddit discussion on this topic, Is it socially acceptable to use the Iris dataset? has some interesting replies.↩︎

  2. Recall that \(R^2\) for a linear model is the the proportion of variation in the response that is explained by the model, calculated as \(R^2 = \text{SS}_H / \text{SS}_T = \text{SS}_H / (\text{SS}_H + \text{SS}_E )\). For a multivariate model, these are obtained from the diagonal elements of \(\mathbf{H}\) and \(\mathbf{E}\).↩︎

  3. That the \(\mathbf{H}\) ellipses for the contrasts subtend that for the overall test of Species is no accident. In fact, this is true in \(p\)-dimensional space for any linear hypothesis, and orthogonal contrasts have the additional geometric property that they form conjugate axes for the overall \(\mathbf{H}\) ellipsoid relative to the \(\mathbf{E}\) ellipsoid (Friendly et al., 2013).↩︎

  4. candiscList() performs a generalized canonical discriminant analysis for all terms in a multivariate linear model, returning a list of the results for each factor.↩︎

  5. If significance scaling was used the interpretation of the canonical HE plot plot would the same as before: if the hypothesis ellipse extends beyond the error ellipse, then that dimension is significant.↩︎

  6. An alternative to fitting the model removing specific cases deemed troublesome is to use a robust method, such as heplots::roblm() which uses re-weighted least squares to down-weight observations with large residuals or other problems.↩︎