Introduction

The purpose of this exercise is to introduce you to some of the basics of:

To work through this exercise in R Studio, open a new R script (say, vismlm-ex2.R) in the editor and work from there, following this document (in another window). In this document, you are encouraged to try things yourself, but you can click Code / Hide for individual chunks. When code is shown, the icon at the upper left will copy it to the clipboard. The script for this excercise is contained in exercises/rohwer-mmra.R.

Load packages

It is a good idea to start a script by loading the packages that will be used. We will use the car, heplots, and stargazer packages here. Load them into your R workspace. (Use install.packages() if any are not found.)

library(car)
library(heplots)
library(stargazer)

Rohwer data

This example concerns regression analysis predicting measures of aptitude and achievement for a sample of Low SES kindergarten students from measures of performance on paired associate memory tests.
The response variables are:

  • SAT – A scholastic aptitude test
  • PPVT – Peabody picture vocabulary test
  • Raven – the Raven progressive matrices test

The predictors are scores on 5 paired associate tasks called N, S, NS, NA, SS. These differ in the way the stimulus term is presented:

  • N = named;
  • S = used in a ‘still’ task;
  • NS = used in a named still task; etc.

See: help("Rohwer", package="heplots") for more details.

Load the data

The dataset Rohwer contains 69 observations from two groups of kindergarden children. A group of 37 children from a a High-SES neighborhood (SES=='Hi') and a group of 32 children from a Low-SES neighborhood (SES=='Lo'). For the purposes of this exercise, use the following code to load the data and select only the Low SES group, giving the data.frame Rohwer2.

data("Rohwer", package="heplots")
Rohwer2 <- subset(Rohwer, subset=SES=='Lo')
rownames(Rohwer2)<- 1:nrow(Rohwer2)

Univariate regression models

A univariate approach entails fitting a separate regression model for each of the PPVT, SAT, and Raven variables. This code fits & tests the model for SAT.

rohwer.mod1 <-lm(SAT ~ n + s + ns + na + ss, data=Rohwer2)
Anova(rohwer.mod1)
## Anova Table (Type II tests)
## 
## Response: SAT
##            Sum Sq Df F value  Pr(>F)  
## n            59.7  1  0.1328 0.71806  
## s             1.3  1  0.0028 0.95786  
## ns         1626.8  1  3.6205 0.06640 .
## na          102.2  1  0.2275 0.63673  
## ss         1872.9  1  4.1681 0.04978 *
## Residuals 13929.5 31                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Fit and test an analogous model rohwer.mod2 for the PPVT score. I’m showing here the output you should see.

rohwer.mod2 <-lm(PPVT ~ n + s + ns + na + ss, data=Rohwer2)
Anova(rohwer.mod2)
## Anova Table (Type II tests)
## 
## Response: PPVT
##            Sum Sq Df F value   Pr(>F)   
## n            1.04  1  0.0117 0.914521   
## s          263.55  1  2.9550 0.095583 . 
## ns          48.24  1  0.5408 0.467612   
## na         903.61  1 10.1318 0.003307 **
## ss          38.91  1  0.4362 0.513818   
## Residuals 2764.76 31                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Do the same to give rohwer.mod3 for Raven.

rohwer.mod3 <-lm(Raven ~ n + s + ns + na + ss, data=Rohwer2)
Anova(rohwer.mod3)
## Anova Table (Type II tests)
## 
## Response: Raven
##            Sum Sq Df F value Pr(>F)
## n           7.163  1  0.8277 0.3699
## s           2.113  1  0.2442 0.6247
## ns         24.728  1  2.8573 0.1010
## na          0.582  1  0.0673 0.7971
## ss          1.008  1  0.1165 0.7352
## Residuals 268.282 31

Interpretation

Review the Anova summaries above. What do you conclude for these response variables? In particular,

  • Which PA tests are significant predictors for each response variable?
  • Which PA tests are significant predictors for all response variables?

For presentation, or for ease of direct comparison, it is easier to combine the model summary statistics into a table. stargazer is one package that does this, and can produce output in text, LaTeX or HTML format. There are many, many options.

stargazer(rohwer.mod1, rohwer.mod2, rohwer.mod3,
    type = "text", 
    digits=2,
    star.cutoffs = c(0.05, 0.01, 0.001),
    omit = "Constant",
    no.space=TRUE,
    title="Univariate regression models for Rohwer data")
## 
## Univariate regression models for Rohwer data
## =============================================================
##                                     Dependent variable:      
##                               -------------------------------
##                                  SAT        PPVT      Raven  
##                                  (1)        (2)        (3)   
## -------------------------------------------------------------
## n                               -0.61      -0.08      0.21   
##                                 (1.67)     (0.74)    (0.23)  
## s                               -0.05      -0.72      0.06   
##                                 (0.94)     (0.42)    (0.13)  
## ns                              -1.73      -0.30      0.21   
##                                 (0.91)     (0.41)    (0.13)  
## na                               0.49      1.47**     -0.04  
##                                 (1.04)     (0.46)    (0.14)  
## ss                              2.25*       0.32      -0.05  
##                                 (1.10)     (0.49)    (0.15)  
## -------------------------------------------------------------
## Observations                      37         37        37    
## R2                               0.21       0.51      0.22   
## Adjusted R2                      0.08       0.43      0.10   
## Residual Std. Error (df = 31)   21.20       9.44      2.94   
## F Statistic (df = 5; 31)         1.63     6.47***     1.77   
## =============================================================
## Note:                           *p<0.05; **p<0.01; ***p<0.001

Re-evaluate your conclusions from these univariate models.

Multivariate model

Fit the multivariate regression model for all three responses. The syntax for a multivariate lm() uses cbind(Y1, Y2, ...) to “bind” several responses together.

rohwer.mlm <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer2)
Anova(rohwer.mlm)
## 
## Type II MANOVA Tests: Pillai test statistic
##    Df test stat approx F num Df den Df  Pr(>F)  
## n   1  0.038358   0.3856      3     29 0.76418  
## s   1  0.111793   1.2167      3     29 0.32130  
## ns  1  0.225221   2.8100      3     29 0.05696 .
## na  1  0.267458   3.5294      3     29 0.02705 *
## ss  1  0.138962   1.5601      3     29 0.22030  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Overall tests

For a univariate model, the model \(F\)-test gives an overall test of the hypothesis that all regression coefficients are zero, i.e., none of the predictors explain the response beyond chance.

car::linearHypothesis() is more general. It can be used to test any hypothesis regarding subsets of predictors or their linear combinations. For convenience, create a vector of the names of the predictors for an overall test.

predictors <- c("n", "s", "ns", "na", "ss")
linearHypothesis(rohwer.mod1, hypothesis=predictors)
## Linear hypothesis test
## 
## Hypothesis:
## n = 0
## s = 0
## ns = 0
## na = 0
## ss = 0
## 
## Model 1: restricted model
## Model 2: SAT ~ n + s + ns + na + ss
## 
##   Res.Df   RSS Df Sum of Sq      F Pr(>F)
## 1     36 17583                           
## 2     31 13930  5    3653.8 1.6263 0.1824

This is equivalent to the test of the hypothesis that \(R^2_{SAT} = 0\).

You can try doing the same test for PPVT in rohwer.mod2, and for Raven in rohwer.mod3. What do you conclude?

linearHypothesis(rohwer.mod2, hypothesis=predictors)
## Linear hypothesis test
## 
## Hypothesis:
## n = 0
## s = 0
## ns = 0
## na = 0
## ss = 0
## 
## Model 1: restricted model
## Model 2: PPVT ~ n + s + ns + na + ss
## 
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     36 5648.4                                  
## 2     31 2764.8  5    2883.7 6.4667 0.0003179 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
linearHypothesis(rohwer.mod3, hypothesis=predictors)
## Linear hypothesis test
## 
## Hypothesis:
## n = 0
## s = 0
## ns = 0
## na = 0
## ss = 0
## 
## Model 1: restricted model
## Model 2: Raven ~ n + s + ns + na + ss
## 
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     36 344.81                           
## 2     31 268.28  5    76.529 1.7686 0.1486

Overall multivariate test

Try linearHypothesis() using the multivariate model, rohwer.mlm. This tests the multivariate hypothesis that \(\mathbf{B}_{3 \times 4} = [ \mathbf{\beta}_{SAT}, \mathbf{\beta}_{PPVT}, \mathbf{\beta}_{Raven}] = \mathbf{0}\).

predictors <- c("n", "s", "ns", "na", "ss")
linearHypothesis(rohwer.mlm, hypothesis=predictors)
## 
## Sum of squares and products for the hypothesis:
##              SAT      PPVT     Raven
## SAT   3653.77324 2159.9966  63.36803
## PPVT  2159.99661 2883.6759 281.48418
## Raven   63.36803  281.4842  76.52860
## 
## Sum of squares and products for error:
##              SAT     PPVT    Raven
## SAT   13929.5241 1530.517 457.1995
## PPVT   1530.5169 2764.757 213.6780
## Raven   457.1995  213.678 268.2822
## 
## Multivariate Tests: 
##                  Df test stat approx F num Df   den Df     Pr(>F)    
## Pillai            5 0.8252886 2.352859     15 93.00000 0.00659125 ** 
## Wilks             5 0.3431691 2.538140     15 80.45763 0.00391404 ** 
## Hotelling-Lawley  5 1.4487571 2.672152     15 83.00000 0.00234894 ** 
## Roy               5 1.0551154 6.541716      5 31.00000 0.00029228 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

HE plots: Visualizing the multivariate tests

A basic HE plot shows the relation of all predictors to a pair of response variables (the first two, by default). In MMRA models, individual predictors have 1 df, so their H ellipses plot as degenerate lines.

heplot(rohwer.mlm, fill = TRUE)

We can add the H ellipse to show the overall test for all predictors, using the hypothesis= argument. There are many options for heplot().

cols <-  c("red", "blue", "black", "darkgreen", "darkcyan", "magenta", 
            "gray50")
hyp <- list("Regr" = c("n", "s", "ns", "na", "ss"))
heplot(rohwer.mlm, 
       hypotheses=hyp, 
       fill = TRUE, fill.alpha=.10,
       cex = 1.4,
       col = cols)

Note how the orientation of the overall Regr ellipse reflects the contributions of ss and ns, aligned mainly with the SAT score, and s and na aligned mainly with the PPVT score.

pairs() plot

We can also view all pairwise HE plots using the pairs.mlm method.

pairs(rohwer.mlm, 
      hypotheses=hyp, 
      fill = TRUE,
      col = cols)