Exercise: Visualizing MMRA Models

Introduction

The purpose of this exercise is to introduce you to some of the basics of:

fitting multivariate regression models in R,
understanding the difference between univariate and multivariate tests,
data visualization for such models using the heplots package

To work through this exercise in R Studio, open a new R script (say, vismlm-ex2.R) in the editor and work from there, following this document (in another window). In this document, you are encouraged to try things yourself, but you can click Code / Hide for individual chunks. When code is shown, the icon at the upper left will copy it to the clipboard. The script for this excercise is contained in exercises/rohwer-mmra.R.

Load packages

It is a good idea to start a script by loading the packages that will be used. We will use the car, heplots, and stargazer packages here. Load them into your R workspace. (Use install.packages() if any are not found.)

library(car)
library(heplots)
library(stargazer)

Rohwer data

This example concerns regression analysis predicting measures of aptitude and achievement for a sample of Low SES kindergarten students from measures of performance on paired associate memory tests.
The response variables are:

SAT – A scholastic aptitude test
PPVT – Peabody picture vocabulary test
Raven – the Raven progressive matrices test

The predictors are scores on 5 paired associate tasks called N, S, NS, NA, SS. These differ in the way the stimulus term is presented:

N = named;
S = used in a ‘still’ task;
NS = used in a named still task; etc.

See: help("Rohwer", package="heplots") for more details.

Load the data

The dataset Rohwer contains 69 observations from two groups of kindergarden children. A group of 37 children from a a High-SES neighborhood (SES=='Hi') and a group of 32 children from a Low-SES neighborhood (SES=='Lo'). For the purposes of this exercise, use the following code to load the data and select only the Low SES group, giving the data.frame Rohwer2.

data("Rohwer", package="heplots")
Rohwer2 <- subset(Rohwer, subset=SES=='Lo')
rownames(Rohwer2)<- 1:nrow(Rohwer2)

Univariate regression models

A univariate approach entails fitting a separate regression model for each of the PPVT, SAT, and Raven variables. This code fits & tests the model for SAT.

rohwer.mod1 <-lm(SAT ~ n + s + ns + na + ss, data=Rohwer2)
Anova(rohwer.mod1)

## Anova Table (Type II tests)
## 
## Response: SAT
##            Sum Sq Df F value  Pr(>F)  
## n            59.7  1  0.1328 0.71806  
## s             1.3  1  0.0028 0.95786  
## ns         1626.8  1  3.6205 0.06640 .
## na          102.2  1  0.2275 0.63673  
## ss         1872.9  1  4.1681 0.04978 *
## Residuals 13929.5 31                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Fit and test an analogous model rohwer.mod2 for the PPVT score. I’m showing here the output you should see.

rohwer.mod2 <-lm(PPVT ~ n + s + ns + na + ss, data=Rohwer2)
Anova(rohwer.mod2)

## Anova Table (Type II tests)
## 
## Response: PPVT
##            Sum Sq Df F value   Pr(>F)   
## n            1.04  1  0.0117 0.914521   
## s          263.55  1  2.9550 0.095583 . 
## ns          48.24  1  0.5408 0.467612   
## na         903.61  1 10.1318 0.003307 **
## ss          38.91  1  0.4362 0.513818   
## Residuals 2764.76 31                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Do the same to give rohwer.mod3 for Raven.

rohwer.mod3 <-lm(Raven ~ n + s + ns + na + ss, data=Rohwer2)
Anova(rohwer.mod3)

## Anova Table (Type II tests)
## 
## Response: Raven
##            Sum Sq Df F value Pr(>F)
## n           7.163  1  0.8277 0.3699
## s           2.113  1  0.2442 0.6247
## ns         24.728  1  2.8573 0.1010
## na          0.582  1  0.0673 0.7971
## ss          1.008  1  0.1165 0.7352
## Residuals 268.282 31

Interpretation

Review the Anova summaries above. What do you conclude for these response variables? In particular,

Which PA tests are significant predictors for each response variable?
Which PA tests are significant predictors for all response variables?

For presentation, or for ease of direct comparison, it is easier to combine the model summary statistics into a table. stargazer is one package that does this, and can produce output in text, LaTeX or HTML format. There are many, many options.

stargazer(rohwer.mod1, rohwer.mod2, rohwer.mod3,
    type = "text", 
    digits=2,
    star.cutoffs = c(0.05, 0.01, 0.001),
    omit = "Constant",
    no.space=TRUE,
    title="Univariate regression models for Rohwer data")

## 
## Univariate regression models for Rohwer data
## =============================================================
##                                     Dependent variable:      
##                               -------------------------------
##                                  SAT        PPVT      Raven  
##                                  (1)        (2)        (3)   
## -------------------------------------------------------------
## n                               -0.61      -0.08      0.21   
##                                 (1.67)     (0.74)    (0.23)  
## s                               -0.05      -0.72      0.06   
##                                 (0.94)     (0.42)    (0.13)  
## ns                              -1.73      -0.30      0.21   
##                                 (0.91)     (0.41)    (0.13)  
## na                               0.49      1.47**     -0.04  
##                                 (1.04)     (0.46)    (0.14)  
## ss                              2.25*       0.32      -0.05  
##                                 (1.10)     (0.49)    (0.15)  
## -------------------------------------------------------------
## Observations                      37         37        37    
## R2                               0.21       0.51      0.22   
## Adjusted R2                      0.08       0.43      0.10   
## Residual Std. Error (df = 31)   21.20       9.44      2.94   
## F Statistic (df = 5; 31)         1.63     6.47***     1.77   
## =============================================================
## Note:                           *p<0.05; **p<0.01; ***p<0.001

Re-evaluate your conclusions from these univariate models.

Multivariate model

Fit the multivariate regression model for all three responses. The syntax for a multivariate lm() uses cbind(Y1, Y2, ...) to “bind” several responses together.

rohwer.mlm <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer2)
Anova(rohwer.mlm)

## 
## Type II MANOVA Tests: Pillai test statistic
##    Df test stat approx F num Df den Df  Pr(>F)  
## n   1  0.038358   0.3856      3     29 0.76418  
## s   1  0.111793   1.2167      3     29 0.32130  
## ns  1  0.225221   2.8100      3     29 0.05696 .
## na  1  0.267458   3.5294      3     29 0.02705 *
## ss  1  0.138962   1.5601      3     29 0.22030  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Overall tests

For a univariate model, the model \(F\)-test gives an overall test of the hypothesis that all regression coefficients are zero, i.e., none of the predictors explain the response beyond chance.

car::linearHypothesis() is more general. It can be used to test any hypothesis regarding subsets of predictors or their linear combinations. For convenience, create a vector of the names of the predictors for an overall test.

predictors <- c("n", "s", "ns", "na", "ss")
linearHypothesis(rohwer.mod1, hypothesis=predictors)

## Linear hypothesis test
## 
## Hypothesis:
## n = 0
## s = 0
## ns = 0
## na = 0
## ss = 0
## 
## Model 1: restricted model
## Model 2: SAT ~ n + s + ns + na + ss
## 
##   Res.Df   RSS Df Sum of Sq      F Pr(>F)
## 1     36 17583                           
## 2     31 13930  5    3653.8 1.6263 0.1824

This is equivalent to the test of the hypothesis that \(R^2_{SAT} = 0\).

You can try doing the same test for PPVT in rohwer.mod2, and for Raven in rohwer.mod3. What do you conclude?

linearHypothesis(rohwer.mod2, hypothesis=predictors)

## Linear hypothesis test
## 
## Hypothesis:
## n = 0
## s = 0
## ns = 0
## na = 0
## ss = 0
## 
## Model 1: restricted model
## Model 2: PPVT ~ n + s + ns + na + ss
## 
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     36 5648.4                                  
## 2     31 2764.8  5    2883.7 6.4667 0.0003179 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

linearHypothesis(rohwer.mod3, hypothesis=predictors)

## Linear hypothesis test
## 
## Hypothesis:
## n = 0
## s = 0
## ns = 0
## na = 0
## ss = 0
## 
## Model 1: restricted model
## Model 2: Raven ~ n + s + ns + na + ss
## 
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     36 344.81                           
## 2     31 268.28  5    76.529 1.7686 0.1486

Overall multivariate test

Try linearHypothesis() using the multivariate model, rohwer.mlm. This tests the multivariate hypothesis that \(\mathbf{B}_{3 \times 4} = [ \mathbf{\beta}_{SAT}, \mathbf{\beta}_{PPVT}, \mathbf{\beta}_{Raven}] = \mathbf{0}\).

predictors <- c("n", "s", "ns", "na", "ss")
linearHypothesis(rohwer.mlm, hypothesis=predictors)

## 
## Sum of squares and products for the hypothesis:
##              SAT      PPVT     Raven
## SAT   3653.77324 2159.9966  63.36803
## PPVT  2159.99661 2883.6759 281.48418
## Raven   63.36803  281.4842  76.52860
## 
## Sum of squares and products for error:
##              SAT     PPVT    Raven
## SAT   13929.5241 1530.517 457.1995
## PPVT   1530.5169 2764.757 213.6780
## Raven   457.1995  213.678 268.2822
## 
## Multivariate Tests: 
##                  Df test stat approx F num Df   den Df     Pr(>F)    
## Pillai            5 0.8252886 2.352859     15 93.00000 0.00659125 ** 
## Wilks             5 0.3431691 2.538140     15 80.45763 0.00391404 ** 
## Hotelling-Lawley  5 1.4487571 2.672152     15 83.00000 0.00234894 ** 
## Roy               5 1.0551154 6.541716      5 31.00000 0.00029228 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

HE plots: Visualizing the multivariate tests

A basic HE plot shows the relation of all predictors to a pair of response variables (the first two, by default). In MMRA models, individual predictors have 1 df, so their H ellipses plot as degenerate lines.

The relative size of any H ellipse reflects the strength (effect size) of that predictor for the two response variables shown in the plot.
By default, these are scaled to show significance. The general rule is that any predictor is significant (using Roy’s test) if its H ellipse (line) protrudes anywhere outside the E ellipse.
The orientation of an H ellipse shows how a given predictor is associated with both response variables. E.g., the positive slope for ns indicates that it is associated with higher scores on both SAT and PPVT.

heplot(rohwer.mlm, fill = TRUE)

We can add the H ellipse to show the overall test for all predictors, using the hypothesis= argument. There are many options for heplot().

cols <-  c("red", "blue", "black", "darkgreen", "darkcyan", "magenta", 
            "gray50")
hyp <- list("Regr" = c("n", "s", "ns", "na", "ss"))
heplot(rohwer.mlm, 
       hypotheses=hyp, 
       fill = TRUE, fill.alpha=.10,
       cex = 1.4,
       col = cols)

Note how the orientation of the overall Regr ellipse reflects the contributions of ss and ns, aligned mainly with the SAT score, and s and na aligned mainly with the PPVT score.

`pairs()` plot

We can also view all pairwise HE plots using the pairs.mlm method.

pairs(rohwer.mlm, 
      hypotheses=hyp, 
      fill = TRUE,
      col = cols)

Exercise: Visualizing MMRA Models

Michael Friendly

2/20/2021

Introduction

Load packages

Rohwer data

Load the data

Univariate regression models

Interpretation

Multivariate model

Overall tests

Overall multivariate test

HE plots: Visualizing the multivariate tests

`pairs()` plot

Exercise: Visualizing MMRA Models

Michael Friendly

2/20/2021

Introduction

Load packages

Rohwer data

Load the data

Univariate regression models

Interpretation

Multivariate model

Overall tests

Overall multivariate test

HE plots: Visualizing the multivariate tests

pairs() plot

`pairs()` plot