The purpose of this exercise is to introduce you to some of the basics of:
heplots packageTo work through this exercise in R Studio, open a new R script (say, vismlm-ex2.R) in the editor and work from there, following this document (in another window). In this document, you are encouraged to try things yourself, but you can click Code / Hide for individual chunks. When code is shown, the icon at the upper left  will copy it to the clipboard. The script for this excercise is contained in exercises/rohwer-mmra.R.
It is a good idea to start a script by loading the packages that will be used. We will use the car, heplots, and stargazer packages here. Load them into your R workspace. (Use install.packages() if any are not found.)
library(car)
library(heplots)
library(stargazer)This example concerns regression analysis predicting measures of aptitude and achievement for a sample of Low SES kindergarten students from measures of performance on paired associate memory tests.
The response variables are:
SAT – A scholastic aptitude testPPVT – Peabody picture vocabulary testRaven – the Raven progressive matrices testThe predictors are scores on 5 paired associate tasks called N, S, NS, NA, SS. These differ in the way the stimulus term is presented:
N = named;S = used in a ‘still’ task;NS = used in a named still task; etc.See: help("Rohwer", package="heplots") for more details.
The dataset Rohwer contains 69 observations from two groups of kindergarden children. A group of 37 children from a a High-SES neighborhood (SES=='Hi') and a group of 32 children from a Low-SES neighborhood (SES=='Lo'). For the purposes of this exercise, use the following code to load the data and select only the Low SES group, giving the data.frame Rohwer2.
data("Rohwer", package="heplots")
Rohwer2 <- subset(Rohwer, subset=SES=='Lo')
rownames(Rohwer2)<- 1:nrow(Rohwer2)A univariate approach entails fitting a separate regression model for each of the PPVT, SAT, and Raven variables. This code fits & tests the model for SAT.
rohwer.mod1 <-lm(SAT ~ n + s + ns + na + ss, data=Rohwer2)
Anova(rohwer.mod1)## Anova Table (Type II tests)
## 
## Response: SAT
##            Sum Sq Df F value  Pr(>F)  
## n            59.7  1  0.1328 0.71806  
## s             1.3  1  0.0028 0.95786  
## ns         1626.8  1  3.6205 0.06640 .
## na          102.2  1  0.2275 0.63673  
## ss         1872.9  1  4.1681 0.04978 *
## Residuals 13929.5 31                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Fit and test an analogous model rohwer.mod2 for the PPVT score. I’m showing here the output you should see.
rohwer.mod2 <-lm(PPVT ~ n + s + ns + na + ss, data=Rohwer2)
Anova(rohwer.mod2)## Anova Table (Type II tests)
## 
## Response: PPVT
##            Sum Sq Df F value   Pr(>F)   
## n            1.04  1  0.0117 0.914521   
## s          263.55  1  2.9550 0.095583 . 
## ns          48.24  1  0.5408 0.467612   
## na         903.61  1 10.1318 0.003307 **
## ss          38.91  1  0.4362 0.513818   
## Residuals 2764.76 31                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Do the same to give rohwer.mod3 for Raven.
rohwer.mod3 <-lm(Raven ~ n + s + ns + na + ss, data=Rohwer2)
Anova(rohwer.mod3)## Anova Table (Type II tests)
## 
## Response: Raven
##            Sum Sq Df F value Pr(>F)
## n           7.163  1  0.8277 0.3699
## s           2.113  1  0.2442 0.6247
## ns         24.728  1  2.8573 0.1010
## na          0.582  1  0.0673 0.7971
## ss          1.008  1  0.1165 0.7352
## Residuals 268.282 31Review the Anova summaries above. What do you conclude for these response variables? In particular,
For presentation, or for ease of direct comparison, it is easier to combine the model summary statistics into a table. stargazer is one package that does this, and can produce output in text, LaTeX or HTML format. There are many, many options.
stargazer(rohwer.mod1, rohwer.mod2, rohwer.mod3,
    type = "text", 
    digits=2,
    star.cutoffs = c(0.05, 0.01, 0.001),
    omit = "Constant",
    no.space=TRUE,
    title="Univariate regression models for Rohwer data")## 
## Univariate regression models for Rohwer data
## =============================================================
##                                     Dependent variable:      
##                               -------------------------------
##                                  SAT        PPVT      Raven  
##                                  (1)        (2)        (3)   
## -------------------------------------------------------------
## n                               -0.61      -0.08      0.21   
##                                 (1.67)     (0.74)    (0.23)  
## s                               -0.05      -0.72      0.06   
##                                 (0.94)     (0.42)    (0.13)  
## ns                              -1.73      -0.30      0.21   
##                                 (0.91)     (0.41)    (0.13)  
## na                               0.49      1.47**     -0.04  
##                                 (1.04)     (0.46)    (0.14)  
## ss                              2.25*       0.32      -0.05  
##                                 (1.10)     (0.49)    (0.15)  
## -------------------------------------------------------------
## Observations                      37         37        37    
## R2                               0.21       0.51      0.22   
## Adjusted R2                      0.08       0.43      0.10   
## Residual Std. Error (df = 31)   21.20       9.44      2.94   
## F Statistic (df = 5; 31)         1.63     6.47***     1.77   
## =============================================================
## Note:                           *p<0.05; **p<0.01; ***p<0.001Re-evaluate your conclusions from these univariate models.
Fit the multivariate regression model for all three responses. The syntax for a multivariate lm() uses cbind(Y1, Y2, ...) to “bind” several responses together.
rohwer.mlm <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer2)
Anova(rohwer.mlm)## 
## Type II MANOVA Tests: Pillai test statistic
##    Df test stat approx F num Df den Df  Pr(>F)  
## n   1  0.038358   0.3856      3     29 0.76418  
## s   1  0.111793   1.2167      3     29 0.32130  
## ns  1  0.225221   2.8100      3     29 0.05696 .
## na  1  0.267458   3.5294      3     29 0.02705 *
## ss  1  0.138962   1.5601      3     29 0.22030  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1For a univariate model, the model \(F\)-test gives an overall test of the hypothesis that all regression coefficients are zero, i.e., none of the predictors explain the response beyond chance.
car::linearHypothesis() is more general. It can be used to test any hypothesis regarding subsets of predictors or their linear combinations. For convenience, create a vector of the names of the predictors for an overall test.
predictors <- c("n", "s", "ns", "na", "ss")
linearHypothesis(rohwer.mod1, hypothesis=predictors)## Linear hypothesis test
## 
## Hypothesis:
## n = 0
## s = 0
## ns = 0
## na = 0
## ss = 0
## 
## Model 1: restricted model
## Model 2: SAT ~ n + s + ns + na + ss
## 
##   Res.Df   RSS Df Sum of Sq      F Pr(>F)
## 1     36 17583                           
## 2     31 13930  5    3653.8 1.6263 0.1824This is equivalent to the test of the hypothesis that \(R^2_{SAT} = 0\).
You can try doing the same test for PPVT in rohwer.mod2, and for Raven in rohwer.mod3. What do you conclude?
linearHypothesis(rohwer.mod2, hypothesis=predictors)## Linear hypothesis test
## 
## Hypothesis:
## n = 0
## s = 0
## ns = 0
## na = 0
## ss = 0
## 
## Model 1: restricted model
## Model 2: PPVT ~ n + s + ns + na + ss
## 
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     36 5648.4                                  
## 2     31 2764.8  5    2883.7 6.4667 0.0003179 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1linearHypothesis(rohwer.mod3, hypothesis=predictors)## Linear hypothesis test
## 
## Hypothesis:
## n = 0
## s = 0
## ns = 0
## na = 0
## ss = 0
## 
## Model 1: restricted model
## Model 2: Raven ~ n + s + ns + na + ss
## 
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     36 344.81                           
## 2     31 268.28  5    76.529 1.7686 0.1486Try linearHypothesis() using the multivariate model, rohwer.mlm. This tests the multivariate hypothesis that \(\mathbf{B}_{3 \times 4} = [ \mathbf{\beta}_{SAT}, \mathbf{\beta}_{PPVT}, \mathbf{\beta}_{Raven}] = \mathbf{0}\).
predictors <- c("n", "s", "ns", "na", "ss")
linearHypothesis(rohwer.mlm, hypothesis=predictors)## 
## Sum of squares and products for the hypothesis:
##              SAT      PPVT     Raven
## SAT   3653.77324 2159.9966  63.36803
## PPVT  2159.99661 2883.6759 281.48418
## Raven   63.36803  281.4842  76.52860
## 
## Sum of squares and products for error:
##              SAT     PPVT    Raven
## SAT   13929.5241 1530.517 457.1995
## PPVT   1530.5169 2764.757 213.6780
## Raven   457.1995  213.678 268.2822
## 
## Multivariate Tests: 
##                  Df test stat approx F num Df   den Df     Pr(>F)    
## Pillai            5 0.8252886 2.352859     15 93.00000 0.00659125 ** 
## Wilks             5 0.3431691 2.538140     15 80.45763 0.00391404 ** 
## Hotelling-Lawley  5 1.4487571 2.672152     15 83.00000 0.00234894 ** 
## Roy               5 1.0551154 6.541716      5 31.00000 0.00029228 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1A basic HE plot shows the relation of all predictors to a pair of response variables (the first two, by default). In MMRA models, individual predictors have 1 df, so their H ellipses plot as degenerate lines.
The relative size of any H ellipse reflects the strength (effect size) of that predictor for the two response variables shown in the plot.
By default, these are scaled to show significance. The general rule is that any predictor is significant (using Roy’s test) if its H ellipse (line) protrudes anywhere outside the E ellipse.
The orientation of an H ellipse shows how a given predictor is associated with both response variables. E.g., the positive slope for ns indicates that it is associated with higher scores on both SAT and PPVT.
heplot(rohwer.mlm, fill = TRUE)We can add the H ellipse to show the overall test for all predictors, using the hypothesis= argument. There are many options for heplot().
cols <-  c("red", "blue", "black", "darkgreen", "darkcyan", "magenta", 
            "gray50")
hyp <- list("Regr" = c("n", "s", "ns", "na", "ss"))
heplot(rohwer.mlm, 
       hypotheses=hyp, 
       fill = TRUE, fill.alpha=.10,
       cex = 1.4,
       col = cols)Note how the orientation of the overall Regr ellipse reflects the contributions of ss and ns, aligned mainly with the SAT score, and s and na aligned mainly with the PPVT score.
pairs() plotWe can also view all pairwise HE plots using the pairs.mlm method.
pairs(rohwer.mlm, 
      hypotheses=hyp, 
      fill = TRUE,
      col = cols)