The purpose of this exercise is to introduce you to some of the basics of:
heplots
packageTo work through this exercise in R Studio, open a new R script (say, vismlm-ex2.R
) in the editor and work from there, following this document (in another window). In this document, you are encouraged to try things yourself, but you can click Code / Hide for individual chunks. When code is shown, the icon at the upper left will copy it to the clipboard. The script for this excercise is contained in exercises/rohwer-mmra.R.
It is a good idea to start a script by loading the packages that will be used. We will use the car
, heplots
, and stargazer
packages here. Load them into your R workspace. (Use install.packages()
if any are not found.)
library(car)
library(heplots)
library(stargazer)
This example concerns regression analysis predicting measures of aptitude and achievement for a sample of Low SES kindergarten students from measures of performance on paired associate memory tests.
The response variables are:
SAT
– A scholastic aptitude testPPVT
– Peabody picture vocabulary testRaven
– the Raven progressive matrices testThe predictors are scores on 5 paired associate tasks called N, S, NS, NA, SS
. These differ in the way the stimulus term is presented:
N
= named;S
= used in a ‘still’ task;NS
= used in a named still task; etc.See: help("Rohwer", package="heplots")
for more details.
The dataset Rohwer
contains 69 observations from two groups of kindergarden children. A group of 37 children from a a High-SES neighborhood (SES=='Hi'
) and a group of 32 children from a Low-SES neighborhood (SES=='Lo'
). For the purposes of this exercise, use the following code to load the data and select only the Low SES group, giving the data.frame Rohwer2
.
data("Rohwer", package="heplots")
Rohwer2 <- subset(Rohwer, subset=SES=='Lo')
rownames(Rohwer2)<- 1:nrow(Rohwer2)
A univariate approach entails fitting a separate regression model for each of the PPVT
, SAT
, and Raven
variables. This code fits & tests the model for SAT
.
rohwer.mod1 <-lm(SAT ~ n + s + ns + na + ss, data=Rohwer2)
Anova(rohwer.mod1)
## Anova Table (Type II tests)
##
## Response: SAT
## Sum Sq Df F value Pr(>F)
## n 59.7 1 0.1328 0.71806
## s 1.3 1 0.0028 0.95786
## ns 1626.8 1 3.6205 0.06640 .
## na 102.2 1 0.2275 0.63673
## ss 1872.9 1 4.1681 0.04978 *
## Residuals 13929.5 31
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Fit and test an analogous model rohwer.mod2
for the PPVT
score. I’m showing here the output you should see.
rohwer.mod2 <-lm(PPVT ~ n + s + ns + na + ss, data=Rohwer2)
Anova(rohwer.mod2)
## Anova Table (Type II tests)
##
## Response: PPVT
## Sum Sq Df F value Pr(>F)
## n 1.04 1 0.0117 0.914521
## s 263.55 1 2.9550 0.095583 .
## ns 48.24 1 0.5408 0.467612
## na 903.61 1 10.1318 0.003307 **
## ss 38.91 1 0.4362 0.513818
## Residuals 2764.76 31
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Do the same to give rohwer.mod3
for Raven
.
rohwer.mod3 <-lm(Raven ~ n + s + ns + na + ss, data=Rohwer2)
Anova(rohwer.mod3)
## Anova Table (Type II tests)
##
## Response: Raven
## Sum Sq Df F value Pr(>F)
## n 7.163 1 0.8277 0.3699
## s 2.113 1 0.2442 0.6247
## ns 24.728 1 2.8573 0.1010
## na 0.582 1 0.0673 0.7971
## ss 1.008 1 0.1165 0.7352
## Residuals 268.282 31
Review the Anova summaries above. What do you conclude for these response variables? In particular,
For presentation, or for ease of direct comparison, it is easier to combine the model summary statistics into a table. stargazer
is one package that does this, and can produce output in text, LaTeX or HTML format. There are many, many options.
stargazer(rohwer.mod1, rohwer.mod2, rohwer.mod3,
type = "text",
digits=2,
star.cutoffs = c(0.05, 0.01, 0.001),
omit = "Constant",
no.space=TRUE,
title="Univariate regression models for Rohwer data")
##
## Univariate regression models for Rohwer data
## =============================================================
## Dependent variable:
## -------------------------------
## SAT PPVT Raven
## (1) (2) (3)
## -------------------------------------------------------------
## n -0.61 -0.08 0.21
## (1.67) (0.74) (0.23)
## s -0.05 -0.72 0.06
## (0.94) (0.42) (0.13)
## ns -1.73 -0.30 0.21
## (0.91) (0.41) (0.13)
## na 0.49 1.47** -0.04
## (1.04) (0.46) (0.14)
## ss 2.25* 0.32 -0.05
## (1.10) (0.49) (0.15)
## -------------------------------------------------------------
## Observations 37 37 37
## R2 0.21 0.51 0.22
## Adjusted R2 0.08 0.43 0.10
## Residual Std. Error (df = 31) 21.20 9.44 2.94
## F Statistic (df = 5; 31) 1.63 6.47*** 1.77
## =============================================================
## Note: *p<0.05; **p<0.01; ***p<0.001
Re-evaluate your conclusions from these univariate models.
Fit the multivariate regression model for all three responses. The syntax for a multivariate lm()
uses cbind(Y1, Y2, ...)
to “bind” several responses together.
rohwer.mlm <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer2)
Anova(rohwer.mlm)
##
## Type II MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## n 1 0.038358 0.3856 3 29 0.76418
## s 1 0.111793 1.2167 3 29 0.32130
## ns 1 0.225221 2.8100 3 29 0.05696 .
## na 1 0.267458 3.5294 3 29 0.02705 *
## ss 1 0.138962 1.5601 3 29 0.22030
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For a univariate model, the model \(F\)-test gives an overall test of the hypothesis that all regression coefficients are zero, i.e., none of the predictors explain the response beyond chance.
car::linearHypothesis()
is more general. It can be used to test any hypothesis regarding subsets of predictors or their linear combinations. For convenience, create a vector of the names of the predictors
for an overall test.
predictors <- c("n", "s", "ns", "na", "ss")
linearHypothesis(rohwer.mod1, hypothesis=predictors)
## Linear hypothesis test
##
## Hypothesis:
## n = 0
## s = 0
## ns = 0
## na = 0
## ss = 0
##
## Model 1: restricted model
## Model 2: SAT ~ n + s + ns + na + ss
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 36 17583
## 2 31 13930 5 3653.8 1.6263 0.1824
This is equivalent to the test of the hypothesis that \(R^2_{SAT} = 0\).
You can try doing the same test for PPVT
in rohwer.mod2
, and for Raven
in rohwer.mod3
. What do you conclude?
linearHypothesis(rohwer.mod2, hypothesis=predictors)
## Linear hypothesis test
##
## Hypothesis:
## n = 0
## s = 0
## ns = 0
## na = 0
## ss = 0
##
## Model 1: restricted model
## Model 2: PPVT ~ n + s + ns + na + ss
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 36 5648.4
## 2 31 2764.8 5 2883.7 6.4667 0.0003179 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
linearHypothesis(rohwer.mod3, hypothesis=predictors)
## Linear hypothesis test
##
## Hypothesis:
## n = 0
## s = 0
## ns = 0
## na = 0
## ss = 0
##
## Model 1: restricted model
## Model 2: Raven ~ n + s + ns + na + ss
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 36 344.81
## 2 31 268.28 5 76.529 1.7686 0.1486
Try linearHypothesis()
using the multivariate model, rohwer.mlm
. This tests the multivariate hypothesis that \(\mathbf{B}_{3 \times 4} = [ \mathbf{\beta}_{SAT}, \mathbf{\beta}_{PPVT}, \mathbf{\beta}_{Raven}] = \mathbf{0}\).
predictors <- c("n", "s", "ns", "na", "ss")
linearHypothesis(rohwer.mlm, hypothesis=predictors)
##
## Sum of squares and products for the hypothesis:
## SAT PPVT Raven
## SAT 3653.77324 2159.9966 63.36803
## PPVT 2159.99661 2883.6759 281.48418
## Raven 63.36803 281.4842 76.52860
##
## Sum of squares and products for error:
## SAT PPVT Raven
## SAT 13929.5241 1530.517 457.1995
## PPVT 1530.5169 2764.757 213.6780
## Raven 457.1995 213.678 268.2822
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 5 0.8252886 2.352859 15 93.00000 0.00659125 **
## Wilks 5 0.3431691 2.538140 15 80.45763 0.00391404 **
## Hotelling-Lawley 5 1.4487571 2.672152 15 83.00000 0.00234894 **
## Roy 5 1.0551154 6.541716 5 31.00000 0.00029228 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A basic HE plot shows the relation of all predictors to a pair of response variables (the first two, by default). In MMRA models, individual predictors have 1 df, so their H ellipses plot as degenerate lines.
The relative size of any H ellipse reflects the strength (effect size) of that predictor for the two response variables shown in the plot.
By default, these are scaled to show significance. The general rule is that any predictor is significant (using Roy’s test) if its H ellipse (line) protrudes anywhere outside the E ellipse.
The orientation of an H ellipse shows how a given predictor is associated with both response variables. E.g., the positive slope for ns
indicates that it is associated with higher scores on both SAT
and PPVT
.
heplot(rohwer.mlm, fill = TRUE)
We can add the H ellipse to show the overall test for all predictors, using the hypothesis=
argument. There are many options for heplot()
.
cols <- c("red", "blue", "black", "darkgreen", "darkcyan", "magenta",
"gray50")
hyp <- list("Regr" = c("n", "s", "ns", "na", "ss"))
heplot(rohwer.mlm,
hypotheses=hyp,
fill = TRUE, fill.alpha=.10,
cex = 1.4,
col = cols)
Note how the orientation of the overall Regr
ellipse reflects the contributions of ss
and ns
, aligned mainly with the SAT
score, and s
and na
aligned mainly with the PPVT
score.
pairs()
plotWe can also view all pairwise HE plots using the pairs.mlm
method.
pairs(rohwer.mlm,
hypotheses=hyp,
fill = TRUE,
col = cols)