This script reproduces all of the analysis and graphs for the MMRA of the Rohwer data in the paper and also includes other analyses not described there. It is set up as an R script that can be “compiled” to HTML, Word, or PDF using knitr::knit()
. This is most convenient within R Studio via the File -> Compile Notebook
option.
library(car)
library(heplots)
data("Rohwer", package="heplots")
Rohwer2 <- subset(Rohwer, subset=group==2)
rownames(Rohwer2)<- 1:nrow(Rohwer2) # apply rownames
Here, we use linearHypothesis
to test the overall null, B = 0
rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer2)
Anova(rohwer.mod)
##
## Type II MANOVA Tests: Pillai test statistic
## Df test stat approx F num Df den Df Pr(>F)
## n 1 0.202 2.02 3 24 0.1376
## s 1 0.310 3.59 3 24 0.0284 *
## ns 1 0.358 4.46 3 24 0.0126 *
## na 1 0.465 6.96 3 24 0.0016 **
## ss 1 0.089 0.78 3 24 0.5173
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(linearHypothesis(rohwer.mod,
c("n", "s", "ns", "na", "ss")), SSP=FALSE)
##
## Multivariate Tests:
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 5 1.0386 2.753 15 78.00 0.001912 **
## Wilks 5 0.2431 2.974 15 66.65 0.001154 **
## Hotelling-Lawley 5 2.0615 3.115 15 68.00 0.000697 ***
## Roy 5 1.4654 7.620 5 26.00 0.000160 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(rohwer.coef <- coef(rohwer.mod)[-1,])
## SAT PPVT Raven
## n 3.2571 0.06728 0.05935
## s 2.9966 0.36998 0.49244
## ns -5.8591 -0.37438 -0.16402
## na 5.6662 1.52301 0.11898
## ss -0.6227 0.41016 -0.12116
Alternatively, we can view all coefficients and significance values using the tidy()
function in the broom
package
library(broom)
tidy(rohwer.mod)
## response term estimate std.error statistic p.value
## 1 SAT (Intercept) -28.46747 25.7190 -1.1069 2.785e-01
## 2 SAT n 3.25713 1.2958 2.5136 1.848e-02
## 3 SAT s 2.99658 1.5000 1.9977 5.631e-02
## 4 SAT ns -5.85906 1.5450 -3.7924 8.016e-04
## 5 SAT na 5.66622 1.3384 4.2335 2.537e-04
## 6 SAT ss -0.62265 1.1406 -0.5459 5.898e-01
## 7 PPVT (Intercept) 39.69709 12.2687 3.2356 3.298e-03
## 8 PPVT n 0.06728 0.6181 0.1088 9.142e-01
## 9 PPVT s 0.36998 0.7155 0.5171 6.095e-01
## 10 PPVT ns -0.37438 0.7370 -0.5080 6.157e-01
## 11 PPVT na 1.52301 0.6385 2.3854 2.464e-02
## 12 PPVT ss 0.41016 0.5441 0.7538 4.577e-01
## 13 Raven (Intercept) 13.24384 2.6143 5.0660 2.824e-05
## 14 Raven n 0.05935 0.1317 0.4506 6.560e-01
## 15 Raven s 0.49244 0.1525 3.2298 3.346e-03
## 16 Raven ns -0.16402 0.1570 -1.0445 3.059e-01
## 17 Raven na 0.11898 0.1360 0.8746 3.898e-01
## 18 Raven ss -0.12116 0.1159 -1.0450 3.056e-01
For a publication-quality summary table of the coefficients, it is most convenient to fit the separate models for each response.
rohwer.mod1 <- lm(SAT ~ n + s + ns + na + ss, data = Rohwer2)
rohwer.mod2 <- lm(PPVT ~ n + s + ns + na + ss, data = Rohwer2)
rohwer.mod3 <- lm(Raven ~ n + s + ns + na + ss, data = Rohwer2)
The stargazer
package creates nice tables in HTML, LaTeX or just ASCII text. There are many, many options.
library(stargazer, quietly=TRUE)
stargazer(rohwer.mod1, rohwer.mod2, rohwer.mod3,
type = "text", digits=2,
omit.table.layout = "#",
star.cutoffs = c(0.05, 0.01, 0.001),
omit = "Constant",
title="Univariate regression models for Rohwer data")
##
## Univariate regression models for Rohwer data
## =============================================================
## Dependent variable:
## -------------------------------
## SAT PPVT Raven
## -------------------------------------------------------------
## n 3.26* 0.07 0.06
## (1.30) (0.62) (0.13)
##
## s 3.00 0.37 0.49**
## (1.50) (0.72) (0.15)
##
## ns -5.86*** -0.37 -0.16
## (1.54) (0.74) (0.16)
##
## na 5.67*** 1.52* 0.12
## (1.34) (0.64) (0.14)
##
## ss -0.62 0.41 -0.12
## (1.14) (0.54) (0.12)
##
## -------------------------------------------------------------
## Observations 32 32 32
## R2 0.56 0.35 0.31
## Adjusted R2 0.47 0.23 0.18
## Residual Std. Error (df = 26) 25.67 12.25 2.61
## F Statistic (df = 5; 26) 6.54*** 2.85* 2.32
## =============================================================
## Note: *p<0.05; **p<0.01; ***p<0.001
Set colors to use in the following plots. They correspond to the E ellipse and the H ellipses for all hypotheses.
cols <- c("red", "blue", "black", "darkgreen", "darkcyan",
"magenta", "gray20")
HE plot, with ellipse to test all 5 regressors
hyp <- list("Regr" = c("n", "s", "ns", "na", "ss"))
heplot(rohwer.mod,
hypotheses = hyp,
fill=TRUE, fill.alpha=0.1, cex=1.5, cex.lab=1.4, col=cols,
lwd=c(1,3))
pairs(rohwer.mod, hypotheses=hyp, col=cols, var.cex=4,
fill=TRUE, fill.alpha=0.1, cex=1.5)
This plot shows bivariate confidence regions for the each of the coefficients. Those that don’t overlap (0,0) are shaded.
coefplot(rohwer.mod, lwd=2, fill=(1:5) %in% 3:4,
main="Bivariate 95% coefficient plot for SAT and PPVT", level=0.95)
Canonical correlation analysis finds the linear combinations of the Y variables most highly correlated with linear combinations of the _X_s. Only one canonical correlation is significant here, and accounts for 73% of the shared variance between the predictors and responses, but 100% is accounted for in two dimensions.
library(candisc)
X <- as.matrix(Rohwer[1:37, 6:10]) # Extract X variables for High SES students
Y <- as.matrix(Rohwer[1:37, 3:5]) # Extract Y variables for High SES students
cc <- cancor(X, Y, set.names=c("PA", "Ability"))
cc
##
## Canonical correlation analysis of:
## 5 PA variables: n, s, ns, na, ss
## with 3 Ability variables: SAT, PPVT, Raven
##
## CanR CanRSQ Eigen percent cum scree
## 1 0.7165 0.51341 1.05512 72.829 72.83 ******************************
## 2 0.4906 0.24071 0.31702 21.882 94.71 *********
## 3 0.2668 0.07117 0.07662 5.289 100.00 **
##
## Test of H0: The canonical correlations in the
## current row and all that follow are zero
##
## CanR WilksL F df1 df2 p.value
## 1 0.717 0.343 2.54 15 80.5 0.004
## 2 0.491 0.705 1.43 8 60.0 0.203
## 3 0.267 0.929 0.79 3 31.0 0.508
This HE plot shows the representation of the regression relations in the space of the first two canonical variates for the Y variables, Ycan1
and Ycan2
. The X predictors are shown by their H ellipses in this space. The relationship of the Y response variables to their canonical scores is shown by the arrows, reflecting their correlations (structure coefficients) with the canonical variables.
Note we can also visualize the relationship of joint hypothesis to individual ones in canonical space.
heplot(cc, hypotheses=list("na+ns"=c("na", "ns")),
fill = TRUE, fill.alpha=0.1,
label.pos = c(3, rep(1,5), .1),
cex=1.4, var.cex=1.25, var.lwd=3, var.col="black")
## Vector scale factor set to 1